rtl_fm does not properly limit audio before downsampling. You will hear annoying pitch when you set lower audio rate (e.g. 24k or 16k) - it's the aliased pilot tone. But what if you can't afford 48k audio rate, for example because you are saving the stream to a small memory card? Try proper resampling with sox.
rtl_fm -f 99700k -p 1 -M fm -s 144k -r 48000 -E deemp | sox -t raw -r 48000 -e signed-integer -b 16 -c 1 - -t raw - vol 6 dB rate -m 16000 | aplay -r 16000 -f S16_LE -c 1