Minimum volume threshold to detect speech – Politepix

Minimum volume threshold to detect speech

nvrtd frst — Wed, 31 Oct 2012 07:23:44 +0000

Hi,

I am having a problem that in a very quiet area, pocketSphinx starts detecting very slight noises as speech (thinks like the hum of the fridge, very slight sounds). It then gets into this loop of detecting these noises as speech, which obviously hinders good recognition.

Is there a way to set a minimum absolute volume threshold just so it doesn’t pick up these things like the hum of a fridge?

Thanks

I am using OpenEars 1.2.2. NOTE: I have tweaked the build to use VoiceProcessingIO and have added code to set the audio session mode to “Video Recording.”

Reply To: Minimum volume threshold to detect speech

Halle Winkler — Wed, 31 Oct 2012 15:08:58 +0000

Hiya,

Sorry, there is no trivial way to do this. You can only attempt to hack ContinuousADModule.mm with the proviso that wrong values forced into the VAD usually cause crashes. You might have better results going with the slightly less-sensitive default settings because the noise reduction might be causing an artificially wide differential between minor noises and noise-supressed quiet.

Reply To: Minimum volume threshold to detect speech

nvrtd frst — Wed, 31 Oct 2012 21:27:01 +0000

Hey Halle,

Thanks for the response. What do you mean by “less sensitive default settings?”

Thanks!

Reply To: Minimum volume threshold to detect speech

Halle Winkler — Wed, 31 Oct 2012 21:40:07 +0000

Hiya,

I meant not overriding the audio session or audio unit settings, since I think the OpenEars defaults are less likely to zero out low noise buffers, meaning that there will be less of a difference between a low noise buffer and a not-that-much noise buffer, which theoretically might help the VAD to not overreact to not-that-much-noise buffers.

Reply To: Minimum volume threshold to detect speech

nvrtd frst — Wed, 31 Oct 2012 21:52:04 +0000

Thanks Halle. The reason I set the audio session mode to video recording is because I use voiceProcessingIO so users can give voice input while sound is playing, but this significantly decreases the output volume (when compared to remoteIO). I’ve found that by setting the session mode to “video recording,” the sound output volume is much better. But now I realize that there are some disadvantages as well.

Reply To: Minimum volume threshold to detect speech

Halle Winkler — Thu, 01 Nov 2012 09:03:07 +0000

It’s tricky — having noise cancellation and noise suppression is obviously a good thing, but the VAD in OpenEars is partial to non-noise-suppressed sources. I personally don’t use VoiceProcessingIO because it doesn’t seem to get the same degree of QA as RemoteIO (or it just has a lot more options that need QA attention) and I’ve had it stop working in a couple of minor OS updates on a couple of devices, which is a little bit too much of a needle in the haystack situation for maintaining a framework.

Actually, thinking of the VAD and its issues with noise-suppressed sources, I wonder if any of the command-line options in PocketsphinxRunConfig.h would help you. It might be worth a quick look in there to see if there is anything relevant you can turn on (maybe AGC or dithering or something).

Something I’ve noticed is that as a developer you will tend to try to find a test space with the least ambient noise possible because it’s unproductive to test speech applications in an uncontrolled environment, but real user environments are almost always noisier, so you might not need to worry too much about the corner case of an extremely quiet environment that has a slightly-less-quiet noise in it. I used to have a very similar issue when testing AllEars (which ended up providing the OpenEars code) that a sparrow would visit my balcony and the relatively quiet cheeping would totally ruin recognition, but I haven’t received user reports of similar scenarios.