Optimization for short utterances – Politepix

Optimization for short utterances

Elsa — Thu, 02 Aug 2012 09:21:48 +0000

Hello,
I’m using Open Ears to recognize essentially words one by one and no complete sentences. Thus, I regularly have short utterances (less than 1-2 seconds) and sometimes pocketsphinx isn’t going into the decoding process or it’s not very responsive (starts decoding a bit late).
I’m aware that my use case is not the optimal one for pocketsphinx, but I was wondering if it was possible to optimize it for this type of utterances ?
I know that in earlier version of Open Ears it was possible to set kSecondsOfSilenceToDetect so that pocketsphinx would get into decoding faster, but I can’t find it in the last version.
Thank you for your help!

Reply To: Optimization for short utterances

Halle Winkler — Thu, 02 Aug 2012 15:56:12 +0000

Sure, check out the float property of PocketsphinxController “secondsOfSilenceToDetect”. I just moved it into the class so you could set it programmatically.

Reply To: Optimization for short utterances

Elsa — Mon, 06 Aug 2012 08:55:52 +0000

Cool thank you ! It is definitely faster now.
Do you have any other advices to optimize for short utterances ? Sometimes it’s hard to get Sphinx into the decoding process, I have to repeat several times the same word or to speak very close to the microphone. Maybe it’s a microphone configuration issue ?
My app runs on iPad.

Reply To: Optimization for short utterances

Halle Winkler — Mon, 06 Aug 2012 11:15:08 +0000

You could try RapidEars and see if it helps if you’re open to non-free solutions. If I recall correctly, your implementation isn’t a supported method, so you might have audio session problems.

Reply To: Optimization for short utterances

Elsa — Mon, 06 Aug 2012 12:35:13 +0000

Ok thank you, I’ll give it a try !

Reply To: Optimization for short utterances

woodyard — Sun, 26 Aug 2012 21:30:15 +0000

I’m doing something similar – what value would you recommend and what values are acceptable? The default is one correct? Can you use something like .5?

Reply To: Optimization for short utterances

Halle Winkler — Mon, 27 Aug 2012 05:43:58 +0000

I would recommend reducing it and doing some user testing to see what the minimum is for your application before you have an issue with utterances being cut off.

Reply To: Optimization for short utterances

tarantoga — Thu, 27 Sep 2012 21:50:52 +0000

Was trying to lower secondsOfSilenceToDetect to very low values but it doesnt seem to work at all.
In log there is always:
2012-09-27 23:47:18.423 TestOpenEars[1650:907] Pocketsphinx has detected a second of silence, concluding an utterance.
And I would really like to have only half second delay or maybe even 0.33
Is it possible? Or to get it paid plugin is needed?

Reply To: Optimization for short utterances

Halle Winkler — Fri, 28 Sep 2012 05:41:20 +0000

The log always says “a second of silence” because that’s just what an NSLog statement says in the sample app. It isn’t related to the functionality of the property secondsOfSilenceToDetect and the log statement doesn’t come from the framework.

secondsOfSilenceToDetect defaults to .7 seconds currently and if you change it it will be shorter or longer, but the difference between .7 seconds and for instance .33 isn’t going to be a big perceptual difference (although the very short delay can cause issues since any intermittent noise followed by a pause can trigger recognition) because you will still have the following sequence of events which all use time: the speech continuing until to completion, the silence after the complete speech, and then the time to process the complete speech.

RapidEars doesn’t use a period of silence at all because it recognizes speech while the speech is in-progress rather than performing recognition on a completed statement (for instance, if you say “go right” it will first return the live hypotheses “go” and then “go right” as you are in the process of speaking the phrase — RapidEars doesn’t wait for a silence period to recognize). For your goal of using OpenEars-style speech recognition that only happens after a silence but with a shorter silence period it isn’t necessary for you to use RapidEars. But, since OpenEars defaults to a short period of silence out of the box, the differences from shortening it more than the default aren’t going to be dramatic; expect it to be a smaller change in the user experience.

Reply To: Optimization for short utterances

Halle Winkler — Fri, 28 Sep 2012 12:17:36 +0000

I’ve fixed the NSLog statement for the next version so the sample app doesn’t create confusion about the framework behavior and updated the online documentation and tutorial.