- This topic has 5 replies, 2 voices, and was last updated 11 years, 3 months ago by Halle Winkler.
-
AuthorPosts
-
January 25, 2013 at 3:21 pm #1015471jbsilbParticipant
I’m having one challenge that I’d love to get feedback from the community on:
When I start Listening, there’s a bit of a lag, so I typically prefer to start the system prior to requiring any speech input. Unfortunately, that means that it automatically jumps into recognition mode, which can cause code to be implemented that’s not useful.
I’ve solved this in the past by blocking based on a boolean, but this seems inefficient and still seems to frequently lead to words getting queued in the hypothesis, and the first recognition being error prone.
Is there a way to “soft start” the engine so that there isn’t a lag between first request and “ready” state, while not starting the actual recognition process?
Thanks!
January 25, 2013 at 3:28 pm #1015472Halle WinklerPolitepixWelcome,
This is not actually advisable, because the lag is the voice activity detection checking the noise levels in the room and calibrating itself to distinguish between silence and speech in the current conditions before the user starts speaking. If this is done at some arbitrary time before the user is just about to talk, the calibration isn’t being performed for the environment which exists in the timeframe in which the user is speaking. This will lead to error-prone recognition.
January 25, 2013 at 3:34 pm #1015473jbsilbParticipantHi,
Thanks for the suggestion. Strictly speaking, however, it’s not an arbitrary time, it’s usually 15-20 seconds prior to the first voice input required (in the car).
What’s the recommended audio environment for calibration? Only ambient noise?
We’d like to make sure users have some sort of queue so that they might turn off radios, etc if that aids in calibration.
January 25, 2013 at 3:40 pm #1015474Halle WinklerPolitepixYup, for speech recognition the optimal environment is always as quiet as possible, since background noise will either occlude the speech or cause an attempt to recognize it. So if the users are in the car and they are just using the built-in phone mic, it’s a good suggestion for them to turn off the radio. The important thing about calibration is that it is done on an environment that matches the speech environment, meaning that if the user is going to talk over the radio even if you suggest that they not do that, you want the radio on during calibration because silence in that case means “the user isn’t talking but there is quieter radio noise running in the background”.
January 25, 2013 at 5:21 pm #1015475jbsilbParticipantThe one other thing we noticed is that every time the language model changes, the system starts listening and recognizing. Is that correct?
January 25, 2013 at 5:32 pm #1015476Halle WinklerPolitepixSort of — these are all things that happen when the engine is started (calibration, listening, language model switching), so they aren’t responsible for starting it. Switching language models is something you can do while listening is in progress so the impression that it starts listening comes from the context in which you are preventing entry into the listening loop.
I think what you’re seeing is that the overall listening method is recursive, so events which return it to the top of the loop will end-run your method of preventing recognition. I think the startup time is just a second or so, are you seeing significantly longer waits to start?
-
AuthorPosts
- You must be logged in to reply to this topic.