1) Should I use Rejecto or JSGF grammar for the wake-up language model?
2) In addition to the above I assume that wake-up keyword monitoring happens during music playback (from the device). When the keyword is spotted I change the language model as I previously said but I also turn the music down waiting for a command to be spoken. That means that the background noise level changes dramatically and OpenEars needs to recalibrate. Is there a fast way to force a predefined level of calibration?
3) Is this the best practice for my scenario?
I am surprised that I haven’t found any similar topics.
I think it’s only that this is a specific version of a general implementation question that gets asked, i.e. how to spot a keyword. I’ll be happy to try to help you with it.
1) Should I use Rejecto or JSGF grammar for the wake-up language model?
You can’t use JSGF with a single keyword match, I think. JSGF rules as implemented in Pocketsphinx have to have more than one potential match or they crash, to the best of my knowledge.
Is there a fast way to force a predefined level of calibration?
You won’t get good results from a predefined level since it depends on the environment and mic characteristics being similar to your predefined case, which is unlikely. Have you already tested and discovered that the ongoing recalibration isn’t doing the job?
]]>Then if I turn the music down it seems to get my first sentences wrong. However after a few seconds it starts getting them right again.
Any suggestions?
]]>When I turn the music up OpenEars’ status is “Pocketsphinx has detected speech.” It never passes that phase as it thinks there is someone speaking instead of background music. Eventually it crashes. I need somehow to set a timeout for that and/or force a recalibration at this point.
Good to know — could you test against this beta that is designed to fix the issue where there is occasionally an endless recognition plus a crash when there is a sudden increase in volume in the background:
/wp-content/uploads/OpenEarsDistributionBeta.tar.bz2
The beta is primarily designed for fixing this problem and so far I haven’t gotten any feedback on the beta from the other developers reporting the problem, so I’d appreciate hearing whether it alleviates the issue for you.
]]>2014-01-20 21:44:45.804 openEarsTest[6425:60b] Calibration time: 0
2014-01-20 21:44:46.482 openEarsTest[6425:60b] Pocketsphinx calibration has started.
2014-01-20 21:44:48.685 openEarsTest[6425:60b] Pocketsphinx calibration is complete.
2014-01-20 21:44:48.687 openEarsTest[6425:60b] Pocketsphinx is now listening.
2014-01-20 21:44:49.201 openEarsTest[6425:60b] Pocketsphinx has detected speech.
2014-01-20 21:44:52.374 openEarsTest[6425:60b] Pocketsphinx has detected a period of silence, concluding an utterance.
2014-01-20 21:44:52.558 openEarsTest[6425:60b] The received hypothesis is OK with a score of 0 and an ID of 000000000
2014-01-20 21:44:52.625 openEarsTest[6425:60b] Pocketsphinx is now listening.
2014-01-20 21:44:53.172 openEarsTest[6425:60b] Pocketsphinx has detected speech.
After that there is a crash. I have turned up the music right before the last log message.
]]>That’s the logging from your app — could you show me the output with verbosePocketsphinx and OpenEarsLogging turned on so I can get some info about what is happening with the framework?
Make sure you clean your project and rebuild with the new framework installed to make sure you’re linking to the new version.
Quick question: why does your log say that your calibration time is zero?
]]>Here is the detailed log. It kept listening about 3 minutes of Daft Punk before crashing. :)
2014-01-20 22:01:59.379 openEarsTest[6472:60b] Starting OpenEars logging for OpenEars version 1.64 on 32-bit device: iPad running iOS version: 7.000000
2014-01-20 22:01:59.387 openEarsTest[6472:60b] Normalized array contains the following entries:
(
“OK DEAR”
)
2014-01-20 22:01:59.433 openEarsTest[6472:60b] Starting dynamic language model generation
2014-01-20 22:01:59.438 openEarsTest[6472:60b] Able to open /var/mobile/Applications/79A5CC24-FD52-45CF-BFA2-FDF965D64F65/Library/Caches/NameIWantForMyLanguageModelFiles.corpus for reading
2014-01-20 22:01:59.440 openEarsTest[6472:60b] Able to open /var/mobile/Applications/79A5CC24-FD52-45CF-BFA2-FDF965D64F65/Library/Caches/NameIWantForMyLanguageModelFiles_pipe.txt for writing
2014-01-20 22:01:59.440 openEarsTest[6472:60b] Starting text2wfreq_impl
2014-01-20 22:01:59.456 openEarsTest[6472:60b] Done with text2wfreq_impl
2014-01-20 22:01:59.458 openEarsTest[6472:60b] Able to open /var/mobile/Applications/79A5CC24-FD52-45CF-BFA2-FDF965D64F65/Library/Caches/NameIWantForMyLanguageModelFiles_pipe.txt for reading.
2014-01-20 22:01:59.460 openEarsTest[6472:60b] Able to open /var/mobile/Applications/79A5CC24-FD52-45CF-BFA2-FDF965D64F65/Library/Caches/NameIWantForMyLanguageModelFiles.vocab for reading.
2014-01-20 22:01:59.461 openEarsTest[6472:60b] Starting wfreq2vocab
2014-01-20 22:01:59.464 openEarsTest[6472:60b] Done with wfreq2vocab
2014-01-20 22:01:59.466 openEarsTest[6472:60b] Starting text2idngram
2014-01-20 22:01:59.482 openEarsTest[6472:60b] Done with text2idngram
2014-01-20 22:01:59.486 openEarsTest[6472:60b] Starting idngram2lm
2014-01-20 22:01:59.496 openEarsTest[6472:60b] Done with idngram2lm
2014-01-20 22:01:59.497 openEarsTest[6472:60b] Starting sphinx_lm_convert
2014-01-20 22:01:59.509 openEarsTest[6472:60b] Finishing sphinx_lm_convert
2014-01-20 22:01:59.514 openEarsTest[6472:60b] Done creating language model with CMUCLMTK in 0.079652 seconds.
2014-01-20 22:01:59.592 openEarsTest[6472:60b] I’m done running performDictionaryLookup and it took 0.048693 seconds
2014-01-20 22:01:59.598 openEarsTest[6472:60b] I’m done running dynamic language model generation and it took 411940919.598826 seconds
2014-01-20 22:01:59.601 openEarsTest[6472:60b] Leaving sample rate at the default of 16000.
2014-01-20 22:01:59.606 openEarsTest[6472:60b] The audio session has never been initialized so we will do that now.
2014-01-20 22:01:59.607 openEarsTest[6472:60b] Checking and resetting all audio session settings.
2014-01-20 22:01:59.609 openEarsTest[6472:60b] audioCategory is incorrect, we will change it.
2014-01-20 22:01:59.612 openEarsTest[6472:60b] audioCategory is now on the correct setting of kAudioSessionCategory_PlayAndRecord.
2014-01-20 22:01:59.614 openEarsTest[6472:60b] bluetoothInput is incorrect, we will change it.
2014-01-20 22:01:59.615 openEarsTest[6472:60b] bluetooth input is now on the correct setting of 1.
2014-01-20 22:01:59.617 openEarsTest[6472:60b] Output Device: SpeakerAndMicrophone.
2014-01-20 22:01:59.619 openEarsTest[6472:60b] categoryDefaultToSpeaker is incorrect, we will change it.
2014-01-20 22:01:59.620 openEarsTest[6472:60b] CategoryDefaultToSpeaker is now on the correct setting of 1.
2014-01-20 22:01:59.622 openEarsTest[6472:60b] preferredBufferSize is incorrect, we will change it.
2014-01-20 22:01:59.623 openEarsTest[6472:60b] PreferredBufferSize is now on the correct setting of 0.128000.
2014-01-20 22:01:59.624 openEarsTest[6472:60b] preferredSampleRateCheck is incorrect, we will change it.
2014-01-20 22:01:59.626 openEarsTest[6472:60b] preferred hardware sample rate is now on the correct setting of 16000.000000.
2014-01-20 22:01:59.659 openEarsTest[6472:60b] AudioSessionManager startAudioSession has reached the end of the initialization.
2014-01-20 22:01:59.660 openEarsTest[6472:60b] Exiting startAudioSession.
2014-01-20 22:01:59.663 openEarsTest[6472:3f07] Recognition loop has started
2014-01-20 22:01:59.834 openEarsTest[6472:3f07] Starting openAudioDevice on the device.
2014-01-20 22:01:59.835 openEarsTest[6472:3f07] Audio unit wrapper successfully created.
2014-01-20 22:01:59.844 openEarsTest[6472:3f07] Set audio route to SpeakerAndMicrophone
2014-01-20 22:01:59.846 openEarsTest[6472:3f07] Checking and resetting all audio session settings.
2014-01-20 22:01:59.848 openEarsTest[6472:3f07] audioCategory is correct, we will leave it as it is.
2014-01-20 22:01:59.849 openEarsTest[6472:3f07] bluetoothInput is correct, we will leave it as it is.
2014-01-20 22:01:59.851 openEarsTest[6472:3f07] Output Device: SpeakerAndMicrophone.
2014-01-20 22:01:59.852 openEarsTest[6472:3f07] categoryDefaultToSpeaker is correct, we will leave it as it is.
2014-01-20 22:01:59.854 openEarsTest[6472:3f07] preferredBufferSize is correct, we will leave it as it is.
2014-01-20 22:01:59.856 openEarsTest[6472:3f07] preferredSampleRateCheck is correct, we will leave it as it is.
2014-01-20 22:01:59.858 openEarsTest[6472:3f07] Setting the variables for the device and starting it.
2014-01-20 22:01:59.860 openEarsTest[6472:3f07] Looping through ringbuffer sections and pre-allocating them.
2014-01-20 22:02:00.330 openEarsTest[6472:3f07] Started audio output unit.
2014-01-20 22:02:00.333 openEarsTest[6472:3f07] Calibration has started
2014-01-20 22:02:00.334 openEarsTest[6472:60b] Pocketsphinx calibration has started.
2014-01-20 22:02:02.536 openEarsTest[6472:3f07] Calibration has completed
2014-01-20 22:02:02.537 openEarsTest[6472:60b] Pocketsphinx calibration is complete.
2014-01-20 22:02:02.538 openEarsTest[6472:3f07] Project has these words or phrases in its dictionary:
DEAR
OK
2014-01-20 22:02:02.540 openEarsTest[6472:3f07] Listening.
2014-01-20 22:02:02.541 openEarsTest[6472:60b] Pocketsphinx is now listening.
2014-01-20 22:02:03.711 openEarsTest[6472:3f07] Speech detected…
2014-01-20 22:02:03.712 openEarsTest[6472:60b] Pocketsphinx has detected speech.