First long phrase missed

Home Forums OpenEars First long phrase missed

Viewing 30 posts - 1 through 30 (of 30 total)

  • Author
  • #1022245

    The first long phrase I say is interrupted in the middle of the phrase.

    I was running 1.65 and Rejecto 1.65, but even after updating to 1.71 (is there a new Rejecto?) I have the same problem.

    I have a phrase with 15 words and it only gets about half of that. 1.71 captures more of the phrase but still misses the last half.

    Once it misses one long phrase, it works fine with the same phrase after that.

    Even if you say short phrases first (which it works fine with), it will still miss the first long one. After that, it works fine with long phrases.

    Have you seen this before? Do I need to preallocate something or increase something so that it catches the first one? I’m at a loss trying to find a work around for this.

    Thanks for your help.

    Halle Winkler


    No, I haven’t seen it before, but I would generally advise against taking a phrase that long in a single recognition round since it creates so many opportunities for wrong recognition due to an intermittent noise or hesitation from the user that affects a syllable in the middle. There is no technical reason that the first long phrase wouldn’t have a sense of the silence/speech threshold unless there is something else in the app affecting audio, or the calibration isn’t long enough.


    It works great after the first try, and there is no pause or hesitation and no intermittent noise during the first try. I’m working in a relatively quiet environment. How would I do a longer calibration and how would that help?

    I don’t have any choice with the length of the phrase – we’re asking questions and as I said, it works fine AFTER the first attempt.

    Here’s a sample question:

    Which doctors have the highest net switching for my product over the last thirteen weeks?

    First try after starting the app always fails while I’m still speaking, and after that it works fine.

    In 1.65 it would give the following message:

    2014-08-14 17:07:45.144 coach[23556:a107] There is reason to suspect the VAD of being out of sync with the current background noise levels in the environment so we will recalibrate.

    In 1.71 it no longer gives that message and as I said, it gets more words in 1.71 but still misses the last half.

    Halle Winkler

    It works great after the first try, and there is no pause or hesitation and no intermittent noise during the first try

    Under real-world conditions it’s likely to create interaction stress for your users since they will hesitate and experience intermittent noise, among other challenges such as mic distance, accent, etc. Avoiding long queries via voice recognition is a suggestion I make when I give talks on the subject of speech UI, since it reduces user interaction stress.

    You can check out the calibration options in PocketsphinxController’s docs and I’d also recommend setting a longer secondsOfSilence pause time along with the longer calibration.

    It sounds like maybe there is something about the app setup that is leading to a poor calibration (maybe another audio object interrupting it or something weird when the listening loop starts), so that is an area you could troubleshoot to detect why the calibration results aren’t ideal.


    I tried all three calibration times and I even set the timeout to 1.6 seconds. Here’s the results:

    2014-08-15 14:38:01.855 coach[24101:6923] Calibration has completed
    2014-08-15 14:38:01.857 coach[24101:6923] Listening.
    2014-08-15 14:38:11.276 coach[24101:6923] Speech detected…
    INFO: file_omitted(0): Resized backpointer table to 10000 entries
    INFO: file_omitted(0): Resized score stack to 200000 entries
    2014-08-15 14:38:14.734 coach[24101:6923] Stopping audio unit.
    2014-08-15 14:38:14.865 coach[24101:6923] Audio Output Unit stopped, cleaning up variable states.
    2014-08-15 14:38:14.866 coach[24101:6923] Processing speech, please wait…
    INFO: file_omitted(0): cmn_prior_update: from < 47.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: file_omitted(0): cmn_prior_update: to < 58.97 -2.76 -0.05 -2.55 -2.07 -2.80 -0.86 0.27 -0.82 -0.19 -0.09 0.13 -0.02 >
    INFO: file_omitted(0): 8462 words recognized (24/fr)
    INFO: file_omitted(0): 477916 senones evaluated (1350/fr)
    INFO: file_omitted(0): 305552 channels searched (863/fr), 41300 1st, 165791 last
    INFO: file_omitted(0): 23343 words for which last channels evaluated (65/fr)
    INFO: file_omitted(0): 14266 candidate words for entering last phone (40/fr)
    INFO: file_omitted(0): fwdtree 1.95 CPU 0.551 xRT
    INFO: file_omitted(0): fwdtree 3.60 wall 1.016 xRT
    INFO: file_omitted(0): Utterance vocabulary contains 145 words
    INFO: file_omitted(0): 4616 words recognized (13/fr)
    INFO: file_omitted(0): 337969 senones evaluated (955/fr)
    INFO: file_omitted(0): 342829 channels searched (968/fr)
    INFO: file_omitted(0): 22321 words searched (63/fr)
    INFO: file_omitted(0): 18710 word transitions (52/fr)
    INFO: file_omitted(0): fwdflat 1.27 CPU 0.358 xRT
    INFO: file_omitted(0): fwdflat 1.27 wall 0.358 xRT
    2014-08-15 14:38:16.145 coach[24101:6923] Pocketsphinx heard “WHICH DOCTORS HAVE A HIGHEST NET SWITCHING FOR MY PRODUCT OVER ALL” with a score of (0) and an utterance ID of 000000000.
    2014-08-15 14:38:16.147 coach[24101:6923] Checking and resetting all audio session settings.
    2014-08-15 14:38:16.148 coach[24101:907] _isSpeaking: 0
    2014-08-15 14:38:16.149 coach[24101:6923] audioCategory is correct, we will leave it as it is.

    Here’s the second try in the same debug session:

    2014-08-15 14:38:16.270 coach[24101:6923] Listening.
    2014-08-15 14:38:39.491 coach[24101:6923] Speech detected…
    2014-08-15 14:38:45.964 coach[24101:6923] Stopping audio unit.
    2014-08-15 14:38:46.094 coach[24101:6923] Audio Output Unit stopped, cleaning up variable states.
    2014-08-15 14:38:46.096 coach[24101:6923] Processing speech, please wait…
    INFO: file_omitted(0): cmn_prior_update: from < 59.07 -2.73 -0.26 -2.34 -2.00 -2.78 -0.94 0.16 -0.72 -0.18 0.02 0.04 -0.03 >
    INFO: file_omitted(0): cmn_prior_update: to < 57.70 -2.95 -0.09 -2.21 -1.92 -2.72 -0.87 0.11 -0.72 -0.14 0.03 0.01 -0.05 >
    INFO: file_omitted(0): 8648 words recognized (18/fr)
    INFO: file_omitted(0): 557984 senones evaluated (1136/fr)
    INFO: file_omitted(0): 315338 channels searched (642/fr), 57386 1st, 147718 last
    INFO: file_omitted(0): 28447 words for which last channels evaluated (57/fr)
    INFO: file_omitted(0): 14112 candidate words for entering last phone (28/fr)
    INFO: file_omitted(0): fwdtree 2.27 CPU 0.463 xRT
    INFO: file_omitted(0): fwdtree 6.62 wall 1.348 xRT
    INFO: file_omitted(0): Utterance vocabulary contains 127 words
    INFO: file_omitted(0): 3739 words recognized (8/fr)
    INFO: file_omitted(0): 328839 senones evaluated (670/fr)
    INFO: file_omitted(0): 276068 channels searched (562/fr)
    INFO: file_omitted(0): 21483 words searched (43/fr)
    INFO: file_omitted(0): 18104 word transitions (36/fr)
    INFO: file_omitted(0): fwdflat 1.17 CPU 0.238 xRT
    INFO: file_omitted(0): fwdflat 1.17 wall 0.238 xRT
    2014-08-15 14:38:47.281 coach[24101:6923] Pocketsphinx heard “WHICH DOCTORS HAVE THE HIGHEST NET SWITCHING FOR MY PRODUCT OVER THE LAST THIRTEEN WEEKS” with a score of (0) and an utterance ID of 000000001.
    2014-08-15 14:38:47.282 coach[24101:6923] Checking and resetting all audio session settings.
    2014-08-15 14:38:47.284 coach[24101:6923] audioCategory is correct, we will leave it as it is.

    Let me know if you see anything in this listing. As you can see, it can and does recognize the entire sentence perfectly after the first attempt.

    Halle Winkler

    It looks like there is something else going on in the app preventing the initial calibration from working correctly, such as another audio object or something obstructing the calibration in the PocketsphinxController setup. I would focus troubleshooting on other app code.


    Here’s what I think is happening:

    When it receives a long phrase, the average gain (AGC) is adjusting so that it starts to detect silence during the phrase. When I set the timeout to 3.6 seconds, it gives the app enough time to finish the phrase before it times out.

    From the debug log, I see that AGC is set to none with a threshold of 2.

    I’m using an iPad (2nd Gen) with its internal mic.

    Is there a way to change the averaging detection for silence so that it doesn’t change its value over 5 seconds of speech? (Hopefully I’m saying this correctly). Just in case, let me try another way. The level it uses to determine silence is appearing to change over a few seconds of speech so that eventually the speech looks like silence. I need a slower response so that speech still looks like speech after 5 seconds.

    Does that make any sense?

    Halle Winkler

    Sorry, that isn’t the issue here; there is no AGC with the audio session on the device or used in the voice audio detection. This isn’t a normal result, so I would investigate other aspects of the app, specifically other audio objects and any timing issues with other operations at the time that calibration should be executing.


    Well, I disabled everything except OpenEars and it still is happening. I’m going to try it in another app to see if I can determine what is happening. I’ll let you know what I find.

    Halle Winkler

    OK, the best way to troubleshoot this kind of issue is to take either the tutorial example or the sample app, and without changing anything else about it, just add your language model generation code and test against a recording of your phrase using pathToTestFile.

    That way, you will either get a better result, which suggests that it is something going on with the app (and then you can start adding in your related app features until you learn what causes the issue), or you will get the same result, which suggests that it is an edge case of some kind with the language model, and then you will have a test case replicating the issue that is simple enough to send me (something that only diverges from the sample app or tutorial example by a few lines of code) and I can take a look at it.

    If you end up with such a simple example which demonstrates that it is an edge case issue with the language model, go ahead and send the example to the email address your framework license was sent from, along with the audio file for testing and the device you’re testing on and its iOS version, and I’ll check it out.


    I took the sample app and made this one modification to it. When I attempt to read the following sentence, it shows “has detected finished speech” before I finish reading the sentence. After the first failed attempt, it works perfectly. However, if I press “Stop Listening” and then press “Start Listening” again, it will once again miss the first long sentence. After the first miss, it works fine. Thanks for looking into this issue.


    NSArray *firstLanguageArray = [[NSArray alloc] initWithArray:[NSArray arrayWithObjects: // All capital letters.
    @”IS A”,
    @”OF A”,
    @”HAS A”,

    Halle Winkler

    OK, please let me know the device you’re testing on, its iOS version, and email me an audio file of yourself making this statement after 4 seconds of not speaking (fine to make a recording using and send it over, and I will convert it to a WAV) so I can be guaranteed to replicate it locally. You can use the email account your license was sent to.


    I tested on a second generation iPad running iOS 6.1.3 and on an iPad Mini running iOS 7.1.2.

    I made the recording as you suggested and wanted to try it before sending it.

    However, I noticed a very repeatable problem (besides the early termination of recognition). If I play the recording (from my iPhone) instead of speaking and regardless of volume, the sample app recognition loop won’t restart. I turned on logging, but it only gave me the following additional log entry:

    2014-08-19 12:02:49.143 OpenEarsSampleApp[26108:3a03] cont_ad_read failed, stopping.
    2014-08-19 12:02:49.246 OpenEarsSampleApp[26108:907] Setting up the continuous recognition loop has failed for some reason, please turn on [OpenEarsLogging startOpenEarsLogging] in OpenEarsConfig.h to learn more.

    The audio recording has some hiss (noise) on it since it is using the iPhone5 microphone, but even when I lower the volume to minimize the hiss, it still fails to resume preventing me from testing the recording for the second time.

    I’m going to go into my recording studio and make a decent voice recording of the phrase to make sure it fails properly and then works the second time, but I’ll send both recordings so that you can see the second problem.


    Halle Winkler

    No need to make another recording –– the only thing that is of interest here is how the device mic input is interacting with the app. The goal isn’t to play it out through another device, but to convert it into a WAV and add it to the app and give the testing tool pathToTestFile its path. This will create an identical result to that of having the speech recorded by the app.

    cont_ad_read failed, stopping.

    Hmm, I’ve never heard of calibration failing on a device on an unaltered sample app regardless of input and I’m concerned about trying to replicate such unusual results. I think what we should do is for you to take your speech recording and convert it to a WAV by following the instructions in the docs regarding pathToTestFile, and then add it to your altered sample app to create a test case that replicates the issue without interaction, and send me the whole test case. The second round in which it works correctly isn’t important to replicate, we’re only interested in the first round where it isn’t finishing. Thanks!


    I’ve emailed the sample app with the audio file to your email address from your newsletters. Please let me know if you received it.


    Halle Winkler

    Thanks – let me know which device these recordings were recorded with, running which iOS version (this will be the test device).


    The recording was made on the iPhone5s running 7.1.2. but at 16 kHz and 16 bits, I doubt it matters where I recorded it. The error seems to be device and iOS independent. I have verified it exists on the following devices:

    iPad 2nd Gen running iOS 6.1.3
    iPad 4th Gen running iOS 7.1.2
    iPad Air running iOS 7.1.2
    iPad Mini running iOS 7.1.2
    iPhone 5s running iOS 7.1.2

    The iPhone 5s almost gets the whole phrase, just missing the last few words (the first time).

    Halle Winkler

    The spectral characteristics of the mic as well as the default characteristics of the HAL for the device matter to both OpenEars and Pocketsphinx for different reasons, even down to 8-bit/8k, so thank you for providing the information that allows me to do a meaningful test.

    Halle Winkler


    I’m a bit confused now – this issue does not replicate when I run your app on a 5S or other devices. This is the full logging output on all devices I tested on (with differences in the scoring and timestamps):

    2014-08-21 10:51:01.892 OpenEarsSampleApp[1276:60b] acousticModelPath is /var/mobile/Applications/9B255FB9-4AA7-4082-9BC4-04D7EEB31129/
    2014-08-21 10:51:02.553 OpenEarsSampleApp[1276:60b] Dynamic language generator completed successfully, you can find your new files FirstOpenEarsDynamicLanguageModel.DMP
     at the paths 
    2014-08-21 10:51:02.557 OpenEarsSampleApp[1276:60b] acousticModelPath is /var/mobile/Applications/9B255FB9-4AA7-4082-9BC4-04D7EEB31129/
    2014-08-21 10:51:03.184 OpenEarsSampleApp[1276:60b] Dynamic language generator completed successfully, you can find your new files SecondOpenEarsDynamicLanguageModel.DMP
     at the paths 
    2014-08-21 10:51:03.191 OpenEarsSampleApp[1276:60b] 
    Welcome to the OpenEars sample project. This project understands the words:
    and if you say "CHANGE MODEL" it will switch to its dynamically-generated model which understands the words:
    2014-08-21 10:51:09.860 OpenEarsSampleApp[1276:60b] The received hypothesis is THIS IS A LONGER TEST OF A VERY LONG SENTENCE TO DETERMINE IF THE OPEN EARS LIBRARY HAS A PROBLEM DETECTING THE LONG SENTENCE with a score of -61202 and an ID of 000000000

    I see from your email that you said “It recognizes the entire file” which sounds like it demonstrates the absence of the issue rather than the issue. Is the sample app you sent intended to replicate the issue you reported, or is the goal of the sample app to demonstrate something else? I’m trying to get a sample app which replicates the issue using a recording of you on your 5S which I can run on my 5S so I can observe your unwanted results as you yourself are experiencing them directly. If the issue never replicates when you use pathToTestFile, please let me know that.


    The problem does not show up if you use a WAV file. It only happens when the microphone is recording the audio. It detects finished speech before the end of the sentence. If you change the #ifdef switch in the sample app so that it is using the live mic, and play the audio file from another device, or simply say the sentence, you’ll see the problems.

    When I zipped the sample app, it was set to use the WAV file (sorry for the confusion).

    Halle Winkler

    OK, to clarify just in case we have a future case where a test app is needed, I was asking for a test app which used pathToTestFile along with your normal startListeningWithLanguageModelAtPath method call, not runRecognitionOnWavFileAtPath (you can read more about this in the docs for pathToTestFile), but I made a new test app so that it was possible to recreate the results using your test recording with startListeningWithLanguageModelAtPath.

    I’ve seen your issue. What you can do in order to work around this bad outcome on the first utterance is to change the #define value kExcessiveUtterancePeriod of ContinuousModel.m in OpenEars.xcodeproj to something a bit longer than 13 seconds.

    kExcessiveUtterancePeriod is very important in that it prevents your app from ever having a circumstance in which the voice audio detection becomes stuck for the app session due to an extreme change in background levels in one direction or the other which occurs too quickly for the voice activity detection to smoothly adjust to. It is set to that number due to the maximum likely sentence which can be satisfactorily recognized in the field. kExcessiveUtterancePeriod is not applied to every utterance, but it is applied to any utterance which causes a rescaling of backpointer size in pocketsphinx, i.e. utterances with a particularly large search space, which in this case is happening with the first utterance only, due to unknown causes that need more looking into – it isn’t expected.

    My advice is to increase kExcessiveUtterancePeriod by seconds until it is large enough to not interfere with your maximum utterance length, plus a little bit of buffer for slow speakers, but no longer than that, so that results in the field remain as good as possible in cases with abrupt and significant changes of background level. Maybe it should be something along the lines of 20 seconds for your app. Then recompile the framework project and this issue shouldn’t be in evidence for your app.

    At the moment I’m developing the next version of OpenEars which uses a different voice activity detector which is meant to be more accurate and more noise-robust, which will ideally mean that kExcessiveUtterancePeriod will disappear altogether because the situation in which it is requested as a failsafe will no longer occur.


    Thanks for your help!

    However, ContinuousModel.m isn’t available in the project.

    I tried placing the define in ContinuousModel.h hoping it was undefined in the library, but it has no effect, so it probably is defined in the library after the include for the .h file.

    #define kExcessiveUtterancePeriod 15

    Is there a value I can set once the module is initialized?

    How can I implement this change?


    Halle Winkler

    ContinuousModel.m and its #define for kExcessiveUtterancePeriod are in the project OpenEars.xcodeproj (it is in the distribution your downloaded). You’ll need to change kExcessiveUtterancePeriod and then recompile the framework.


    Thanks – I never noticed the source was included – I just copied the framework and bundles.

    I changed the value to 20.0, recompiled the bundles and framework, cleaned and compiled the sample app, but it’s still bailing early, especially on the iPad 2nd gen which bails at 5 seconds. Do I need to set LongRecognition to TRUE?

    Halle Winkler

    Do I need to set LongRecognition to TRUE?

    No, if it isn’t helping I would troubleshoot whether your newly compiled framework is linked to your app or if it is still using the old framework, and whether 20 is the right number.

    Halle Winkler

    You know, on second thought, that is a safer bet. For now change this method of PocketsphinxController:

    - (void) longRecognition {
        self.continuousModel.longRecognition = TRUE;    

    To read:

    - (void) longRecognition {
       // self.continuousModel.longRecognition = TRUE;    

    I’d like to look more into the kExcessiveUtterancePeriod issue and in the meantime this should definitely prevent that result.


    Thanks, I’ll try that.

    By the way:

    2014-08-22 10:54:39.290 OpenEarsSampleApp[29757:3c03] ###### GLG Recognition Loop: 20.000000

    I am running my compiled version and it is set to 20.0.

    Now that I know I have source code, I’ll also look into the issue here too.



    Halle Winkler

    Super – I may not be able to take a lot of feedback about that area since it’s both sensitive and on the way out (to the best of my knowledge), but it’s always good if you can dig into the framework and make the changes you want.


    I determined that the following condition occurs twice during the first long phrase:

    // Expand the backpointer tables if necessary.
    if (ngs->bpidx >= ngs->bp_table_size) {

    excessive_length_notification(); // HLW this has been added since no one ever wants this to go on for more than one round on an iPhone.

    ngs->bp_table_size *= 2;
    ngs->bp_table = ckd_realloc(ngs->bp_table,
    * sizeof(*ngs->bp_table));
    E_INFO(“Resized backpointer table to %d entries\n”, ngs->bp_table_size);


    This one doesn’t happen at all:

    if (ngs->bss_head >= ngs->bscore_stack_size
    – bin_mdef_n_ciphone(ps_search_acmod(ngs)->mdef)) {

    excessive_length_notification(); // HLW this has been added since no one ever wants this to go on for more than one round on an iPhone.

    ngs->bscore_stack_size *= 2;
    ngs->bscore_stack = ckd_realloc(ngs->bscore_stack,
    * sizeof(*ngs->bscore_stack));
    E_INFO(“Resized score stack to %d entries\n”, ngs->bscore_stack_size);

    Is there an easy way to start with ngs->bp_table_size four times the normal size so that we never hit that code?

    That would solve the problem for our application while leaving the LongRecognition and kExcessiveUtterancePeriod code intact.

    Thanks again for the excellent support.

    Halle Winkler

    Correct, that is the presenting issue (but there is more under the surface since it’s a problematic area). That isn’t a part of the dependency code that I can troubleshoot right now due to the reasons I mentioned, but I encourage you to make any changes you like to your local version as long as you report issues to me based on a clean framework. A simpler quick-and-dirty way to deal with this at the moment would just to take a look at this method of PocketsphinxController:

    - (void) longRecognition {
        self.continuousModel.longRecognition = TRUE;    

    And put a delay (of your preference) before the switch to TRUE since that is the only place it can happen. You could also do a check to see if it’s the first utterance (the utterances return an utterance number) and not set it to TRUE for the first one.

    There isn’t going to be a perfect fix to this at the moment because that code is under heavy revision in my local branch and I don’t want to heavily test or make changes to the old version simultaneously. These kinds of VAD-related issues are the reason that I’m switching the VAD (voice activity detection) code and putting a lot of effort in testing and refining the new version, so both speech app developers and I can spend less time looking at VAD stuff :) .

    Thanks again for the excellent support.

    You’re welcome! Glad to help.

Viewing 30 posts - 1 through 30 (of 30 total)
  • You must be logged in to reply to this topic.