Forum Replies Created
-
AuthorPosts
-
gieblerParticipant
I determined that the following condition occurs twice during the first long phrase:
// Expand the backpointer tables if necessary.
if (ngs->bpidx >= ngs->bp_table_size) {excessive_length_notification(); // HLW this has been added since no one ever wants this to go on for more than one round on an iPhone.
ngs->bp_table_size *= 2;
ngs->bp_table = ckd_realloc(ngs->bp_table,
ngs->bp_table_size
* sizeof(*ngs->bp_table));
E_INFO(“Resized backpointer table to %d entries\n”, ngs->bp_table_size);}
This one doesn’t happen at all:
if (ngs->bss_head >= ngs->bscore_stack_size
– bin_mdef_n_ciphone(ps_search_acmod(ngs)->mdef)) {excessive_length_notification(); // HLW this has been added since no one ever wants this to go on for more than one round on an iPhone.
ngs->bscore_stack_size *= 2;
ngs->bscore_stack = ckd_realloc(ngs->bscore_stack,
ngs->bscore_stack_size
* sizeof(*ngs->bscore_stack));
E_INFO(“Resized score stack to %d entries\n”, ngs->bscore_stack_size);
}Is there an easy way to start with ngs->bp_table_size four times the normal size so that we never hit that code?
That would solve the problem for our application while leaving the LongRecognition and kExcessiveUtterancePeriod code intact.
Thanks again for the excellent support.
gieblerParticipantThanks, I’ll try that.
By the way:
2014-08-22 10:54:39.290 OpenEarsSampleApp[29757:3c03] ###### GLG Recognition Loop: 20.000000
I am running my compiled version and it is set to 20.0.
Now that I know I have source code, I’ll also look into the issue here too.
Thanks,
Gary
gieblerParticipantThanks – I never noticed the source was included – I just copied the framework and bundles.
I changed the value to 20.0, recompiled the bundles and framework, cleaned and compiled the sample app, but it’s still bailing early, especially on the iPad 2nd gen which bails at 5 seconds. Do I need to set LongRecognition to TRUE?
gieblerParticipantThanks for your help!
However, ContinuousModel.m isn’t available in the project.
I tried placing the define in ContinuousModel.h hoping it was undefined in the library, but it has no effect, so it probably is defined in the library after the include for the .h file.
#define kExcessiveUtterancePeriod 15
Is there a value I can set once the module is initialized?
How can I implement this change?
Thanks.
gieblerParticipantThe problem does not show up if you use a WAV file. It only happens when the microphone is recording the audio. It detects finished speech before the end of the sentence. If you change the #ifdef switch in the sample app so that it is using the live mic, and play the audio file from another device, or simply say the sentence, you’ll see the problems.
When I zipped the sample app, it was set to use the WAV file (sorry for the confusion).
gieblerParticipantThe recording was made on the iPhone5s running 7.1.2. but at 16 kHz and 16 bits, I doubt it matters where I recorded it. The error seems to be device and iOS independent. I have verified it exists on the following devices:
iPad 2nd Gen running iOS 6.1.3
iPad 4th Gen running iOS 7.1.2
iPad Air running iOS 7.1.2
iPad Mini running iOS 7.1.2
iPhone 5s running iOS 7.1.2The iPhone 5s almost gets the whole phrase, just missing the last few words (the first time).
gieblerParticipantI’ve emailed the sample app with the audio file to your email address from your newsletters. Please let me know if you received it.
Thanks.
gieblerParticipantI tested on a second generation iPad running iOS 6.1.3 and on an iPad Mini running iOS 7.1.2.
I made the recording as you suggested and wanted to try it before sending it.
However, I noticed a very repeatable problem (besides the early termination of recognition). If I play the recording (from my iPhone) instead of speaking and regardless of volume, the sample app recognition loop won’t restart. I turned on logging, but it only gave me the following additional log entry:
2014-08-19 12:02:49.143 OpenEarsSampleApp[26108:3a03] cont_ad_read failed, stopping.
2014-08-19 12:02:49.246 OpenEarsSampleApp[26108:907] Setting up the continuous recognition loop has failed for some reason, please turn on [OpenEarsLogging startOpenEarsLogging] in OpenEarsConfig.h to learn more.The audio recording has some hiss (noise) on it since it is using the iPhone5 microphone, but even when I lower the volume to minimize the hiss, it still fails to resume preventing me from testing the recording for the second time.
I’m going to go into my recording studio and make a decent voice recording of the phrase to make sure it fails properly and then works the second time, but I’ll send both recordings so that you can see the second problem.
Thanks!
gieblerParticipantI took the sample app and made this one modification to it. When I attempt to read the following sentence, it shows “has detected finished speech” before I finish reading the sentence. After the first failed attempt, it works perfectly. However, if I press “Stop Listening” and then press “Start Listening” again, it will once again miss the first long sentence. After the first miss, it works fine. Thanks for looking into this issue.
// THIS IS A LONGER TEST OF A VERY LONG SENTENCE TO DETERMINE IF THE OPEN EARS LIBRARY HAS A PROBLEM DETECTING VERBOSE SPEECH
NSArray *firstLanguageArray = [[NSArray alloc] initWithArray:[NSArray arrayWithObjects: // All capital letters.
@”BACKWARD”,
@”CHANGE”,
@”FORWARD”,
@”THIS”,
@”IS A”,
@”LONGER”,
@”TEST”,
@”OF A”,
@”VERY”,
@”LONG”,
@”SENTENCE”,
@”TO”,
@”DETERMINE”,
@”IF”,
@”THE”,
@”OPEN”,
@”EARS”,
@”LIBRARY”,
@”HAS A”,
@”PROBLEM”,
@”DETECTING”,
@”VERBOSE”,
@”SPEECH”,
@”GO”,
@”LEFT”,
@”MODEL”,
@”RIGHT”,
@”TURN”,
nil]];gieblerParticipantWell, I disabled everything except OpenEars and it still is happening. I’m going to try it in another app to see if I can determine what is happening. I’ll let you know what I find.
gieblerParticipantHere’s what I think is happening:
When it receives a long phrase, the average gain (AGC) is adjusting so that it starts to detect silence during the phrase. When I set the timeout to 3.6 seconds, it gives the app enough time to finish the phrase before it times out.
From the debug log, I see that AGC is set to none with a threshold of 2.
I’m using an iPad (2nd Gen) with its internal mic.
Is there a way to change the averaging detection for silence so that it doesn’t change its value over 5 seconds of speech? (Hopefully I’m saying this correctly). Just in case, let me try another way. The level it uses to determine silence is appearing to change over a few seconds of speech so that eventually the speech looks like silence. I need a slower response so that speech still looks like speech after 5 seconds.
Does that make any sense?
gieblerParticipantI tried all three calibration times and I even set the timeout to 1.6 seconds. Here’s the results:
2014-08-15 14:38:01.855 coach[24101:6923] Calibration has completed
2014-08-15 14:38:01.857 coach[24101:6923] Listening.
2014-08-15 14:38:11.276 coach[24101:6923] Speech detected…
INFO: file_omitted(0): Resized backpointer table to 10000 entries
INFO: file_omitted(0): Resized score stack to 200000 entries
2014-08-15 14:38:14.734 coach[24101:6923] Stopping audio unit.
2014-08-15 14:38:14.865 coach[24101:6923] Audio Output Unit stopped, cleaning up variable states.
2014-08-15 14:38:14.866 coach[24101:6923] Processing speech, please wait…
INFO: file_omitted(0): cmn_prior_update: from < 47.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: file_omitted(0): cmn_prior_update: to < 58.97 -2.76 -0.05 -2.55 -2.07 -2.80 -0.86 0.27 -0.82 -0.19 -0.09 0.13 -0.02 >
INFO: file_omitted(0): 8462 words recognized (24/fr)
INFO: file_omitted(0): 477916 senones evaluated (1350/fr)
INFO: file_omitted(0): 305552 channels searched (863/fr), 41300 1st, 165791 last
INFO: file_omitted(0): 23343 words for which last channels evaluated (65/fr)
INFO: file_omitted(0): 14266 candidate words for entering last phone (40/fr)
INFO: file_omitted(0): fwdtree 1.95 CPU 0.551 xRT
INFO: file_omitted(0): fwdtree 3.60 wall 1.016 xRT
INFO: file_omitted(0): Utterance vocabulary contains 145 words
INFO: file_omitted(0): 4616 words recognized (13/fr)
INFO: file_omitted(0): 337969 senones evaluated (955/fr)
INFO: file_omitted(0): 342829 channels searched (968/fr)
INFO: file_omitted(0): 22321 words searched (63/fr)
INFO: file_omitted(0): 18710 word transitions (52/fr)
INFO: file_omitted(0): fwdflat 1.27 CPU 0.358 xRT
INFO: file_omitted(0): fwdflat 1.27 wall 0.358 xRT
2014-08-15 14:38:16.145 coach[24101:6923] Pocketsphinx heard “WHICH DOCTORS HAVE A HIGHEST NET SWITCHING FOR MY PRODUCT OVER ALL” with a score of (0) and an utterance ID of 000000000.
2014-08-15 14:38:16.147 coach[24101:6923] Checking and resetting all audio session settings.
2014-08-15 14:38:16.148 coach[24101:907] _isSpeaking: 0
2014-08-15 14:38:16.149 coach[24101:6923] audioCategory is correct, we will leave it as it is.Here’s the second try in the same debug session:
2014-08-15 14:38:16.270 coach[24101:6923] Listening.
2014-08-15 14:38:39.491 coach[24101:6923] Speech detected…
2014-08-15 14:38:45.964 coach[24101:6923] Stopping audio unit.
2014-08-15 14:38:46.094 coach[24101:6923] Audio Output Unit stopped, cleaning up variable states.
2014-08-15 14:38:46.096 coach[24101:6923] Processing speech, please wait…
INFO: file_omitted(0): cmn_prior_update: from < 59.07 -2.73 -0.26 -2.34 -2.00 -2.78 -0.94 0.16 -0.72 -0.18 0.02 0.04 -0.03 >
INFO: file_omitted(0): cmn_prior_update: to < 57.70 -2.95 -0.09 -2.21 -1.92 -2.72 -0.87 0.11 -0.72 -0.14 0.03 0.01 -0.05 >
INFO: file_omitted(0): 8648 words recognized (18/fr)
INFO: file_omitted(0): 557984 senones evaluated (1136/fr)
INFO: file_omitted(0): 315338 channels searched (642/fr), 57386 1st, 147718 last
INFO: file_omitted(0): 28447 words for which last channels evaluated (57/fr)
INFO: file_omitted(0): 14112 candidate words for entering last phone (28/fr)
INFO: file_omitted(0): fwdtree 2.27 CPU 0.463 xRT
INFO: file_omitted(0): fwdtree 6.62 wall 1.348 xRT
INFO: file_omitted(0): Utterance vocabulary contains 127 words
INFO: file_omitted(0): 3739 words recognized (8/fr)
INFO: file_omitted(0): 328839 senones evaluated (670/fr)
INFO: file_omitted(0): 276068 channels searched (562/fr)
INFO: file_omitted(0): 21483 words searched (43/fr)
INFO: file_omitted(0): 18104 word transitions (36/fr)
INFO: file_omitted(0): fwdflat 1.17 CPU 0.238 xRT
INFO: file_omitted(0): fwdflat 1.17 wall 0.238 xRT
2014-08-15 14:38:47.281 coach[24101:6923] Pocketsphinx heard “WHICH DOCTORS HAVE THE HIGHEST NET SWITCHING FOR MY PRODUCT OVER THE LAST THIRTEEN WEEKS” with a score of (0) and an utterance ID of 000000001.
2014-08-15 14:38:47.282 coach[24101:6923] Checking and resetting all audio session settings.
2014-08-15 14:38:47.284 coach[24101:6923] audioCategory is correct, we will leave it as it is.Let me know if you see anything in this listing. As you can see, it can and does recognize the entire sentence perfectly after the first attempt.
gieblerParticipantIt works great after the first try, and there is no pause or hesitation and no intermittent noise during the first try. I’m working in a relatively quiet environment. How would I do a longer calibration and how would that help?
I don’t have any choice with the length of the phrase – we’re asking questions and as I said, it works fine AFTER the first attempt.
Here’s a sample question:
Which doctors have the highest net switching for my product over the last thirteen weeks?
First try after starting the app always fails while I’m still speaking, and after that it works fine.
In 1.65 it would give the following message:
2014-08-14 17:07:45.144 coach[23556:a107] There is reason to suspect the VAD of being out of sync with the current background noise levels in the environment so we will recalibrate.
In 1.71 it no longer gives that message and as I said, it gets more words in 1.71 but still misses the last half.
gieblerParticipantEven though I was generating a new .dic file, it was failing to copy it to the proper folder and still using the old one! Once I discovered that, your suggestions for Q1,Q2,Q3,Q4 and IMS are all working!
Thanks!
gieblerParticipantI’m adding these entries to the cmu07a.dic file and then generating my .dic file by adding Q1,Q2,Q3,Q4 and IMS to the language array and generating it. I’ll download my language file to make sure they ended up there…
gieblerParticipantI also can’t get it to recognize our company name (IMS) which I also added to the .dic file as shown here:
IMRIE IH M ER IY
IMS AY EH M EH S
IMUS AY M AH SAny suggestions for this one?
Thanks!
gieblerParticipantHere’s what (and where) I put in the .dic file:
Q.S K Y UW Z
Q1 K Y UW W AH N
Q2 K Y UW T UW
Q3 K Y UW TH R IY
Q4 K Y UW F AO R
QANA K AA N AHI had to edit the .dic file in Hex since at first Xcode put spaces instead of a tab.
It still doesn’t recognize Q1, Q2, Q3 or Q4.
It comes out “Two One” or “U One” no matter how clearly I speak.
I need both “Two” and “U” (U.S.) in my recognition file.
Any other thoughts? I don’t know what else to do. Would Rejecto help?
January 30, 2013 at 7:26 am in reply to: PocketSphinx stops listening after I change language file. #1015511gieblerParticipantThis same thing happened to me when I updated to version 1.2.5. I have an app where I need to add words occasionally. When I generate the new files, it would go through the motions of listening, but not ever recognize anything. Based on Geri’s solution, I deleted the old dynamic dictionary and grammar files before creating the new ones and this eliminated the bug. Now I can use the same file names over and over. Hope this helps you track down the bug.
-
AuthorPosts