Got another issue to pick your brain on.
I’ve got several users who are using the voice recognition feature long-term in the background of their iOS device. Although it happens when the app is in the foreground as well. What they report and I’ve confirmed, is that after a long period of time (I was able to see it about about 3 hours) the OpenEars no longer responds to voice commands.
When I ran the debugger on my iPad, what’s happening after a long period of time is the following:
OpenEars appears to be listening ok and actually does pick up the audio right before it appears to be unresponsive, but in the debug prints I see this:
2014-02-27 22:08:49.110 MobiLincHD[491:d46f] Processing speech, please wait…
INFO: file_omitted(0): cmn_prior_update: from < 38.99 -7.17 -4.31 -3.34 -2.30 -1.41 -0.54 -1.32 -0.06 -0.11 -0.10 0.34 0.39 >
INFO: file_omitted(0): cmn_prior_update: to < 38.43 -7.16 -4.34 -3.22 -2.40 -1.42 -0.56 -1.35 -0.10 -0.10 -0.13 0.34 0.40 >
INFO: file_omitted(0): 20536 words recognized (42/fr)
INFO: file_omitted(0): 172257 senones evaluated (352/fr)
INFO: file_omitted(0): 80936 channels searched (165/fr), 7776 1st, 56095 last
INFO: file_omitted(0): 25358 words for which last channels evaluated (51/fr)
INFO: file_omitted(0): 5056 candidate words for entering last phone (10/fr)
INFO: file_omitted(0): fwdtree 0.72 CPU 0.147 xRT
INFO: file_omitted(0): fwdtree 5.33 wall 1.088 xRT
INFO: file_omitted(0): Utterance vocabulary contains 49 words
INFO: file_omitted(0): 10036 words recognized (20/fr)
INFO: file_omitted(0): 154390 senones evaluated (315/fr)
INFO: file_omitted(0): 94457 channels searched (192/fr)
INFO: file_omitted(0): 20010 words searched (40/fr)
INFO: file_omitted(0): 14609 word transitions (29/fr)
INFO: file_omitted(0): fwdflat 0.24 CPU 0.049 xRT
INFO: file_omitted(0): fwdflat 0.24 wall 0.049 xRT
INFO: file_omitted(0): </s> not found in last frame, using ___REJ_K.488 instead
INFO: file_omitted(0): lattice start node <s>.0 end node ___REJ_K.274
INFO: file_omitted(0): Eliminated 1108 nodes before end node
INFO: file_omitted(0): Lattice has 2561 nodes, 24194 links
and then nothing else gets printed. In a normal scenario I would usually see the following, but this never appears (I let it run for another 12 hours before killing it):
INFO: file_omitted(0): Normalizer P(O) = alpha(<sil>:94:97) = -780488
INFO: file_omitted(0): Joint P(O,S) = -788376 P(S|O) = -7888
INFO: file_omitted(0): bestpath 0.22 CPU 0.229 xRT
INFO: file_omitted(0): bestpath 0.23 wall 0.234 xRT
2014-02-27 22:08:35.843 MobiLincHD[491:d46f] Pocketsphinx heard ” ” with a score of (-7888) and an utterance ID of 000000039.
Any ideas why the framework isn’t able to progress past the ngram_search_lattice() call? I’m assuming that this is returning a NULL which is why I’m not seeing the call into ngram_search_bestpath(). But why would that cause a complete breakdown of mic sound recognition?
I haven’t seen it on short time frame durations. This is definitely something that pops up under long term OpenEars use. (3+ hours).
I tried this using both the VoiceChat and Default modes. Both exhibited the exact same behavior.
Any ideas would be greatly appreciated!
Thanks Halle!
Wes
But, yes, I am tracking a bug regarding very long searches. It isn’t technically that they don’t return, but that the return is so delayed that it feels like a hang (from a UX perspective the same problem IMO).
It is happening because the search space on these searches is too big for some reason (my early impression is that the reason is a very long utterance due to some persistent noises being taken for an extended speech utterance, combined with something about language model weight values) and it is my current top priority to figure out and fix, but also a challenging issue to pin down. Here are the current correlations for this bug, I’m very interested in any new info you can give me about yours:
1. It appears consistently when a weight above 1.2 is applied to Rejecto when there is a particularly long utterance. This is verified.
2. It has been reported without Rejecto when background volume increases suddenly, although my own tests with the most recent version of the OpenEars beta do not replicate this.
Here is the most recent beta link:
https://politepix.com/wp-content/uploads/OpenEarsDistributionBeta.tar.bz2
I can’t take any test cases that occur over periods of hours, but if you can provide me with a test case that occurs in fewer than 10 minutes, based on the sample app plus an audio recording added to pathToTestFile, I will be very happy to add it to the data on this bug and it will help to get a faster fix.
]]>Actually, from the debug logs, it looks quite clean and is working well, but yes, there is a weird bug here at play. Sounds like you might be on to it.
To answer your points, I’m using Rejecto, but only in the default mode. Meaning, I’m not changing the weighting. I left it as the default (presumably 1.0).
I have no data points without Rejecto, as Rejecto was critical in my application.
It is happening because the search space on these searches is too big for some reason (my early impression is that the reason is a very long utterance due to some persistent noises being taken for an extended speech utterance
This *sounds* right to me. I could see this happening over long periods of listening where OpenEars is exposed to the environment sound for too long and extended speech is triggered.
I’ll try out the beta release and let it run for hours and see if it behaves any differently and let you know later today.
Thanks!
Wes
Early report: Been running for about an hour and observing sound in the room and how OpenEars is handling it.
I’m seeing cases where long sustained background noise does cause OpenEars to take a long time to process where the CPU shoots to 100%. I’ve seen take as long as 60-90 seconds before it eventually returns to listening and all is normal again (0% CPU and listening normally).
This is slightly different than what I was seeing on 1.65 where OpenEars wasn’t returning in my test case but the CPU was 0% (never saw it peak at 100%)
Is this consistent with what you see on your end?
Is this considered “normal”?
Obviously having the CPU shoot to 100% for long periods of time isn’t desirable, but wanted your comments on this.
Leaving my test case running. Will check back in later.
Wes
]]>* I have received one report of non-returning that turned out to be very delayed returning.
I would be very surprised if the change in the beta could have the effect of increasing a bug symptom. It only changes occasional inaccurate probabilities to be normal ones so it can’t really be implicated in a negative behavioral change to the best of my knowledge.
What I have seen about this bug, and the reason it is very challenging, is that it is very intermittent and very nondeterministic so what might seem like a new/different behavior might be a behavior that was there previously that hadn’t yet manifested in front of you, meaning that a new thing is not necessarily related to the beta.
]]>I can see about capturing the background noise that’s causing the high CPU if you are interested. I seem to have a way of reproducing it.
I agree, this is a different behavior than what I saw on 1.65. I’m not sure it was “stuck” in ngram_search_lattice(), but it definitely didn’t proceed to ngram_search_bestpath().
XCODE could have been misleading on the CPU usage, but the iPad wasn’t hot like it had been running at 100% for hours on end.
So far, so good, minus the high CPU on occasion. Let me know if you’d like me to capture the reproducible noise that causes the CPU to peg.
Wes
]]>Thank you, it would be very helpful to have the test case for the long searches in the beta. BTW, I don’t know if you saw this but SaveThatWave 1.65 has a feature now to capture an entire recognition session from startListening: to stopListening and the demo will run for 3 minutes, so you could use the demo to do a direct capture of a session that gets weird if you can get it to happen in fewer than 3 minutes by using the new SaveThatWaveController method startSessionDebugRecord.
Then you can drop that WAV right into pathToTestFile in your sample app and I should (more or less – none of this is perfectly deterministic) be able to see what you saw.
]]>OK, so my understanding is that the beta represents an improvement for you because it means you aren’t permanently losing the reactivity of the UI, or at least you are not seeing any new manifestations of that behavior, but it is also (obviously) not optimal yet because of the remaining issue with the long searches, is that correct?
Yes, so far. I’ve been running for 3+ hours and OpenEars is still going strong. I couldn’t get this far before, so, yes, at least in my single test case the beta appears to have addressed what I saw earlier in 1.65 as noted at the top of this post.
Ah, great! I didn’t know about SaveThatWave. I was thinking about how I’d accomplish that. Ok, let me experiment around with it and see if I can capture the session that pegs the CPU to 100%.
What’s the best way to send you the sample project with the wav file?
Wes
]]>Yes, so far. I’ve been running for 3+ hours and OpenEars is still going strong. I couldn’t get this far before, so, yes, at least in my single test case the beta appears to have addressed what I saw earlier in 1.65 as noted at the top of this post.
OK, that’s good news – this is the first feedback I’ve gotten from reporters of this issue about the effect of the improvements in the beta, so we’ll keep fingers crossed that we’ll continue to only see the current symptom with the increasing background noise. A little more background: a delay due to suddenly-increasing background noise is expected behavior, because that means that the voice activity detection doesn’t have a way of distinguishing the speech/silence transition anymore, since the calibration values became irrelevant inside of a single utterance. Under these conditions, it should notice that this happened and sort itself out in about 14 seconds (this can be made a bit shorter but there are other tradeoffs to doing so, so if it is an uncommon occurrence this timeframe is probably about right).
So we’re only seeing the high CPU and inaccurate speech/silence threshhold as dysfunctional if it takes notably longer than 14 seconds to self-correct, or if this long CPU usage occurs in the absence of a swift increase in background noise. Sometimes completely normal searches can take 1-2 seconds and use 99% CPU, so just seeing a strenuous search isn’t a bug on its own.
What’s the best way to send you the sample project with the wav file?
Ideally, put it up somewhere I can download it and send me the link via the email we’ve talked over previously.
]]>