- This topic has 13 replies, 2 voices, and was last updated 10 years, 10 months ago by Halle Winkler.
-
AuthorPosts
-
May 30, 2013 at 11:05 am #1017333lookbadgersParticipant
I understand that OpenEars uses the previous word to help detection with the next word when working with sentences.
If I had the corpus “THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG”
If it heard “THE QUICK” it increases the probability the next sound will be BROWN, I was wondering how much more does it increase the probability that the next sound will be matched with BROWN?
If there was to be a pause between hearing “THE QUICK” and BROWN does this increased probability still apply?
If there are likely to be pauses between words, is it better to have a corpus that contains the sentence as a whole and the individual words each on a new line?
May 30, 2013 at 11:44 am #1017334Halle WinklerPolitepixThere isn’t an answer in the form of “it’s 20% more likely” because it is dependent on the overall number of words as well as how many composed phrases were submitted to be turned into a language model. But I can tell you what is happening under the hood.
Pocketsphinx as used by OpenEars takes into account unigrams (single words) bigrams (word pairs) and trigrams (word triplets) in language models. These are indicated in the language model as 1-gram, 2-gram and 3-gram and are collectively referred to as n-grams. When a set of individual words are submitted (meaning no composed phrases at all, just single words by themselves), the likelihood of each bigram and trigram is equal, meaning that all combinations of the words that can be expressed as a pair or a triplet are equally likely. When a phrase is submitted (taking yours as an example), every bigram and trigram that occurs within the phrase is more likely than word bigrams and trigrams composed of word combinations which do not appear within the phrase. The
andsymbols, indicating the beginning and end of an utterance, are also taken into account for probabilities so they appear in all of the n-gram sets. They are added automatically; you don’t do anything about them.With this information in mind about how it is working, it should be possible for you to construct a test which answers your question for the specific app you are developing, which I can’t really advise you about from the info in the description.
May 30, 2013 at 12:26 pm #1017335lookbadgersParticipantThank you for the reply. I now understand why it’s not an exact probability for all apps but instead a formula based on number of words in a sentence.
With that in mind does a previous utterance have any impact on the next?
For example the pause between “THE QUICK” and “BROWN”. Or is the second utterance “BROWN” treated as a 1-gram and the previous 2-gram “THE QUICK” disregarded.
I’ve noticed sometime users have been hesitating between words in expected phrases and I am trying to find out if this has been reducing the quality of the perceived hypothesis after the pause.
May 30, 2013 at 12:34 pm #1017336Halle WinklerPolitepixYou have to construct a test for your app in order to see if recognition accuracy is reduced as a result of pauses. It isn’t difficult: you can turn on verbose output and use this method to submit recordings with and without pauses and look at the results:
- (void) runRecognitionOnWavFileAtPath: (NSString *) wavPath usingLanguageModelAtPath: (NSString *) languageModelPath dictionaryAtPath: (NSString *) dictionaryPath languageModelIsJSGF: (BOOL) languageModelIsJSGF
If you need to gather input recordings you can make them using the SaveThatWave demo.
May 30, 2013 at 12:41 pm #1017337lookbadgersParticipantThank you, I will give that a try. I was just hoping the question may have already been answered. I will try and post my findings when I get time to test.
May 30, 2013 at 12:48 pm #1017338Halle WinklerPolitepixThat’s great — please share at least a little information about your app vocabulary (you don’t have to tell the exact words, but give us info about vocabulary size, number of phrases, and a similar kind of phrase to the one you’re testing) and things like the mic used and the length of the pauses and environmental factors so a reader can get the big picture. I don’t think this is something that will behave the same with every app or under every circumstance (which is why I can’t answer it off the cuff) so it would be most helpful to hear about the context for the findings.
June 5, 2013 at 5:02 pm #1017402lookbadgersParticipantJust trying to get this up and running. I can get SaveTheWave to work in the sample app. However in my test application I get undefined symbols for architecture armv7
“_OBJC_CLASS_$_SaevTheWaveController”, referenced from:
objc-class-ref in…
ld: symbol(s) not found for architecture armv7
clang error: link command failed with exit code 1I have checked “Other Linker Flags” have the linker flag “-ObjC”:
June 5, 2013 at 5:06 pm #1017403Halle WinklerPolitepixHi,
You misspelled SaveThatWaveController in your project.
June 6, 2013 at 12:59 pm #1017419lookbadgersParticipantSorry that was a typo when writing out the error, the typo did not exist in the code. Anyway that is now working and I can record wav files.
I’m calling the method runRecognitionOnWavFileAtPath but nothing happens.
I give the languageModel and dictionary paths generated by Rejecto.
I recorded the wav file outside of the app in the end, I was wondering if the problem is it’s not the right format?
16-bit PCM
16000Hz
StereoJune 6, 2013 at 1:08 pm #1017420Halle WinklerPolitepixI’m calling the method runRecognitionOnWavFileAtPath but nothing happens.
Do you mean that there is nothing ever returned in the OpenEarsEventsObserver pocketsphinxDidReceiveHypothesis: method?
June 6, 2013 at 1:15 pm #1017421lookbadgersParticipantThat is correct pocketsphinxDidReceiveHypothesis is never called.
I assume you can’t use RapidEars with SaveTheWave? Which might make the test I’m working on irrelevant.
June 6, 2013 at 1:24 pm #1017422Halle WinklerPolitepixIt’s probably because the file is stereo. That’s correct, RapidEars doesn’t work on a single pause-bounded utterance in the sense that stock OpenEars does, so it doesn’t have a method for outputting that complete utterance as a WAV.
I construct replicable tests for RapidEars by making a recording and playing it out of a speaker and into the device, cued by a note in the console that says “press play now”. While it is not the most deterministic thing going, it is the most informative approach I can think of that actually replicates real-world behavior without interfering with RapidEars’ resource management at the same time I’m trying to test it.
June 6, 2013 at 3:06 pm #1017424lookbadgersParticipantI’ve created a new recording in mono but still the same problem.
Thank you for the suggestion about testing RapidEars I will have to try that at some point.
June 6, 2013 at 3:17 pm #1017425Halle WinklerPolitepixIn that case the issue is going to be related to something else about the app setup. Output from SaveThatWave is known to work so I would just get it from there and if that doesn’t work I would look at your OpenEarsEventsObserver delegate setup.
-
AuthorPosts
- You must be logged in to reply to this topic.