Sentences, Word Order and Pauses

This topic has 13 replies, 2 voices, and was last updated 10 years, 10 months ago by Halle Winkler.

Viewing 14 posts - 1 through 14 (of 14 total)

Advertisement: “Did you know OpenEars™ can use rules-based grammars to recognize fixed phrases? And RuleORama lets you use them with RapidEars!”

Author

Posts
May 30, 2013 at 11:05 am #1017333

lookbadgers
Participant

I understand that OpenEars uses the previous word to help detection with the next word when working with sentences.

If I had the corpus “THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG”

If it heard “THE QUICK” it increases the probability the next sound will be BROWN, I was wondering how much more does it increase the probability that the next sound will be matched with BROWN?

If there was to be a pause between hearing “THE QUICK” and BROWN does this increased probability still apply?

If there are likely to be pauses between words, is it better to have a corpus that contains the sentence as a whole and the individual words each on a new line?

May 30, 2013 at 11:44 am #1017334

Halle Winkler
Politepix

There isn’t an answer in the form of “it’s 20% more likely” because it is dependent on the overall number of words as well as how many composed phrases were submitted to be turned into a language model. But I can tell you what is happening under the hood.

Pocketsphinx as used by OpenEars takes into account unigrams (single words) bigrams (word pairs) and trigrams (word triplets) in language models. These are indicated in the language model as 1-gram, 2-gram and 3-gram and are collectively referred to as n-grams. When a set of individual words are submitted (meaning no composed phrases at all, just single words by themselves), the likelihood of each bigram and trigram is equal, meaning that all combinations of the words that can be expressed as a pair or a triplet are equally likely. When a phrase is submitted (taking yours as an example), every bigram and trigram that occurs within the phrase is more likely than word bigrams and trigrams composed of word combinations which do not appear within the phrase. The ~~and~~ symbols, indicating the beginning and end of an utterance, are also taken into account for probabilities so they appear in all of the n-gram sets. They are added automatically; you don’t do anything about them.

With this information in mind about how it is working, it should be possible for you to construct a test which answers your question for the specific app you are developing, which I can’t really advise you about from the info in the description.

May 30, 2013 at 12:26 pm #1017335

lookbadgers
Participant

Thank you for the reply. I now understand why it’s not an exact probability for all apps but instead a formula based on number of words in a sentence.

With that in mind does a previous utterance have any impact on the next?

For example the pause between “THE QUICK” and “BROWN”. Or is the second utterance “BROWN” treated as a 1-gram and the previous 2-gram “THE QUICK” disregarded.

I’ve noticed sometime users have been hesitating between words in expected phrases and I am trying to find out if this has been reducing the quality of the perceived hypothesis after the pause.

May 30, 2013 at 12:34 pm #1017336
Halle Winkler
Politepix
You have to construct a test for your app in order to see if recognition accuracy is reduced as a result of pauses. It isn’t difficult: you can turn on verbose output and use this method to submit recordings with and without pauses and look at the results:
```
- (void) runRecognitionOnWavFileAtPath: (NSString *) wavPath usingLanguageModelAtPath: (NSString *) languageModelPath
dictionaryAtPath: (NSString *) dictionaryPath languageModelIsJSGF: (BOOL) languageModelIsJSGF
```
If you need to gather input recordings you can make them using the SaveThatWave demo.
May 30, 2013 at 12:41 pm #1017337

lookbadgers
Participant

Thank you, I will give that a try. I was just hoping the question may have already been answered. I will try and post my findings when I get time to test.

May 30, 2013 at 12:48 pm #1017338

Halle Winkler
Politepix

That’s great — please share at least a little information about your app vocabulary (you don’t have to tell the exact words, but give us info about vocabulary size, number of phrases, and a similar kind of phrase to the one you’re testing) and things like the mic used and the length of the pauses and environmental factors so a reader can get the big picture. I don’t think this is something that will behave the same with every app or under every circumstance (which is why I can’t answer it off the cuff) so it would be most helpful to hear about the context for the findings.

June 5, 2013 at 5:02 pm #1017402

lookbadgers
Participant

Just trying to get this up and running. I can get SaveTheWave to work in the sample app. However in my test application I get undefined symbols for architecture armv7

“_OBJC_CLASS_$_SaevTheWaveController”, referenced from:
objc-class-ref in…
ld: symbol(s) not found for architecture armv7
clang error: link command failed with exit code 1

I have checked “Other Linker Flags” have the linker flag “-ObjC”:

June 5, 2013 at 5:06 pm #1017403

Halle Winkler
Politepix

Hi,

You misspelled SaveThatWaveController in your project.

June 6, 2013 at 12:59 pm #1017419

lookbadgers
Participant

Sorry that was a typo when writing out the error, the typo did not exist in the code. Anyway that is now working and I can record wav files.

I’m calling the method runRecognitionOnWavFileAtPath but nothing happens.

I give the languageModel and dictionary paths generated by Rejecto.

I recorded the wav file outside of the app in the end, I was wondering if the problem is it’s not the right format?

16-bit PCM
16000Hz
Stereo

June 6, 2013 at 1:08 pm #1017420

Halle Winkler
Politepix

I’m calling the method runRecognitionOnWavFileAtPath but nothing happens.

Do you mean that there is nothing ever returned in the OpenEarsEventsObserver pocketsphinxDidReceiveHypothesis: method?

June 6, 2013 at 1:15 pm #1017421

lookbadgers
Participant

That is correct pocketsphinxDidReceiveHypothesis is never called.

I assume you can’t use RapidEars with SaveTheWave? Which might make the test I’m working on irrelevant.

June 6, 2013 at 1:24 pm #1017422

Halle Winkler
Politepix

It’s probably because the file is stereo. That’s correct, RapidEars doesn’t work on a single pause-bounded utterance in the sense that stock OpenEars does, so it doesn’t have a method for outputting that complete utterance as a WAV.

I construct replicable tests for RapidEars by making a recording and playing it out of a speaker and into the device, cued by a note in the console that says “press play now”. While it is not the most deterministic thing going, it is the most informative approach I can think of that actually replicates real-world behavior without interfering with RapidEars’ resource management at the same time I’m trying to test it.

June 6, 2013 at 3:06 pm #1017424

lookbadgers
Participant

I’ve created a new recording in mono but still the same problem.

Thank you for the suggestion about testing RapidEars I will have to try that at some point.

June 6, 2013 at 3:17 pm #1017425

Halle Winkler
Politepix

In that case the issue is going to be related to something else about the app setup. Output from SaveThatWave is known to work so I would just get it from there and if that doesn’t work I would look at your OpenEarsEventsObserver delegate setup.
Author

Posts

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic.