Won't Recognize Q, CUE, or QUEUE

This topic has 10 replies, 2 voices, and was last updated 11 years, 2 months ago by Halle Winkler.

Viewing 11 posts - 1 through 11 (of 11 total)

Advertisement: “NeatSpeech is great-sounding offline speech synthesis, compatible with iOS6.1, and you can even edit pronunciations!”

Author

Posts
February 21, 2013 at 11:03 pm #1015706

giebler
Participant

I can’t get the program to recognize any of these: Q, QUEUE or CUE.

Can you offer any hints? In a business usage, we will be using Q1, Q2, Q3 and Q4.

The program does recognize “QUARTER”, but we can’t force users to say that.

We’re buying Rejecto next week to help eliminate wrong words, but right now we’re having trouble with the right words.

Thanks!

February 21, 2013 at 11:15 pm #1015707
Halle Winkler
Politepix
Hmm, single letters that rhyme with other single letters are very challenging for recognition.

Since you already know the number of required quarters, something sneaky you can try is to have Q1 etc be the entire word, that is, instead of trying to recognize the combination of Q and 1, you will have a word “Q1” in your language model and you’ll edit the dictionary used so that the entry for Q1 reads as follows:
```
Q1   K Y UW W AH N
```
You’d do this for each quarter. Having the multiple syllable/sound combinations available for distinguishing between the quarters should make them recognizable.
February 21, 2013 at 11:17 pm #1015708

Halle Winkler
Politepix

Here’s the skinny on making custom language models before runtime:

https://www.politepix.com/2012/11/02/openears-tips-1-create-a-language-model-before-runtime-from-a-text-file/

and editing your phonetic dictionary:

https://www.politepix.com/2012/12/04/openears-tips-and-tricks-5-customizing-the-master-phonetic-dictionary-or-using-a-new-one/

February 22, 2013 at 3:25 pm #1015711

giebler
Participant

Here’s what (and where) I put in the .dic file:

Q.S K Y UW Z
Q1 K Y UW W AH N
Q2 K Y UW T UW
Q3 K Y UW TH R IY
Q4 K Y UW F AO R
QANA K AA N AH

I had to edit the .dic file in Hex since at first Xcode put spaces instead of a tab.

It still doesn’t recognize Q1, Q2, Q3 or Q4.

It comes out “Two One” or “U One” no matter how clearly I speak.

I need both “Two” and “U” (U.S.) in my recognition file.

Any other thoughts? I don’t know what else to do. Would Rejecto help?

February 22, 2013 at 3:49 pm #1015712

Halle Winkler
Politepix

The issue of doing recognition with several individual words that are only a syllable long and all rhyme with each other is not a satisfactorily-solved issue in speech recognition. This is another variation of the general issue of recognition of the English alphabet, which you can read people trying to find fixes for in every speech-recognition-related resource, unfortunately. There is no contextual cue for which one is the “real one” in the case you’re describing so as soon as there is any distance from the mic, the sounds are going to get mixed up.

The strategy for dealing with it is going to be some combination of removing confusing words from the model and fusing multiple words together that you know will be spoken together.

An example is that you don’t need the loose letter “U” if its presence there is just in order to let “U.S.” be recognized. In that case, make the word “U.S.”:

U.S. Y UW AH S

This will also improve the accuracy of words that are spoken near utterances of “U.S.”.

The next issue I see is that the “Q1” etc segment has a couple of obscure words before and after it, which suggests to me that this is a big language model. Do you have the opportunity to switch between smaller, more contextually-specific language models?

Can you do counting in either its own language model that you switch to, or with some kind of prefix? e.g. “Category 2” instead of just “2”.

The last thing is that you haven’t shown the entry in the language model or the pocketsphinx logging output, so I don’t know for sure whether your alteration is actually in your language model as far as pocketsphinx is concerned. If you remove “U” and “2”, are you able to recognize “Q1”? If not, there might be an issue in the language model in general.

In case you have confirmed that the language model is OK, and none of these approaches are options for you (although they are almost always options for an app that you can make design decisions about), the last possibility is to do it as a JSGF ruleset rather than a statistical ARPA model. Searching this forum for JSGF should help you get started.

February 22, 2013 at 3:55 pm #1015713

giebler
Participant

I also can’t get it to recognize our company name (IMS) which I also added to the .dic file as shown here:

IMRIE IH M ER IY
IMS AY EH M EH S
IMUS AY M AH S

Any suggestions for this one?

Thanks!

February 22, 2013 at 3:59 pm #1015714

giebler
Participant

I’m adding these entries to the cmu07a.dic file and then generating my .dic file by adding Q1,Q2,Q3,Q4 and IMS to the language array and generating it. I’ll download my language file to make sure they ended up there…

February 22, 2013 at 4:01 pm #1015715

Halle Winkler
Politepix

Yes, step one is definitely making sure that these new words are present in your language model and phonetic dictionary. Also, turn on verbosePocketsphinx so you receive any complaints from pocketsphinx about your language model or dictionary.

February 22, 2013 at 4:02 pm #1015716

Halle Winkler
Politepix

Also turn on OpenEarsLogging and verboseCMUCLMTK so you get any relevant output from the process of generating the language models.

February 22, 2013 at 4:18 pm #1015717

giebler
Participant

Even though I was generating a new .dic file, it was failing to copy it to the proper folder and still using the old one! Once I discovered that, your suggestions for Q1,Q2,Q3,Q4 and IMS are all working!

Thanks!

February 22, 2013 at 4:24 pm #1015718

Halle Winkler
Politepix

Love to hear that :) .
Author

Posts

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic.