Duplicate words in dictionary

This topic has 2 replies, 2 voices, and was last updated 8 years, 2 months ago by Halle Winkler.

Viewing 3 posts - 1 through 3 (of 3 total)

Advertisement: “Rejecto is a plugin for OpenEars™ and RapidEars that lets you ignore speech that isn't in your vocabulary!”

Author

Posts
February 6, 2016 at 9:18 pm #1027839

touchapptech
Participant

I’ve found that if I add the certain words, such as “CLOSE” or “RESUME” to the array passed to generateLanguageModelFromArray or the dictionary passed to generateGrammarFromDictionary, when Pocketshpinx starts listening, I get this:

2016-02-06 13:42:30.115 myapp[1048:661228] Project has these words or phrases in its dictionary:
CLOSE
CLOSE(2)

or

2016-02-06 13:45:00.021 myapp[1060:662742] Project has these words or phrases in its dictionary:
RESUME
RESUME(2)
RESUME(3)

Even tested it just adding the single word with no other words, and still get the duplicates. Not all words trigger this, but some others that I’ve found that do are “FAVORITES”, “TV”, “ENTER” and “EXIT”, but I’m guessing there are probably others. Don’t know what I’m doing that is causing this. I can give you the complete logs if you need, but thought you might know off the top of your head what was causing this, and if it is a problem or just something I should ignore. Trying to keep the language model as small as possible, so would like to figure out how to keep the duplicates out.

Thanks!

February 6, 2016 at 9:27 pm #1027840

touchapptech
Participant

So was just looking at the words again and realize there are two pronunciations of “close” and two (that I can think of) for “resume”. So guess that accounts for some of it. But not sure why it would cause duplicates for “exit” or “enter” or “favorites”? And is there a way even for a word like “close” (in my case it is as in “close the door”) that I can keep it from generating the duplicates?

Thanks!

February 6, 2016 at 9:57 pm #1027841

Halle Winkler
Politepix

Hi,

This is correct behavior. If there is an alternate pronunciation in the dictionary that means that there is a common accent that uses it, so if you just include the ones which correspond to your own accent, other users will be excluded. It shouldn’t lead to any unwanted behavior. All recognitions of the various pronunciations will be returned as just the word itself in the hypothesis without the (2) or (3) in the word, since that part is managed by the grammar or language model (which only has the one textual representation) rather than the pronunciation dictionary.
Author

Posts

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.