Custom Acoustic Model – Missing g2p file

This topic has 3 replies, 2 voices, and was last updated 7 years, 12 months ago by Halle Winkler.

Viewing 4 posts - 1 through 4 (of 4 total)

Advertisement: “RuleORama is an OpenEars™ plugin that lets you create rules-based grammars for fixed phrase recognition, fast enough for RapidEars!”

Author

Posts

April 25, 2016 at 11:10 pm #1030187

Powerkey

Participant

I am trying to create a custom AcousticModel for my app. I simply replaced the contents of the LMGLL.text file in the AcousticModelEnglish bundle with my manually generated list of names.

My corpus text contains entries like…

John-Smith
Jane-Doe

My modified LookupList contains entries like…

John-Smith<tab>JH AA N S M IH TH
Jane-Doe<tab>JH EY N D OW

When the app runs I get the message that the word John-Smith does not exist in the dictionary and that it is using the fallback method to generate graphemes.

2016-04-25 13:43:46.908 TalkApp[9674:5082253] Starting OpenEars logging for OpenEars version 2.501 on 64-bit device (or build): iPhone running iOS version: 9.300000
2016-04-25 13:43:46.971 TalkApp[9674:5082253] Starting dynamic language model generation
.
INFO: ngram_model_arpa_legacy.c(504): ngrams 1=1027, 2=2050, 3=1025
INFO: ngram_model_arpa_legacy.c(136): Reading unigrams
INFO: ngram_model_arpa_legacy.c(543):     1027 = #unigrams created
INFO: ngram_model_arpa_legacy.c(196): Reading bigrams
INFO: ngram_model_arpa_legacy.c(561):     2050 = #bigrams created
INFO: ngram_model_arpa_legacy.c(562):        3 = #prob2 entries
INFO: ngram_model_arpa_legacy.c(570):        3 = #bo_wt2 entries
INFO: ngram_model_arpa_legacy.c(293): Reading trigrams
INFO: ngram_model_arpa_legacy.c(583):     1025 = #trigrams created
INFO: ngram_model_arpa_legacy.c(584):        2 = #prob3 entries
INFO: ngram_model_dmp_legacy.c(521): Building DMP model...
INFO: ngram_model_dmp_legacy.c(551):     1027 = #unigrams created
INFO: ngram_model_dmp_legacy.c(652):     2050 = #bigrams created
INFO: ngram_model_dmp_legacy.c(653):        3 = #prob2 entries
INFO: ngram_model_dmp_legacy.c(660):        3 = #bo_wt2 entries
INFO: ngram_model_dmp_legacy.c(664):     1025 = #trigrams created
INFO: ngram_model_dmp_legacy.c(665):        2 = #prob3 entries
2016-04-25 13:43:47.018 TalkApp[9674:5082253] Done creating language model with CMUCLMTK in 0.046206 seconds.
2016-04-25 13:43:47.019 TalkApp[9674:5082253] Since there is no cached version, loading the language model lookup list for the acoustic model called AcousticModelEnglish
2016-04-25 13:43:47.020 TalkApp[9674:5082253] The word John-Smith was not found in the dictionary of the acoustic model /Users/powerkey/Library/Developer/CoreSimulator/Devices/504E24B0-3556-4CFE-BAA8-E316926491B2/data/Containers/Bundle/Application/F11E5786-82FE-4C84-8A6D-5DF547950513/TalkApp.app/AcousticModelEnglish.bundle. Now using the fallback method to look it up. If this is happening more frequently than you would expect, likely causes can be that you are entering words in another language from the one you are recognizing, or that there are symbols (including numbers) that need to be spelled out or cleaned up, or you are using your own acoustic model and there is an issue with either its phonetic dictionary or it lacks a g2p file. Please get in touch at the forums for assistance with the last two possible issues.
2016-04-25 13:43:47.020 TalkApp[9674:5082253] Using convertGraphemes for the word or phrase john which doesn't appear in the dictionary
2016-04-25 13:43:47.021 TalkApp[9674:5082253] Using convertGraphemes for the word or phrase smith which doesn't appear in the dictionary
2016-04-25 13:43:47.022 TalkApp[9674:5082253] the graphemes "JH AA N S M IH TH" were created for the word John-Smith using the fallback method.

My expectations are that if the word in the corpus matches the word in the LookupList then the grapheme will be generated using the pronunciations in the LookupList. I seem to be missing something.

I also tried creating an AcousticModelCustom (leaving the AcousticModelEnglish alone) with my custom names, but now I get a more messages regarding a missing g2p file.

2016-04-25 14:04:30.109 TalkApp[9714:5137214] Starting OpenEars logging for OpenEars version 2.501 on 64-bit device (or build): iPhone running iOS version: 9.300000
2016-04-25 14:04:30.173 TalkApp[9714:5137214] Starting dynamic language model generation
.
INFO: ngram_model_arpa_legacy.c(504): ngrams 1=1028, 2=2051, 3=1026
INFO: ngram_model_arpa_legacy.c(136): Reading unigrams
INFO: ngram_model_arpa_legacy.c(543):     1028 = #unigrams created
INFO: ngram_model_arpa_legacy.c(196): Reading bigrams
INFO: ngram_model_arpa_legacy.c(561):     2051 = #bigrams created
INFO: ngram_model_arpa_legacy.c(562):        3 = #prob2 entries
INFO: ngram_model_arpa_legacy.c(570):        3 = #bo_wt2 entries
INFO: ngram_model_arpa_legacy.c(293): Reading trigrams
INFO: ngram_model_arpa_legacy.c(583):     1026 = #trigrams created
INFO: ngram_model_arpa_legacy.c(584):        2 = #prob3 entries
INFO: ngram_model_dmp_legacy.c(521): Building DMP model...
INFO: ngram_model_dmp_legacy.c(551):     1028 = #unigrams created
INFO: ngram_model_dmp_legacy.c(652):     2051 = #bigrams created
INFO: ngram_model_dmp_legacy.c(653):        3 = #prob2 entries
INFO: ngram_model_dmp_legacy.c(660):        3 = #bo_wt2 entries
INFO: ngram_model_dmp_legacy.c(664):     1026 = #trigrams created
INFO: ngram_model_dmp_legacy.c(665):        2 = #prob3 entries
2016-04-25 14:04:30.202 TalkApp[9714:5137214] Done creating language model with CMUCLMTK in 0.028684 seconds.
2016-04-25 14:04:30.203 TalkApp[9714:5137214] Since there is no cached version, loading the language model lookup list for the acoustic model called AcousticModelCustom
2016-04-25 14:04:30.204 TalkApp[9714:5137214] Since there is no cached version, loading the g2p model for the acoustic model called AcousticModelCustom
2016-04-25 14:04:30.204 TalkApp[9714:5137214] Error: an attempt was made to load the g2p file for the acoustic model at the path /Users/powerkey/Library/Developer/CoreSimulator/Devices/504E24B0-3556-4CFE-BAA8-E316926491B2/data/Containers/Bundle/Application/D106895C-2174-40BB-AABA-FF2145542035/TalkApp.app/AcousticModelCustom.bundle and it wasn't possible to complete.  This file does not appear to exist. Please ask for help in the forums and be sure to turn on all logging. An exception or unpredictable behavior should be expected now since this file is a requirement.
2016-04-25 14:04:30.204 TalkApp[9714:5137214] Error: a g2p is missing in a case where one will be needed. Expect an exception shortly. If you need help getting a new acoustic model set up with a g2p please come by the forums and inquire.
2016-04-25 14:04:30.205 TalkApp[9714:5137214] The word John-Smith was not found in the dictionary of the acoustic model /Users/powerkey/Library/Developer/CoreSimulator/Devices/504E24B0-3556-4CFE-BAA8-E316926491B2/data/Containers/Bundle/Application/D106895C-2174-40BB-AABA-FF2145542035/TalkApp.app/AcousticModelCustom.bundle. Now using the fallback method to look it up. If this is happening more frequently than you would expect, likely causes can be that you are entering words in another language from the one you are recognizing, or that there are symbols (including numbers) that need to be spelled out or cleaned up, or you are using your own acoustic model and there is an issue with either its phonetic dictionary or it lacks a g2p file. Please get in touch at the forums for assistance with the last two possible issues.
2016-04-25 14:04:30.205 TalkApp[9714:5137214] the graphemes "" were created for the word John-Smith using the fallback method.

April 26, 2016 at 8:50 am #1030193

Halle Winkler

Politepix

Hi,

The only modification I support is adding entries to an existing acoustic model lookup list in the alphabetically-correct location, but not altering the bundle contents or removing entries from the lookup list, sorry.

April 26, 2016 at 6:40 pm #1030195

Powerkey

Participant

Okay. I think I can work within that process, but I have a few questions to make sure I understand the details.

1. Will pocketsphinx only recognize full names if my corpus contains only full names?

2. Does the fallback method utilize the lookup list in AcousticModelEnglish? i.e. Would an error in the English lookup list cause problems with the fallback method?

3. If I duplicate (in the Finder) the AcousticModelEnglish.bundle, rename it to AcousticModelCustom.bundle, add it to my project and point the pathToModel method to the Custom bundle, would you expect that to work? Or, should I just modify the lookup list in the English bundle?

April 26, 2016 at 7:07 pm #1030196

Halle Winkler

Politepix

Hi,

1. Will pocketsphinx only recognize full names if my corpus contains only full names?

I think we already talked this one through in your previous question, but if we haven’t, clarify it a little more with reference to your previous questions so I can understand what differentiates it, thanks.

2. Does the fallback method utilize the lookup list in AcousticModelEnglish? i.e. Would an error in the English lookup list cause problems with the fallback method?

Sorry, the question is a bit outside of the scope of support here – make sure that you only add entries to the lookup list which are valid and in the alphabetically-correct position so there is no need to discuss acoustic model failure states. If your changes to the lookup list lead to functionality issues you should remove them.

3. If I duplicate (in the Finder) the AcousticModelEnglish.bundle, rename it to AcousticModelCustom.bundle, add it to my project and point the pathToModel method to the Custom bundle, would you expect that to work? Or, should I just modify the lookup list in the English bundle?

That should work fine.

Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.