- This topic has 11 replies, 2 voices, and was last updated 10 years ago by Halle Winkler.
-
AuthorPosts
-
November 18, 2012 at 4:25 pm #13072radox1Participant
I am working on a financial application where I would like the user to be able to input large numbers using one voice. For example I would like a user to be able to input their salary as “twenty eight thousand five hundred” rather than “two eight five zero zero zero”.
I have looked around online for a number grammar which can support this but I have been unable to find one. As I imagine this is a common requirement I thought a grammar for this would be readily available. Could someone please point me in the right direction?
Thanks in advance.
November 18, 2012 at 4:28 pm #13073Halle WinklerPolitepixHello,
I’m not aware of a pre-rolled grammar for large numbers, sorry. I generally recommend not using JSGF due to slow performance and what seems like slightly buggy recognition in the engine. Have you tried generating a text corpus of number words and creating your own ARPA language model (like in this blog post: https://www.politepix.com/2012/11/02/openears-tips-1-create-a-language-model-before-runtime-from-a-text-file/)?
November 18, 2012 at 6:26 pm #13074radox1ParticipantHi Halle,
Thanks for the link. The text corpus to detect all of the possible numbers is going to be fairly large. Do you have any advice on then going back from the recognised strings to numbers?
Ben
November 18, 2012 at 9:24 pm #13075Halle WinklerPolitepixI’ve never thought about this task so this is not coming from a position of experience with it, but if the maximum is (for instance) 999,999 this seems to me that it would need [0-9], a set of tens incrementing by ten going up to “90”, a set of hundreds incrementing by 100 going up to “900”, and a set of thousands incrementing by 1000 going up to “9000”, so a model with a base set of 40 unigrams which have equal probability of being found in a particular bigram or trigram. Out of that you can make 999,999 with the available words “nine hundred”, “ninety” “nine thousand” “nine hundred” “ninety” “nine”. It seems that interpreting this back into digits should be possible to construct a ruleset for since there are only a few variations on correct statement of a number in English. I can also see why you would want a grammar, however, to have a rules-based recognition that you can be more confident about processing backwards into digits.
November 18, 2012 at 11:01 pm #13077radox1ParticipantI have tried to implement something similar and it seems to be working fairly well.
I have included “and” as this is often used within numbers. “nine hundred and eight one”.
One issue I am having is that “thirty” “fifty” and “eighty” are often wrongly identified as each other.
I will try adding “one hundred”, “two hundred” … into the grammar as this should make it slightly easier to parse.
–Current grammar—
ONE
TWO
THREE
FOUR
FIVE
SIX
SEVEN
EIGHT
NINE
TEN
ELEVEN
TWELVE
THIRTEEN
FOURTEEN
FIFTEEN
SIXTEEN
SEVENTEEN
EIGHTEEN
NINETEEN
TWENTY
THIRTY
FOURTY
FIFTY
SIXTY
SEVENTY
EIGHTY
NINETY
HUNDRED
THOUSAND
MILLION
POUND
PEE
PENCE
ANDNovember 18, 2012 at 11:06 pm #13078Halle WinklerPolitepixLooks like a good start. There might be an accent bias hurting accuracy since the default acoustic model is comprised of US speech. You might want to adapt the model to a variety of UK accents using your number set as the speech corpus. This may get you some improvement with the thirty/fifty/eighty issue.
November 19, 2012 at 12:07 am #13080radox1ParticipantHalle how would I go about using my number set as a speech corpus?
November 19, 2012 at 12:19 am #13081Halle WinklerPolitepixTo learn about how an acoustic model is adapted you probably want to check out the CMU Sphinx project, since that isn’t something I can support from here beyond pointing you to the docs at the CMU project since it isn’t part of OpenEars: http://cmusphinx.sourceforge.net/wiki/tutorialadapt
The corpus of speech you would want to use in order to adapt to a UK accent for your particular application would have a number of different speakers with the desired UK accents saying the words for which you want more accuracy (I would have them say all of the words in your language model). Basically you will want to make recordings of your speakers saying the words and then you will use the acoustic model adaptation method linked above to integrate their speech into the acoustic model. The result ought to be that your adapted acoustic model will get better at recognizing/distinguishing between those words in the accents you include. The acoustic model you end up with can be used with OpenEars just like the default acoustic model.
November 19, 2012 at 1:44 am #13082radox1ParticipantThanks for the link. I will definitely look into that!
One more thing. Is there a way to queue things to be spoken?
Currently if I request the fliteController to say something whilst it is already talking it ignore it. Ideally i’d like it to queue the request and start it when the previous speech has stopped. Will I need to manually implement this behaviour?
November 19, 2012 at 8:26 am #13083Halle WinklerPolitepixThis isn’t a feature of FliteController, but NeatSpeech operates with a queue and it renders the new speech in the background so that it generally starts playing instantly when the previous speech is complete, and it has a male and female UK voice.
April 21, 2014 at 3:31 pm #1020916Halle WinklerPolitepixPlease check out the new dynamic generation language for OpenEars added with version 1.7: https://www.politepix.com/2014/04/10/openears-1-7-introducing-dynamic-grammar-generation/
April 24, 2014 at 6:07 pm #1021025Halle WinklerPolitepixIn addition to the dynamic grammar generation that has been added to stock OpenEars in version 1.7, there is also a new plugin called RuleORama which can use the same API in order to generate grammars which are a bit faster and compatible with RapidEars: https://www.politepix.com/ruleorama
-
AuthorPosts
- You must be logged in to reply to this topic.