Large Number Grammar – Politepix

Large Number Grammar

radox1 — Sun, 18 Nov 2012 15:25:08 +0000

I am working on a financial application where I would like the user to be able to input large numbers using one voice. For example I would like a user to be able to input their salary as “twenty eight thousand five hundred” rather than “two eight five zero zero zero”.

I have looked around online for a number grammar which can support this but I have been unable to find one. As I imagine this is a common requirement I thought a grammar for this would be readily available. Could someone please point me in the right direction?

Thanks in advance.

Reply To: Large Number Grammar

Halle Winkler — Sun, 18 Nov 2012 15:28:52 +0000

Hello,

I’m not aware of a pre-rolled grammar for large numbers, sorry. I generally recommend not using JSGF due to slow performance and what seems like slightly buggy recognition in the engine. Have you tried generating a text corpus of number words and creating your own ARPA language model (like in this blog post: /2012/11/02/openears-tips-1-create-a-language-model-before-runtime-from-a-text-file/)?

Reply To: Large Number Grammar

radox1 — Sun, 18 Nov 2012 17:26:12 +0000

Hi Halle,

Thanks for the link. The text corpus to detect all of the possible numbers is going to be fairly large. Do you have any advice on then going back from the recognised strings to numbers?

Ben

Reply To: Large Number Grammar

Halle Winkler — Sun, 18 Nov 2012 20:24:24 +0000

I’ve never thought about this task so this is not coming from a position of experience with it, but if the maximum is (for instance) 999,999 this seems to me that it would need [0-9], a set of tens incrementing by ten going up to “90”, a set of hundreds incrementing by 100 going up to “900”, and a set of thousands incrementing by 1000 going up to “9000”, so a model with a base set of 40 unigrams which have equal probability of being found in a particular bigram or trigram. Out of that you can make 999,999 with the available words “nine hundred”, “ninety” “nine thousand” “nine hundred” “ninety” “nine”. It seems that interpreting this back into digits should be possible to construct a ruleset for since there are only a few variations on correct statement of a number in English. I can also see why you would want a grammar, however, to have a rules-based recognition that you can be more confident about processing backwards into digits.

Reply To: Large Number Grammar

radox1 — Sun, 18 Nov 2012 22:01:08 +0000

I have tried to implement something similar and it seems to be working fairly well.

I have included “and” as this is often used within numbers. “nine hundred and eight one”.

One issue I am having is that “thirty” “fifty” and “eighty” are often wrongly identified as each other.

I will try adding “one hundred”, “two hundred” … into the grammar as this should make it slightly easier to parse.

–Current grammar—

ONE
TWO
THREE
FOUR
FIVE
SIX
SEVEN
EIGHT
NINE
TEN
ELEVEN
TWELVE
THIRTEEN
FOURTEEN
FIFTEEN
SIXTEEN
SEVENTEEN
EIGHTEEN
NINETEEN
TWENTY
THIRTY
FOURTY
FIFTY
SIXTY
SEVENTY
EIGHTY
NINETY
HUNDRED
THOUSAND
MILLION
POUND
PEE
PENCE
AND

Reply To: Large Number Grammar

Halle Winkler — Sun, 18 Nov 2012 22:06:41 +0000

Looks like a good start. There might be an accent bias hurting accuracy since the default acoustic model is comprised of US speech. You might want to adapt the model to a variety of UK accents using your number set as the speech corpus. This may get you some improvement with the thirty/fifty/eighty issue.

Reply To: Large Number Grammar

radox1 — Sun, 18 Nov 2012 23:07:13 +0000

Halle how would I go about using my number set as a speech corpus?

Reply To: Large Number Grammar

Halle Winkler — Sun, 18 Nov 2012 23:19:16 +0000

To learn about how an acoustic model is adapted you probably want to check out the CMU Sphinx project, since that isn’t something I can support from here beyond pointing you to the docs at the CMU project since it isn’t part of OpenEars: http://cmusphinx.sourceforge.net/wiki/tutorialadapt

The corpus of speech you would want to use in order to adapt to a UK accent for your particular application would have a number of different speakers with the desired UK accents saying the words for which you want more accuracy (I would have them say all of the words in your language model). Basically you will want to make recordings of your speakers saying the words and then you will use the acoustic model adaptation method linked above to integrate their speech into the acoustic model. The result ought to be that your adapted acoustic model will get better at recognizing/distinguishing between those words in the accents you include. The acoustic model you end up with can be used with OpenEars just like the default acoustic model.

Reply To: Large Number Grammar

radox1 — Mon, 19 Nov 2012 00:44:25 +0000

Thanks for the link. I will definitely look into that!

One more thing. Is there a way to queue things to be spoken?

Currently if I request the fliteController to say something whilst it is already talking it ignore it. Ideally i’d like it to queue the request and start it when the previous speech has stopped. Will I need to manually implement this behaviour?

Reply To: Large Number Grammar

Halle Winkler — Mon, 19 Nov 2012 07:26:28 +0000

This isn’t a feature of FliteController, but NeatSpeech operates with a queue and it renders the new speech in the background so that it generally starts playing instantly when the previous speech is complete, and it has a male and female UK voice.