'Key Phrase Spotting' Guidance

Home Forums OpenEars plugins 'Key Phrase Spotting' Guidance

Viewing 6 posts - 1 through 6 (of 6 total)

  • Author
  • #1027490

    Hello Halle –

    First of all, once again, thank you for the awesome work on these plugins. I have really enjoyed using them in my app. We are down to the ‘nutty gritty’ and trying to ensure that everything is working flawlessly with our app. Thus, my hope to reach out to you with some advice on ‘key phrase spotting’.

    I was hoping to use OpenEars in my app for a ‘key phrase’ spotting task of recognizing the phrase ‘Hello Focus’.

    Now, when I add the phrase ‘Hello Focus’ to the regular language model, it mostly works, but occasionally only hears one word of the phrase : ‘hello’ or ‘focus’. What I like about the standard language model is our ability to use Rejecto so that we ignore invalid speech.

    At the moment, I am using RuleORama to detect the phrase ‘Hello Focus’, because it is close to flawless when I speak it. Unfortunately, it allows a broad array of false positives. i.e. ‘Yellow Mocus’ sets it off. So, the problem that we are running into is that regular speech sets off the keyword.
    This happened in a demo this morning, which was pretty much the only issue with the app. So, this is obviously my top priority for development.

    That being said, we have a FluentSoft license, and could use they keyword spotting technology. It is my hope that you have some advice so that we can make this work with OpenEars.

    Any thoughts? Thanks for your help!


    Halle Winkler


    A perfect rhyme with a matching number of syllables at a distance would be expected to result in a low-probability match (and probably wouldn’t be attempted during a demo since there’s probably no reason to try out “yellow mocus” during a demo or expect users to accidentally say it in practice, though I’m just guessing here) so am I right that the issue isn’t about matches for perfect rhymes but other utterances that can commonly occur in real-world usage? So I can help troubleshoot, can you give an example of an utterance that gives a false positive match in real-world practice, that wouldn’t be expected to?


    Hey Halle –

    Thanks for the response. There must be quite a few ‘false positives’ for hello focus, because it does go off “quite a bit” (ambiguous, I know) during conversation. That being said, to the extent that I’ve hunted for false positives, I had the app listening while I had some YouTube videos running and here are some of the phrases that triggered it :

    “Go for it, Mr. Robbins”
    “Shell of this Person”

    Those are the only two phrases that I know for sure, but I can run some more tests and get back to you.

    One thought that I had was maybe to identify as many of these ‘false positives’ that I can and add them to the grammar to be later ignored if triggered. It just doesn’t seem very scientific.

    I know that this has been said before, but I am really looking to keep the spot-on hair-trigger recognition accuracy that I am achieving w/ RuleORama+RapidEars, yet get the exclusivity that I see happening w/ Rejecto.

    Right now, I can only guarantee one or the other, depending on technology. I am trying to find a way to ensure both. Is that too much to ask? (wink, wink). I kid.

    Thanks so much for your help and advice.


    I would also like to add that we are also using RuleORama to listen for “command phrases”. i.e. :
    ‘Text Halle’
    ‘Call Halle’

    For the same reason as above, I have been using RuleORama for this task, because it nearly always hears the phrase and the whole phrase.

    Where we running into problems is w/ false positives, just as in key phrase spotting. For example, if I say ‘Text Spaghetti Monster’ with the above grammar of only ‘Text Halle’ and ‘Call Halle’, it will match it to ‘Text Halle’.

    This is really the missing piece of the puzzle for us, is trying to figure out how to get this listening technology more exclusionary. Right now, it is very forgiving in matching incoming audio to one of the entries in the grammar. I am looking to mimic the ability of Rejecto to throw away unwanted speech.

    Again – any help that you can offer would be appreciated.



    Another phrase that triggers ‘Hello Focus’ is ‘Hero Focus’.
    It’s very close, but not the same. Ideally, we would want the former to trigger and the latter to not.

    Halle Winkler

    Hi Tim,

    Maybe this would be a good case for setting up a replication case for me:


    The reason is that generally when you create a wake-up phrase, the goal is that it shouldn’t sound much like utterances that are likely to be spoken, in order to avoid false positives (although some are inevitable). Many of the utterances you’re reporting will never be spoken or overheard in thousands of hours of incidental speech (hero focus, yellow mocus) so they primarily demonstrate that your choice of wake-up phrase was very good, rather representing a case of false positive that would reward troubleshooting work. If you do extensive testing to try to eliminate rhymes that should be expected to be low-score recognitions and that are very low-likelihood to be spoken, you will unintentionally adjust your results towards false negatives, so it is important with speech applications to test against and debug relatively probable things.

    The overheard video speech is more important, but as you said, it is ambiguous in combination with the specific reports of syllable-matching rhymes that initiated the forum topic. So consider giving me a replication case as shown above so we can both see exactly the same results from a usage case which is probable. Note, it is very unlikely I can review any replication cases until 2016.

    I do have one suggestion for your wake-up phrase which maybe could help you get a slightly more Rule-O-Rama-like result with Rejecto – add your phrase to the LanguagModelGeneratorLookupList.text in the acoustic model in the alphabetically correct place, i.e. right after these entries:


    add these entries:


    Please note that there is a tab between the word and the pronunciation and it has to remain there, with no spaces added to the beginning or end of the line.

    This will not prevent near-rhymes from being heard, but it will allow you to create a language model which includes the “word” HELLOFOCUS that will not be recognized as the words hello and focus separately, meaning that Rejecto can then handle unrelated utterances before and after (but not within, which you don’t want anyway since that will give you false negatives for intentional but imperfect utterances of the phrase). If it works, you should see both of these pronunciations present in your dynamically generated .dic file found in your caches folder.

Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.