RapidEars is a plugin for OpenEars which extends its functionality in one specific way, by doing recognition in realtime as the user is speaking instead of performing recognition after the user has completed their speech and paused for a certain period of time. To learn about how OpenEars uses a vocabulary that you define dynamically at runtime from an array of words or phrases, there is a good explanation in the documentation about the basics of offline speech recognition from the main OpenEars page and examples in the OpenEars tutorial and the sample app that ships with the OpenEars distribution.
The short answer is that all speech recognition uses a predefined list of words in a text file or memory structure of some kind. However, cloud-based speech recognition is able to use a much larger word list than recognition that is being performed on a handheld device because of the greater CPU and memory resources available, meaning that it can give the impression of recognizing “everything a user says”, while an offline SDK such as OpenEars needs to be used with a smaller chosen word set that applies to the specific task of the application rather than being generalized to any application.
]]>see my rejecto module
LanguageModelGenerator *lmGenerator = [[LanguageModelGenerator alloc] init];
NSArray *words = [[NSArray alloc] initWithArray:[NSArray arrayWithObjects:@”CUSTOMONE”,@”CUSTOMTWO”,@”CUSTOMTHREE”,@”THANKYOUSCREEN”,
@”HOMESCREEN”,
@”TOOCLOSESCREEN”,
nil]];
NSString *name = @”NameIWantForMyLanguageModelFiles”;
NSError *err = [lmGenerator generateRejectingLanguageModelFromArray:words
withFilesNamed:name withOptionalExclusions:nil usingVowelsOnly:TRUE withWeight:nil forAcousticModelAtPath:[AcousticModel pathToModel:@”AcousticModelEnglish”]];
Mano
]]>Speech recognition is a complex application and depending on the requirements of the app, Rejecto may or may not work right out of the box in the way you’re expecting. It isn’t a scam or a programming error, it’s just the usual challenges of machine perception that are the reasons we don’t have perfect universal speech recognition working without the network in our phones already.
The language model generation command has several arguments that are designed to let its behavior be customized to a particular vocabulary for best rejection performance, which you can read about in its documentation. The argument you might want to check out first is withWeight:.
In general, offline speech in an app is the kind of project where you’ll be happier with the results if you go into it expecting to spend a bit of time testing, refining, asking constructive questions and reading the docs, because every application is different and approaches that work well for one might need to be altered for the next. Thanks for giving Rejecto a try!
]]>