Tagged: noise problem vadThreshold
- This topic has 21 replies, 2 voices, and was last updated 9 years, 3 months ago by maxgarmar.
-
AuthorPosts
-
January 1, 2015 at 11:21 pm #1024045maxgarmarParticipant
Hi guys,
I was using until now openears 1.66 in an app which is currently working in Spanish and I have to say that I was really happy with the results and the accuracy.
Then some days ago I realized that this version has problems with “ch” middle-phonemes words.
For instance:
“Leche” is working perfectly but however “Lechuga” is impossible to recognize.
Then I read on the changelog the correction on 1.7.1 version about spanish phonemes and I guess this will solve my problem and I will be the happiest guy in the world :D. But I cannot download this version only the 2.0.
I tried the 2.0 and then the words with “ch” in the middle are working now but the accuracy is worse than 1.66 and with a TV making noises in background openears is too sensitive and is crazy how it keeps listening in comparison with 1.66. I read all the post about that and I tried to change the values:[[OEPocketsphinxController sharedInstance] setSecondsOfSilenceToDetect:0.2]; [[OEPocketsphinxController sharedInstance] setVadThreshold:3.0];
but does not seems to do anything, I don’t feel any difference perhaps I am setting the values in the wrong place.
Anyway, could I download or get 1.7.1 version somehow?, would be awesome. I don’t find it in any place. If it is not possible, please could you help me to use 2.0 correctly?.
Thanks in advance ! and really you are doing a good job
January 2, 2015 at 9:18 am #1024048Halle WinklerPolitepixWelcome,
A couple of things that should help.
The first is that you can’t use unrealistically low secondsOfSilenceToDetect values like .1 or .2 with OpenEars 2.0 anymore. There isn’t an advantage to using a pause detection of a length which is less than a speaker’s actual pause (at least .3 seconds if not more) so this has never provided a benefit as a UI for the user and causes actual speech to be interrupted, as well as speech-like sounds to be submitted constantly for recognition. It doesn’t result in a real-time-like user experience because speech still has to be processed as a whole, and it is more frequently in the process of being submitted for recognition, meaning it is likely to not be listening at the time that the user is speaking due to processing.
secondsOfSilenceToDetect values that were significantly shorter than a speaker’s pause have been the cause of almost all of the issues reported for 2.0, so it’s pretty likely I’m going to limit that property to realistic values in upcoming versions. It will need to be increased to a value greater than .2 for 2.0.
From the upgrade guide:
Another API change is that before you start setting properties of OEPocketsphinxController for the first time, it is necessary to call its setActive method: [[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil];
If you haven’t called setActive: and your vadThreshold settings are the first properties you’re setting, this will cause them not to take effect.
Lastly, you can set vadThreshold to a value higher than 3.0 if 3.0 is still causing oversensitive recognition, although this is probably more due to the secondsOfSilenceToDetect setting.
Sorry, it isn’t possible to offer dual support to 1.x and 2.x simultaneously – if there are issues with 2.0 they need to be reported and then they will be fixed.
January 2, 2015 at 10:45 am #1024049maxgarmarParticipantThanks Halle for that fast answer.
Ok I will try the couple of things you told me, but then before I would like to know a couple of things to fully understand how this is working.
First, what are the realistic values for secondsOfSilenceToDetect being the minimum .3 as you told me? What would be the best value for you if I tell you that I’m recognizing maximum 3 words in a phrase but normally it will be just one word?
Another thing is, I would like to know what is the maximum value for the vadThreshold to play with it.
Thanks a lot
January 2, 2015 at 11:59 am #1024050Halle WinklerPolitepixHi,
The maximum value for vadThreshold is 4.0.
I would expect 2.0-3.0 to work fine for most applications as long as the secondsOfSilenceToDetect has a realistic relationship with user pauses.
Regarding secondsOfSilenceToDetect, the smallest value I would expect to begin to give a good user experience is .3, since that is getting towards the average durations that speakers pause between utterances. I would probably expect it to be a bit larger at .4 or maybe .5 since the intention isn’t to only catch the speech of speakers whose pauses are shorter than the average.
If you are ever trying to recognize more than one word at a time this should be a value that is distinctly longer than a normal inter-word pause, since setting it to the same length or shorter means you are insuring that the user will have their speech interrupted and recognized while they are still in the process of active app-directed speech, and that also means their subsequent speech will not be heard by the engine while it is in the process of performing recognition on the partial speech that it interrupted, which is a big downside risk for a speech UX. That’s what the default value of .7 is meant to help with. Probably a value like .5 is still reasonable here to split the difference between good UX in terms of getting all the speech and good UX in terms of reactivity.
I don’t think there is a big gain in setting it to a tiny value purely in terms of the sense of reactiveness, because that latency is only going to be a small percentage of the overall interaction time which includes the time the user spoke, and their actual intended period of silence, and the time that the engine took to process the entire finalized speech, which is likely to be a minimum of 3 seconds overall (assuming that everything else that happens in your UI as a result of the speech is instantaneous), making a question of tenths of a second one way or another not the biggest part of the puzzle in terms of UX.
In some cases I think the idea behind setting it very low is to give a RapidEars-like lack of latency, but RapidEars returns the user speech continuously while it is still in progress, which is just a very different UX to the UX of complete utterances being analyzed after they have been finished, regardless of the secondsOfSilenceToDetect time.
In the absence of a real-time approach like that in RapidEars, I think it’s likely to make speech UI users happier when the secondsOfSilenceToDetect corresponds to their intention to denote that their app-directed speech is finished, versus other events such as a inter-word pause or hesitation.
January 2, 2015 at 3:29 pm #1024053maxgarmarParticipantHi,
regarding what you you told me about where to set the vadThreshold and secondsOfSilenceToDetect, I was setting those values before setActive line, so I would expect that when I move it after this line I would feel a real difference. But I am sorry, nothing changed.
Tests:
vadThreshold = 4.0 (maximun)
secondsOfSilenceToDetect= 0.3and with the TV in background with a normal level of sound, the recognition does not stop … and it recognize 6 or 7 words.. when I just said one.
after this test I changed secondsOfSilenceToDetect to 0.4 and 0.5 but not with better results… I really don’t get what’s going on. What am I doing wrong, Halle?
Could I get somehow 1.7.1 version ? I would like to make tests with it to see the results.
Thanks, I hope we can get this working
January 2, 2015 at 4:03 pm #1024056Halle WinklerPolitepixOK, can you make a replication case for me so I can see what you’re seeing exactly?
What I would need is a recording of your speech (in the environment you’re describing) added to pathToTestFile (there is a commented-out usage of pathToTestFile in the sample app and you can read more about it in OEPocketsphinxController.h) and the changes to the sample app which will demonstrate this issue (I believe the only changes should be vadThreshold, secondsOfSilenceToDetect, the generation of your language model, and the selection of the Spanish acoustic model). Once you can demonstrate the issue to yourself using the sample app, contact me through the contact form with a place to download your recording and a description of your changes to the sample app and I will test it out and get back to you. Sorry you’re having this issue and I promise I’ll check it out.
January 2, 2015 at 9:08 pm #1024060maxgarmarParticipantOk here we go.
This is the complete code I changed on ViewController.m of the test application. You could see how I set the values and everything.
// ViewController.m // OpenEarsSampleApp // // ViewController.m demonstrates the use of the OpenEars framework. // // Copyright Politepix UG (haftungsbeschränkt) 2014. All rights reserved. // https://www.politepix.com // Contact at https://www.politepix.com/contact // // This file is licensed under the Politepix Shared Source license found in the root of the source distribution. // ************************************************************************************************************************************************************** // ************************************************************************************************************************************************************** // ************************************************************************************************************************************************************** // IMPORTANT NOTE: Audio driver and hardware behavior is completely different between the Simulator and a real device. It is not informative to test OpenEars' accuracy on the Simulator, and please do not report Simulator-only bugs since I only actively support // the device driver. Please only do testing/bug reporting based on results on a real device such as an iPhone or iPod Touch. Thanks! // ************************************************************************************************************************************************************** // ************************************************************************************************************************************************************** // ************************************************************************************************************************************************************** #import "ViewController.h" #import <OpenEars/OEPocketsphinxController.h> #import <OpenEars/OEFliteController.h> #import <OpenEars/OELanguageModelGenerator.h> #import <OpenEars/OELogging.h> #import <OpenEars/OEAcousticModel.h> #import <Slt/Slt.h> @interface ViewController() // UI actions, not specifically related to OpenEars other than the fact that they invoke OpenEars methods. - (IBAction) stopButtonAction; - (IBAction) startButtonAction; - (IBAction) suspendListeningButtonAction; - (IBAction) resumeListeningButtonAction; // Example for reading out the input audio levels without locking the UI using an NSTimer - (void) startDisplayingLevels; - (void) stopDisplayingLevels; // These three are the important OpenEars objects that this class demonstrates the use of. @property (nonatomic, strong) Slt *slt; @property (nonatomic, strong) OEEventsObserver *openEarsEventsObserver; @property (nonatomic, strong) OEPocketsphinxController *pocketsphinxController; @property (nonatomic, strong) OEFliteController *fliteController; // Some UI, not specifically related to OpenEars. @property (nonatomic, strong) IBOutlet UIButton *stopButton; @property (nonatomic, strong) IBOutlet UIButton *startButton; @property (nonatomic, strong) IBOutlet UIButton *suspendListeningButton; @property (nonatomic, strong) IBOutlet UIButton *resumeListeningButton; @property (nonatomic, strong) IBOutlet UITextView *statusTextView; @property (nonatomic, strong) IBOutlet UITextView *heardTextView; @property (nonatomic, strong) IBOutlet UILabel *pocketsphinxDbLabel; @property (nonatomic, strong) IBOutlet UILabel *fliteDbLabel; @property (nonatomic, assign) BOOL usingStartingLanguageModel; @property (nonatomic, assign) int restartAttemptsDueToPermissionRequests; @property (nonatomic, assign) BOOL startupFailedDueToLackOfPermissions; // Things which help us show off the dynamic language features. @property (nonatomic, copy) NSString *pathToFirstDynamicallyGeneratedLanguageModel; @property (nonatomic, copy) NSString *pathToFirstDynamicallyGeneratedDictionary; @property (nonatomic, copy) NSString *pathToSecondDynamicallyGeneratedLanguageModel; @property (nonatomic, copy) NSString *pathToSecondDynamicallyGeneratedDictionary; // Our NSTimer that will help us read and display the input and output levels without locking the UI @property (nonatomic, strong) NSTimer *uiUpdateTimer; @end @implementation ViewController #define kLevelUpdatesPerSecond 18 // We'll have the ui update 18 times a second to show some fluidity without hitting the CPU too hard. //#define kGetNbest // Uncomment this if you want to try out nbest #pragma mark - #pragma mark Memory Management - (void)dealloc { [self stopDisplayingLevels]; } #pragma mark - #pragma mark View Lifecycle - (void)viewDidLoad { [super viewDidLoad]; self.fliteController = [[OEFliteController alloc] init]; self.openEarsEventsObserver = [[OEEventsObserver alloc] init]; self.openEarsEventsObserver.delegate = self; self.slt = [[Slt alloc] init]; self.restartAttemptsDueToPermissionRequests = 0; self.startupFailedDueToLackOfPermissions = FALSE; // [OELogging startOpenEarsLogging]; // Uncomment me for OELogging, which is verbose logging about internal OpenEars operations such as audio settings. If you have issues, show this logging in the forums. //[OEPocketsphinxController sharedInstance].verbosePocketSphinx = TRUE; // Uncomment this for much more verbose speech recognition engine output. If you have issues, show this logging in the forums. [self.openEarsEventsObserver setDelegate:self]; // Make this class the delegate of OpenEarsObserver so we can get all of the messages about what OpenEars is doing. [[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil]; // Call this before setting any OEPocketsphinxController characteristics [[OEPocketsphinxController sharedInstance] setSecondsOfSilenceToDetect:0.3]; [[OEPocketsphinxController sharedInstance] setVadThreshold:4.0]; // This is the language model we're going to start up with. The only reason I'm making it a class property is that I reuse it a bunch of times in this example, // but you can pass the string contents directly to OEPocketsphinxController:startListeningWithLanguageModelAtPath:dictionaryAtPath:languageModelIsJSGF: NSArray *firstLanguageArray = @[@"ADIOS", @"LECHUGA", @"MADRID", @"BARCELONA", @"PARIS", @"ROMA", @"MAINZ", @"HOLA"]; OELanguageModelGenerator *languageModelGenerator = [[OELanguageModelGenerator alloc] init]; // languageModelGenerator.verboseLanguageModelGenerator = TRUE; // Uncomment me for verbose language model generator debug output. NSError *error = [languageModelGenerator generateLanguageModelFromArray:firstLanguageArray withFilesNamed:@"FirstOpenEarsDynamicLanguageModel" forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"]]; // Change "AcousticModelSpanish" to "AcousticModelSpanish" in order to create a language model for Spanish recognition instead of English. if(error) { NSLog(@"Dynamic language generator reported error %@", [error description]); } else { self.pathToFirstDynamicallyGeneratedLanguageModel = [languageModelGenerator pathToSuccessfullyGeneratedLanguageModelWithRequestedName:@"FirstOpenEarsDynamicLanguageModel"]; self.pathToFirstDynamicallyGeneratedDictionary = [languageModelGenerator pathToSuccessfullyGeneratedDictionaryWithRequestedName:@"FirstOpenEarsDynamicLanguageModel"]; } self.usingStartingLanguageModel = TRUE; // This is not an OpenEars thing, this is just so I can switch back and forth between the two models in this sample app. // Here is an example of dynamically creating an in-app grammar. // We want it to be able to response to the speech "CHANGE MODEL" and a few other things. Items we want to have recognized as a whole phrase (like "CHANGE MODEL") // we put into the array as one string (e.g. "CHANGE MODEL" instead of "CHANGE" and "MODEL"). This increases the probability that they will be recognized as a phrase. This works even better starting with version 1.0 of OpenEars. NSArray *secondLanguageArray = @[@"ADIOS", @"LECHUGA", @"MADRID", @"BARCELONA", @"PARIS", @"ROMA", @"MAINZ", @"HOLA"]; // The last entry, quidnunc, is an example of a word which will not be found in the lookup dictionary and will be passed to the fallback method. The fallback method is slower, // so, for instance, creating a new language model from dictionary words will be pretty fast, but a model that has a lot of unusual names in it or invented/rare/recent-slang // words will be slower to generate. You can use this information to give your users good UI feedback about what the expectations for wait times should be. // I don't think it's beneficial to lazily instantiate OELanguageModelGenerator because you only need to give it a single message and then release it. // If you need to create a very large model or any size of model that has many unusual words that have to make use of the fallback generation method, // you will want to run this on a background thread so you can give the user some UI feedback that the task is in progress. // generateLanguageModelFromArray:withFilesNamed returns an NSError which will either have a value of noErr if everything went fine or a specific error if it didn't. error = [languageModelGenerator generateLanguageModelFromArray:secondLanguageArray withFilesNamed:@"SecondOpenEarsDynamicLanguageModel" forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"]]; // Change "AcousticModelSpanish" to "AcousticModelSpanish" in order to create a language model for Spanish recognition instead of English. // NSError *error = [languageModelGenerator generateLanguageModelFromTextFile:[NSString stringWithFormat:@"%@/%@",[[NSBundle mainBundle] resourcePath], @"OpenEarsCorpus.txt"] withFilesNamed:@"SecondOpenEarsDynamicLanguageModel" forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"]]; // Try this out to see how generating a language model from a corpus works. if(error) { NSLog(@"Dynamic language generator reported error %@", [error description]); } else { self.pathToSecondDynamicallyGeneratedLanguageModel = [languageModelGenerator pathToSuccessfullyGeneratedLanguageModelWithRequestedName:@"SecondOpenEarsDynamicLanguageModel"]; // We'll set our new .languagemodel file to be the one to get switched to when the words "CHANGE MODEL" are recognized. self.pathToSecondDynamicallyGeneratedDictionary = [languageModelGenerator pathToSuccessfullyGeneratedDictionaryWithRequestedName:@"SecondOpenEarsDynamicLanguageModel"];; // We'll set our new dictionary to be the one to get switched to when the words "CHANGE MODEL" are recognized. // Next, an informative message. NSLog(@"\n\nWelcome to the OpenEars sample project. This project understands the words:\nBACKWARD,\nCHANGE,\nFORWARD,\nGO,\nLEFT,\nMODEL,\nRIGHT,\nTURN,\nand if you say \"CHANGE MODEL\" it will switch to its dynamically-generated model which understands the words:\nCHANGE,\nMODEL,\nMONDAY,\nTUESDAY,\nWEDNESDAY,\nTHURSDAY,\nFRIDAY,\nSATURDAY,\nSUNDAY,\nQUIDNUNC"); // This is how to start the continuous listening loop of an available instance of OEPocketsphinxController. We won't do this if the language generation failed since it will be listening for a command to change over to the generated language. [[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil]; // Call this once before setting properties of the OEPocketsphinxController instance. [[OEPocketsphinxController sharedInstance] setSecondsOfSilenceToDetect:0.3]; [[OEPocketsphinxController sharedInstance] setVadThreshold:4.0]; [OEPocketsphinxController sharedInstance].pathToTestFile = [[NSBundle mainBundle] pathForResource:@"openears" ofType:@"wav"]; // This is how you could use a test WAV (mono/16-bit/16k) rather than live recognition if(![OEPocketsphinxController sharedInstance].isListening) { [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't already listening. } // [self startDisplayingLevels] is not an OpenEars method, just a very simple approach for level reading // that I've included with this sample app. My example implementation does make use of two OpenEars // methods: the pocketsphinxInputLevel method of OEPocketsphinxController and the fliteOutputLevel // method of fliteController. // // The example is meant to show one way that you can read those levels continuously without locking the UI, // by using an NSTimer, but the OpenEars level-reading methods // themselves do not include multithreading code since I believe that you will want to design your own // code approaches for level display that are tightly-integrated with your interaction design and the // graphics API you choose. [self startDisplayingLevels]; // Here is some UI stuff that has nothing specifically to do with OpenEars implementation self.startButton.hidden = TRUE; self.stopButton.hidden = TRUE; self.suspendListeningButton.hidden = TRUE; self.resumeListeningButton.hidden = TRUE; } } #pragma mark - #pragma mark OEEventsObserver delegate methods // What follows are all of the delegate methods you can optionally use once you've instantiated an OEEventsObserver and set its delegate to self. // I've provided some pretty granular information about the exact phase of the Pocketsphinx listening loop, the Audio Session, and Flite, but I'd expect // that the ones that will really be needed by most projects are the following: // //- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID; //- (void) audioSessionInterruptionDidBegin; //- (void) audioSessionInterruptionDidEnd; //- (void) audioRouteDidChangeToRoute:(NSString *)newRoute; //- (void) pocketsphinxDidStartListening; //- (void) pocketsphinxDidStopListening; // // It isn't necessary to have a OEPocketsphinxController or a OEFliteController instantiated in order to use these methods. If there isn't anything instantiated that will // send messages to an OEEventsObserver, all that will happen is that these methods will never fire. You also do not have to create a OEEventsObserver in // the same class or view controller in which you are doing things with a OEPocketsphinxController or OEFliteController; you can receive updates from those objects in // any class in which you instantiate an OEEventsObserver and set its delegate to self. // This is an optional delegate method of OEEventsObserver which delivers the text of speech that Pocketsphinx heard and analyzed, along with its accuracy score and utterance ID. - (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID { NSLog(@"Local callback: The received hypothesis is %@ with a score of %@ and an ID of %@", hypothesis, recognitionScore, utteranceID); // Log it. if([hypothesis isEqualToString:@"CHANGE MODEL"]) { // If the user says "CHANGE MODEL", we will switch to the alternate model (which happens to be the dynamically generated model). // Here is an example of language model switching in OpenEars. Deciding on what logical basis to switch models is your responsibility. // For instance, when you call a customer service line and get a response tree that takes you through different options depending on what you say to it, // the models are being switched as you progress through it so that only relevant choices can be understood. The construction of that logical branching and // how to react to it is your job, OpenEars just lets you send the signal to switch the language model when you've decided it's the right time to do so. if(self.usingStartingLanguageModel) { // If we're on the starting model, switch to the dynamically generated one. // You can only change language models with ARPA grammars in OpenEars (the ones that end in .languagemodel or .DMP). // Trying to switch between JSGF models (the ones that end in .gram) will return no result. [[OEPocketsphinxController sharedInstance] changeLanguageModelToFile:self.pathToSecondDynamicallyGeneratedLanguageModel withDictionary:self.pathToSecondDynamicallyGeneratedDictionary]; self.usingStartingLanguageModel = FALSE; } else { // If we're on the dynamically generated model, switch to the start model (this is just an example of a trigger and method for switching models). [[OEPocketsphinxController sharedInstance] changeLanguageModelToFile:self.pathToFirstDynamicallyGeneratedLanguageModel withDictionary:self.pathToFirstDynamicallyGeneratedDictionary]; self.usingStartingLanguageModel = TRUE; } } self.heardTextView.text = [NSString stringWithFormat:@"Heard: \"%@\"", hypothesis]; // Show it in the status box. // This is how to use an available instance of OEFliteController. We're going to repeat back the command that we heard with the voice we've chosen. [self.fliteController say:[NSString stringWithFormat:@"You said %@",hypothesis] withVoice:self.slt]; } #ifdef kGetNbest - (void) pocketsphinxDidReceiveNBestHypothesisArray:(NSArray *)hypothesisArray { // Pocketsphinx has an n-best hypothesis dictionary. NSLog(@"Local callback: hypothesisArray is %@",hypothesisArray); } #endif // An optional delegate method of OEEventsObserver which informs that there was an interruption to the audio session (e.g. an incoming phone call). - (void) audioSessionInterruptionDidBegin { NSLog(@"Local callback: AudioSession interruption began."); // Log it. self.statusTextView.text = @"Status: AudioSession interruption began."; // Show it in the status box. NSError *error = nil; if([OEPocketsphinxController sharedInstance].isListening) { error = [[OEPocketsphinxController sharedInstance] stopListening]; // React to it by telling Pocketsphinx to stop listening (if it is listening) since it will need to restart its loop after an interruption. if(error) NSLog(@"Error while stopping listening in audioSessionInterruptionDidBegin: %@", error); } } // An optional delegate method of OEEventsObserver which informs that the interruption to the audio session ended. - (void) audioSessionInterruptionDidEnd { NSLog(@"Local callback: AudioSession interruption ended."); // Log it. self.statusTextView.text = @"Status: AudioSession interruption ended."; // Show it in the status box. // We're restarting the previously-stopped listening loop. if(![OEPocketsphinxController sharedInstance].isListening){ [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't currently listening. } } // An optional delegate method of OEEventsObserver which informs that the audio input became unavailable. - (void) audioInputDidBecomeUnavailable { NSLog(@"Local callback: The audio input has become unavailable"); // Log it. self.statusTextView.text = @"Status: The audio input has become unavailable"; // Show it in the status box. NSError *error = nil; if([OEPocketsphinxController sharedInstance].isListening){ error = [[OEPocketsphinxController sharedInstance] stopListening]; // React to it by telling Pocketsphinx to stop listening since there is no available input (but only if we are listening). if(error) NSLog(@"Error while stopping listening in audioInputDidBecomeUnavailable: %@", error); } } // An optional delegate method of OEEventsObserver which informs that the unavailable audio input became available again. - (void) audioInputDidBecomeAvailable { NSLog(@"Local callback: The audio input is available"); // Log it. self.statusTextView.text = @"Status: The audio input is available"; // Show it in the status box. if(![OEPocketsphinxController sharedInstance].isListening) { [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition, but only if we aren't already listening. } } // An optional delegate method of OEEventsObserver which informs that there was a change to the audio route (e.g. headphones were plugged in or unplugged). - (void) audioRouteDidChangeToRoute:(NSString *)newRoute { NSLog(@"Local callback: Audio route change. The new audio route is %@", newRoute); // Log it. self.statusTextView.text = [NSString stringWithFormat:@"Status: Audio route change. The new audio route is %@",newRoute]; // Show it in the status box. NSError *error = [[OEPocketsphinxController sharedInstance] stopListening]; // React to it by telling the Pocketsphinx loop to shut down and then start listening again on the new route if(error)NSLog(@"Local callback: error while stopping listening in audioRouteDidChangeToRoute: %@",error); if(![OEPocketsphinxController sharedInstance].isListening) { [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't already listening. } } // An optional delegate method of OEEventsObserver which informs that the Pocketsphinx recognition loop has entered its actual loop. // This might be useful in debugging a conflict between another sound class and Pocketsphinx. - (void) pocketsphinxRecognitionLoopDidStart { NSLog(@"Local callback: Pocketsphinx started."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx started."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx is now listening for speech. - (void) pocketsphinxDidStartListening { NSLog(@"Local callback: Pocketsphinx is now listening."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx is now listening."; // Show it in the status box. self.startButton.hidden = TRUE; // React to it with some UI changes. self.stopButton.hidden = FALSE; self.suspendListeningButton.hidden = FALSE; self.resumeListeningButton.hidden = TRUE; } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx detected speech and is starting to process it. - (void) pocketsphinxDidDetectSpeech { NSLog(@"Local callback: Pocketsphinx has detected speech."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx has detected speech."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx detected a second of silence, indicating the end of an utterance. // This was added because developers requested being able to time the recognition speed without the speech time. The processing time is the time between // this method being called and the hypothesis being returned. - (void) pocketsphinxDidDetectFinishedSpeech { NSLog(@"Local callback: Pocketsphinx has detected a second of silence, concluding an utterance."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx has detected finished speech."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx has exited its recognition loop, most // likely in response to the OEPocketsphinxController being told to stop listening via the stopListening method. - (void) pocketsphinxDidStopListening { NSLog(@"Local callback: Pocketsphinx has stopped listening."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx has stopped listening."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx is still in its listening loop but it is not // Going to react to speech until listening is resumed. This can happen as a result of Flite speech being // in progress on an audio route that doesn't support simultaneous Flite speech and Pocketsphinx recognition, // or as a result of the OEPocketsphinxController being told to suspend recognition via the suspendRecognition method. - (void) pocketsphinxDidSuspendRecognition { NSLog(@"Local callback: Pocketsphinx has suspended recognition."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx has suspended recognition."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx is still in its listening loop and after recognition // having been suspended it is now resuming. This can happen as a result of Flite speech completing // on an audio route that doesn't support simultaneous Flite speech and Pocketsphinx recognition, // or as a result of the OEPocketsphinxController being told to resume recognition via the resumeRecognition method. - (void) pocketsphinxDidResumeRecognition { NSLog(@"Local callback: Pocketsphinx has resumed recognition."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx has resumed recognition."; // Show it in the status box. } // An optional delegate method which informs that Pocketsphinx switched over to a new language model at the given URL in the course of // recognition. This does not imply that it is a valid file or that recognition will be successful using the file. - (void) pocketsphinxDidChangeLanguageModelToFile:(NSString *)newLanguageModelPathAsString andDictionary:(NSString *)newDictionaryPathAsString { NSLog(@"Local callback: Pocketsphinx is now using the following language model: \n%@ and the following dictionary: %@",newLanguageModelPathAsString,newDictionaryPathAsString); } // An optional delegate method of OEEventsObserver which informs that Flite is speaking, most likely to be useful if debugging a // complex interaction between sound classes. You don't have to do anything yourself in order to prevent Pocketsphinx from listening to Flite talk and trying to recognize the speech. - (void) fliteDidStartSpeaking { NSLog(@"Local callback: Flite has started speaking"); // Log it. self.statusTextView.text = @"Status: Flite has started speaking."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Flite is finished speaking, most likely to be useful if debugging a // complex interaction between sound classes. - (void) fliteDidFinishSpeaking { NSLog(@"Local callback: Flite has finished speaking"); // Log it. self.statusTextView.text = @"Status: Flite has finished speaking."; // Show it in the status box. } - (void) pocketSphinxContinuousSetupDidFailWithReason:(NSString *)reasonForFailure { // This can let you know that something went wrong with the recognition loop startup. Turn on [OELogging startOpenEarsLogging] to learn why. NSLog(@"Local callback: Setting up the continuous recognition loop has failed for the reason %@, please turn on [OELogging startOpenEarsLogging] to learn more.", reasonForFailure); // Log it. self.statusTextView.text = @"Status: Not possible to start recognition loop."; // Show it in the status box. } - (void) pocketSphinxContinuousTeardownDidFailWithReason:(NSString *)reasonForFailure { // This can let you know that something went wrong with the recognition loop startup. Turn on [OELogging startOpenEarsLogging] to learn why. NSLog(@"Local callback: Tearing down the continuous recognition loop has failed for the reason %@, please turn on [OELogging startOpenEarsLogging] to learn more.", reasonForFailure); // Log it. self.statusTextView.text = @"Status: Not possible to cleanly end recognition loop."; // Show it in the status box. } - (void) testRecognitionCompleted { // A test file which was submitted for direct recognition via the audio driver is done. NSLog(@"Local callback: A test file which was submitted for direct recognition via the audio driver is done."); // Log it. NSError *error = nil; if([OEPocketsphinxController sharedInstance].isListening) { // If we're listening, stop listening. error = [[OEPocketsphinxController sharedInstance] stopListening]; if(error) NSLog(@"Error while stopping listening in testRecognitionCompleted: %@", error); } } /** Pocketsphinx couldn't start because it has no mic permissions (will only be returned on iOS7 or later).*/ - (void) pocketsphinxFailedNoMicPermissions { NSLog(@"Local callback: The user has never set mic permissions or denied permission to this app's mic, so listening will not start."); self.startupFailedDueToLackOfPermissions = TRUE; } /** The user prompt to get mic permissions, or a check of the mic permissions, has completed with a TRUE or a FALSE result (will only be returned on iOS7 or later).*/ - (void) micPermissionCheckCompleted:(BOOL)result { if(result) { self.restartAttemptsDueToPermissionRequests++; if(self.restartAttemptsDueToPermissionRequests == 1 && self.startupFailedDueToLackOfPermissions) { // If we get here because there was an attempt to start which failed due to lack of permissions, and now permissions have been requested and they returned true, we restart exactly once with the new permissions. NSError *error = nil; if([OEPocketsphinxController sharedInstance].isListening){ error = [[OEPocketsphinxController sharedInstance] stopListening]; // Stop listening if we are listening. if(error) NSLog(@"Error while stopping listening in micPermissionCheckCompleted: %@", error); } if(!error && ![OEPocketsphinxController sharedInstance].isListening) { // If there was no error and we aren't listening, start listening. [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition. self.startupFailedDueToLackOfPermissions = FALSE; } } } } #pragma mark - #pragma mark UI // This is not OpenEars-specific stuff, just some UI behavior - (IBAction) suspendListeningButtonAction { // This is the action for the button which suspends listening without ending the recognition loop [[OEPocketsphinxController sharedInstance] suspendRecognition]; self.startButton.hidden = TRUE; self.stopButton.hidden = FALSE; self.suspendListeningButton.hidden = TRUE; self.resumeListeningButton.hidden = FALSE; } - (IBAction) resumeListeningButtonAction { // This is the action for the button which resumes listening if it has been suspended [[OEPocketsphinxController sharedInstance] resumeRecognition]; self.startButton.hidden = TRUE; self.stopButton.hidden = FALSE; self.suspendListeningButton.hidden = FALSE; self.resumeListeningButton.hidden = TRUE; } - (IBAction) stopButtonAction { // This is the action for the button which shuts down the recognition loop. NSError *error = nil; if([OEPocketsphinxController sharedInstance].isListening) { // Stop if we are currently listening. error = [[OEPocketsphinxController sharedInstance] stopListening]; if(error)NSLog(@"Error stopping listening in stopButtonAction: %@", error); } self.startButton.hidden = FALSE; self.stopButton.hidden = TRUE; self.suspendListeningButton.hidden = TRUE; self.resumeListeningButton.hidden = TRUE; } - (IBAction) startButtonAction { // This is the action for the button which starts up the recognition loop again if it has been shut down. if(![OEPocketsphinxController sharedInstance].isListening) { [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't already listening. } self.startButton.hidden = TRUE; self.stopButton.hidden = FALSE; self.suspendListeningButton.hidden = FALSE; self.resumeListeningButton.hidden = TRUE; } #pragma mark - #pragma mark Example for reading out Pocketsphinx and Flite audio levels without locking the UI by using an NSTimer // What follows are not OpenEars methods, just an approach for level reading // that I've included with this sample app. My example implementation does make use of two OpenEars // methods: the pocketsphinxInputLevel method of OEPocketsphinxController and the fliteOutputLevel // method of OEFliteController. // // The example is meant to show one way that you can read those levels continuously without locking the UI, // by using an NSTimer, but the OpenEars level-reading methods // themselves do not include multithreading code since I believe that you will want to design your own // code approaches for level display that are tightly-integrated with your interaction design and the // graphics API you choose. // // Please note that if you use my sample approach, you should pay attention to the way that the timer is always stopped in // dealloc. This should prevent you from having any difficulties with deallocating a class due to a running NSTimer process. - (void) startDisplayingLevels { // Start displaying the levels using a timer [self stopDisplayingLevels]; // We never want more than one timer valid so we'll stop any running timers first. self.uiUpdateTimer = [NSTimer scheduledTimerWithTimeInterval:1.0/kLevelUpdatesPerSecond target:self selector:@selector(updateLevelsUI) userInfo:nil repeats:YES]; } - (void) stopDisplayingLevels { // Stop displaying the levels by stopping the timer if it's running. if(self.uiUpdateTimer && [self.uiUpdateTimer isValid]) { // If there is a running timer, we'll stop it here. [self.uiUpdateTimer invalidate]; self.uiUpdateTimer = nil; } } - (void) updateLevelsUI { // And here is how we obtain the levels. This method includes the actual OpenEars methods and uses their results to update the UI of this view controller. self.pocketsphinxDbLabel.text = [NSString stringWithFormat:@"Pocketsphinx Input level:%f",[[OEPocketsphinxController sharedInstance] pocketsphinxInputLevel]]; //pocketsphinxInputLevel is an OpenEars method of the class OEPocketsphinxController. if(self.fliteController.speechInProgress) { self.fliteDbLabel.text = [NSString stringWithFormat:@"Flite Output level: %f",[self.fliteController fliteOutputLevel]]; // fliteOutputLevel is an OpenEars method of the class OEFliteController. } } @end
This was the result in the console:
2015-01-02 20:48:16.057 OpenEarsSampleApp[1154:60b] Local callback: Pocketsphinx is now listening.
2015-01-02 20:48:16.062 OpenEarsSampleApp[1154:60b] Local callback: Pocketsphinx started.
2015-01-02 20:48:16.115 OpenEarsSampleApp[1154:60b] Local callback: Pocketsphinx has detected speech.
2015-01-02 20:48:32.753 OpenEarsSampleApp[1154:60b] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
2015-01-02 20:48:33.136 OpenEarsSampleApp[1154:60b] Local callback: The received hypothesis is ROMA with a score of 0 and an ID of 0
2015-01-02 20:48:33.364 OpenEarsSampleApp[1154:60b] Local callback: Flite has started speaking
2015-01-02 20:48:33.372 OpenEarsSampleApp[1154:60b] Local callback: Pocketsphinx has suspended recognition.
2015-01-02 20:48:35.130 OpenEarsSampleApp[1154:60b] Local callback: Flite has finished speaking
2015-01-02 20:48:35.137 OpenEarsSampleApp[1154:60b] Local callback: Pocketsphinx has resumed recognition.Here is the link to download the voice recording regarding that I did it with the internal microphone of iPhone 5 perfectly working with the version 1.66 of openEars. I hope this can help us to find what’s going on.
January 2, 2015 at 9:13 pm #1024063maxgarmarParticipantSorry I was trying to add the link via the tag “link” from the editor but I don’t see the result finally. I edited two times but did not work. Or perhaps it is hidden ?
anyway here it is without the tag:
https://dl.dropboxusercontent.com/u/6380067/openears.wav.zip
and compressed.
Sorry for writing again.
Thanks
January 5, 2015 at 3:17 pm #1024077maxgarmarParticipantHi Halle,
How far did you get with the problem I was facing ? as you see the recognition thread never ends while my recording is working (because the TV in background) with 1.x it’s stop in every word I say. That’s the problem I think. 2.0 never stops listening because the noise.
By the way, playing around your nice framework I just realized if you add any word that before was not recognizing to the LanguageModelGeneratorLookupList.text file then the recognition works perfectly. So my problem is solved for one specific word which is crazy if I have to add all the words which the framework is not recognizing.
Then I looked at the same file in 2.0 version and I see that that dictionary is exactly the same like 1.x but still it is working for words that 1.x does not recognize except that you add them to the dictionary.
I hope it helps you to find what’s happening.Thanks
January 5, 2015 at 4:57 pm #1024078Halle WinklerPolitepixYep, I’ve taken a look at your example and I think I have an idea what the issue is due to. I will be running tests for a few days and then I may have either a fix to push or a beta for you to check out. I appreciate the good example, it was very helpful.
January 6, 2015 at 7:08 pm #1024090Halle WinklerPolitepixOK, take a look at OpenEars 2.01 out today and see if it improves this issue. In my testing of your example, it detected your statements correctly. Make sure you bring in both the 2.01 framework and the 2.01 spanish acoustic model into any app you are testing. Let me know how it works for you.
January 7, 2015 at 12:21 pm #1024108maxgarmarParticipantOk Halle, after an exhaustive test comparing openEars 2.0 noise treatment and 1.x. OpenEars 2.0.1 has improved a lot, congratulations and thanks by the way, but still the sensibility is bigger than 1.x. If you like I can send you again another .wav with cases that 2.0.1 is still recognizing. But I think is not necessary.
Thinking about apps using this library, almost all the cases will have noises around it so I would increase the vadThreshold to a bigger value like 6.0 or whatever to avoid more noises for people like me that think about how the library is used. If I record with Apple’s Voice Memos app from apple you can see that the noises around me are not moving even the bars of the app until I am speaking directly to the mic. But still openEars 2 is listening.People that want to recognize more noises then they could decrease this value to 1.0 for instance.
What do you think? Please don’t hesitate to ask or collect any information from me. I am just trying to help you.
Thanks
January 7, 2015 at 12:34 pm #1024109Halle WinklerPolitepixYes, please send another example demonstrating the issue you’ve described in your most recent post, with a WAV and your minimal changes to the sample app, letting me know the following:
1. which device and iOS version you tested both the old version and the new version,
2. your own log results for the old version and the new version with both verbosePocketsphinx and OpenEarsLogging/OELogging turned on. To avoid complications with the “CH” issue in your OpenEars 1.x version please test with a language model that doesn’t have this phoneme.a vadThreshold as high as 3.5 will suppress actual user speech in testing, so this is probably going to be due to something different if there is still a significant difference in perceived speech onset sensitivity. However, when I ran your test case against 2.01, I had nothing recognized other than your two utterances, they were recognized correctly and immediately on completion, and that was with the default vadThreshold (2.0), so I would need a new case from you if you are seeing something different.
Thanks!
January 7, 2015 at 1:15 pm #1024113maxgarmarParticipantWell ok, let’s do it again.
1. The devices and versions are the same. Just I have my app with 1.7 installed and also your example app directly taken from the 2.0.1 download on my iPhone 5 with 7.0.1 version.
Sorry but in my app I could not take any recording values because in production testing and I can’t modify the code. But I can tell you that is not sensible like 2.0.1.
Anyway, regarding your phrase “a vadThreshold as high as 3.5 will suppress actual user speech in testing” is enough to know that something is really wrong with 2.0.1 because here my test and you will see the results:2.
// ViewController.m // OpenEarsSampleApp // // ViewController.m demonstrates the use of the OpenEars framework. // // Copyright Politepix UG (haftungsbeschränkt) 2014. All rights reserved. // https://www.politepix.com // Contact at https://www.politepix.com/contact // // This file is licensed under the Politepix Shared Source license found in the root of the source distribution. // ************************************************************************************************************************************************************** // ************************************************************************************************************************************************************** // ************************************************************************************************************************************************************** // IMPORTANT NOTE: Audio driver and hardware behavior is completely different between the Simulator and a real device. It is not informative to test OpenEars' accuracy on the Simulator, and please do not report Simulator-only bugs since I only actively support // the device driver. Please only do testing/bug reporting based on results on a real device such as an iPhone or iPod Touch. Thanks! // ************************************************************************************************************************************************************** // ************************************************************************************************************************************************************** // ************************************************************************************************************************************************************** #import "ViewController.h" #import <OpenEars/OEPocketsphinxController.h> #import <OpenEars/OEFliteController.h> #import <OpenEars/OELanguageModelGenerator.h> #import <OpenEars/OELogging.h> #import <OpenEars/OEAcousticModel.h> #import <Slt/Slt.h> @interface ViewController() // UI actions, not specifically related to OpenEars other than the fact that they invoke OpenEars methods. - (IBAction) stopButtonAction; - (IBAction) startButtonAction; - (IBAction) suspendListeningButtonAction; - (IBAction) resumeListeningButtonAction; // Example for reading out the input audio levels without locking the UI using an NSTimer - (void) startDisplayingLevels; - (void) stopDisplayingLevels; // These three are the important OpenEars objects that this class demonstrates the use of. @property (nonatomic, strong) Slt *slt; @property (nonatomic, strong) OEEventsObserver *openEarsEventsObserver; @property (nonatomic, strong) OEPocketsphinxController *pocketsphinxController; @property (nonatomic, strong) OEFliteController *fliteController; // Some UI, not specifically related to OpenEars. @property (nonatomic, strong) IBOutlet UIButton *stopButton; @property (nonatomic, strong) IBOutlet UIButton *startButton; @property (nonatomic, strong) IBOutlet UIButton *suspendListeningButton; @property (nonatomic, strong) IBOutlet UIButton *resumeListeningButton; @property (nonatomic, strong) IBOutlet UITextView *statusTextView; @property (nonatomic, strong) IBOutlet UITextView *heardTextView; @property (nonatomic, strong) IBOutlet UILabel *pocketsphinxDbLabel; @property (nonatomic, strong) IBOutlet UILabel *fliteDbLabel; @property (nonatomic, assign) BOOL usingStartingLanguageModel; @property (nonatomic, assign) int restartAttemptsDueToPermissionRequests; @property (nonatomic, assign) BOOL startupFailedDueToLackOfPermissions; // Things which help us show off the dynamic language features. @property (nonatomic, copy) NSString *pathToFirstDynamicallyGeneratedLanguageModel; @property (nonatomic, copy) NSString *pathToFirstDynamicallyGeneratedDictionary; @property (nonatomic, copy) NSString *pathToSecondDynamicallyGeneratedLanguageModel; @property (nonatomic, copy) NSString *pathToSecondDynamicallyGeneratedDictionary; // Our NSTimer that will help us read and display the input and output levels without locking the UI @property (nonatomic, strong) NSTimer *uiUpdateTimer; @end @implementation ViewController #define kLevelUpdatesPerSecond 18 // We'll have the ui update 18 times a second to show some fluidity without hitting the CPU too hard. #define kGetNbest // Uncomment this if you want to try out nbest #pragma mark - #pragma mark Memory Management - (void)dealloc { [self stopDisplayingLevels]; } #pragma mark - #pragma mark View Lifecycle - (void)viewDidLoad { [super viewDidLoad]; self.fliteController = [[OEFliteController alloc] init]; self.openEarsEventsObserver = [[OEEventsObserver alloc] init]; self.openEarsEventsObserver.delegate = self; self.slt = [[Slt alloc] init]; self.restartAttemptsDueToPermissionRequests = 0; self.startupFailedDueToLackOfPermissions = FALSE; [OELogging startOpenEarsLogging]; // Uncomment me for OELogging, which is verbose logging about internal OpenEars operations such as audio settings. If you have issues, show this logging in the forums. [OEPocketsphinxController sharedInstance].verbosePocketSphinx = TRUE; // Uncomment this for much more verbose speech recognition engine output. If you have issues, show this logging in the forums. [self.openEarsEventsObserver setDelegate:self]; // Make this class the delegate of OpenEarsObserver so we can get all of the messages about what OpenEars is doing. [[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil]; // Call this before setting any OEPocketsphinxController characteristics [OEPocketsphinxController sharedInstance].returnNbest = TRUE; [OEPocketsphinxController sharedInstance].nBestNumber = 5; [[OEPocketsphinxController sharedInstance] setSecondsOfSilenceToDetect:0.5]; [[OEPocketsphinxController sharedInstance] setVadThreshold:3.5]; // This is the language model we're going to start up with. The only reason I'm making it a class property is that I reuse it a bunch of times in this example, // but you can pass the string contents directly to OEPocketsphinxController:startListeningWithLanguageModelAtPath:dictionaryAtPath:languageModelIsJSGF: NSArray *firstLanguageArray = @[@"ADIOS", @"LECHUGA", @"MADRID", @"BARCELONA", @"PARIS", @"ROMA", @"MAINZ", @"HOLA", @"CHORIZO", @"HORCHATA"]; OELanguageModelGenerator *languageModelGenerator = [[OELanguageModelGenerator alloc] init]; languageModelGenerator.verboseLanguageModelGenerator = TRUE; // Uncomment me for verbose language model generator debug output. NSError *error = [languageModelGenerator generateLanguageModelFromArray:firstLanguageArray withFilesNamed:@"FirstOpenEarsDynamicLanguageModel" forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"]]; // Change "AcousticModelSpanish" to "AcousticModelSpanish" in order to create a language model for Spanish recognition instead of English. if(error) { NSLog(@"Dynamic language generator reported error %@", [error description]); } else { self.pathToFirstDynamicallyGeneratedLanguageModel = [languageModelGenerator pathToSuccessfullyGeneratedLanguageModelWithRequestedName:@"FirstOpenEarsDynamicLanguageModel"]; self.pathToFirstDynamicallyGeneratedDictionary = [languageModelGenerator pathToSuccessfullyGeneratedDictionaryWithRequestedName:@"FirstOpenEarsDynamicLanguageModel"]; } self.usingStartingLanguageModel = TRUE; // This is not an OpenEars thing, this is just so I can switch back and forth between the two models in this sample app. // Here is an example of dynamically creating an in-app grammar. // We want it to be able to response to the speech "CHANGE MODEL" and a few other things. Items we want to have recognized as a whole phrase (like "CHANGE MODEL") // we put into the array as one string (e.g. "CHANGE MODEL" instead of "CHANGE" and "MODEL"). This increases the probability that they will be recognized as a phrase. This works even better starting with version 1.0 of OpenEars. NSArray *secondLanguageArray = @[@"ADIOS", @"LECHUGA", @"MADRID", @"BARCELONA", @"PARIS", @"ROMA", @"MAINZ", @"HOLA", @"CHORIZO", @"HORCHATA"]; // The last entry, quidnunc, is an example of a word which will not be found in the lookup dictionary and will be passed to the fallback method. The fallback method is slower, // so, for instance, creating a new language model from dictionary words will be pretty fast, but a model that has a lot of unusual names in it or invented/rare/recent-slang // words will be slower to generate. You can use this information to give your users good UI feedback about what the expectations for wait times should be. // I don't think it's beneficial to lazily instantiate OELanguageModelGenerator because you only need to give it a single message and then release it. // If you need to create a very large model or any size of model that has many unusual words that have to make use of the fallback generation method, // you will want to run this on a background thread so you can give the user some UI feedback that the task is in progress. // generateLanguageModelFromArray:withFilesNamed returns an NSError which will either have a value of noErr if everything went fine or a specific error if it didn't. error = [languageModelGenerator generateLanguageModelFromArray:secondLanguageArray withFilesNamed:@"SecondOpenEarsDynamicLanguageModel" forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"]]; // Change "AcousticModelSpanish" to "AcousticModelSpanish" in order to create a language model for Spanish recognition instead of English. // NSError *error = [languageModelGenerator generateLanguageModelFromTextFile:[NSString stringWithFormat:@"%@/%@",[[NSBundle mainBundle] resourcePath], @"OpenEarsCorpus.txt"] withFilesNamed:@"SecondOpenEarsDynamicLanguageModel" forAcousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"]]; // Try this out to see how generating a language model from a corpus works. if(error) { NSLog(@"Dynamic language generator reported error %@", [error description]); } else { self.pathToSecondDynamicallyGeneratedLanguageModel = [languageModelGenerator pathToSuccessfullyGeneratedLanguageModelWithRequestedName:@"SecondOpenEarsDynamicLanguageModel"]; // We'll set our new .languagemodel file to be the one to get switched to when the words "CHANGE MODEL" are recognized. self.pathToSecondDynamicallyGeneratedDictionary = [languageModelGenerator pathToSuccessfullyGeneratedDictionaryWithRequestedName:@"SecondOpenEarsDynamicLanguageModel"];; // We'll set our new dictionary to be the one to get switched to when the words "CHANGE MODEL" are recognized. // Next, an informative message. NSLog(@"\n\nWelcome to the OpenEars sample project. This project understands the words:\nBACKWARD,\nCHANGE,\nFORWARD,\nGO,\nLEFT,\nMODEL,\nRIGHT,\nTURN,\nand if you say \"CHANGE MODEL\" it will switch to its dynamically-generated model which understands the words:\nCHANGE,\nMODEL,\nMONDAY,\nTUESDAY,\nWEDNESDAY,\nTHURSDAY,\nFRIDAY,\nSATURDAY,\nSUNDAY,\nQUIDNUNC"); // This is how to start the continuous listening loop of an available instance of OEPocketsphinxController. We won't do this if the language generation failed since it will be listening for a command to change over to the generated language. [[OEPocketsphinxController sharedInstance] setActive:TRUE error:nil]; // Call this once before setting properties of the OEPocketsphinxController instance. [[OEPocketsphinxController sharedInstance] setSecondsOfSilenceToDetect:0.5]; [[OEPocketsphinxController sharedInstance] setVadThreshold:3.5]; [OEPocketsphinxController sharedInstance].pathToTestFile = [[NSBundle mainBundle] pathForResource:@"openears4" ofType:@"wav"]; // This is how you could use a test WAV (mono/16-bit/16k) rather than live recognition if(![OEPocketsphinxController sharedInstance].isListening) { [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't already listening. } // [self startDisplayingLevels] is not an OpenEars method, just a very simple approach for level reading // that I've included with this sample app. My example implementation does make use of two OpenEars // methods: the pocketsphinxInputLevel method of OEPocketsphinxController and the fliteOutputLevel // method of fliteController. // // The example is meant to show one way that you can read those levels continuously without locking the UI, // by using an NSTimer, but the OpenEars level-reading methods // themselves do not include multithreading code since I believe that you will want to design your own // code approaches for level display that are tightly-integrated with your interaction design and the // graphics API you choose. [self startDisplayingLevels]; // Here is some UI stuff that has nothing specifically to do with OpenEars implementation self.startButton.hidden = TRUE; self.stopButton.hidden = TRUE; self.suspendListeningButton.hidden = TRUE; self.resumeListeningButton.hidden = TRUE; } } #pragma mark - #pragma mark OEEventsObserver delegate methods // What follows are all of the delegate methods you can optionally use once you've instantiated an OEEventsObserver and set its delegate to self. // I've provided some pretty granular information about the exact phase of the Pocketsphinx listening loop, the Audio Session, and Flite, but I'd expect // that the ones that will really be needed by most projects are the following: // //- (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID; //- (void) audioSessionInterruptionDidBegin; //- (void) audioSessionInterruptionDidEnd; //- (void) audioRouteDidChangeToRoute:(NSString *)newRoute; //- (void) pocketsphinxDidStartListening; //- (void) pocketsphinxDidStopListening; // // It isn't necessary to have a OEPocketsphinxController or a OEFliteController instantiated in order to use these methods. If there isn't anything instantiated that will // send messages to an OEEventsObserver, all that will happen is that these methods will never fire. You also do not have to create a OEEventsObserver in // the same class or view controller in which you are doing things with a OEPocketsphinxController or OEFliteController; you can receive updates from those objects in // any class in which you instantiate an OEEventsObserver and set its delegate to self. // This is an optional delegate method of OEEventsObserver which delivers the text of speech that Pocketsphinx heard and analyzed, along with its accuracy score and utterance ID. - (void) pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis recognitionScore:(NSString *)recognitionScore utteranceID:(NSString *)utteranceID { NSLog(@"Local callback: The received hypothesis is %@ with a score of %@ and an ID of %@", hypothesis, recognitionScore, utteranceID); // Log it. if([hypothesis isEqualToString:@"CHANGE MODEL"]) { // If the user says "CHANGE MODEL", we will switch to the alternate model (which happens to be the dynamically generated model). // Here is an example of language model switching in OpenEars. Deciding on what logical basis to switch models is your responsibility. // For instance, when you call a customer service line and get a response tree that takes you through different options depending on what you say to it, // the models are being switched as you progress through it so that only relevant choices can be understood. The construction of that logical branching and // how to react to it is your job, OpenEars just lets you send the signal to switch the language model when you've decided it's the right time to do so. if(self.usingStartingLanguageModel) { // If we're on the starting model, switch to the dynamically generated one. // You can only change language models with ARPA grammars in OpenEars (the ones that end in .languagemodel or .DMP). // Trying to switch between JSGF models (the ones that end in .gram) will return no result. [[OEPocketsphinxController sharedInstance] changeLanguageModelToFile:self.pathToSecondDynamicallyGeneratedLanguageModel withDictionary:self.pathToSecondDynamicallyGeneratedDictionary]; self.usingStartingLanguageModel = FALSE; } else { // If we're on the dynamically generated model, switch to the start model (this is just an example of a trigger and method for switching models). [[OEPocketsphinxController sharedInstance] changeLanguageModelToFile:self.pathToFirstDynamicallyGeneratedLanguageModel withDictionary:self.pathToFirstDynamicallyGeneratedDictionary]; self.usingStartingLanguageModel = TRUE; } } self.heardTextView.text = [NSString stringWithFormat:@"Heard: \"%@\"", hypothesis]; // Show it in the status box. // This is how to use an available instance of OEFliteController. We're going to repeat back the command that we heard with the voice we've chosen. [self.fliteController say:[NSString stringWithFormat:@"You said %@",hypothesis] withVoice:self.slt]; } #ifdef kGetNbest - (void) pocketsphinxDidReceiveNBestHypothesisArray:(NSArray *)hypothesisArray { // Pocketsphinx has an n-best hypothesis dictionary. NSLog(@"Local callback: hypothesisArray is %@",hypothesisArray); } #endif // An optional delegate method of OEEventsObserver which informs that there was an interruption to the audio session (e.g. an incoming phone call). - (void) audioSessionInterruptionDidBegin { NSLog(@"Local callback: AudioSession interruption began."); // Log it. self.statusTextView.text = @"Status: AudioSession interruption began."; // Show it in the status box. NSError *error = nil; if([OEPocketsphinxController sharedInstance].isListening) { error = [[OEPocketsphinxController sharedInstance] stopListening]; // React to it by telling Pocketsphinx to stop listening (if it is listening) since it will need to restart its loop after an interruption. if(error) NSLog(@"Error while stopping listening in audioSessionInterruptionDidBegin: %@", error); } } // An optional delegate method of OEEventsObserver which informs that the interruption to the audio session ended. - (void) audioSessionInterruptionDidEnd { NSLog(@"Local callback: AudioSession interruption ended."); // Log it. self.statusTextView.text = @"Status: AudioSession interruption ended."; // Show it in the status box. // We're restarting the previously-stopped listening loop. if(![OEPocketsphinxController sharedInstance].isListening){ [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't currently listening. } } // An optional delegate method of OEEventsObserver which informs that the audio input became unavailable. - (void) audioInputDidBecomeUnavailable { NSLog(@"Local callback: The audio input has become unavailable"); // Log it. self.statusTextView.text = @"Status: The audio input has become unavailable"; // Show it in the status box. NSError *error = nil; if([OEPocketsphinxController sharedInstance].isListening){ error = [[OEPocketsphinxController sharedInstance] stopListening]; // React to it by telling Pocketsphinx to stop listening since there is no available input (but only if we are listening). if(error) NSLog(@"Error while stopping listening in audioInputDidBecomeUnavailable: %@", error); } } // An optional delegate method of OEEventsObserver which informs that the unavailable audio input became available again. - (void) audioInputDidBecomeAvailable { NSLog(@"Local callback: The audio input is available"); // Log it. self.statusTextView.text = @"Status: The audio input is available"; // Show it in the status box. if(![OEPocketsphinxController sharedInstance].isListening) { [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition, but only if we aren't already listening. } } // An optional delegate method of OEEventsObserver which informs that there was a change to the audio route (e.g. headphones were plugged in or unplugged). - (void) audioRouteDidChangeToRoute:(NSString *)newRoute { NSLog(@"Local callback: Audio route change. The new audio route is %@", newRoute); // Log it. self.statusTextView.text = [NSString stringWithFormat:@"Status: Audio route change. The new audio route is %@",newRoute]; // Show it in the status box. NSError *error = [[OEPocketsphinxController sharedInstance] stopListening]; // React to it by telling the Pocketsphinx loop to shut down and then start listening again on the new route if(error)NSLog(@"Local callback: error while stopping listening in audioRouteDidChangeToRoute: %@",error); if(![OEPocketsphinxController sharedInstance].isListening) { [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't already listening. } } // An optional delegate method of OEEventsObserver which informs that the Pocketsphinx recognition loop has entered its actual loop. // This might be useful in debugging a conflict between another sound class and Pocketsphinx. - (void) pocketsphinxRecognitionLoopDidStart { NSLog(@"Local callback: Pocketsphinx started."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx started."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx is now listening for speech. - (void) pocketsphinxDidStartListening { NSLog(@"Local callback: Pocketsphinx is now listening."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx is now listening."; // Show it in the status box. self.startButton.hidden = TRUE; // React to it with some UI changes. self.stopButton.hidden = FALSE; self.suspendListeningButton.hidden = FALSE; self.resumeListeningButton.hidden = TRUE; } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx detected speech and is starting to process it. - (void) pocketsphinxDidDetectSpeech { NSLog(@"Local callback: Pocketsphinx has detected speech."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx has detected speech."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx detected a second of silence, indicating the end of an utterance. // This was added because developers requested being able to time the recognition speed without the speech time. The processing time is the time between // this method being called and the hypothesis being returned. - (void) pocketsphinxDidDetectFinishedSpeech { NSLog(@"Local callback: Pocketsphinx has detected a second of silence, concluding an utterance."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx has detected finished speech."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx has exited its recognition loop, most // likely in response to the OEPocketsphinxController being told to stop listening via the stopListening method. - (void) pocketsphinxDidStopListening { NSLog(@"Local callback: Pocketsphinx has stopped listening."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx has stopped listening."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx is still in its listening loop but it is not // Going to react to speech until listening is resumed. This can happen as a result of Flite speech being // in progress on an audio route that doesn't support simultaneous Flite speech and Pocketsphinx recognition, // or as a result of the OEPocketsphinxController being told to suspend recognition via the suspendRecognition method. - (void) pocketsphinxDidSuspendRecognition { NSLog(@"Local callback: Pocketsphinx has suspended recognition."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx has suspended recognition."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Pocketsphinx is still in its listening loop and after recognition // having been suspended it is now resuming. This can happen as a result of Flite speech completing // on an audio route that doesn't support simultaneous Flite speech and Pocketsphinx recognition, // or as a result of the OEPocketsphinxController being told to resume recognition via the resumeRecognition method. - (void) pocketsphinxDidResumeRecognition { NSLog(@"Local callback: Pocketsphinx has resumed recognition."); // Log it. self.statusTextView.text = @"Status: Pocketsphinx has resumed recognition."; // Show it in the status box. } // An optional delegate method which informs that Pocketsphinx switched over to a new language model at the given URL in the course of // recognition. This does not imply that it is a valid file or that recognition will be successful using the file. - (void) pocketsphinxDidChangeLanguageModelToFile:(NSString *)newLanguageModelPathAsString andDictionary:(NSString *)newDictionaryPathAsString { NSLog(@"Local callback: Pocketsphinx is now using the following language model: \n%@ and the following dictionary: %@",newLanguageModelPathAsString,newDictionaryPathAsString); } // An optional delegate method of OEEventsObserver which informs that Flite is speaking, most likely to be useful if debugging a // complex interaction between sound classes. You don't have to do anything yourself in order to prevent Pocketsphinx from listening to Flite talk and trying to recognize the speech. - (void) fliteDidStartSpeaking { NSLog(@"Local callback: Flite has started speaking"); // Log it. self.statusTextView.text = @"Status: Flite has started speaking."; // Show it in the status box. } // An optional delegate method of OEEventsObserver which informs that Flite is finished speaking, most likely to be useful if debugging a // complex interaction between sound classes. - (void) fliteDidFinishSpeaking { NSLog(@"Local callback: Flite has finished speaking"); // Log it. self.statusTextView.text = @"Status: Flite has finished speaking."; // Show it in the status box. } - (void) pocketSphinxContinuousSetupDidFailWithReason:(NSString *)reasonForFailure { // This can let you know that something went wrong with the recognition loop startup. Turn on [OELogging startOpenEarsLogging] to learn why. NSLog(@"Local callback: Setting up the continuous recognition loop has failed for the reason %@, please turn on [OELogging startOpenEarsLogging] to learn more.", reasonForFailure); // Log it. self.statusTextView.text = @"Status: Not possible to start recognition loop."; // Show it in the status box. } - (void) pocketSphinxContinuousTeardownDidFailWithReason:(NSString *)reasonForFailure { // This can let you know that something went wrong with the recognition loop startup. Turn on [OELogging startOpenEarsLogging] to learn why. NSLog(@"Local callback: Tearing down the continuous recognition loop has failed for the reason %@, please turn on [OELogging startOpenEarsLogging] to learn more.", reasonForFailure); // Log it. self.statusTextView.text = @"Status: Not possible to cleanly end recognition loop."; // Show it in the status box. } - (void) testRecognitionCompleted { // A test file which was submitted for direct recognition via the audio driver is done. NSLog(@"Local callback: A test file which was submitted for direct recognition via the audio driver is done."); // Log it. NSError *error = nil; if([OEPocketsphinxController sharedInstance].isListening) { // If we're listening, stop listening. error = [[OEPocketsphinxController sharedInstance] stopListening]; if(error) NSLog(@"Error while stopping listening in testRecognitionCompleted: %@", error); } } /** Pocketsphinx couldn't start because it has no mic permissions (will only be returned on iOS7 or later).*/ - (void) pocketsphinxFailedNoMicPermissions { NSLog(@"Local callback: The user has never set mic permissions or denied permission to this app's mic, so listening will not start."); self.startupFailedDueToLackOfPermissions = TRUE; } /** The user prompt to get mic permissions, or a check of the mic permissions, has completed with a TRUE or a FALSE result (will only be returned on iOS7 or later).*/ - (void) micPermissionCheckCompleted:(BOOL)result { if(result) { self.restartAttemptsDueToPermissionRequests++; if(self.restartAttemptsDueToPermissionRequests == 1 && self.startupFailedDueToLackOfPermissions) { // If we get here because there was an attempt to start which failed due to lack of permissions, and now permissions have been requested and they returned true, we restart exactly once with the new permissions. NSError *error = nil; if([OEPocketsphinxController sharedInstance].isListening){ error = [[OEPocketsphinxController sharedInstance] stopListening]; // Stop listening if we are listening. if(error) NSLog(@"Error while stopping listening in micPermissionCheckCompleted: %@", error); } if(!error && ![OEPocketsphinxController sharedInstance].isListening) { // If there was no error and we aren't listening, start listening. [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition. self.startupFailedDueToLackOfPermissions = FALSE; } } } } #pragma mark - #pragma mark UI // This is not OpenEars-specific stuff, just some UI behavior - (IBAction) suspendListeningButtonAction { // This is the action for the button which suspends listening without ending the recognition loop [[OEPocketsphinxController sharedInstance] suspendRecognition]; self.startButton.hidden = TRUE; self.stopButton.hidden = FALSE; self.suspendListeningButton.hidden = TRUE; self.resumeListeningButton.hidden = FALSE; } - (IBAction) resumeListeningButtonAction { // This is the action for the button which resumes listening if it has been suspended [[OEPocketsphinxController sharedInstance] resumeRecognition]; self.startButton.hidden = TRUE; self.stopButton.hidden = FALSE; self.suspendListeningButton.hidden = FALSE; self.resumeListeningButton.hidden = TRUE; } - (IBAction) stopButtonAction { // This is the action for the button which shuts down the recognition loop. NSError *error = nil; if([OEPocketsphinxController sharedInstance].isListening) { // Stop if we are currently listening. error = [[OEPocketsphinxController sharedInstance] stopListening]; if(error)NSLog(@"Error stopping listening in stopButtonAction: %@", error); } self.startButton.hidden = FALSE; self.stopButton.hidden = TRUE; self.suspendListeningButton.hidden = TRUE; self.resumeListeningButton.hidden = TRUE; } - (IBAction) startButtonAction { // This is the action for the button which starts up the recognition loop again if it has been shut down. if(![OEPocketsphinxController sharedInstance].isListening) { [[OEPocketsphinxController sharedInstance] startListeningWithLanguageModelAtPath:self.pathToFirstDynamicallyGeneratedLanguageModel dictionaryAtPath:self.pathToFirstDynamicallyGeneratedDictionary acousticModelAtPath:[OEAcousticModel pathToModel:@"AcousticModelSpanish"] languageModelIsJSGF:FALSE]; // Start speech recognition if we aren't already listening. } self.startButton.hidden = TRUE; self.stopButton.hidden = FALSE; self.suspendListeningButton.hidden = FALSE; self.resumeListeningButton.hidden = TRUE; } #pragma mark - #pragma mark Example for reading out Pocketsphinx and Flite audio levels without locking the UI by using an NSTimer // What follows are not OpenEars methods, just an approach for level reading // that I've included with this sample app. My example implementation does make use of two OpenEars // methods: the pocketsphinxInputLevel method of OEPocketsphinxController and the fliteOutputLevel // method of OEFliteController. // // The example is meant to show one way that you can read those levels continuously without locking the UI, // by using an NSTimer, but the OpenEars level-reading methods // themselves do not include multithreading code since I believe that you will want to design your own // code approaches for level display that are tightly-integrated with your interaction design and the // graphics API you choose. // // Please note that if you use my sample approach, you should pay attention to the way that the timer is always stopped in // dealloc. This should prevent you from having any difficulties with deallocating a class due to a running NSTimer process. - (void) startDisplayingLevels { // Start displaying the levels using a timer [self stopDisplayingLevels]; // We never want more than one timer valid so we'll stop any running timers first. self.uiUpdateTimer = [NSTimer scheduledTimerWithTimeInterval:1.0/kLevelUpdatesPerSecond target:self selector:@selector(updateLevelsUI) userInfo:nil repeats:YES]; } - (void) stopDisplayingLevels { // Stop displaying the levels by stopping the timer if it's running. if(self.uiUpdateTimer && [self.uiUpdateTimer isValid]) { // If there is a running timer, we'll stop it here. [self.uiUpdateTimer invalidate]; self.uiUpdateTimer = nil; } } - (void) updateLevelsUI { // And here is how we obtain the levels. This method includes the actual OpenEars methods and uses their results to update the UI of this view controller. self.pocketsphinxDbLabel.text = [NSString stringWithFormat:@"Pocketsphinx Input level:%f",[[OEPocketsphinxController sharedInstance] pocketsphinxInputLevel]]; //pocketsphinxInputLevel is an OpenEars method of the class OEPocketsphinxController. if(self.fliteController.speechInProgress) { self.fliteDbLabel.text = [NSString stringWithFormat:@"Flite Output level: %f",[self.fliteController fliteOutputLevel]]; // fliteOutputLevel is an OpenEars method of the class OEFliteController. } } @end
Definitely, is not suppressing at all. Here the result in the log:
2015-01-07 13:01:28.314 OpenEarsSampleApp[842:60b] Starting OpenEars logging for OpenEars version 2.01 on 32-bit device (or build): iPhone running iOS version: 7.000000
2015-01-07 13:01:28.318 OpenEarsSampleApp[842:60b] Creating shared instance of OEPocketsphinxController
2015-01-07 13:01:28.380 OpenEarsSampleApp[842:60b] Starting dynamic language model generation
## Vocab generated by v2 of the CMU-Cambridge Statistcal
## Language Modeling toolkit.
##
## Includes 12 words ##
wfreq2vocab : Done.
text2idngram
Vocab : /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.vocab
Output idngram : /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.idngram
N-gram buffer size : 10
Hash table size : 5000
Temp directory : /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/cmuclmtk-jsQ2W3
Max open files : 20
FOF size : 10
n : 3
Initialising hash table…
Reading vocabulary…
Allocating memory for the n-gram buffer…
Reading text into the n-gram buffer…
20,000 n-grams processed for each “.”, 1,000,000 for each line.Sorting n-grams…
Writing sorted n-grams to temporary file /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/cmuclmtk-jsQ2W3/1
Merging 1 temporary files…2-grams occurring: N times > N times Sug. -spec_num value
0 21 31
1 20 1 11
2 0 1 11
3 0 1 11
4 0 1 11
5 0 1 11
6 0 1 11
7 0 1 11
8 0 1 11
9 0 1 11
10 1 0 103-grams occurring: N times > N times Sug. -spec_num value
0 30 40
1 30 0 10
2 0 0 10
3 0 0 10
4 0 0 10
5 0 0 10
6 0 0 10
7 0 0 10
8 0 0 10
9 0 0 10
10 0 0 10
text2idngram : Done.read_wlist_into_siht: a list of 12 words was read from “/var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.vocab”.
read_wlist_into_array: a list of 12 words was read from “/var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.vocab”.
Unigram was renormalized to absorb a mass of 0.5
prob[UNK] = 1e-99
ARPA-style 3-gram will be written to /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.arpa
idngram2lm : Done.
INFO: cmd_ln.c(702): Parsing command line:
sphinx_lm_convert \
-i /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.arpa \
-o /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP \
-debug 10Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 10
-help no no
-i /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.arpa
-ienc
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP
-oenc utf8 utf8
-ofmtINFO: ngram_model_arpa.c(504): ngrams 1=12, 2=20, 3=10
INFO: ngram_model_arpa.c(137): Reading unigrams
INFO: ngram_model_arpa.c(543): 12 = #unigrams created
INFO: ngram_model_arpa.c(197): Reading bigrams
INFO: ngram_model_arpa.c(561): 20 = #bigrams created
INFO: ngram_model_arpa.c(562): 3 = #prob2 entries
INFO: ngram_model_arpa.c(570): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(294): Reading trigrams
INFO: ngram_model_arpa.c(583): 10 = #trigrams created
INFO: ngram_model_arpa.c(584): 2 = #prob3 entries
INFO: ngram_model_dmp.c(518): Building DMP model…
INFO: ngram_model_dmp.c(548): 12 = #unigrams created
INFO: ngram_model_dmp.c(649): 20 = #bigrams created
INFO: ngram_model_dmp.c(650): 3 = #prob2 entries
INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
INFO: ngram_model_dmp.c(661): 10 = #trigrams created
INFO: ngram_model_dmp.c(662): 2 = #prob3 entries
2015-01-07 13:01:28.439 OpenEarsSampleApp[842:60b] Done creating language model with CMUCLMTK in 0.058217 seconds.
2015-01-07 13:01:28.477 OpenEarsSampleApp[842:60b] The word ADIOS was not found in the dictionary /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-07 13:01:28.479 OpenEarsSampleApp[842:60b] Now using the fallback method to look up the word ADIOS
2015-01-07 13:01:28.480 OpenEarsSampleApp[842:60b] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the Spanish phonetic lookup dictionary is that your words are not in Spanish or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase.
2015-01-07 13:01:28.493 OpenEarsSampleApp[842:60b] The word HORCHATA was not found in the dictionary /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-07 13:01:28.495 OpenEarsSampleApp[842:60b] Now using the fallback method to look up the word HORCHATA
2015-01-07 13:01:28.496 OpenEarsSampleApp[842:60b] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the Spanish phonetic lookup dictionary is that your words are not in Spanish or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase.
2015-01-07 13:01:28.503 OpenEarsSampleApp[842:60b] The word LECHUGA was not found in the dictionary /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-07 13:01:28.505 OpenEarsSampleApp[842:60b] Now using the fallback method to look up the word LECHUGA
2015-01-07 13:01:28.506 OpenEarsSampleApp[842:60b] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the Spanish phonetic lookup dictionary is that your words are not in Spanish or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase.
2015-01-07 13:01:28.513 OpenEarsSampleApp[842:60b] The word MAINZ was not found in the dictionary /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-07 13:01:28.514 OpenEarsSampleApp[842:60b] Now using the fallback method to look up the word MAINZ
2015-01-07 13:01:28.516 OpenEarsSampleApp[842:60b] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the Spanish phonetic lookup dictionary is that your words are not in Spanish or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase.
2015-01-07 13:01:28.520 OpenEarsSampleApp[842:60b] I’m done running performDictionaryLookup and it took 0.053911 seconds
2015-01-07 13:01:28.526 OpenEarsSampleApp[842:60b] I’m done running dynamic language model generation and it took 0.200190 seconds
2015-01-07 13:01:28.532 OpenEarsSampleApp[842:60b] Starting dynamic language model generation
## Vocab generated by v2 of the CMU-Cambridge Statistcal
## Language Modeling toolkit.
##
## Includes 12 words ##
wfreq2vocab : Done.
text2idngram
Vocab : /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/SecondOpenEarsDynamicLanguageModel.vocab
Output idngram : /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/SecondOpenEarsDynamicLanguageModel.idngram
N-gram buffer size : 10
Hash table size : 5000
Temp directory : /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/cmuclmtk-no6FQp
Max open files : 20
FOF size : 10
n : 3
Initialising hash table…
Reading vocabulary…
Allocating memory for the n-gram buffer…
Reading text into the n-gram buffer…
20,000 n-grams processed for each “.”, 1,000,000 for each line.Sorting n-grams…
Writing sorted n-grams to temporary file /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/cmuclmtk-no6FQp/1
Merging 1 temporary files…2-grams occurring: N times > N times Sug. -spec_num value
0 21 31
1 20 1 11
2 0 1 11
3 0 1 11
4 0 1 11
5 0 1 11
6 0 1 11
7 0 1 11
8 0 1 11
9 0 1 11
10 1 0 103-grams occurring: N times > N times Sug. -spec_num value
0 30 40
1 30 0 10
2 0 0 10
3 0 0 10
4 0 0 10
5 0 0 10
6 0 0 10
7 0 0 10
8 0 0 10
9 0 0 10
10 0 0 10
text2idngram : Done.read_wlist_into_siht: a list of 12 words was read from “/var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/SecondOpenEarsDynamicLanguageModel.vocab”.
read_wlist_into_array: a list of 12 words was read from “/var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/SecondOpenEarsDynamicLanguageModel.vocab”.
Unigram was renormalized to absorb a mass of 0.5
prob[UNK] = 1e-99
ARPA-style 3-gram will be written to /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/SecondOpenEarsDynamicLanguageModel.arpa
idngram2lm : Done.
INFO: cmd_ln.c(702): Parsing command line:
sphinx_lm_convert \
-i /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/SecondOpenEarsDynamicLanguageModel.arpa \
-o /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/SecondOpenEarsDynamicLanguageModel.DMP \
-debug 10Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 10
-help no no
-i /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/SecondOpenEarsDynamicLanguageModel.arpa
-ienc
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/SecondOpenEarsDynamicLanguageModel.DMP
-oenc utf8 utf8
-ofmtINFO: ngram_model_arpa.c(504): ngrams 1=12, 2=20, 3=10
INFO: ngram_model_arpa.c(137): Reading unigrams
INFO: ngram_model_arpa.c(543): 12 = #unigrams created
INFO: ngram_model_arpa.c(197): Reading bigrams
INFO: ngram_model_arpa.c(561): 20 = #bigrams created
INFO: ngram_model_arpa.c(562): 3 = #prob2 entries
INFO: ngram_model_arpa.c(570): 3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(294): Reading trigrams
INFO: ngram_model_arpa.c(583): 10 = #trigrams created
INFO: ngram_model_arpa.c(584): 2 = #prob3 entries
INFO: ngram_model_dmp.c(518): Building DMP model…
INFO: ngram_model_dmp.c(548): 12 = #unigrams created
INFO: ngram_model_dmp.c(649): 20 = #bigrams created
INFO: ngram_model_dmp.c(650): 3 = #prob2 entries
INFO: ngram_model_dmp.c(657): 3 = #bo_wt2 entries
INFO: ngram_model_dmp.c(661): 10 = #trigrams created
INFO: ngram_model_dmp.c(662): 2 = #prob3 entries
2015-01-07 13:01:28.583 OpenEarsSampleApp[842:60b] Done creating language model with CMUCLMTK in 0.049580 seconds.
2015-01-07 13:01:28.621 OpenEarsSampleApp[842:60b] The word ADIOS was not found in the dictionary /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-07 13:01:28.622 OpenEarsSampleApp[842:60b] Now using the fallback method to look up the word ADIOS
2015-01-07 13:01:28.624 OpenEarsSampleApp[842:60b] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the Spanish phonetic lookup dictionary is that your words are not in Spanish or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase.
2015-01-07 13:01:28.636 OpenEarsSampleApp[842:60b] The word HORCHATA was not found in the dictionary /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-07 13:01:28.638 OpenEarsSampleApp[842:60b] Now using the fallback method to look up the word HORCHATA
2015-01-07 13:01:28.639 OpenEarsSampleApp[842:60b] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the Spanish phonetic lookup dictionary is that your words are not in Spanish or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase.
2015-01-07 13:01:28.646 OpenEarsSampleApp[842:60b] The word LECHUGA was not found in the dictionary /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-07 13:01:28.648 OpenEarsSampleApp[842:60b] Now using the fallback method to look up the word LECHUGA
2015-01-07 13:01:28.649 OpenEarsSampleApp[842:60b] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the Spanish phonetic lookup dictionary is that your words are not in Spanish or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase.
2015-01-07 13:01:28.657 OpenEarsSampleApp[842:60b] The word MAINZ was not found in the dictionary /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/LanguageModelGeneratorLookupList.text/LanguageModelGeneratorLookupList.text.
2015-01-07 13:01:28.658 OpenEarsSampleApp[842:60b] Now using the fallback method to look up the word MAINZ
2015-01-07 13:01:28.659 OpenEarsSampleApp[842:60b] If this is happening more frequently than you would expect, the most likely cause for it is since you are using the Spanish phonetic lookup dictionary is that your words are not in Spanish or aren’t dictionary words, or that you are submitting the words in lowercase when they need to be entirely written in uppercase.
2015-01-07 13:01:28.664 OpenEarsSampleApp[842:60b] I’m done running performDictionaryLookup and it took 0.053968 seconds
2015-01-07 13:01:28.670 OpenEarsSampleApp[842:60b] I’m done running dynamic language model generation and it took 0.141725 seconds
2015-01-07 13:01:28.672 OpenEarsSampleApp[842:60b]Welcome to the OpenEars sample project. This project understands the words:
BACKWARD,
CHANGE,
FORWARD,
GO,
LEFT,
MODEL,
RIGHT,
TURN,
and if you say “CHANGE MODEL” it will switch to its dynamically-generated model which understands the words:
CHANGE,
MODEL,
MONDAY,
TUESDAY,
WEDNESDAY,
THURSDAY,
FRIDAY,
SATURDAY,
SUNDAY,
QUIDNUNC
2015-01-07 13:01:28.674 OpenEarsSampleApp[842:60b] Attempting to start listening session from startListeningWithLanguageModelAtPath:
2015-01-07 13:01:28.678 OpenEarsSampleApp[842:60b] User gave mic permission for this app.
2015-01-07 13:01:28.680 OpenEarsSampleApp[842:60b] Valid setSecondsOfSilence value of 0.500000 will be used.
2015-01-07 13:01:28.681 OpenEarsSampleApp[842:60b] Successfully started listening session from startListeningWithLanguageModelAtPath:
2015-01-07 13:01:28.682 OpenEarsSampleApp[842:1803] Starting listening.
2015-01-07 13:01:28.683 OpenEarsSampleApp[842:1803] about to set up audio session
2015-01-07 13:01:28.718 OpenEarsSampleApp[842:3b03] Audio route has changed for the following reason:
2015-01-07 13:01:28.723 OpenEarsSampleApp[842:3b03] There was a category change. The new category is AVAudioSessionCategoryPlayAndRecord
2015-01-07 13:01:28.732 OpenEarsSampleApp[842:3b03] This is not a case in which OpenEars notifies of a route change. At the close of this function, the new audio route is —SpeakerMicrophoneBuiltIn—. The previous route before changing to this route was <AVAudioSessionRouteDescription: 0x1667dc20,
inputs = (null);
outputs = (
“<AVAudioSessionPortDescription: 0x1667d800, type = Speaker; name = Altavoz; UID = Speaker; selectedDataSource = (null)>”
)>.
2015-01-07 13:01:29.142 OpenEarsSampleApp[842:1803] done starting audio unit
INFO: cmd_ln.c(702): Parsing command line:
\
-lm /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP \
-vad_prespeech 10 \
-vad_postspeech 50 \
-vad_threshold 3.500000 \
-remove_noise yes \
-remove_silence yes \
-bestpath yes \
-lw 6.500000 \
-dict /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic \
-hmm /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundleCurrent configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-keyphrase
-kws
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.DMP
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 10000 10000
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 10 10
-vad_threshold 2.0 3.500000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02INFO: cmd_ln.c(702): Parsing command line:
\
-feat s3_1x39Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no no
-doublebw no no
-feat 1s_c_d_dd s3_1x39
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.333333e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-vad_postspeech 50 50
-vad_prespeech 10 10
-vad_threshold 2.0 3.500000e+00
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.562500e-02INFO: acmod.c(252): Parsed model-specific feature parameters from /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/feat.params
INFO: feat.c(715): Initializing feature stream to type: ‘s3_1x39′, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: mdef.c(518): Reading model definition: /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/mdef
INFO: bin_mdef.c(181): Allocating 27954 * 8 bytes (218 KiB) for CD tree
INFO: tmat.c(206): Reading HMM transition probability matrices: /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/transition_matrices
INFO: acmod.c(124): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/means
INFO: ms_gauden.c(292): 2630 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 16×39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/variances
INFO: ms_gauden.c(292): 2630 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 16×39
INFO: ms_gauden.c(354): 16 variance values floored
INFO: acmod.c(126): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/means
INFO: ms_gauden.c(292): 2630 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 16×39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/variances
INFO: ms_gauden.c(292): 2630 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 16×39
INFO: ms_gauden.c(354): 16 variance values floored
INFO: ptm_mgau.c(792): Number of codebooks exceeds 256: 2630
INFO: acmod.c(128): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/means
INFO: ms_gauden.c(292): 2630 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 16×39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/variances
INFO: ms_gauden.c(292): 2630 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 16×39
INFO: ms_gauden.c(354): 16 variance values floored
INFO: ms_senone.c(149): Reading senone mixture weights: /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/OpenEarsSampleApp.app/AcousticModelSpanish.bundle/mixture_weights
INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(207): Not transposing mixture weights in memory
INFO: ms_senone.c(268): Read mixture weights for 2630 senones: 1 features x 16 codewords
INFO: ms_senone.c(320): Mapping senones to individual codebooks
INFO: ms_mgau.c(141): The value of topn: 4
INFO: dict.c(320): Allocating 4106 * 20 bytes (80 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /var/mobile/Applications/3258C065-2A15-463F-A98B-D502DF01812B/Library/Caches/FirstOpenEarsDynamicLanguageModel.dic
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(336): 10 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 26^3 * 2 bytes (34 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 8216 bytes (8 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 8216 bytes (8 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(79): No \data\ mark in LM file
INFO: ngram_model_dmp.c(166): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(220): ngrams 1=12, 2=20, 3=10
INFO: ngram_model_dmp.c(266): 12 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(312): 20 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(338): 10 = LM.trigrams read
INFO: ngram_model_dmp.c(363): 3 = LM.prob2 entries read
INFO: ngram_model_dmp.c(383): 3 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(403): 2 = LM.prob3 entries read
INFO: ngram_model_dmp.c(431): 1 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(487): 12 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 9 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 4 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 4 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 163
INFO: ngram_search_fwdtree.c(339): after: 9 root, 35 non-root channels, 3 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
2015-01-07 13:01:30.877 OpenEarsSampleApp[842:1803] Listening.
2015-01-07 13:01:30.879 OpenEarsSampleApp[842:1803] Project has these words or phrases in its dictionary:
ADIOS
BARCELONA
CHORIZO
HOLA
HORCHATA
LECHUGA
MADRID
MAINZ
PARIS
ROMA
2015-01-07 13:01:30.880 OpenEarsSampleApp[842:1803] Recognition loop has started
2015-01-07 13:01:30.905 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx is now listening.
2015-01-07 13:01:30.908 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx started.
2015-01-07 13:01:31.210 OpenEarsSampleApp[842:1803] Speech detected…
2015-01-07 13:01:31.212 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has detected speech.
2015-01-07 13:01:33.335 OpenEarsSampleApp[842:1803] End of speech detected…
INFO: cmn_prior.c(131): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 9.20 0.70 -0.18 -0.14 -0.30 -0.40 -0.35 -0.32 -0.38 -0.21 -0.22 -0.25 -0.19 >
INFO: ngram_search_fwdtree.c(1550): 277 words recognized (1/fr)
INFO: ngram_search_fwdtree.c(1552): 14886 senones evaluated (66/fr)
INFO: ngram_search_fwdtree.c(1556): 3780 channels searched (16/fr), 505 1st, 2058 last
INFO: ngram_search_fwdtree.c(1559): 304 words for which last channels evaluated (1/fr)
INFO: ngram_search_fwdtree.c(1561): 54 candidate words for entering last phone (0/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.34 CPU 0.152 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 2.44 wall 1.084 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
2015-01-07 13:01:33.337 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
INFO: ngram_search_fwdflat.c(938): 318 words recognized (1/fr)
INFO: ngram_search_fwdflat.c(940): 17608 senones evaluated (78/fr)
INFO: ngram_search_fwdflat.c(942): 5444 channels searched (24/fr)
INFO: ngram_search_fwdflat.c(944): 473 words searched (2/fr)
INFO: ngram_search_fwdflat.c(947): 57 word transitions (0/fr)
INFO: ngram_search_fwdflat.c(950): fwdflat 0.23 CPU 0.103 xRT
INFO: ngram_search_fwdflat.c(953): fwdflat 0.23 wall 0.103 xRT
INFO: ngram_search.c(1215): </s> not found in last frame, using HOLA.223 instead
INFO: ngram_search.c(1268): lattice start node <s>.0 end node HOLA.2
INFO: ngram_search.c(1294): Eliminated 32 nodes before end node
INFO: ngram_search.c(1399): Lattice has 36 nodes, 1 links
INFO: ps_lattice.c(1368): Normalizer P(O) = alpha(HOLA:2:223) = -478053
INFO: ps_lattice.c(1403): Joint P(O,S) = -478053 P(S|O) = 0
INFO: ngram_search.c(890): bestpath 0.01 CPU 0.002 xRT
INFO: ngram_search.c(893): bestpath 0.00 wall 0.000 xRT
2015-01-07 13:01:33.569 OpenEarsSampleApp[842:1803] Pocketsphinx heard “HOLA” with a score of (0) and an utterance ID of 0.
2015-01-07 13:01:33.570 OpenEarsSampleApp[842:60b] Flite sending interrupt speech request.
2015-01-07 13:01:33.572 OpenEarsSampleApp[842:60b] Local callback: The received hypothesis is HOLA with a score of 0 and an ID of 0
2015-01-07 13:01:33.574 OpenEarsSampleApp[842:60b] I’m running flite
2015-01-07 13:01:33.716 OpenEarsSampleApp[842:60b] I’m done running flite and it took 0.140953 seconds
2015-01-07 13:01:33.717 OpenEarsSampleApp[842:60b] Flite audio player was nil when referenced so attempting to allocate a new audio player.
2015-01-07 13:01:33.719 OpenEarsSampleApp[842:60b] Loading speech data for Flite concluded successfully.
2015-01-07 13:01:33.770 OpenEarsSampleApp[842:60b] Local callback: hypothesisArray is (
{
Hypothesis = HOLA;
Score = “-9037”;
}
)
2015-01-07 13:01:33.772 OpenEarsSampleApp[842:60b] Flite sending suspend recognition notification.
2015-01-07 13:01:33.774 OpenEarsSampleApp[842:60b] Local callback: Flite has started speaking
2015-01-07 13:01:33.780 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has suspended recognition.
2015-01-07 13:01:34.979 OpenEarsSampleApp[842:60b] AVAudioPlayer did finish playing with success flag of 1
2015-01-07 13:01:35.132 OpenEarsSampleApp[842:60b] Flite sending resume recognition notification.
2015-01-07 13:01:35.635 OpenEarsSampleApp[842:60b] Local callback: Flite has finished speaking
2015-01-07 13:01:35.642 OpenEarsSampleApp[842:60b] Valid setSecondsOfSilence value of 0.500000 will be used.
2015-01-07 13:01:35.643 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has resumed recognition.
INFO: cmn_prior.c(131): cmn_prior_update: from < 9.20 0.70 -0.18 -0.14 -0.30 -0.40 -0.35 -0.32 -0.38 -0.21 -0.22 -0.25 -0.19 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 9.20 0.70 -0.18 -0.14 -0.30 -0.40 -0.35 -0.32 -0.38 -0.21 -0.22 -0.25 -0.19 >
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
2015-01-07 13:01:37.487 OpenEarsSampleApp[842:1803] Speech detected…
2015-01-07 13:01:37.489 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has detected speech.
2015-01-07 13:01:41.814 OpenEarsSampleApp[842:1803] End of speech detected…
INFO: cmn_prior.c(131): cmn_prior_update: from < 9.20 0.70 -0.18 -0.14 -0.30 -0.40 -0.35 -0.32 -0.38 -0.21 -0.22 -0.25 -0.19 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 9.31 0.33 -0.27 0.11 -0.30 -0.45 -0.30 -0.15 -0.25 -0.18 -0.20 -0.17 -0.11 >
INFO: ngram_search_fwdtree.c(1550): 1881 words recognized (4/fr)
INFO: ngram_search_fwdtree.c(1552): 95871 senones evaluated (209/fr)
INFO: ngram_search_fwdtree.c(1556): 29159 channels searched (63/fr), 3690 1st, 15184 last
INFO: ngram_search_fwdtree.c(1559): 2097 words for which last channels evaluated (4/fr)
INFO: ngram_search_fwdtree.c(1561): 1023 candidate words for entering last phone (2/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 1.66 CPU 0.362 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 6.03 wall 1.316 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 12 words
2015-01-07 13:01:41.815 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
INFO: ngram_search_fwdflat.c(938): 1483 words recognized (3/fr)
INFO: ngram_search_fwdflat.c(940): 94703 senones evaluated (207/fr)
INFO: ngram_search_fwdflat.c(942): 37245 channels searched (81/fr)
INFO: ngram_search_fwdflat.c(944): 3256 words searched (7/fr)
INFO: ngram_search_fwdflat.c(947): 1500 word transitions (3/fr)
INFO: ngram_search_fwdflat.c(950): fwdflat 1.24 CPU 0.270 xRT
INFO: ngram_search_fwdflat.c(953): fwdflat 1.22 wall 0.266 xRT
INFO: ngram_search.c(1215): </s> not found in last frame, using <sil>.456 instead
INFO: ngram_search.c(1268): lattice start node <s>.0 end node <sil>.386
INFO: ngram_search.c(1294): Eliminated 5 nodes before end node
INFO: ngram_search.c(1399): Lattice has 67 nodes, 69 links
INFO: ps_lattice.c(1368): Normalizer P(O) = alpha(<sil>:386:456) = -1095420
INFO: ps_lattice.c(1403): Joint P(O,S) = -1112740 P(S|O) = -17320
INFO: ngram_search.c(890): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(893): bestpath 0.00 wall 0.000 xRT
2015-01-07 13:01:43.036 OpenEarsSampleApp[842:1803] Pocketsphinx heard “MAINZ HORCHATA LECHUGA ADIOS” with a score of (-17320) and an utterance ID of 1.
2015-01-07 13:01:43.037 OpenEarsSampleApp[842:60b] Flite sending interrupt speech request.
2015-01-07 13:01:43.038 OpenEarsSampleApp[842:60b] Local callback: The received hypothesis is MAINZ HORCHATA LECHUGA ADIOS with a score of -17320 and an ID of 1
2015-01-07 13:01:43.041 OpenEarsSampleApp[842:60b] I’m running flite
2015-01-07 13:01:43.054 OpenEarsSampleApp[842:1803] Speech detected…
2015-01-07 13:01:43.292 OpenEarsSampleApp[842:60b] I’m done running flite and it took 0.249377 seconds
2015-01-07 13:01:43.293 OpenEarsSampleApp[842:60b] Flite audio player was nil when referenced so attempting to allocate a new audio player.
2015-01-07 13:01:43.294 OpenEarsSampleApp[842:60b] Loading speech data for Flite concluded successfully.
2015-01-07 13:01:43.337 OpenEarsSampleApp[842:60b] Local callback: hypothesisArray is (
{
Hypothesis = “MAINZ HORCHATA LECHUGA ADIOS”;
Score = “-20469”;
},
{
Hypothesis = “MAINZ HORCHATA LECHUGA PARIS”;
Score = “-20469”;
},
{
Hypothesis = “MAINZ HORCHATA LECHUGA MAINZ”;
Score = “-20469”;
},
{
Hypothesis = “MAINZ MAINZ HORCHATA LECHUGA ADIOS”;
Score = “-20469”;
},
{
Hypothesis = “BARCELONA LECHUGA ADIOS”;
Score = “-20469”;
}
)
2015-01-07 13:01:43.339 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has detected speech.
2015-01-07 13:01:43.342 OpenEarsSampleApp[842:60b] Flite sending suspend recognition notification.
2015-01-07 13:01:43.344 OpenEarsSampleApp[842:60b] Local callback: Flite has started speaking
2015-01-07 13:01:43.351 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has suspended recognition.
2015-01-07 13:01:45.939 OpenEarsSampleApp[842:60b] AVAudioPlayer did finish playing with success flag of 1
2015-01-07 13:01:46.092 OpenEarsSampleApp[842:60b] Flite sending resume recognition notification.
2015-01-07 13:01:46.595 OpenEarsSampleApp[842:60b] Local callback: Flite has finished speaking
2015-01-07 13:01:46.602 OpenEarsSampleApp[842:60b] Valid setSecondsOfSilence value of 0.500000 will be used.
2015-01-07 13:01:46.603 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has resumed recognition.
INFO: cmn_prior.c(131): cmn_prior_update: from < 9.31 0.33 -0.27 0.11 -0.30 -0.45 -0.30 -0.15 -0.25 -0.18 -0.20 -0.17 -0.11 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 9.31 0.35 -0.27 0.13 -0.30 -0.46 -0.30 -0.16 -0.25 -0.19 -0.20 -0.17 -0.12 >
INFO: ngram_search_fwdtree.c(1550): 149 words recognized (3/fr)
INFO: ngram_search_fwdtree.c(1552): 8840 senones evaluated (173/fr)
INFO: ngram_search_fwdtree.c(1556): 2560 channels searched (50/fr), 423 1st, 1352 last
INFO: ngram_search_fwdtree.c(1559): 199 words for which last channels evaluated (3/fr)
INFO: ngram_search_fwdtree.c(1561): 74 candidate words for entering last phone (1/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.51 CPU 0.994 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 3.76 wall 7.378 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 5 words
INFO: ngram_search_fwdflat.c(938): 55 words recognized (1/fr)
INFO: ngram_search_fwdflat.c(940): 8242 senones evaluated (162/fr)
INFO: ngram_search_fwdflat.c(942): 3030 channels searched (59/fr)
INFO: ngram_search_fwdflat.c(944): 262 words searched (5/fr)
INFO: ngram_search_fwdflat.c(947): 144 word transitions (2/fr)
INFO: ngram_search_fwdflat.c(950): fwdflat 0.12 CPU 0.226 xRT
INFO: ngram_search_fwdflat.c(953): fwdflat 0.12 wall 0.236 xRT
2015-01-07 13:01:48.672 OpenEarsSampleApp[842:1803] Speech detected…
2015-01-07 13:01:48.673 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has detected speech.
INFO: cmn_prior.c(99): cmn_prior_update: from < 9.31 0.35 -0.27 0.13 -0.30 -0.46 -0.30 -0.16 -0.25 -0.19 -0.20 -0.17 -0.12 >
INFO: cmn_prior.c(116): cmn_prior_update: to < 9.32 0.34 -0.26 0.13 -0.30 -0.46 -0.33 -0.17 -0.25 -0.19 -0.20 -0.17 -0.13 >
2015-01-07 13:01:49.894 OpenEarsSampleApp[842:1803] End of speech detected…
INFO: cmn_prior.c(131): cmn_prior_update: from < 9.32 0.34 -0.26 0.13 -0.30 -0.46 -0.33 -0.17 -0.25 -0.19 -0.20 -0.17 -0.13 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 9.41 0.38 -0.27 0.09 -0.32 -0.45 -0.29 -0.18 -0.23 -0.20 -0.20 -0.15 -0.14 >
INFO: ngram_search_fwdtree.c(1550): 440 words recognized (4/fr)
INFO: ngram_search_fwdtree.c(1552): 26366 senones evaluated (214/fr)
INFO: ngram_search_fwdtree.c(1556): 7946 channels searched (64/fr), 1022 1st, 4338 last
INFO: ngram_search_fwdtree.c(1559): 544 words for which last channels evaluated (4/fr)
INFO: ngram_search_fwdtree.c(1561): 279 candidate words for entering last phone (2/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.53 CPU 0.427 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 2.97 wall 2.418 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 8 words
2015-01-07 13:01:49.896 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
INFO: ngram_search_fwdflat.c(938): 325 words recognized (3/fr)
INFO: ngram_search_fwdflat.c(940): 25549 senones evaluated (208/fr)
INFO: ngram_search_fwdflat.c(942): 9753 channels searched (79/fr)
INFO: ngram_search_fwdflat.c(944): 826 words searched (6/fr)
INFO: ngram_search_fwdflat.c(947): 472 word transitions (3/fr)
INFO: ngram_search_fwdflat.c(950): fwdflat 0.34 CPU 0.278 xRT
INFO: ngram_search_fwdflat.c(953): fwdflat 0.34 wall 0.276 xRT
INFO: ngram_search.c(1215): </s> not found in last frame, using HOLA.121 instead
INFO: ngram_search.c(1268): lattice start node <s>.0 end node HOLA.100
INFO: ngram_search.c(1294): Eliminated 1 nodes before end node
INFO: ngram_search.c(1399): Lattice has 31 nodes, 26 links
INFO: ps_lattice.c(1368): Normalizer P(O) = alpha(HOLA:100:121) = -331976
INFO: ps_lattice.c(1403): Joint P(O,S) = -342116 P(S|O) = -10140
INFO: ngram_search.c(890): bestpath 0.01 CPU 0.006 xRT
INFO: ngram_search.c(893): bestpath 0.00 wall 0.000 xRT
2015-01-07 13:01:50.237 OpenEarsSampleApp[842:1803] Pocketsphinx heard “ADIOS ROMA HOLA” with a score of (-10140) and an utterance ID of 2.
2015-01-07 13:01:50.238 OpenEarsSampleApp[842:60b] Flite sending interrupt speech request.
2015-01-07 13:01:50.240 OpenEarsSampleApp[842:60b] Local callback: The received hypothesis is ADIOS ROMA HOLA with a score of -10140 and an ID of 2
2015-01-07 13:01:50.243 OpenEarsSampleApp[842:60b] I’m running flite
2015-01-07 13:01:50.459 OpenEarsSampleApp[842:60b] I’m done running flite and it took 0.214843 seconds
2015-01-07 13:01:50.460 OpenEarsSampleApp[842:60b] Flite audio player was nil when referenced so attempting to allocate a new audio player.
2015-01-07 13:01:50.461 OpenEarsSampleApp[842:60b] Loading speech data for Flite concluded successfully.
2015-01-07 13:01:50.489 OpenEarsSampleApp[842:60b] Local callback: hypothesisArray is (
{
Hypothesis = “ADIOS ROMA HOLA”;
Score = “-5782”;
},
{
Hypothesis = “MAINZ ROMA HOLA”;
Score = “-5782”;
},
{
Hypothesis = “CHORIZO ROMA HOLA”;
Score = “-5782”;
},
{
Hypothesis = “ROMA HOLA”;
Score = “-5782”;
},
{
Hypothesis = “ADIOS ROMA HOLA”;
Score = “-5782”;
}
)
2015-01-07 13:01:50.491 OpenEarsSampleApp[842:60b] Flite sending suspend recognition notification.
2015-01-07 13:01:50.493 OpenEarsSampleApp[842:60b] Local callback: Flite has started speaking
2015-01-07 13:01:50.499 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has suspended recognition.
2015-01-07 13:01:52.627 OpenEarsSampleApp[842:60b] AVAudioPlayer did finish playing with success flag of 1
2015-01-07 13:01:52.780 OpenEarsSampleApp[842:60b] Flite sending resume recognition notification.
2015-01-07 13:01:53.283 OpenEarsSampleApp[842:60b] Local callback: Flite has finished speaking
2015-01-07 13:01:53.289 OpenEarsSampleApp[842:60b] Valid setSecondsOfSilence value of 0.500000 will be used.
2015-01-07 13:01:53.290 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has resumed recognition.
INFO: cmn_prior.c(131): cmn_prior_update: from < 9.41 0.38 -0.27 0.09 -0.32 -0.45 -0.29 -0.18 -0.23 -0.20 -0.20 -0.15 -0.14 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 9.41 0.38 -0.27 0.09 -0.32 -0.45 -0.29 -0.18 -0.23 -0.20 -0.20 -0.15 -0.14 >
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
2015-01-07 13:01:54.077 OpenEarsSampleApp[842:1803] Speech detected…
2015-01-07 13:01:54.079 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has detected speech.
INFO: cmn_prior.c(99): cmn_prior_update: from < 9.41 0.38 -0.27 0.09 -0.32 -0.45 -0.29 -0.18 -0.23 -0.20 -0.20 -0.15 -0.14 >
INFO: cmn_prior.c(116): cmn_prior_update: to < 9.19 0.49 -0.17 0.08 -0.34 -0.46 -0.31 -0.20 -0.24 -0.22 -0.23 -0.16 -0.18 >
2015-01-07 13:01:57.566 OpenEarsSampleApp[842:1803] End of speech detected…
INFO: cmn_prior.c(131): cmn_prior_update: from < 9.19 0.49 -0.17 0.08 -0.34 -0.46 -0.31 -0.20 -0.24 -0.22 -0.23 -0.16 -0.18 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 8.68 0.43 -0.14 0.05 -0.31 -0.46 -0.32 -0.19 -0.22 -0.20 -0.23 -0.15 -0.18 >
INFO: ngram_search_fwdtree.c(1550): 1269 words recognized (4/fr)
INFO: ngram_search_fwdtree.c(1552): 74257 senones evaluated (206/fr)
INFO: ngram_search_fwdtree.c(1556): 21743 channels searched (60/fr), 3037 1st, 11112 last
INFO: ngram_search_fwdtree.c(1559): 1527 words for which last channels evaluated (4/fr)
INFO: ngram_search_fwdtree.c(1561): 803 candidate words for entering last phone (2/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 1.27 CPU 0.353 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 4.12 wall 1.143 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 9 words
2015-01-07 13:01:57.567 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has detected a second of silence, concluding an utterance.
INFO: ngram_search_fwdflat.c(938): 966 words recognized (3/fr)
INFO: ngram_search_fwdflat.c(940): 65712 senones evaluated (183/fr)
INFO: ngram_search_fwdflat.c(942): 24595 channels searched (68/fr)
INFO: ngram_search_fwdflat.c(944): 2147 words searched (5/fr)
INFO: ngram_search_fwdflat.c(947): 882 word transitions (2/fr)
INFO: ngram_search_fwdflat.c(950): fwdflat 0.87 CPU 0.242 xRT
INFO: ngram_search_fwdflat.c(953): fwdflat 0.88 wall 0.244 xRT
INFO: ngram_search.c(1268): lattice start node <s>.0 end node </s>.354
INFO: ngram_search.c(1294): Eliminated 0 nodes before end node
INFO: ngram_search.c(1399): Lattice has 77 nodes, 73 links
INFO: ps_lattice.c(1368): Normalizer P(O) = alpha(</s>:354:358) = -1031992
INFO: ps_lattice.c(1403): Joint P(O,S) = -1042543 P(S|O) = -10551
INFO: ngram_search.c(890): bestpath 0.00 CPU 0.001 xRT
INFO: ngram_search.c(893): bestpath 0.00 wall 0.000 xRT
2015-01-07 13:01:58.446 OpenEarsSampleApp[842:1803] Pocketsphinx heard “ADIOS ROMA CHORIZO HORCHATA HORCHATA HORCHATA” with a score of (-10551) and an utterance ID of 3.
2015-01-07 13:01:58.448 OpenEarsSampleApp[842:60b] Flite sending interrupt speech request.
2015-01-07 13:01:58.449 OpenEarsSampleApp[842:60b] Local callback: The received hypothesis is ADIOS ROMA CHORIZO HORCHATA HORCHATA HORCHATA with a score of -10551 and an ID of 3
2015-01-07 13:01:58.452 OpenEarsSampleApp[842:60b] I’m running flite
2015-01-07 13:01:58.738 OpenEarsSampleApp[842:1803] Speech detected…
2015-01-07 13:01:58.815 OpenEarsSampleApp[842:60b] I’m done running flite and it took 0.361972 seconds
2015-01-07 13:01:58.816 OpenEarsSampleApp[842:60b] Flite audio player was nil when referenced so attempting to allocate a new audio player.
2015-01-07 13:01:58.818 OpenEarsSampleApp[842:60b] Loading speech data for Flite concluded successfully.
2015-01-07 13:01:58.847 OpenEarsSampleApp[842:60b] Local callback: hypothesisArray is (
{
Hypothesis = “ADIOS ROMA CHORIZO HORCHATA HORCHATA HORCHATA”;
Score = “-18498”;
},
{
Hypothesis = “ADIOS ROMA CHORIZO LECHUGA ROMA HORCHATA”;
Score = “-18498”;
},
{
Hypothesis = “ADIOS ROMA CHORIZO HORCHATA HOLA HORCHATA”;
Score = “-18498”;
},
{
Hypothesis = “ADIOS ROMA CHORIZO HORCHATA HORCHATA”;
Score = “-18498”;
},
{
Hypothesis = “HOLA ROMA CHORIZO HORCHATA HORCHATA HORCHATA”;
Score = “-18498”;
}
)
2015-01-07 13:01:58.848 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has detected speech.
2015-01-07 13:01:58.850 OpenEarsSampleApp[842:60b] Flite sending suspend recognition notification.
2015-01-07 13:01:58.852 OpenEarsSampleApp[842:60b] Local callback: Flite has started speaking
2015-01-07 13:01:58.859 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has suspended recognition.
2015-01-07 13:02:02.658 OpenEarsSampleApp[842:60b] AVAudioPlayer did finish playing with success flag of 1
2015-01-07 13:02:02.810 OpenEarsSampleApp[842:60b] Flite sending resume recognition notification.
2015-01-07 13:02:03.314 OpenEarsSampleApp[842:60b] Local callback: Flite has finished speaking
2015-01-07 13:02:03.320 OpenEarsSampleApp[842:60b] Valid setSecondsOfSilence value of 0.500000 will be used.
2015-01-07 13:02:03.321 OpenEarsSampleApp[842:60b] Local callback: Pocketsphinx has resumed recognition.
INFO: cmn_prior.c(131): cmn_prior_update: from < 8.68 0.43 -0.14 0.05 -0.31 -0.46 -0.32 -0.19 -0.22 -0.20 -0.23 -0.15 -0.18 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 8.79 0.45 -0.13 0.05 -0.30 -0.47 -0.36 -0.19 -0.22 -0.21 -0.24 -0.16 -0.19 >
INFO: ngram_search_fwdtree.c(1550): 600 words recognized (5/fr)
INFO: ngram_search_fwdtree.c(1552): 34910 senones evaluated (277/fr)
INFO: ngram_search_fwdtree.c(1556): 10828 channels searched (85/fr), 1096 1st, 6179 last
INFO: ngram_search_fwdtree.c(1559): 736 words for which last channels evaluated (5/fr)
INFO: ngram_search_fwdtree.c(1561): 563 candidate words for entering last phone (4/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 0.99 CPU 0.788 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 5.01 wall 3.973 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 10 words
INFO: ngram_search_fwdflat.c(938): 321 words recognized (3/fr)
INFO: ngram_search_fwdflat.c(940): 34110 senones evaluated (271/fr)
INFO: ngram_search_fwdflat.c(942): 14032 channels searched (111/fr)
INFO: ngram_search_fwdflat.c(944): 1130 words searched (8/fr)
INFO: ngram_search_fwdflat.c(947): 464 word transitions (3/fr)
INFO: ngram_search_fwdflat.c(950): fwdflat 0.46 CPU 0.363 xRT
INFO: ngram_search_fwdflat.c(953): fwdflat 0.46 wall 0.362 xRT
Here the link to download the speech I am using:
https://dl.dropboxusercontent.com/u/6380067/openears4.wav.zip
In this speech I just said two words in Spanish “MADRID” in second 10 and “ROMA” in second 20. Rest are noises around me.
I hope this helps you again.
January 7, 2015 at 8:14 pm #1024126Halle WinklerPolitepixAnyway, regarding your phrase “a vadThreshold as high as 3.5 will suppress actual user speech in testing” is enough to know that something is really wrong with 2.0.1 because here my test and you will see the results:
But this recording is not of noises, it is a recording of continuous, human-comprehensible speech without crosstalk which precedes the louder speech by a long time in the recording, so it is the reference point for distinguishing between speech and silence for the engine. It does make sense that it is being detected as speech rather than noise since it is single-speaker speech which is clear enough to be understood by a human listener. A user using an app from a distance could have their speech detected at this power level.
I agree with you that the behavior has changed from 1.70 (or more specifically Pocketsphinx .8), but I don’t see this is a sign of something being really wrong, since it is speech at a volume which could be user speech, and the recording begins with the quieter speech and carries on for long enough that it makes sense that it is recognized. It is a bit strange that it isn’t recognized at all in 1.70.
In any case, that isn’t the behavior you want for your app, which is reasonable. I did a large amount of testing today on your case using the old and new Pocketsphinx on some Ubuntu VMs and the old and new OpenEars, and I did notice that it looks a bit like for Spanish recognition, the vadThreshold values would be more useful if they went higher as you requested earlier. In 2.01 I had similar results to 1.70 when I used a vadThreshold of 4.4 or 4.3, which more similar to lower values with the English model (although I had better accuracy with 2.01). It seems possible that the ideal vadThreshold values may have some relationship to the acoustic models and when I’ve had more time to test it I will check in with the Sphinx project and see if they have some ideas about whether that is the case, at which point I may add some kind of vadThreshold multiplier to OpenEars.
For now, so you can get on with things, I’ve uploaded OpenEars 2.02 which has a maximum vadThreshold of 5.0. When I set it to 4.4 in my test of your audio, it recognized “MADRID” and then “ROMA” and nothing else. I hope this is helpful.
January 8, 2015 at 12:34 pm #1024137maxgarmarParticipantOk Halle, now I got it. Anyway thanks so much for increasing that value to improve the response with noises in the Spanish language. But now I have a doubt.
Regarding,I did notice that it looks a bit like for Spanish recognition, the vadThreshold values would be more useful if they went higher as you requested earlier. In 2.01 I had similar results to 1.70 when I used a vadThreshold of 4.4 or 4.3, which more similar to lower values with the English model (although I had better accuracy with 2.01)
I would like to know then, what are the “lower” values for English language to make it like Spanish. I meant, if 4.4 or 4.3 is for Spanish, what do you think is the best for English. My app is also working in English so it would help me.
Thanks and I will give you a feedback about my test of 2.0.2
January 8, 2015 at 12:52 pm #1024138Halle WinklerPolitepixThat’s a good question – from what I heard from other developers, they are seeing more noise suppression from 2.5-3.5 and past 3.5 starts to suppress real interactions. I am going to continue testing this out over the coming weeks to see if I can get some more concrete insight into this and perhaps discuss it with the Sphinx project once I have some good tests for their testbed (rather than mine), so there may be a later OpenEars version that has defaults that are more similar to the old behavior for each language. In the meantime, to find out the ideal settings for your own app with each language, testing is your friend :) .
January 9, 2015 at 9:26 pm #1024174Halle WinklerPolitepixIn my testing for version 2.03, a vadThreshold of 2.0 was fine for excluding a normal amount of background noise.
January 12, 2015 at 1:06 pm #1024183maxgarmarParticipantIn which language are you talking about, Halle ?
I saw in changelog that there are many improvements in 2.03. I will test it soon.Thanks
January 12, 2015 at 1:11 pm #1024184Halle WinklerPolitepixEnglish worked fine for me at 2.0.
January 13, 2015 at 11:20 am #1024197Halle WinklerPolitepixJust an update that I have verification from CMU Sphinx that needing a higher VAD threshold for the Spanish acoustic model is expected behavior. This may change in the future with the creation of a new Spanish acoustic model (that I would expect can be included in OpenEars at that time). I wanted to thank you sincerely for providing me with the test cases and reports about this, because it wasn’t showing up in my own tests and it’s important to know about for usability of the current model with OpenEars 2.x.
January 13, 2015 at 1:15 pm #1024199maxgarmarParticipantNot for this. I am using it so I just wanted to improve it as much as I can.
I think we can close this thread (although I will open a new one because other problem :D).Thanks for helping and supporting
-
AuthorPosts
- You must be logged in to reply to this topic.