OpenEars intigartion with camera control

Tagged: OpenEars with camera, voice recognition, voice recognotion with camera

This topic has 13 replies, 2 voices, and was last updated 11 years, 6 months ago by Halle Winkler.

Viewing 14 posts - 1 through 14 (of 14 total)

Advertisement: “Don't want to wait for pauses before receiving speech recognition results? try RapidEars!”

Author

Posts
September 27, 2012 at 1:06 pm #11361

jay_stepin
Participant

Hello,
I have made a demo of voice recognition with openEars and its working fine. But now i want to integrate it with camera control. For example when user shout START camera will start video recording and stop video recording with STOP.
My first question is it possible with OpenEars??
And if yes can you help me to get start.
Do any one know any such application??

September 29, 2012 at 11:33 am #11420

Halle Winkler
Politepix

Hi Jay,

This question is a little too broad for the forum, sorry. Feel free to ask specific questions about your code.

September 29, 2012 at 2:44 pm #11422

jay_stepin
Participant

Okay.
So i try to get it work with UIImagePickerController, but when UIImagePickerController comes in play

LanguageModelGenerator *lmGenerator = [[LanguageModelGenerator alloc] init];

NSArray *words = [NSArray arrayWithObjects:@”START”, @”STOP”, @”HAVARD”, nil];
NSString *name = @”Mahadev”;
NSError *err = [lmGenerator generateLanguageModelFromArray:words withFilesNamed:name];
NSLog(@”%@”,err);
NSDictionary *languageGeneratorResults = nil;

lmPath = nil;
dicPath = nil;

if([err code] == noErr) {

languageGeneratorResults = [err userInfo];

lmPath = [languageGeneratorResults objectForKey:@”LMPath”];
dicPath = [languageGeneratorResults objectForKey:@”DictionaryPath”];

} else {
NSLog(@”Error: %@”,[err localizedDescription]);
}

upto this coding works fine but

[self.pocketsphinxController startListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dicPath languageModelIsJSGF:NO];

this method doesn’t trigger, so i do not get

– (void) pocketsphinxDidStartListening {
NSLog(@”Pocketsphinx is now listening.”);
}

working….

If you can help me with this it will be awesome.

I know this is a messy kind of question, so feel free if you need more specifications.

September 29, 2012 at 2:56 pm #11423

Halle Winkler
Politepix

What is the relationship between the picker code and the code above? Probably it’s not the case that startListeningWithLanguageModelAtPath: doesn’t trigger but rather that it gets to a certain point in the loop and has trouble. If you turn on verbosePocketSphinx and OpenEarsLogging the output will probably tell you a lot about the reason that startListeningWithLanguageModelAtPath: isn’t getting good results. You can search the log output for the words “error” or “warning” specifically or you can post it here (but please make sure both forms of logging have been turned on first so I can really see everything that is happening).

October 1, 2012 at 2:17 pm #11448

jay_stepin
Participant

Hello Halle,
As per your suggestion i turn on OpenEarsLogging and get consol output like

2012-10-01 18:44:25.956 videoDemo[1085:307] Starting OpenEars logging for OpenEars version {{{{1.2.2}}}} on device: iPhone running iOS version: 4.100000
2012-10-01 18:44:25.965 videoDemo[1085:307] Normalized array contains the following entries:
(
START,
STOP,
HAVARD
)
2012-10-01 18:44:25.977 videoDemo[1085:307] Starting dynamic language model generation
2012-10-01 18:44:25.988 videoDemo[1085:307] Able to open /var/mobile/Applications/E7438B2B-52F3-40DD-BE93-F444AD537A32/Documents/Mahadev.corpus for reading
2012-10-01 18:44:25.991 videoDemo[1085:307] Able to open /var/mobile/Applications/E7438B2B-52F3-40DD-BE93-F444AD537A32/Documents/Mahadev_pipe.txt for writing
2012-10-01 18:44:25.994 videoDemo[1085:307] Starting text2wfreq_impl
2012-10-01 18:44:26.032 videoDemo[1085:307] Done with text2wfreq_impl
2012-10-01 18:44:26.037 videoDemo[1085:307] Able to open /var/mobile/Applications/E7438B2B-52F3-40DD-BE93-F444AD537A32/Documents/Mahadev_pipe.txt for reading.
2012-10-01 18:44:26.041 videoDemo[1085:307] Able to open /var/mobile/Applications/E7438B2B-52F3-40DD-BE93-F444AD537A32/Documents/Mahadev.vocab for reading.
2012-10-01 18:44:26.044 videoDemo[1085:307] Starting wfreq2vocab
2012-10-01 18:44:26.049 videoDemo[1085:307] Done with wfreq2vocab
2012-10-01 18:44:26.057 videoDemo[1085:307] Starting text2idngram
2012-10-01 18:44:26.094 videoDemo[1085:307] Done with text2idngram
2012-10-01 18:44:26.103 videoDemo[1085:307] Starting idngram2lm

2012-10-01 18:44:26.145 videoDemo[1085:307] Done with idngram2lm
2012-10-01 18:44:26.149 videoDemo[1085:307] Starting sphinx_lm_convert
2012-10-01 18:44:26.165 videoDemo[1085:307] Finishing sphinx_lm_convert
2012-10-01 18:44:26.178 videoDemo[1085:307] Done creating language model with CMUCLMTK in 0.198320 seconds.
2012-10-01 18:44:26.550 videoDemo[1085:307] I’m done running performDictionaryLookup and it took 0.309528 seconds
2012-10-01 18:44:26.563 videoDemo[1085:307] I’m done running dynamic language model generation and it took 370790066.563663 seconds

Just this not ant further.

October 1, 2012 at 11:58 pm #11449

Halle Winkler
Politepix

Hi,

I don’t think that logging has verbosePocketSphinx enabled. If it does, that means that your app has an issue that is blocking before [self.pocketsphinxController startListeningWithLanguageModelAtPath:lmPath dictionaryAtPath:dicPath languageModelIsJSGF:NO];, not during it, since the logging would show it starting but stopping somewhere, but this logging shows nothing that happens after the language model is generated. I recommended showing the relationship between your picker code and the OpenEars code before — without that or the output with verbosePocketsphinx there’s no way to know what is happening since the code above is the code which works in the sample app.

October 2, 2012 at 6:34 am #11452

jay_stepin
Participant

Hello,

I m really sorry for my explanation its really bad.

So i have create a viewController and put all the method of OpenEars in viewDidload and add this controller as camera overlay.

Pardon me but i don’t understand your point by “I don’t think that logging has verbosePocketSphinx enabled”, can you explain it more so i can help my self by giving you more info.

October 2, 2012 at 8:36 am #11454

Halle Winkler
Politepix

Hi Jay,

No problem, just go to https://www.politepix.com/openears and search the page for “verbosePocketSphinx” and you’ll see the property definition. There is also an example of using it in the sample app view controller if you search for that string.

October 3, 2012 at 7:59 am #11488

jay_stepin
Participant

Hello Halle,

As per your suggestion i have start verbosePocketSphinx and got following log in console.

2012-10-03 12:25:30.609 videoDemo[1871:307] Starting OpenEars logging for OpenEars version {{{{1.2.2}}}} on device: iPhone running iOS version: 4.100000
2012-10-03 12:25:30.619 videoDemo[1871:307] Normalized array contains the following entries:
(
START,
STOP,
HAVARD
)
2012-10-03 12:25:30.631 videoDemo[1871:307] Starting dynamic language model generation
2012-10-03 12:25:30.641 videoDemo[1871:307] Able to open /var/mobile/Applications/E7438B2B-52F3-40DD-BE93-F444AD537A32/Documents/Mahadev.corpus for reading
2012-10-03 12:25:30.645 videoDemo[1871:307] Able to open /var/mobile/Applications/E7438B2B-52F3-40DD-BE93-F444AD537A32/Documents/Mahadev_pipe.txt for writing
2012-10-03 12:25:30.650 videoDemo[1871:307] Starting text2wfreq_impl
2012-10-03 12:25:30.692 videoDemo[1871:307] Done with text2wfreq_impl
2012-10-03 12:25:30.697 videoDemo[1871:307] Able to open /var/mobile/Applications/E7438B2B-52F3-40DD-BE93-F444AD537A32/Documents/Mahadev_pipe.txt for reading.
2012-10-03 12:25:30.702 videoDemo[1871:307] Able to open /var/mobile/Applications/E7438B2B-52F3-40DD-BE93-F444AD537A32/Documents/Mahadev.vocab for reading.
2012-10-03 12:25:30.707 videoDemo[1871:307] Starting wfreq2vocab
2012-10-03 12:25:30.714 videoDemo[1871:307] Done with wfreq2vocab
2012-10-03 12:25:30.720 videoDemo[1871:307] Starting text2idngram
2012-10-03 12:25:30.758 videoDemo[1871:307] Done with text2idngram
2012-10-03 12:25:30.767 videoDemo[1871:307] Starting idngram2lm

2012-10-03 12:25:30.808 videoDemo[1871:307] Done with idngram2lm
2012-10-03 12:25:30.811 videoDemo[1871:307] Starting sphinx_lm_convert
2012-10-03 12:25:30.824 videoDemo[1871:307] Finishing sphinx_lm_convert
2012-10-03 12:25:30.836 videoDemo[1871:307] Done creating language model with CMUCLMTK in 0.202123 seconds.
2012-10-03 12:25:31.201 videoDemo[1871:307] I’m done running performDictionaryLookup and it took 0.300287 seconds
2012-10-03 12:25:31.213 videoDemo[1871:307] I’m done running dynamic language model generation and it took 370940131.213007 seconds
2012-10-03 12:25:31.219 videoDemo[1871:307] JAY TEsting
2012-10-03 12:25:31.226 videoDemo[1871:307] A sample rate was requested that isn’t one of the two supported values of 16000 or 8000 so we will use the default of 16000.
2012-10-03 12:25:31.241 videoDemo[1871:307] The audio session has never been initialized so we will do that now.
2012-10-03 12:25:31.247 videoDemo[1871:307] Checking and resetting all audio session settings.
2012-10-03 12:25:31.250 videoDemo[1871:307] audioCategory is incorrect, we will change it.
2012-10-03 12:25:31.253 videoDemo[1871:307] audioCategory is now on the correct setting of kAudioSessionCategory_PlayAndRecord.
2012-10-03 12:25:31.256 videoDemo[1871:307] bluetoothInput is incorrect, we will change it.
2012-10-03 12:25:31.258 videoDemo[1871:307] bluetooth input is now on the correct setting of 1.
2012-10-03 12:25:31.261 videoDemo[1871:307] categoryDefaultToSpeaker is incorrect, we will change it.
2012-10-03 12:25:31.269 videoDemo[1871:307] CategoryDefaultToSpeaker is now on the correct setting of 1.
2012-10-03 12:25:31.272 videoDemo[1871:307] preferredBufferSize is incorrect, we will change it.
2012-10-03 12:25:31.275 videoDemo[1871:307] PreferredBufferSize is now on the correct setting of 0.128000.
2012-10-03 12:25:31.278 videoDemo[1871:307] preferredSampleRateCheck is incorrect, we will change it.
2012-10-03 12:25:31.280 videoDemo[1871:307] preferred hardware sample rate is now on the correct setting of 16000.000000.
2012-10-03 12:25:31.422 videoDemo[1871:307] AudioSessionManager startAudioSession has reached the end of the initialization.
2012-10-03 12:25:31.427 videoDemo[1871:307] Exiting startAudioSession.
2012-10-03 12:25:31.434 videoDemo[1871:560f] Recognition loop has started
2012-10-03 12:25:31.568 videoDemo[1871:307] Using two-stage rotation animation. To use the smoother single-stage animation, this application must remove two-stage method implementations.
2012-10-03 12:25:31.696 videoDemo[1871:307] Using two-stage rotation animation is not supported when rotating more than one view controller or view controllers not the window delegate
2012-10-03 12:25:32.457 videoDemo[1871:560f] Starting openAudioDevice on the device.
2012-10-03 12:25:32.472 videoDemo[1871:560f] Audio unit wrapper successfully created.
2012-10-03 12:25:32.493 videoDemo[1871:560f] Set audio route to SpeakerAndMicrophone
2012-10-03 12:25:32.500 videoDemo[1871:560f] Checking and resetting all audio session settings.
2012-10-03 12:25:32.505 videoDemo[1871:560f] audioCategory is correct, we will leave it as it is.
2012-10-03 12:25:32.508 videoDemo[1871:560f] bluetoothInput is correct, we will leave it as it is.
2012-10-03 12:25:32.510 videoDemo[1871:560f] categoryDefaultToSpeaker is correct, we will leave it as it is.
2012-10-03 12:25:32.517 videoDemo[1871:560f] preferredBufferSize is correct, we will leave it as it is.
2012-10-03 12:25:32.521 videoDemo[1871:560f] preferredSampleRateCheck is correct, we will leave it as it is.
2012-10-03 12:25:32.524 videoDemo[1871:560f] Setting the variables for the device and starting it.
2012-10-03 12:25:32.537 videoDemo[1871:560f] Looping through ringbuffer sections and pre-allocating them.
2012-10-03 12:25:33.356 videoDemo[1871:560f] Started audio output unit.
2012-10-03 12:25:33.360 videoDemo[1871:560f] Calibration has started
2012-10-03 12:25:34.764 videoDemo[1871:307] The Audio Session was interrupted.
2012-10-03 12:25:35.569 videoDemo[1871:560f] Calibration has completed
2012-10-03 12:25:35.573 videoDemo[1871:560f] Project has these words in its dictionary:
HAVARD
START
STOP
2012-10-03 12:25:35.576 videoDemo[1871:560f] Listening.

October 3, 2012 at 8:11 am #11489

Halle Winkler
Politepix

OK, I think the issue is simply that the audio stream is not provided to PocketsphinxController since it is used by the video picker.

October 3, 2012 at 8:17 am #11490

jay_stepin
Participant

OK, so any thing we can do about it??

October 3, 2012 at 8:28 am #11491

Halle Winkler
Politepix

Solving this would be an advanced undertaking that would require you to thoroughly research the iOS audio session and do a lot of self-guided experimentation in order to learn what is needed. Maybe it’s possible but it isn’t something I can walk you through unfortunately.

October 3, 2012 at 8:36 am #11492

jay_stepin
Participant

Ahhh. Thats Heart Breaking. Thanks mate for the entire help. I really appreciate your work and help. Many many Thanks.

October 3, 2012 at 8:37 am #11493

Halle Winkler
Politepix

You’re welcome, good luck with your app.
Author

Posts

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic.