AVCaptureSession audio combined with speech recognition – Politepix

AVCaptureSession audio combined with speech recognition

decoder — Tue, 12 Aug 2014 23:53:48 +0000

Hi there… I’m evaluating OpenEars with PocketsphinxController for my App (iOS 7.1, iPhone4s)

I’m recording video and audio (with AVFoundation->AVCaptureSession — AVCam sample code) and in parallel I use pocketsphinxcontroller to recognise voice for starting/stopping the video recording.

The video/audio recording starts just fine, but after few seconds, while i’m speaking some commands the audio suddenly stops in the video capture and doesn’t come back unless I start the video capture session again.

The pocketsphinxcontroller recognition works just fine.

Here are my settings (not using flite or any other audio out playback):
pocketsphinxController.returnNbest = YES;
pocketsphinxController.nBestNumber = 5;
self.pocketsphinxController.audioSessionMixing = YES;
self.pocketsphinxController.outputAudio = NO;

tried: self.pocketsphinxController.audioMode = @”VoiceChat”; too but no success.

This is the debug log. It seems that the audio breaks approximatively around:

2014-08-13 01:35:53.954 [5950:8707]

….

2014-08-13 01:35:53.884 [5950:60b] Validating ROAD
2014-08-13 01:35:53.892 [5950:60b] Command ROAD Not Found in Keywords
2014-08-13 01:35:53.888 [5950:8707] audioCategory is correct, we will leave it as it is.
2014-08-13 01:35:53.894 [5950:60b] Hypothesis 1: ROAD with a score of -29048
2014-08-13 01:35:53.894 [5950:8707] bluetoothInput is incorrect, we will change it.
2014-08-13 01:35:53.896 [5950:60b] Hypothesis 2: EVIDENCE with a score of -29137
2014-08-13 01:35:53.902 [5950:60b] Hypothesis 3: SPEED with a score of -49907
2014-08-13 01:35:53.904 [5950:60b] Hypothesis 4: with a score of -50079
2014-08-13 01:35:53.910 [5950:60b] Hypothesis 5: STOP with a score of -50234
2014-08-13 01:35:53.927 [5950:8707] bluetooth input is now on the correct setting of 1.
2014-08-13 01:35:53.927 [5950:8707] bluetooth input is now on the correct setting of 1.
2014-08-13 01:35:53.954 [5950:8707] Output Device: SpeakerAndMicrophone.
2014-08-13 01:35:53.960 [5950:8707] categoryDefaultToSpeaker is correct, we will leave it as it is.
2014-08-13 01:35:53.965 [5950:8707] OverrideCategoryMixWithOthers is incorrect, we will change it.
2014-08-13 01:35:53.970 [5950:8707] OverrideCategoryMixWithOthers is now on the correct setting of 1.
2014-08-13 01:35:53.978 [5950:8707] preferredBufferSize is incorrect, we will change it.
2014-08-13 01:35:53.982 [5950:8707] PreferredBufferSize is now on the correct setting of 0.128000.
2014-08-13 01:35:53.989 [5950:8707] preferredSampleRateCheck is incorrect, we will change it.
2014-08-13 01:35:54.736 [5950:8707] preferred hardware sample rate is now on the correct setting of 16000.000000.
2014-08-13 01:35:54.738 [5950:8707] Setting the variables for the device and starting it.
2014-08-13 01:35:54.740 [5950:8707] Looping through ringbuffer sections and pre-allocating them.
2014-08-13 01:35:54.853 [5950:8707] Started audio output unit.
2014-08-13 01:35:54.862 [5950:8707] Listening.

I suspect that the adjustments that are done and the restart of the audio unit is the reason of breaking the audio capture but what could be the reason of that?

Thanks,
Daniel

Reply To: AVCaptureSession audio combined with speech recognition

Halle Winkler — Wed, 13 Aug 2014 07:41:33 +0000

Welcome,

Yes, this is a known issue due to AV objects having as stringent audio session requirements as OpenEars. You can search for the keywords audio coexistence, or video, in these forums to read much more about it.

Reply To: AVCaptureSession audio combined with speech recognition

decoder — Fri, 22 Aug 2014 18:44:21 +0000

@Halle: Thanks for the feedback. Didn’t found anything that could fix the issue on the fly but while experimenting different values I got it working by doing:

self.pocketsphinxController.audioMode = @”VideoRecording”;
self.pocketsphinxController.audioSessionMixing = YES;
self.pocketsphinxController.outputAudio = NO;

before doing startListeningWithLanguageModelAtPath

Now it still happens very seldom, so I restart the session automatically from the pocketSphinxContinuousSetupDidFail function.

Reply To: AVCaptureSession audio combined with speech recognition

Halle Winkler — Fri, 22 Aug 2014 18:55:38 +0000

OK, good to know. I am working on some new coexistence code right now, so if you’d like to send me a sample app to test against that shows the unwanted behavior, future versions of OpenEars may handle this without any issues. But the sample app has to be extremely simple, as simple as possible to demonstrate the issue –– everything in a single view controller and only a few methods. If you’d like to send it, send me a note via the contact form.

Reply To: AVCaptureSession audio combined with speech recognition

decoder — Tue, 23 Sep 2014 22:05:03 +0000

Thanks for your feedback. Tested the same code with iOS 8 and got another issue when switching video recording to another file. Apparently, this time the video recording can’t start anymore (or the delegate doesn’t fire correctly).

Generally OpenEars collides somehow with the AVCaptureSession (or other way around).

When I’ll get some time I will try to add Voice Commands to the AVCam sample from Apple (https://developer.apple.com/library/ios/samplecode/AVCam/Introduction/Intro.html) and hopefully I will be able to reproduces these conflicts.

Reply To: AVCaptureSession audio combined with speech recognition

decoder — Tue, 23 Sep 2014 22:22:28 +0000

Looked into the debug info and found that after I get this debug message: “Stopping audio unit” the AVCapture doesn’t work stable.

Apparently, if I speak when stooping/restarting video recording, erratically, “Stopping audio unit” appears and the problems occurs.

If I turn off the OpenEars completely everything woks fine with the recording so it is definitively a conflict.

Forgot the say… On iOS8 if I apply the “patch” the video session is even more unstable.

Do you know what really means “Stopping audio unit.” and what could be the reason of this?

Here’s excerpt of the debug log:
2014-09-24 00:07:25.365 Streetcorder[4816:624437] Stopping audio unit.
2014-09-24 00:07:25.365 Streetcorder[4816:624371] Pocketsphinx has detected a period of silence, concluding an utterance.
2014-09-24 00:07:25.408 Streetcorder[4816:624437] Audio Output Unit stopped, cleaning up variable states.
2014-09-24 00:07:25.409 Streetcorder[4816:624437] Processing speech, please wait…

Reply To: AVCaptureSession audio combined with speech recognition

Halle Winkler — Wed, 24 Sep 2014 03:20:40 +0000

Yes, this is due to two conflicting audio sessions with different sample rates and bitrates. You are welcome to send me a very simple sample app demonstrating the issue if you would like to see this as a possible future feature.

Reply To: AVCaptureSession audio combined with speech recognition

Halle Winkler — Tue, 23 Dec 2014 09:28:45 +0000

This kind of video object coexistence ought to work by default with OpenEars 2.0, but I’m waiting to hear some feedback about it from the developers who have these features in their apps.

Reply To: AVCaptureSession audio combined with speech recognition

decoder — Tue, 23 Dec 2014 13:13:26 +0000

Hi Halle, Thanks for the new release!
I will upgrade to 2.0 these days and I’ll come back with some feedback about this issue.

Reply To: AVCaptureSession audio combined with speech recognition

Halle Winkler — Tue, 23 Dec 2014 13:20:37 +0000

Super, no rush at all but I will be interested in your results. Remember to first remove any workarounds you put in to get this working, so they don’t interfere with the default behavior.

This feature is a work in progress and probably needs a fair amount of feedback to catch lots of cases, so don’t be discouraged if it doesn’t work ideally for your case yet, just let me know what is happening when it doesn’t do the right thing and if possible give me a replication case so I can look into it.