Stoccato sound sampling/SaveThatWave shows problem

This topic has 7 replies, 2 voices, and was last updated 10 years ago by Halle Winkler.

Viewing 8 posts - 1 through 8 (of 8 total)

Advertisement: “Don't want to wait for pauses before receiving speech recognition results? try RapidEars!”

Author

Posts
March 28, 2014 at 9:41 pm #1020619
hughescr
Participant
I’m having some trouble in my OpenEars/SaveThatWave app. I think the same problem is happening when I don’t use SaveThatWave, but SaveThatWave makes it more clear I think what’s going on.

My app is trying to listen to a sequence of digits [0..9] being spoken. I initialize like this:
```
    LanguageModelGenerator *lmGenerator = [[LanguageModelGenerator alloc] init];

    NSString *name = @"NumberNaming";
    NSString *acousticModel = [AcousticModel pathToModel:@"AcousticModelEnglish"];
    NSError *err = [lmGenerator generateLanguageModelFromArray:@[@"ONE",@"TWO",@"THREE",@"FOUR",@"FIVE",@"SIX",@"SEVEN",@"EIGHT",@"NINE",@"ZERO"]
                                                withFilesNamed:name
                                        forAcousticModelAtPath:acousticModel];
    self.openEarsEventsObserver.delegate = self;

    self.pocketsphinxController.audioMode = @"VoiceChat";
    self.pocketsphinxController.calibrationTime = 3;
    self.pocketsphinxController.secondsOfSilenceToDetect = 1.5;
    [self.pocketsphinxController startListeningWithLanguageModelAtPath:[languageGeneratorResults objectForKey:@"LMPath"]
                                                      dictionaryAtPath:[languageGeneratorResults objectForKey:@"DictionaryPath"]
                                                   acousticModelAtPath:acousticModel
                                                   languageModelIsJSGF:NO];
    [self.saveThatWaveController start]; // For saving WAVs from OpenEars
```
and
```
- (PocketsphinxController *)pocketsphinxController
{
    if(pocketsphinxController == nil)
    {
        pocketsphinxController = [[PocketsphinxController alloc] init];
		pocketsphinxController.outputAudio = TRUE;
    }

    return pocketsphinxController;
}

- (OpenEarsEventsObserver *)openEarsEventsObserver
{
    if(openEarsEventsObserver == nil)
    {
        openEarsEventsObserver = [[OpenEarsEventsObserver alloc] init];
    }

    return openEarsEventsObserver;
}

- (SaveThatWaveController *)saveThatWaveController
{
    if(saveThatWaveController == nil)
    {
        saveThatWaveController = [[SaveThatWaveController alloc] init];
    }

	return saveThatWaveController;
}
```
I use [self.pocketsphinxController suspendRecognition] in pocketsphinxDidStartListening to suspend listening till I’m ready, then [self.pocketsphinxController resumeRecognition] when I want to start listening.

Now. In the emulator, everything works great. When I run on an actual hardware iPad, the first time I resumeRecognition, OpenEars quickly calls me back on pocketsphinxDidResumeRecognition, but if I start talking, it doesn’t seem to hear anything for a second or two. pocketsphinxDidDetectSpeech isn’t fired and anything I say doesn’t get hypothesized. If I wait silently for a while, then start talking, it hears me, but about 1/2 the time, the audio being recorded is very staccato and recognition is hopeless. Here is a sample recording from SaveTheWave. It sounds like maybe a buffering problem, where it’s got chunks of other parts of the recording stitched in the middle.

Any thoughts on what might be going on here?
March 28, 2014 at 9:51 pm #1020620

Halle Winkler
Politepix

Welcome,

I would expect that this is due to the VoiceChat audio mode. The alternate audio modes besides the default are offered without any guarantee of performance since their behaviors are undocumented and change from OS version to OS version and they aren’t part of the testbed. They were added due to several requests, but unfortunately can only be used on an as-is basis and should not be used if you are encountering issues resulting from them.

March 28, 2014 at 10:29 pm #1020621

hughescr
Participant

Disabling the audio mode does make it seem to record & recognize accurately. But now the audio that I’m playing during suspendRecognition using an AVPlayer is way way muted.

March 28, 2014 at 10:44 pm #1020622

hughescr
Participant

Oh, I see looking at the code that you’re using the old deprecated AudioSessionGetProperty() and AudioSessionSetProperty() instead of the AVAudioSession that iOS7 wants; I guess it’s probably not just deprecated in iOS7 but actually just broken too. I’ll see if I can hack something together with AVAudioSession that works.

March 28, 2014 at 10:50 pm #1020623

Halle Winkler
Politepix

This might be a basic limitation of the PlayAndRecord audio session when it isn’t used in combination with the VoiceChat mode (btw, the VoiceChat mode didn’t have this side-effect of changing interaction with media object playback in previous versions of iOS, and could easily cease having it in the future, which is part of the reason I don’t build the framework around special audio mode behaviors) but you can also try to turn on PocketsphinxController’s audioSessionMixing property to see if it is a session mixing issue rather than the somewhat buggy playback issue with VoiceChat.

March 28, 2014 at 10:55 pm #1020624

Halle Winkler
Politepix

Oh, I see looking at the code that you’re using the old deprecated AudioSessionGetProperty() and AudioSessionSetProperty() instead of the AVAudioSession that iOS7 wants; I guess it’s probably not just deprecated in iOS7 but actually just broken too. I’ll see if I can hack something together with AVAudioSession that works.

AVAudioSession is a higher-level wrapper on the older C-based AudioSession code, but the changes from OS to OS are not due to the API, deprecation, or being broken – there have been behavioral changes in these types of audio behavior in every OS since iOS 3 IIRC. The PlayAndRecord audio session has always had undesirable effects on playback since it was first introduced.

The VoiceChat-related skipping issue isn’t due to the audio API, BTW. It originates in the Pocketsphinx VAD, which isn’t designed to work with an audio mode that has noise suppression.

March 29, 2014 at 2:27 am #1020625

hughescr
Participant

I went and re-wrote AudioSessionManager.m using the AVFoundation stuff (and at the same time converted the whole project to ARC if you’re interested in that). Same behavior, so you’re right :) I tried the newer GameChat and VideoChat settings too, and also no luck.

March 29, 2014 at 10:23 am #1020627

Halle Winkler
Politepix

Yeah, the bigger issue unfortunately is that even if those modes helped your issue, there is no contract for their behavior so it could stop helping on a different device or architecture or after a minor or major OS update. That’s why OpenEars only uses the remoteIO audio unit and defaults to the standard audio mode even though other units and modes occasionally have better performance in various combinations depending on OS version and device.

The mode property was added after a lot of agitation for it, but I’m pretty likely to remove it in a future update since it has led to a lot of issues and support requests and I don’t have a great explanation for what it’s doing there if it causes problems without bringing positive results other than “people wanted it”; i.e. I made a design mistake.
Author

Posts

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic.