Tagged: quality, sample rate
- This topic has 31 replies, 5 voices, and was last updated 9 years, 6 months ago by Halle Winkler.
-
AuthorPosts
-
September 3, 2012 at 8:43 pm #10943hohlParticipant
When using OpenEars the sound playback quality gets reduced. Is there a way to prevent the AVAudioSession reconfiguring when using the PocketsphinxController the first time?
September 3, 2012 at 8:48 pm #10944Halle WinklerPolitepixCan you describe the reduction more specifically? It shouldn’t be possible for the bitrate or sample rate to be changed so I’m unclear on what aspect of playback is different. You can’t use PocketsphinxController without the audio session settings it needs.
September 5, 2012 at 11:19 am #10961hohlParticipantWhat I’ve found out when using logging is:
2012-09-05 12:13:51.599 Autoradio[5729:707] preferredBufferSize is incorrect, we will change it.
2012-09-05 12:13:51.604 Autoradio[5729:707] PreferredBufferSize is now on the correct setting of 0.128000.
2012-09-05 12:13:51.609 Autoradio[5729:707] preferredSampleRateCheck is incorrect, we will change it.
2012-09-05 12:13:51.698 Autoradio[5729:707] preferred hardware sample rate is now on the correct setting of 16000.000000.May this result in reduction?
It’s hard to describe, maybe because I am not a musician. I would say everything sounds more dull. Thought of a lowering of the bitrate?September 5, 2012 at 11:33 am #10962hohlParticipantI extended the logging a bit and recompiled the lib. That’s what I am getting:
2012-09-05 12:29:57.733 Autoradio[5778:707] preferredBufferSize is incorrect, we will change it. Current value: 0.023000
2012-09-05 12:29:57.747 Autoradio[5778:707] PreferredBufferSize is now on the correct setting of 0.128000.
2012-09-05 12:29:57.755 Autoradio[5778:707] preferredSampleRateCheck is incorrect, we will change it. Current value: 44100.000000
2012-09-05 12:29:57.945 Autoradio[5778:707] preferred hardware sample rate is now on the correct setting of 16000.000000.
Sounds like a reduction of hardware sample rate? May I am able to change the check to something like if it is the prefereded kSamplesPerSecond or better or will this block the functionality of OpenEars?September 5, 2012 at 11:38 am #10963hohlParticipantChanged it to:
if (fabs(preferredSampleRateCheck - kSamplesPerSecond) 0.0) {
in AudioSessionManager.m:400 and it still works and the reduction doesn’t take place anymore.September 5, 2012 at 11:42 am #10965Halle WinklerPolitepixAh, I understand now, you’re using a full 44.1k rate and PocketsphinxController requires (really requires) a 16k rate. If you convince it not to sample at 16k you will reduce the recognition accuracy severely. You’re correct that 16k recordings won’t sound as nice as 44.1k (CD quality) but if Pocketsphinx analyzed a 44.1k recording it would take forever.
September 5, 2012 at 11:43 am #10966hohlParticipantSomething is wrong with the code tag in this forum so I uploaded the change to line 400 in AudioSessionManager.m here: https://www.sourcedrop.net/Tyj72cb2147c9
Will this have an influence on OpenEars?
September 5, 2012 at 11:44 am #10967Halle WinklerPolitepixWhy do you need to make a CD-quality recording using the same stream that Pocketsphinx is using?
September 5, 2012 at 11:45 am #10968hohlParticipantAh ok. I understand. But since it still works with the small dictionary I am using I’ll let it like that.
September 5, 2012 at 11:45 am #10969Halle WinklerPolitepixWill this have an influence on OpenEars?
Yup, see my answer that slipped in ahead of your last post.
September 5, 2012 at 11:46 am #10970Halle WinklerPolitepixOK, that’s your call. I think that the perceived speech as far as pocketsphinx is concerned will seem quite different but I’ve also had the experience that it does perform the recognition after all, but there is a big loss of accuracy. For a small vocabulary it’s true that you might find it tolerable regardless so I’m glad to hear it works all right for your application. Do me a favor and mention your override in future support questions so that I can distinguish between potential issues that are normal and potential issues which could be a side-effect of your change.
September 5, 2012 at 11:56 am #10971Halle WinklerPolitepixJust for some background on why it’s like this, for speech perception purposes there isn’t a big improvement in perception for higher sampling rates than 16k (and mono), which means that most speech recognition software will attempt recognition with a maximum of a 16k sample rate because it means there are far fewer samples that have to be analyzed. For non-speech applications such as music it’s naturally always going to be be better to use a higher sample rate and stereo if possible. But generally, even for speech that humans listen to, you also don’t get a lot of extra “bang for the buck” for going from 16k to 44.1k because the comparison standard is telephone bandwidth, which is generally standardized at 8k and compressed, making 16k PCM already a big step up. The reason that the recognition is compromised is that it assumes that a “chunk” of speech is likely to occur within a certain number of samples in a timeframe, and it’s more like 3x the samples in which the speech is occurring, so it is really not going to map well to the recordings which are in the acoustic model (which are actually 8k but the input functions compensate for the doubling of the input rate)
September 5, 2012 at 12:09 pm #10972hohlParticipantBut I need high quality playback since my application is a media player and 16k isn’t acceptable for that kind of application. Why does OpenEars needs to change the global playback quality?
September 5, 2012 at 12:18 pm #10973Halle WinklerPolitepixIt isn’t actively changing the sample rate for playback, it is using the required recording and playback audio session type with a 16k record rate, which might override the playback rate as an unintended side effect. It’s actually a bit surprising to me that the playback rate of a media object is being affected at all, can you show me your object playback code as a test sample so I can replicate and look into it when there is time?
September 5, 2012 at 12:20 pm #10974hohlParticipantI am just using AVPlayer for playback. https://developer.apple.com/library/mac/#documentation/AVFoundation/Reference/AVPlayer_Class/Reference/Reference.html
September 5, 2012 at 12:38 pm #10975Halle WinklerPolitepixRight, but you must be using it during the recognition activity because otherwise the AVPlayer audio session would completely override the OpenEars audio session, so my interest is in how you are using it so it is possible for its playback settings to conflict with those of the OpenEars audio session.
September 5, 2012 at 2:24 pm #10976hohlParticipantAre you looking for this?
NSError *audioSessionError = nil;
[[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayAndRecord error:&audioSessionError];
[[AVAudioSession sharedInstance] setActive:YES error:&audioSessionError];
if (audioSessionError != nil) {
NSLog(@"Something went wrong with initialising the audio session!");
}AudioSessionSetActive(true);
AudioSessionAddPropertyListener(kAudioSessionProperty_AudioRouteChange, ARAudioSessionPropertyListener, nil);AVPlayer is just played and OpenEars session starts when triggered by the user. AVPlayer still plays in background, but I’ll going to make the volume of it lower during OpenEars session in future to provide better results.
September 5, 2012 at 3:09 pm #10977Halle WinklerPolitepixWhy are you doing your own Audio Session management (serious question, maybe there is a good reason for it despite it being in conflict with the OpenEars instructions)?
July 3, 2013 at 2:09 pm #1017602markmakingmusicParticipantIs there a fix or workaround for this? Our app is a streaming music player, which uses Open Ears to detect speech, which is then used to trigger commands for the player (play, stop, pause, next, etc.).
We need playback to be set at 44.100 (which we set when our audio session is set up). However, there is an extreme degradation in quality when we enable Open Ears speech detection. Any ideas?
July 3, 2013 at 2:37 pm #1017603Halle WinklerPolitepixWelcome Mark,
The issue is that PocketsphinxController can’t perform speech recognition on a sample rate other that 16k, so your choice is between 16k playback with less sound resolution or 44.1k recognition with lower-accuracy recognition and other potential problems such as buffer overruns. Option 3 is to separate the two functions and set the session as needed when switching between them, which doesn’t sound like it’s available to you in your usage case.
The best bet for a permanent workaround would be to put some research into how to change the audio driver so that it mixes its own input requirements with external output requirements instead of using its own output settings. I can also do this when there is time, but there are a number of things ahead of that feature at the moment so it will be a bit. Have you tried setting the AudioSessionManager allowMixing property to TRUE? A quick search of the forums should explain more about that.
February 5, 2014 at 12:02 pm #1020046fdimParticipantI am bringing this topic back from the dead. I am running a fairly simple experiment. I don’t do any music manipulation in my app. I use the device’s iPod app. When I start openears by default it takes over the audio session and the music stops.
If I set “audioSessionMixing” property to “YES” then the audio is indeed mixed but with the above problem of dropping the sound quality probably due to dropping of the sample rate. Questions:
– Is there any way to set a different sample rate between recording and playback
– Is it possible to hold two different (channels, streams not sure about the terminology) that each one handle audio with a different sample rate?February 5, 2014 at 12:13 pm #1020047Halle WinklerPolitepixSorry, neither of those things are currently possible. The RecordAndPlay audio session does actually force the playback stream to be enabled and to be the same sample rate as the record stream. I’ve put in a large period of research very recently trying to see if there is any way to decouple them in the driver so that OpenEars could use the mic stream without having any effect on playback, and I had no success and found no reports of other successful experiments with this. I’m probably not 100% done experimenting with this since I would also like to release the playback settings entirely, but the last round made no headway after a lot of investigation and it will be a few months at the earliest before it’s possible to delve into it again.
The sample rate that the driver sets for the mic stream is the only one that can be used with acceptable recognition results.
February 6, 2014 at 10:52 pm #1020071Halle WinklerPolitepixQuick question, since it seems to be causing unexpectedly good results in iOS7 in other playback-related areas that had issues due to the audio session requirements in previous versions: have you tried setting PocketsphinxController’s audioMode setting to @”VoiceChat” to see if that helps at all with playback sample rates?
February 7, 2014 at 10:17 am #1020076fdimParticipantYes I have. From what I can hear @”VoiceChat” only differs in volume compared to @”Default”. Quality is unfortunately the same.
After a quick look I had I think that a possible way to solve this is to leave the default sample rate intact (44.1k) and then use an audio converter audio unit that will downsample the recording input realtime to 16k and feed it into openears. Have you been down that road?
February 11, 2014 at 6:33 pm #1020127fdimParticipantSorry to bug you again on this. I am willing to tamper with the code to try and fix this myself in the way suggested in my previous post. I just want to know if you ‘ve ever tried it that way and it failed so as to give up on it.
February 11, 2014 at 10:03 pm #1020129Halle WinklerPolitepixHi,
No, I have more constraints there. It would have to take in and convert any sample rate and any other characteristics that might come in (such as interleaved, compressed, stereo, vbr, etc), test across all those kinds of stuff passing through, and without having any effect on performance or overhead including on old phones and including in RapidEars. From my perspective the way to fix that annoyance in the long term is to figure out how to successfully decouple the input callback from the output callback because that decreases the overall complexity and the complexity in this particular case, but that seems to be sensitive core audio code and extremely underdocumented, so I haven’t gotten anywhere with it yet if not for a lack of trying recently.
This is a little outside of the scope of support that I offer because it’s getting pretty low-level and questions in this area usually lead to new questions (which is reasonable and there’s absolutely nothing wrong with it, but there aren’t necessarily the resources on my end to talk it through a lot). I completely understand that it’s important to your spec and I’m sorry I can’t support that feature right now.
September 30, 2014 at 1:33 am #1022655dandoenParticipantHalle,
Love your work.
I’m having the same issue as above and am wondering if you’ve gotten any further with this?
Thanks,
DanSeptember 30, 2014 at 8:25 am #1022658Halle WinklerPolitepixHi Dan,
Thank you! Yes, this is under development right now.
September 30, 2014 at 11:18 am #1022660dandoenParticipantThanks, great to hear.
Any news on when you’re planning to release next update?Also, want to try out what the user Hohl (in this thread) suggested (increasing the bitrate) as I have a super small vocabulary. Where can I find the source so I alter and build myself?
September 30, 2014 at 11:24 am #1022661Halle WinklerPolitepixAny news on when you’re planning to release next update?
When it’s ready ;) . It’s the only thing I’m working on, so when all the old stuff works and the new stuff works I will be very happy to release it, but I think if I gave a release date I’d probably end up making a liar of myself.
Also, want to try out what the user Hohl (in this thread) suggested (increasing the bitrate) as I have a super small vocabulary. Where can I find the source so I alter and build myself?
I really question whether this could work acceptably as a user experience but there is no harm in experimenting. The source is all in the distribution you downloaded, just open up OpenEars.xcodeproj.
September 30, 2014 at 11:51 am #1022662dandoenParticipantTotally get it re: release date ;)
And again, thanks. I’ll do some testing and see if it’s acceptable for my use case.September 30, 2014 at 1:24 pm #1022666Halle WinklerPolitepixThanks for your understanding!
-
AuthorPosts
- You must be logged in to reply to this topic.