Tagged: AirPlay
- This topic has 9 replies, 2 voices, and was last updated 11 years ago by Halle Winkler.
-
AuthorPosts
-
April 20, 2013 at 8:01 pm #1017012ransomweaverParticipant
Hello,
My app is using OpenEars to create WAV files that are stored and played back later. My playback system uses MPMusicPlayerController to play iPod music, and an OpenAL soundengine to play my custom wav files, with the app ducking the volume on the music, pausing the player, playing the wav files, then restarting the music. The OpenAL soundengine uses kAudioSessionCategory_MediaPlayback in order to play sounds in the background.
All this works well, except when sending the audio over AirPlay. The music sounds fine, but the OpenEars generated wav files are horribly degraded.
My feeling here is the problem is the sample rate mismatch between the 44.1khz music and the 16khz tts file. Even more damning, my app used to use wav files generated by Festival text2wave and downloaded from a server, and those files I made at 44.1 and worked fine with airplay.
So my question is: can I change Flite to create 44.1khz speech? I see in RuntimeValues.m there is input_sample_rate = 16000. I haven’t tried it changed to 44100, but I’m suspicious that it wouldn’t work and that only 8k and 16k are supported in flite.
Alternatively, does anyone know of a method to just upsample a 16k wav to 44.1? If It can at least not degrade the sample I’d be satisfied with that.
April 20, 2013 at 8:10 pm #1017013Halle WinklerPolitepixWelcome,
You definitely can’t change the output rate of the Flite speech. The runtime value you’re referencing is for Pocketsphinx, unfortunately, and it is for yet-unreleased features.
April 20, 2013 at 9:13 pm #1017014ransomweaverParticipantHi,
Thought as much. Actually I am doing an amplification of the file (using a method from a stack overflow thread you directed me to some time ago. In that method (which uses ExtAudioFile api) it reads a wav file, including the sample rate, then sets a AudioStreamBasicDescription object with the sample rate to indicate the format for the returned samples. If I set that to 44100, the audio will actually sound fine on Airplay BUT the file is not complete; there is less than 1/2 the audio.
i wonder if I could add in an interpolation into this method to raise the samples up to the number that should be in a 44.1khz file.
Here’s the full method. Probably it involves doing this in the loop over the buffers. Any thoughts?
<code>
void ScaleAudioFileAmplitude(NSURL *theURL, float ampScale) {
OSStatus err = noErr;
ExtAudioFileRef audiofile;
ExtAudioFileOpenURL((CFURLRef)theURL, &audiofile);
assert(audiofile);
// get some info about the file’s format.
AudioStreamBasicDescription fileFormat;
UInt32 size = sizeof(fileFormat);
err = ExtAudioFileGetProperty(audiofile, kExtAudioFileProperty_FileDataFormat, &size, &fileFormat);
// we’ll need to know what type of file it is later when we write
AudioFileID aFile;
size = sizeof(aFile);
err = ExtAudioFileGetProperty(audiofile, kExtAudioFileProperty_AudioFile, &size, &aFile);
AudioFileTypeID fileType;
size = sizeof(fileType);
err = AudioFileGetProperty(aFile, kAudioFilePropertyFileFormat, &size, &fileType);
// tell the ExtAudioFile API what format we want samples back in
AudioStreamBasicDescription clientFormat;
bzero(&clientFormat, sizeof(clientFormat));
clientFormat.mChannelsPerFrame = fileFormat.mChannelsPerFrame;
clientFormat.mBytesPerFrame = 4;
clientFormat.mBytesPerPacket = clientFormat.mBytesPerFrame;
clientFormat.mFramesPerPacket = 1;
clientFormat.mBitsPerChannel = 32;
clientFormat.mFormatID = kAudioFormatLinearPCM;
clientFormat.mSampleRate = fileFormat.mSampleRate;
NSLog(@”Sample Rate is %1.2f”,clientFormat.mSampleRate);
clientFormat.mFormatFlags = kLinearPCMFormatFlagIsFloat | kAudioFormatFlagIsNonInterleaved;
err = ExtAudioFileSetProperty(audiofile, kExtAudioFileProperty_ClientDataFormat, sizeof(clientFormat), &clientFormat);
// find out how many frames we need to read
SInt64 numFrames = 0;
size = sizeof(numFrames);
err = ExtAudioFileGetProperty(audiofile, kExtAudioFileProperty_FileLengthFrames, &size, &numFrames);
// create the buffers for reading in data
AudioBufferList *bufferList = malloc(sizeof(AudioBufferList) + sizeof(AudioBuffer) * (clientFormat.mChannelsPerFrame – 1));
bufferList->mNumberBuffers = clientFormat.mChannelsPerFrame;
for (int ii=0; ii < bufferList->mNumberBuffers; ++ii) {
bufferList->mBuffers[ii].mDataByteSize = sizeof(float) * numFrames;
bufferList->mBuffers[ii].mNumberChannels = 1;
bufferList->mBuffers[ii].mData = malloc(bufferList->mBuffers[ii].mDataByteSize);
}
// read in the data
UInt32 rFrames = (UInt32)numFrames;
err = ExtAudioFileRead(audiofile, &rFrames, bufferList);
// close the file
err = ExtAudioFileDispose(audiofile);
// process the audio
for (int ii=0; ii < bufferList->mNumberBuffers; ++ii) {
float *fBuf = (float *)bufferList->mBuffers[ii].mData;
for (int jj=0; jj < rFrames; ++jj) {
*fBuf = *fBuf * ampScale;
fBuf++;
}
}
// open the file for writing
err = ExtAudioFileCreateWithURL((CFURLRef)theURL, fileType, &fileFormat, NULL, kAudioFileFlags_EraseFile, &audiofile);
// tell the ExtAudioFile API what format we’ll be sending samples in
err = ExtAudioFileSetProperty(audiofile, kExtAudioFileProperty_ClientDataFormat, sizeof(clientFormat), &clientFormat);
// write the data
err = ExtAudioFileWrite(audiofile, rFrames, bufferList);
// close the file
ExtAudioFileDispose(audiofile);
// destroy the buffers
for (int ii=0; ii < bufferList->mNumberBuffers; ++ii) {
free(bufferList->mBuffers[ii].mData);
}
free(bufferList);
bufferList = NULL;
}
</code>
April 20, 2013 at 9:29 pm #1017015Halle WinklerPolitepixI don’t have advice on this off the top of my head, but interpolating 16kHz to 44.1kHz is the kind of requirement that would probably make me wonder if the situation had become overly complicated in general. Maybe there’s something simpler than combining all of those different technologies?
April 20, 2013 at 9:30 pm #1017016ransomweaverParticipantNo doubt I could use one of these:
http://www.mega-nerd.com/SRC/api.html
https://github.com/timmartin/libfooid/tree/master/libresample
April 20, 2013 at 9:35 pm #1017017ransomweaverParticipantWell, I have two basic requirements;
1) the app can play iPod music
2) the app can play iPod music and the app’s own wav files in the background.
And the only problem I have right now is that AirPlay doesn’t like an audio stream with 44.1 and 16k audio in it at the same time.
I’m not sure how I would go about fixing that, even with a completely different way of playing audio files, without changing the hertz of one or the other of the kinds of files I’m playing.
April 20, 2013 at 9:40 pm #1017018Halle WinklerPolitepixI guess I’d be curious about what AirPlay does when it gets songs with different sampling rates, since it’s normal for both 44.1k and 48k to be found in audio libraries.
April 20, 2013 at 9:42 pm #1017019ransomweaverParticipantI will look into that. Maybe the problem is my kind of audio session (mediaPlayback) but i need that to keep it alive in the background.
April 20, 2013 at 9:53 pm #1017020Halle WinklerPolitepixMy instinct is that nothing should really be objecting to sample rate changes per se, since the whole deal with playing back a formatted file rather than a buffer stream is that the required data is encapsulated by the header so that a qualified player can deal with differences in file details such as sample rate, maybe bitrate, endianness, codec, whatever. This will also be true of other mediaplayer type objects such as videos. So I might be suspicious of other implementation details besides the sample rates of the files played.
I don’t really have insight into what this particular issue is and you might be 100% right that the most direct fix is to change the sample rate, I’m just sharing what my thought process on it would be if it were my implementation to debug.
April 20, 2013 at 10:13 pm #1017021Halle WinklerPolitepixOh, and here’s a hint that just occurred to me: remember that you have control of the voice’s speed and pitch, so you can use naive methods of sample rate changing that change perceived speed and pitch and then compensate for it in the original voice settings, to a certain extent at least. That might get you as far as successfully resampling to a rate that fits into 44.1 better than 16k.
-
AuthorPosts
- You must be logged in to reply to this topic.