Decoding a Wav file using OpenEars

This topic has 7 replies, 3 voices, and was last updated 8 years, 11 months ago by Halle Winkler.

Viewing 8 posts - 1 through 8 (of 8 total)

Advertisement: “RapidEars is an OpenEars™ plugin that lets you perform speech recognition while the user is still speaking!”

Author

Posts
April 17, 2015 at 8:52 pm #1025449

luaizak
Participant

Hello, I was hoping to use OpenEars to produce the same output as (from terminal):
pocketsphinx_continuous -infile FILE.wav

I have been able to do this successfully in Python using these files, but I don’t know how to apply this same technique to OpenEars.
hmdir = “/usr/local/share/pocketsphinx/model/en-us/en-us”
lmd = “/usr/local/share/pocketsphinx/model/en-us/en-us.lm.dmp”
dictd = “/usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict”

I saw this in the OpenEarsSampleApp:
[OEPocketsphinxController sharedInstance].pathToTestFile = [[NSBundle mainBundle] pathForResource:@”change_model_short” ofType:@”wav”];

But this decodes the wav file based on the short language model, not the model the python program uses.

April 17, 2015 at 9:00 pm #1025450

Halle Winkler
Politepix

Welcome,

OpenEars is a tool for creating iOS speech UIs which uses Pocketsphinx (and other tools from CMU Sphinx) as dependencies, but since it is not a Pocketsphinx wrapper it has no APIs in common with Pocketsphinx. Pocketsphinx can be compiled for iOS, which is what you are seeking here – check out the CMU Github repo in order to check out and work with the Pocketsphinx API in an iOS app.

April 17, 2015 at 9:23 pm #1025451

luaizak
Participant

Just to be clear, you cannot use OpenEars to decode a wav file and determine the spoken words in that file? I am not really concerned about the actual words spoken, I am more interested in getting a word count for each wav file. Thanks!

April 17, 2015 at 9:55 pm #1025453

Halle Winkler
Politepix

Certainly, it’s available as a test method. I think you saw the complete example of doing WAV decoding in the sample app, and there is a second WAV test method in the API as well.

Your question if I understand it is about how to make OpenEars work identically to CMU’s python harness for Pocketsphinx on CMU’s test files, which isn’t a fit for doing with OpenEars since it doesn’t use the language models provided by CMU or forward Pocketsphinx’s API and isn’t designed for use with static language models, it isn’t synchronous as Pocketsphinx’s API is, and there is actually an iOS version of Pocketsphinx that can be used identically to the example you asked about. I can’t really answer your question because I don’t support the use of CMU’s large lm.DMP and .dict files with OpenEars even if I supported the use of static language models in general – that model specifically is too large and off-topic to be accurate on a mobile phone.

Do you want to try to talk through how to accomplish your app goals with OpenEars? That might be more straightforward than trying to get it to act like the Pocketsphinx python wrapper with the CMU test files. Let me know!

April 27, 2015 at 8:40 pm #1025529

luaizak
Participant

Hi Halle,

Thanks for your response. I would love to hear your thoughts on accomplishing this goal with OpenEars. All I need is a word count estimation based off of a recording.

Also,
Do you know where I can find this: “there is actually an iOS version of Pocketsphinx that can be used identically to the example you asked about”

April 28, 2015 at 1:32 pm #1025537

Halle Winkler
Politepix

Hello,

Why is the input a recording? Getting a word count from any possible user speech as input isn’t at all a trivial topic if you aren’t doing extremely large vocabulary recognition, while getting one from pre-existing recordings with already-known content is very trivial because you know what is on them, but it is hard to understand the role of such a piece of functionality in a mobile app without a little bit more explanation. Is it a recording because you are creating test cases for functionality which is intended to be used with live speech?

Also,
Do you know where I can find this: “there is actually an iOS version of Pocketsphinx that can be used identically to the example you asked about”

Yes, from my original response:

check out the CMU Github repo in order to check out and work with the Pocketsphinx API in an iOS app.

The CMU Sphinx project has a Github repo where it’s a good idea to take any current version of any of their platform implementations from.

May 4, 2015 at 8:16 pm #1025576

darasan
Participant

Hi Halle,

I have a related question so thought I would add it here – please let me know if I should move it to a new thread. Just looking into using OpenEars (coming from a nightmarish experience with Nuance…very impressed with OpenEars so far! :-) )

I also want to perform recognition for speech from a WAV file, but for a known vocabulary. I want to record speech from a mic input, retain the stored audio as a WAV and to perform recognition too (not strictly in real-time, but a few seconds later).

From glancing at the docs, there is a function called [runRecognitionOnWavFileAtPath:] that I assume I can use for this purpose.

My question: I have read about certain audio APIs not playing nicely with OpenEars re audio playback, e.g. in Unity: https://www.politepix.com/forums/topic/openears-unity3d-audio-playback-conflicts/

But if it’s possible to run recognition from a WAV, why not just record a WAV using your preferred API (Unity, AVAudioRecorder etc) and then let [runRecognitionOnWavFileAtPath:] handle the recognition? (One reason may be for cases where instant or streaming recognition is required). Would this enable smooth integration with any audio API (and would have solved the Unity issues in the thread above) ?

Thanks!

May 5, 2015 at 10:35 am #1025585

Halle Winkler
Politepix

Hello,

Thanks! Yes, you can always record your own WAV and submit it to the wav decode method – I’ve taken some care to try to make sure that that method keeps as much code in common with the rest of the implementation as possible so it can be used as a fallback or for testing. The downsides of the approach you’re speculating about are that you lose voice activity detection, and you will either have to code your own low-latency audio implementation, which will be subject to the same restraints as OpenEars’ low-latency audio implementation with regards to coexistence, or you will have a higher-latency implementation if you use one of the iOS convenience APIs for audio recording. No reason not to experiment with this, though.
Author

Posts

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic.