Decoding a Wav file using OpenEars

Home Forums OpenEars Decoding a Wav file using OpenEars

Viewing 8 posts - 1 through 8 (of 8 total)

  • Author
    Posts
  • #1025449
    luaizak
    Participant

    Hello, I was hoping to use OpenEars to produce the same output as (from terminal):
    pocketsphinx_continuous -infile FILE.wav

    I have been able to do this successfully in Python using these files, but I don’t know how to apply this same technique to OpenEars.
    hmdir = “/usr/local/share/pocketsphinx/model/en-us/en-us”
    lmd = “/usr/local/share/pocketsphinx/model/en-us/en-us.lm.dmp”
    dictd = “/usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict”

    I saw this in the OpenEarsSampleApp:
    [OEPocketsphinxController sharedInstance].pathToTestFile = [[NSBundle mainBundle] pathForResource:@”change_model_short” ofType:@”wav”];

    But this decodes the wav file based on the short language model, not the model the python program uses.

    #1025450
    Halle Winkler
    Politepix

    Welcome,

    OpenEars is a tool for creating iOS speech UIs which uses Pocketsphinx (and other tools from CMU Sphinx) as dependencies, but since it is not a Pocketsphinx wrapper it has no APIs in common with Pocketsphinx. Pocketsphinx can be compiled for iOS, which is what you are seeking here – check out the CMU Github repo in order to check out and work with the Pocketsphinx API in an iOS app.

    #1025451
    luaizak
    Participant

    Just to be clear, you cannot use OpenEars to decode a wav file and determine the spoken words in that file? I am not really concerned about the actual words spoken, I am more interested in getting a word count for each wav file. Thanks!

    #1025453
    Halle Winkler
    Politepix

    Certainly, it’s available as a test method. I think you saw the complete example of doing WAV decoding in the sample app, and there is a second WAV test method in the API as well.

    Your question if I understand it is about how to make OpenEars work identically to CMU’s python harness for Pocketsphinx on CMU’s test files, which isn’t a fit for doing with OpenEars since it doesn’t use the language models provided by CMU or forward Pocketsphinx’s API and isn’t designed for use with static language models, it isn’t synchronous as Pocketsphinx’s API is, and there is actually an iOS version of Pocketsphinx that can be used identically to the example you asked about. I can’t really answer your question because I don’t support the use of CMU’s large lm.DMP and .dict files with OpenEars even if I supported the use of static language models in general – that model specifically is too large and off-topic to be accurate on a mobile phone.

    Do you want to try to talk through how to accomplish your app goals with OpenEars? That might be more straightforward than trying to get it to act like the Pocketsphinx python wrapper with the CMU test files. Let me know!

    #1025529
    luaizak
    Participant

    Hi Halle,

    Thanks for your response. I would love to hear your thoughts on accomplishing this goal with OpenEars. All I need is a word count estimation based off of a recording.

    Also,
    Do you know where I can find this: “there is actually an iOS version of Pocketsphinx that can be used identically to the example you asked about”

    #1025537
    Halle Winkler
    Politepix

    Hello,

    Why is the input a recording? Getting a word count from any possible user speech as input isn’t at all a trivial topic if you aren’t doing extremely large vocabulary recognition, while getting one from pre-existing recordings with already-known content is very trivial because you know what is on them, but it is hard to understand the role of such a piece of functionality in a mobile app without a little bit more explanation. Is it a recording because you are creating test cases for functionality which is intended to be used with live speech?

    Also,
    Do you know where I can find this: “there is actually an iOS version of Pocketsphinx that can be used identically to the example you asked about”

    Yes, from my original response:

    check out the CMU Github repo in order to check out and work with the Pocketsphinx API in an iOS app.

    The CMU Sphinx project has a Github repo where it’s a good idea to take any current version of any of their platform implementations from.

    #1025576
    darasan
    Participant

    Hi Halle,

    I have a related question so thought I would add it here – please let me know if I should move it to a new thread. Just looking into using OpenEars (coming from a nightmarish experience with Nuance…very impressed with OpenEars so far! :-) )

    I also want to perform recognition for speech from a WAV file, but for a known vocabulary. I want to record speech from a mic input, retain the stored audio as a WAV and to perform recognition too (not strictly in real-time, but a few seconds later).

    From glancing at the docs, there is a function called [runRecognitionOnWavFileAtPath:] that I assume I can use for this purpose.

    My question: I have read about certain audio APIs not playing nicely with OpenEars re audio playback, e.g. in Unity: https://www.politepix.com/forums/topic/openears-unity3d-audio-playback-conflicts/

    But if it’s possible to run recognition from a WAV, why not just record a WAV using your preferred API (Unity, AVAudioRecorder etc) and then let [runRecognitionOnWavFileAtPath:] handle the recognition? (One reason may be for cases where instant or streaming recognition is required). Would this enable smooth integration with any audio API (and would have solved the Unity issues in the thread above) ?

    Thanks!

    #1025585
    Halle Winkler
    Politepix

    Hello,

    Thanks! Yes, you can always record your own WAV and submit it to the wav decode method – I’ve taken some care to try to make sure that that method keeps as much code in common with the rest of the implementation as possible so it can be used as a fallback or for testing. The downsides of the approach you’re speculating about are that you lose voice activity detection, and you will either have to code your own low-latency audio implementation, which will be subject to the same restraints as OpenEars’ low-latency audio implementation with regards to coexistence, or you will have a higher-latency implementation if you use one of the iOS convenience APIs for audio recording. No reason not to experiment with this, though.

Viewing 8 posts - 1 through 8 (of 8 total)
  • You must be logged in to reply to this topic.