OpenEars – Politepix

OpenEars 2.5 and all plugins out now!

Halle Winkler — Mon, 22 Feb 2016 16:22:34 +0000

¿Qué Hay De Nuevo? After a just-slightly-longer-than-expected incubation period (ahem), it is my pleasure to introduce OpenEars 2.5: Hear All The Languages.

Image by Allie Brosh

Well, perhaps not all of the languages. Many of the languages! I’ve developed a language-agnostic grapheme-to-phoneme algorithm which works from a file format fast enough for a phone, and although perfection remains elusive, it is from 5-15x more accurate for generating phonemes than a naive letter-based algorithm. As a result of this, OpenEars is now able to support speech recognition with English, Spanish, Mandarin Chinese, French, German, and Dutch. The new languages will now work with all of the speech recognition features of OpenEars, including dynamic language model generation and switching, grammar generation and switching, and of course hypothesis return.

As a benefit of the flexibility of this format, today Politepix will also release 2.5 versions of every plugin. RapidEars, Rejecto and RuleORama are now compatible with the languages OpenEars is compatible with. NeatSpeech 2.5 is a compatibility update, but won’t be adding TTS output for the new languages. SaveThatWave also has a compatibility update.

By the way, if you are using another language’s Sphinx-compatible acoustic model and you want to make it compatible with OpenEars, you can get in touch to discuss options – it is a pretty flexible approach so I expect to be able to apply it to more languages.

Speech recognition will vary significantly due to the speed and accuracy level of the acoustic model used, so support for non-English acoustic models is very much on a best-effort basis moving forward. Tut mir leid, Schatz!

The next feature of the OpenEars Platform 2.5 is bitcode. OpenEars will now ship with embedded bitcode, and bitcode is now a configurable option for paid plugin framework purchases (subject to a handling fee). Plugin framework demos will also now have a non-recompilable bitcode segment so that they install easily, but the bitcode attached to the demos can’t be submitted to the App Store. Demos can’t be submitted to the App Store in any case, so this shouldn’t be an issue.

Bitcode is also supported on a best-effort basis, meaning that no warranty is made for what it does when it is recompiled. This is not because I don’t care, but because a) no one knows the minutiae of how it will be recompiled, or b) what architectures it will be recompiled with, or c) what the strategic importance of bitcode is to Apple really, and it is a fair assumption that d) there will not be any communication about what is happening if it doesn’t work, so that is a process that, realistically, Politepix has no control over. What is the advantage of bitcode from the developer perspective? Hoe komt een ezel aan twee lange oren?

This OpenEars Platform update also fixes all verified bugs to date, which is my very favorite kind of update. Other than the ones which add Chinese speech recognition, 哇!

OpenEars 2.5 is free as always. The 2.5 paid plugin framework license upgrades are free if your purchase was made after August 17th, 2014. Upgrades for licenses purchased before August 17th, 2014 are 50% off. The bitcode handling fee is an extra fee for all upgrades if you want to add bitcode to your frameworks. The new Licensee site has rolled out (same URL as before), and on it you will find coupons for your upgrades in your download area, either for a free upgrade or a 50%-off upgrade for each of your licensed plugins. Quel délice!

This is a big update with a lot of moving parts, so if you encounter any surprises or frustrations, no te preocupes, just visit the forums, let me know what’s up, and I will be happy to help.

OpenEars 2.5 can be downloaded here and you can browse the compatible language acoustic models here. Make sure to check out the license for the language to be sure you’re able to use it with your project.

I’m delighted to be able to bring the real potential of localization to offline speech recognition with OpenEars, and I can’t wait to see what you do with it. And to the developers who have been waiting for their language to be compatible, a heartfelt welcome/bienvenue/Willkommen/Welkom/Bienvenido/歡迎!

-Halle

OpenEars 2.04 and compatibility versions of all plugins out now, with no more uppercase requirements

Halle Winkler — Sun, 10 May 2015 11:17:20 +0000

Today I’m happy to announce that OpenEars 2.04 and all plugins are out now. This is primarily a bugfix release to reduce memory overhead in OEPocketsphinxController and RapidEars while listening, and to prevent a very rare crash that could happen when stopping listening with RapidEars when a lattice search is still working. However there is one significant change which should be a nice improvement for many developers and I wanted to quickly point it out and explain so that everyone can start taking advantage of it ASAP.

When I first designed (OE)LanguageModelGenerator years ago I made the decision to require text input in uppercase letters for best results because it allowed the most optimization for very fast creation of dynamic language models. This didn’t seem like a big trade-off because at the time the size of a language model needed to be quite small in order to perform well during speech recognition on supported devices such as the 2nd-gen iPhone, which meant that for the most part it was command-and-control applications that were being developed with OpenEars. For a command-and-control vocabulary, word case is not such a big consideration in a UI because the words are out of context. Rather than transforming the developer’s text input automatically, I made the decision to support both all-caps and mixed-case but explain in the docs and in the logging output that mixed-case text input would have to be sent to the fallback phoneme lookup technique which would result in fewer available pronunciations, which would have an accuracy impact for words with multiple pronunciations. This felt like the least-bad compromise between strongly-competing concerns of speed, minimizing complexity, and not discarding the developer’s intentional choices.

Over the last couple of years as the devices and the framework and dependencies have gotten faster, it has become a viable choice with OpenEars to use larger vocabularies, and as a result more app developers have been using it with a broader variety of input sources such as written texts, speeches, etc, which is delightful to see. For that kind of application, the case of the input and output format matters to the developer and the user. The uppercase requirement/advantage no longer supported the goals of the developer or of pleasing UX and needed to be improved, so I revisited this early decision and found a way to do case-insensitive lookup without changing the baseline generation speed, and also improved the generation speed for larger models. That means that you can use normal word and sentence casing in your input text and it will be returned by your speech recognition hypothesis with the same casing intact, and larger text input will generate models faster (this doesn’t affect recognition speed, just how long dynamic model and grammar generation take).

There has also been an improvement in handling of punctuation in input, so in the cases that developers don’t do their own text cleaning to remove symbols which are too ambiguous to transcribe and probably not intended to be spoken (for instance, symbols like { or ^ or `) OELanguageModelGenerator will clean the input and it will be consistent across all the plugins and different model/grammar types. Symbols that can’t be transcribed will be removed, symbols which can be transcribed will usually be transcribed by the best-effort fallback grapheme generator (so you should still take a look at your input when you know it in advance and decide whether it would be better for you to transcribe your symbols into words yourself, especially numbers because only you know for sure whether you want 1600 to be transcribed as ‘sixteen-hundred’ or ‘one thousand six hundred’ or ‘a thousand six hundred’ or ‘one six oh oh’), and symbols which aren’t significant for recognition purposes (such as . or , or ; or ? or !) will be left in place and will become part of your model.

An example of this last point would be if you used the sentence “The Sand Snakes are with me.” as input. OELanguageModelGenerator will successfully find multiple pronunciations for any word in this sentence that has more than one pronunciation – it will leave the case intact and there will be no accuracy decline from that. That period (full stop) symbol at the end will stay attached to the word “me” in the model, meaning that when OEPocketsphinxController returns a hypothesis matching an utterance of the sentence, it will still have the period attached to it in the returned text hypothesis. If this isn’t the desired result and you don’t want the individual words in this input to have hints about their position in a sentence or statement, you can still give the original text to OELanguageModelGenerator without sentence punctuation, but the assumption now is that if you give sentence punctuation as input, it’s because you intend for it to be returned in a hypothesis. That also means that if you create a language model rather than a grammar, you can sometimes see a word with a period or comma appear in a different order in the sentence other than the input order, so that is something to think about when using punctuation and evaluating whether to use a language model (statistical model; words can be returned out of order so a word with a period attached can theoretically appear in the middle of a sentence if someone walks by the user and says it) or a grammar (ruleset; the order you choose is the order that will return).

The decision tree I use for these punctuation transformations and non-transformations is basically a simplified non-interactive version of my interactive text-cleaning tool TheKnownUnknowns, so please feel free to take a look at TheKnownUnknowns alongside OELanguageModelController for more info about considerations with different symbols. Please also feel free to use TheKnownUnknowns for preparing texts for OpenEars where you’d like to make your own decisions in advance about how to transcribe difficult cases. It is primarily designed to quickly clean text corpora before creating an acoustic model using long alignment and similar tasks on large texts that have to be prepared for some kind of transcription-related norm, but it is also a good tool for interactively cleaning text you want to use with OpenEars in advance, since they have their design and major assumptions in common.

Although this is not directly a recognition accuracy change, my sense is that there was a cluster of minor accuracy-related symptoms in some apps related to non-transcribable symbols entering the generator, mixed-case being used without realizing it affected how many pronunciations were found, and the possibility that unknown transcribable or ignorable symbols were being handled differently by the language model/grammar lookup than by the phonetic dictionary lookup, which theoretically could result in never-matching words. Projects that were experiencing any of these issues should see an improvement to accuracy from this change.

As always, OpenEars can be downloaded here and the new plugins can either be downloaded at your demo link or your licensed framework link. I hope this little improvement helps you make great apps!

OpenEars 1.3.0 out now with Pocketsphinx and Sphinxbase .8

Halle Winkler — Fri, 14 Jun 2013 09:13:07 +0000

Just a quick note that after months of testing of OpenEars 1.3.0 preview including Pocketsphinx and Sphinxbase .8 and no issues coming up with those new libraries, OpenEars 1.3.0 is now the release version of OpenEars and has been released in a non-preview version with a couple of bugfixes. As always, ask questions and let me know how it’s working for you in the forums.

[politepix-blog-inline-text-ad]