Home › Forums › OpenEars plugins › Key Words from Reading
Tagged: German Speech Recognition
- This topic has 19 replies, 2 voices, and was last updated 7 years, 1 month ago by Halle Winkler.
-
AuthorPosts
-
March 6, 2017 at 10:12 am #1031643matukhParticipant
Hello, OpenEars team.
I need recognize some key words from speeching in German. I used OpenEars+Rejecto. But in recognized process was detected other words from words dictionary not met in the text. I was changed value on weight and vadThreshold but I didn’t got result that I was needed. How I can to improve recognition?March 6, 2017 at 10:36 am #1031644Halle WinklerPolitepixWelcome,
The different acoustic models have very different accuracy and performance levels, so a lot of variance should be expected (this is discussed on the page about other languages), but we can investigate this a bit to see if there are any underlying causes that can be addressed other than the model itself. How are you obtaining the German speech that you are testing?
March 6, 2017 at 10:40 am #1031645matukhParticipantHi, Halle. Thanks for fast response. We testing 2 different ways, by using translator dictation and by people text reading. But result was not well. Some time was detected words from simple sigh.
March 6, 2017 at 11:01 am #1031646Halle WinklerPolitepixCan you clarify more about what translator dictation is?
March 6, 2017 at 11:07 am #1031647matukhParticipantWe put text to Yandex Translate and press Play. Text for example, “Was für ein seltsames Geräusch?” (Key word is geräusch) ( https://translate.yandex.ua/?lang=de-uk&text=Was%20für%20ein%20seltsames%20Geräusch%3F ). And also screenShot http://prntscr.com/eglx5o
March 6, 2017 at 11:13 am #1031648Halle WinklerPolitepixOK, and what are the regional German accents of the people reading the text you mentioned as the second approach? i.e. what parts of Germany did they grow up in?
March 6, 2017 at 11:20 am #1031649matukhParticipantIt is „Hoch Deutsch“, the „official“ German. Everyone who tests it try to speak as „pure“ German as possible since it is clear that framework will have problems with accents and dialects. Thanks, Halle
March 6, 2017 at 11:23 am #1031650Halle WinklerPolitepixOK, so when you test with people you are only ever testing using native German speakers, is that an accurate statement?
March 6, 2017 at 11:25 am #1031651matukhParticipantYes, it is correct.
March 6, 2017 at 11:40 am #1031652Halle WinklerPolitepixCan you tell me a little bit about the process of evaluating the results? i.e., how do you hear about the recognition rate and results from the native speakers who are testing for you, and how do you reproduce results you’ve heard about? The reason that I ask is that usually we don’t have an office full of native speakers of other languages we can just observe directly as we could for our own native language, so the method for gathering a large cross-section of native speaker data for another language is the underlying condition that makes it possible to improve recognition by adjusting the speech interface or framework settings.
March 6, 2017 at 1:17 pm #1031653matukhParticipantok. Our testers read text what contains key word. Recognition response we received fast about 2-5 seconds. Some times it can to take to 10 second. Response we receive after little bit pause. And in response can contain words from our dictionary, but that words are not accrued in the text. Some times when recognition in proccess we can receive response with word “Ruhe” without any speech or noise. Recognition word “Geräusch” look like unreal, after more than 10 spelling we can receive that word.
In initialization of language model generator we set usingVowelsOnly = YES weight = @(1.5). We tested with other parameters but we get worse result(((
Did I answer for your question?March 6, 2017 at 1:48 pm #1031654Halle WinklerPolitepixNo, my question is more about how the testing is happening – I don’t expect that you have a group of native German-speaker testers that you can directly observe in your office as they use the app in the same way you could do so for a local language, so how are you observing their testing and receiving the results?
March 6, 2017 at 2:00 pm #1031655matukhParticipantWe send app for testers. And they see all recognized words in textView. For some key words we play audio. After testing they writen to us response about recognition process and about quality. After that we done some changes and sent new version. We done it many times and our team take best result of our testing, but that result does not meet our needs. Did I correctly discribe our testing process?
March 6, 2017 at 2:29 pm #1031656Halle WinklerPolitepixYes, thank you for clarifying. It’s quite important to get good testing data before you start trying to fix issues by altering settings, because changing them on the basis of bad data will result in worse results for the average user and a situation where it isn’t possible to get help (for instance from me) due issues with subjective data collection such as having too few reports, non-replicable reports, or reports in which there were other occurrences which affected recognition that you don’t know about (for instance noise or distance) so let’s talk a little bit about how to set up tests for languages that aren’t being tested firsthand in the office.
The first thing to keep in mind is that you can’t use any synthesized speech (like Yandex) because it doesn’t have enough data, so that will only confuse your troubleshooting process.
The second thing is that when you test with humans, to not rely on subjective reports of interactions when troubleshooting, even in your own language but especially with a language you aren’t testing natively in-house, at least until you have a high level of confidence in what is happening, because you have no way of seeing the environmental situation or of replicating the results. And then in that case I am actually the third party removed from the original subjective report so I can’t help effectively (assuming it isn’t just a limitation of the acoustic model but something in the framework that I can help with).
It’s possible for you to obtain complete recordings of the user speech and then to feed them into OpenEars in test mode, so you can replicate the user’s experience. This post is about giving me replicable cases, but it also explains how to use the SaveThatWave demo in order to obtain audio and then use pathToTestFile so you can observe the results yourself: https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/
For your own app, of course, it isn’t necessary to put it all inside of the sample app’s code (that’s just if you want to show it to me for help), but it should get you started with setting up replicable testing for your own app.
It’s important not to turn on Rejecto until you are very confident that vadThreshold is correct for the acoustic model (this would usually mean that a sigh is not processed as speech). You may need to test this yourself; it isn’t really necessary to be a native speaker in order to make sure that the vadThreshold is rejecting as much non-speech as possible. It does sound like vadThreshold should be higher in your case.
Once you have confidence in vadThreshold, you can obtain recorded speech as described in the linked post, and start to tune your Rejecto settings (starting from the default settings). If you continue to get unexpected results, you can give me a full replication case as described in the linked post so I can look into whether it’s a settings issue.
March 6, 2017 at 2:35 pm #1031657matukhParticipantOk. Thanks for great answer. I will try your advices.
Best, regards Vladimir.March 6, 2017 at 2:54 pm #1031658Halle WinklerPolitepixHi Vladimir,
You’re welcome, and good luck with your investigations!
Best regards,
Halle
March 7, 2017 at 3:55 pm #1031662matukhParticipantHi, Halle.
I change settings following your advices, but I continue to get unexpected results. Can you help me more? And “Once you have confidence in vadThreshold, you can obtain recorded speech as described in the linked post, and start to tune your Rejecto settings (starting from the default settings). If you continue to get unexpected results, you can give me a full replication case as described in the linked post so I can look into whether it’s a settings issue.” where I can look for post about replication.
Best regards, Vladimir.March 7, 2017 at 8:15 pm #1031663Halle WinklerPolitepixHi Vladimir,
It’s the link I gave you above:
https://www.politepix.com/forums/topic/how-to-create-a-minimal-case-for-replication/
I can’t guarantee it is something I can assist with, but I can take a look as long as you follow the instructions in that post very carefully.
March 9, 2017 at 8:18 am #1031666matukhParticipantHi, Halle.
I create replication. Here a shared link for it:https://www.dropbox.com/s/9ipr58wv8azx92u/OpenEarsDistribution.zip?dl=0
I hope it correct created and we decide this problem. Thanks for helping.
Best regards,
VladimirMarch 9, 2017 at 9:15 am #1031667Halle WinklerPolitepixHi Vladimir,
The primary recognition issue is going to be testing with a non-native German speaker. I will close this up since the issue is pretty straightforward, but if you want to follow my advice above later on regarding how to test, it’s fine to open a new topic if you continue to have unexpected results, thanks.
-
AuthorPosts
- The topic ‘Key Words from Reading’ is closed to new replies.