From time to time we received questions from the users of tazti. This blog post is the first in an occasional series that addresses user questions.
Today’s question revolves around whether it’s possible to configure tazti to improve its speech recognition, particularly in noisy environments. Dealing with noisy environments is very important for how well your speech recognition program is going to work. It’s common for there to be some noise present in the environment, but the type of noise has an impact, as does the volume. The noise could range from the sound of traffic outside, the ambient noise in a cafe, the noise in a work environment, the TV playing in the same room, or or you’re younger brother playing in the same room. Now most speech recognition software tends to have more problems when the ambient noise in the background is that of a human voice, but surprisingly even nonhuman speech can be misinterpreted.
How well as speech recognition program responds to ambient noise also relates to the type of speech recognition engine that is used. In general, there are two different types of speech recognition engines. Speaker dependent, and speaker independent. Speaker dependent speech recognition engines, are designed to listen for a specific person’s voice. This tuning into a particular person’s voice is usually a accomplished through what is called training. Training usually consists of the main user of the program reading one or more scripts or written excerpts while the speech engine is listening. Speech engine then examines the recorded data of the user speaking and uses that to build a model of the speaker’s voice. It compares sound data picked up by the microphone against data in the user’s voice model to determine whether it should be trying to recognize that sound, and if so, what the is being said.
There’s a great benefit to using the speaker dependent speech engine. These sorts of speech engines often have better accuracy because they are highly tuned to the voice of the person who performed training. It’s also better able to discriminate between background noise, Including human speech in the background, and the voice of the person who trained it.
The drawback of speaker dependent speech engines is that if more than one person uses the device, then each person has to train the engine separately, and you would need to manually switch between training profiles when a different person starts to speak. This kind of situation might come about when transcribing a dialog, an interview, or when trying to transcribe a conversation with a meeting of a group of people. This is where a speaker independent speech engine comes in handy. It will recognize speech from any source – not just the model it is was trained on, and you don’t need to switch back and forth between models.
There are other fundamental differences under the hood between the two types of speech recognition engines, but for our purposes here, what is important to understand is that tazti is a speaker independent engine, so it will do a pretty good job for anyone without requiring training or switching between profiles. The flip side is that it might be more likely to (accurately) pick up the TV conversation in the background or in the cafe you are working in (but really, should you be talking out loud to you computer in a cafe in the first place, we must ask?). As a s result of tazti being speaker independent, we have not included any configuration tools or methods to tweak or tune the speech recognition. What we can recommend to users to get the very best recognition out of tazti with minimal errors is the following: (1) try to use tazti in the quietest place you can, particularly isolated from human speech (this includes TV, radio, video games, cafe conversation, parties etc), and (2) try to get a good quality, noise canceling microphone. We can’t make a recommendation for a particular brand or model, but you would be surprised by how much a good quality headset mic will improvement things.
Thanks for reading and keep on speaking (to your computer, that is!)