This is Craig Thrall's TypePad Profile.
Join TypePad and start following Craig Thrall's activity
Craig Thrall
Recent Activity
There are three different problems with very different error rates: recognizing anybody's structured speech, recognizing one person's unstructured speech, and recognizing anybody's unstructured speech.
If you can define a structure around your interactions with the recognizer (http://en.wikipedia.org/wiki/VoiceXML), you can cut out the list of possible matches, and really increase the confidence measure of the transcription. We're using this approach here: http://www.fidelus.com/locator2.html
This works today, and works well. We've done demos on a speakerphone at a noisy tradeshow booth with very few issues.
You are exactly right that recognizing anybody's *unstructured* speech is a very hard problem to solve. That's why commercial services still use people to do transcriptions if the confidence measure is below a certain threshold. Over time, these services learn the voice of the people that call you frequently.
I don't think Google uses human transcribers, which is why GVoice transcriptions are hilarious.
Whatever Happened to Voice Recognition?
Remember that Scene in Star Trek IV where Scotty tried to use a Mac Plus? Using a mouse or keyboard to control a computer? Don't be silly. In the future, clearly there's only one way computers will be controlled: by speaking to them. There's only one teeny-tiny problem with this magical fu...
Craig Thrall is now following The Typepad Team
Jun 21, 2010
Subscribe to Craig Thrall’s Recent Activity
