This is Craig Thrall's Typepad Profile.
Join Typepad and start following Craig Thrall's activity
Join Now!
Already a member? Sign In
Craig Thrall
Recent Activity
There are three different problems with very different error rates: recognizing anybody's structured speech, recognizing one person's unstructured speech, and recognizing anybody's unstructured speech. If you can define a structure around your interactions with the recognizer (http://en.wikipedia.org/wiki/VoiceXML), you can cut out the list of possible matches, and really increase the confidence measure of the transcription. We're using this approach here: http://www.fidelus.com/locator2.html This works today, and works well. We've done demos on a speakerphone at a noisy tradeshow booth with very few issues. You are exactly right that recognizing anybody's *unstructured* speech is a very hard problem to solve. That's why commercial services still use people to do transcriptions if the confidence measure is below a certain threshold. Over time, these services learn the voice of the people that call you frequently. I don't think Google uses human transcribers, which is why GVoice transcriptions are hilarious.
Craig Thrall is now following The Typepad Team
Jun 21, 2010