This is Jonathanstray's Typepad Profile.
Join Typepad and start following Jonathanstray's activity
Join Now!
Already a member? Sign In
Recent Activity
Ah. Perhaps I see now the difference between our premises. My assumption is that question answering by smart humans is going to be so massively amplified by algorithmic question answering systems -- really they're just very clever search engines -- that that's where we want to focus our investment at the current time, if we want the fastest possible increase in the general quality and fastest possible decrease in the general cost to get a question answered for a random member of the majority of humanity. Or, let me put a question to you instead: how do you forsee this massive new quantity of structured data being used during the question answering process in the future?
I interpret Watson in a different way: doesn't matter what it cost. Point is they demonstrated and advanced the state of the art in extracting useful knowledge from all available data sources, structured and unstructured. And when you read the technical papers, structured data wasn't at the heart of that success. General ontologies and DBPedia tables worked well for certain closed domains (say, presidents, countries, species, basic constraint relations like country-isnt-a-person, etc.) but weren't the main knowledge store or type inference engine. The main knowledge store was unstructured text from a huge variety of places (Wikipedia, newswire archives, web crawls, etc.) plus an open type inference system based on statistical patterns of word use (the algorithm is called PRISMATIC.) Yeah, structured data is great. But what do we want to use it for? Answering someone's question, right? At the moment, the best open-domain question answering techniques are in Watson, which improved the state of the art accuracy from about 30% to >80% in half a decade. As for the hardware required: every time you do a google search you use that much computing power. If I was going to base a startup off of answering people's questions, I would be stockpiling unstructured data -- and smart humans, as you so rightly point out.
You make a very good argument, and I agree with it in the sense that "answer" is a good atomic unit (though stories will continue to be popular, but you know that.) But I wonder how structured our structured data really has to be before it's useful. Consider how well IBM's Watson does in answering questions from a huge database of unstructured text. I don't think we really yet know enough about knowledge representation to get ambitious about representing the world as metadata.
Jonathanstray is now following The Typepad Team
Jun 6, 2011