This is 's TypePad Profile.
Join TypePad and start following 's activity
Join Now!
Already a member? Sign In
Scientist at Microsoft's MSN; co-creator of BlogPulse. I blog at http://datamining.typepad.com/data_mining.
Interests: strategy, computational linguistics, weblogs, artificial intelligence, data/text mining, gis, social media.
Recent Activity
Working in Bing Local Search brings together a number of interesting challenges. Firstly, we are in a moderately sized organization, which means that our org chart has some rough similarities to our high level system architecture. This means that we... Continue reading
Image
The web search community, in recent months and years, has heard quite a bit about the 'knowledge graph'. The basic concept is reasonably straightforward - instead of a graph of pages, we propose a graph of knowledge where the nodes... Continue reading
The early days of web search were essentially about observation. The web search engine observed the web (documents, links and user behaviours) and then delivered results based on those observations. In recent years we have started to see more of... Continue reading
Image
Having recently returned from a trip to Kauai where I used my beach search engine with middling success, I've now got a few updates out on the site. Firstly, there is a full map showing either all the beaches in... Continue reading
Image
At the beginning of the year, Google's results page for {restaurants in seattle} looked like this: Note: the size of the map, the presence and colouration of the main column ad block and the use of the restaurant's web page... Continue reading
Image
When I pushed out the first version of Beach G33k / Beach Ball (I'm not actually sure what the thing is called) - a search engine for beaches - I did as close to nothing in terms of features as... Continue reading
Viliam - Thanks for the comments. I'm deliberately putting this out there with plenty of issues. In the online software world, one can wait for a product to be perfect and never release it, or get feedback early and release often. That is what I hope to do with this system. Regarding lemmatisation, I now have this working on my dev system and will push it out this week. Hopefully you can take another look then. Some interesting issues arise, such as while it is ok to stem snorkeling to snorkel, it is less clear if you want to return hits on 'surf' when searching for 'surfing'.
Image
I've written recently about building a perfect beach search engine. Here is a brief example of using the site. Let's imagine you want to find a beach that offers snorkeling, but you want to find one that is shallow because... Continue reading
Image
There are plenty of techno pundits out there who are on a mission and rarely dip in to data to determine if reality is or is not cooperating with them. From the outside, one has to be somewhat creative to... Continue reading
Image
Currently, as I've mentioned in previous posts, beaches are a strangely under-served segment of the local search space. Searches on Google and Bing for beaches are fielded by entities such as resorts and restaurants that happen to be matches for... Continue reading
[I work at Microsoft where I work on projects that drive data quality in our local search experiences on Bing and other clients.] Most of the civilized world, by this time, has heard about Apple's fumble with their new mapping... Continue reading
Image
This week, Apple got a rude awakening with its initial foray into the world of local search and mapping. The media and user backlash to their iOS upgrade which removes Google as the maps and local search partner and replaces... Continue reading
Image
We will soon be embarking on a short trip to Hawai'i. Naturally, I'm turning to search engines to find out about the best beaches to go to. However, it turns out that this simple problem - where to go on... Continue reading
Image
I very much like Google's visual presentation of certain answers to local search queries that involve regions of space. 'Pittsburgh' '98115' 'Fife' Continue reading
Image
I've been tracking Google's local search experience for a while (I work in Bing's competing local search product). Since February 2012, I've noticed only three variants or changes to the way in which local search results are presented on the... Continue reading
Image
I'm late to this, but it is certainly worth posting. A team of researchers at CMU have been working on mining foursquare checkin data to determine behaviourally defined neighborhoods ('livehoods'). They have put together a site - livehoods.org - which... Continue reading
Image
In a previous post, I pointed out that while many of us use laptops, desktops and tablets with a widescreen form factor display, many websites fail to leverage much of the space. Some of the commenters indicated that they thought... Continue reading
@Marc Machielse - using more screen doesn't mean having longer lines spanning a wider column. Think of the evolution of television - widescreen was adopted there and, amazingly, textual information and other non video data has been adapted to that presentation format (think of news programmes). I'm a big fan of negative space (what designers call the empty spaces that form part of design), but I would also like to know what *opportunities* are seen in these wider screens. For example, I find all the advertising that clusters up the flow of reading to be horrible from a design perspective. It is, in fact, designed to make for poor reading. Why not explore how the additional horizontal space could be used instead?
Image
We've all been using widescreen desktops and laptop for a while. When will the web catch up? Urbanspoon: BBC Bing Google Plus Continue reading
Image
Recently, there have been a number of announcements regarding the redesign of Bing's main search experience. The key difference is the use of three parallel zones in the SERP. Along with the traditional page results area, there are two new... Continue reading
Image
I've been reading the coverage on the new Microsoft tablet launch - the very cool looking Surface. Over on track // microsoft, the twitter buzz seems pretty hot. I note that the main cluster around the launch has perhaps the... Continue reading
1. Be Clear: Is Your Problem Really A Big Data Problem? There are many big data problems out there requiring huge compute scale, innovations in computation paradigms, vast storage space and so on. But just because your data takes up... Continue reading
[The idea behind 'zero tolerance search' posts is to illustrate real life search interactions that show how far we have to go in leveraging the explicit and implicit data in the web and elsewhere.] Yesterday, I heard part of an... Continue reading
Image
track // microsoft (and games and movies) now includes a simple graph indicating the attention being given to each cluster of posts. This graph shows the total of tweets per hour for all posts in the cluster. Below is an... Continue reading