20 min review of latest lookup engines

This post is result of review of lookup engines done for at max 20 mins and should be taken with load of salt.

Both wolframAlpha and GoogleSquare are great attempts at compiling “data” and doing something useful with them.  Wolfram leaves better impression because of focused answer approach – you do not see links and links of information through which you need to sift.

Wolfram– It had challenge of how to crawl up massive index like others to answer search queries, so they went for specialized datasets which allow computation of certain kind on them. Process – Crunch it, hand categorize it and create charts/visuals – where possible.  Again metamodel the “known stuff” which they being in specialized field have got whole lot of algo/formulae backed up. So the they have domains like science, weather, geography where lot of data exists today.  Humans will definitely will be required to message/clean the data-information so that right “inference” can take place later on by the engine. This is apparent from http://www63.wolframalpha.com/participate/participate.html

I was real skeptic in terms of their “computational claim”.   But a scoped query shows computation might be happening. Try a query like “mars” – look at the answers – now try “distance between mars and Jupiter” , it actually computes it. It can do this with known entities with allowed operations.  (bing and google try to point to wiki, G^2 does not understand it at all)

So if wolfram is actually doing computation – that is big thing. But also look at their history – they are computation software firm. They would have loved everything neatly categorized and would love information from the search vendors about “data” queries and see where they can do “computation”.

Saving grace is – It is not a great generic search engine if you notice.  No ego surfing, Wolfram does not know you🙂.

Wolfram is open about where it sources the information. It shares information about how it is interpreting the query – pretty much the first in major search engine.

G^2 – They calculated a given item how popular it was by building up the page rank and refershing it for generic phrase search. Challenge was how to utilize existing page rank and attempt categorization.  Not exactly a search engine but comparison engine (price grabber etc have done this much better for much more tangible).  So given a query it breaks it down to a known fact which can be compared across certain dimensions. This is where the magic might come in if they are doing it just based off the massive index and related url based information.(It is not apparent from simple search “pakistan india china” – same query gets interpreted as “country” and they are compared)   It adds interactivity in form of addition capability for those extra dimension.  If you add each one of the items seperately in item, you start seeing the source of information – Wikipedia.

Computation – offcourse they do 1123*2 etc. But  computation like wolfram is not possible.

But this could be great stuff for verticals like legal, pharma etc.  Ability to compare information – witness/dates. Drugs – chemicals/tests/results. If they are serious about it , they need to open it up for verticals – both the process and api.

Queries

To see the difference – type “Bank of America stock price”  in both and what do you see?

Wolfram – does extra work by looking up “data” and plotting it up. I really wonder whether they have static plot of data across  x range.

G^2 – just goes through existing pages – finds the “best bet – company !” and “tries” to categorize available information(pages data)  into semantic meaning of some sort.

Just type of Bank of America – and now compare the results.

Queries which humans really want to do

  1. when was MSFT more than 40 $” – there is data – but who will interpret it? (wolfram, google square)
  2. But you can do basic queries like “rainfall in Delhi this month” (wolfram is better, google square does not what to do here)
  3. But not – “what is the average rainfall in Delhi this month”.  (Google square has got more resources to refer to compared to launch, wolfram attempts to break down the location)

Again try putting these queries through all engines and see the result. Wolfram comes out strong where ever there is structured data available.  It would be prudent to see the “input interpretation” tag of wolfram to get the clues what they are doing and how granular can they go.

Bing

Bing’s embedded engine – powerset – hopefully does more than left side filters for queries and for domain like music/movies/ travel ––  certainly does not do justice to its fame. It can hopefully combine with FAST for interesting work in future.

Where bing shines at present –

1. Entertainment search (look at the left side filters to provide instant access to right information).

2. Travel – This is visible when using US as location and executing searches like flights from seattle to san francisco

See the answer at top which indicates how are fares expected to behave. Click on that link and you get the mashup (integration from farecast etc).

3. Quality of information – Both Bing and Yahoo provide top links from well vetted sources compared to Google which provides information from user generated content. This is more relevant for recent news, healthcare related searches where correct source is important.

4. Then the local search is where it shines specially when used from mobile. I use it now all the time. It provides locally relevant information.

IMHO – Machine learning is not so advanced yet to make sense out of presentation data(locked in js/html unless it is freed up and ontology prevails over the content)

The background information for the facts based engine are below.

  1. http://start.csail.mit.edu/
  2. http://www.trueknowledge.com/ – look at the kind of searches-Q/A you can do – look at the narrow domain.  Try their addin in FF for bing and exploring it.
    1. http://blog.trueknowledge.com/2009/05/how-to-build-a-universal-answer-engine-ten-vital-principles.html
    2. http://www.cyc.com – wolfram and other q/a engine inspiration.
    3. http://www.twine.com/technology – to see attempts of crowdsourcing the existing information and hiding rdf/owl details.
      1. If stuff is present in rdf/owl format – we could possibly use sparql – http://www.w3.org/TR/rdf-sparql-query/ . But this is long way off. Check out – http://sindice.com/map and you see some of the “categorization” happening.
      2. Just a side note – oracle decided to support triples natively – http://www.oracle.com/technology/tech/semantic_technologies/index.html

Queries to try (Across search engines)

Note how different search engines behave –

wolfram tries to identify – temporal information, location, color etc etc based on sources/domain it knows about.

google/bing bring up # of pages containing phrase.

TK’s answer provides clue to how it actually classifies the query itself across its known basket ontological buckets.

  1. Who was ruler of china  in 1000 AD – (click on More… in Wolfram to see/guess at what is really happening.  You can only query what you have J. )
  2. Children population in China  -qualify with year
  3. When did Rajiv Gandhi_die ?  Only TK attempts an answer  (http://www.trueknowledge.com/q/when_did_Rajiv_Gandhi_die)  – see the behind the scene working – when it shows the question/fact etc.
  4. when did ins khukri sink (again google/bing bring up tons of links) – no answer(TK is honest – no database of facts)

So it boils down to what are we looking for “facts/answers” or just “information from which we want to draw answers”.  True answers/facts are possible when there is valid data (birthday/location/event etc ) or set of people organize incoming data into massive fact engine(TK/Wolfram). Then inference engine can sift through the fact or computation engine(Wolfram) can try to provide absolute answers/facts. “Phrase lookup” will be well served by “Googles and Bings”.

It is all good competition for the end consumer. These will be complementary offerings and need lot of polish. Dream would be sparql application on existing data. Your data from linkedin could be picked in and associated with publication , opinion on forum, photo on flickr/picassa in more authoritative way. But most probably it will remain a dream. Chetan Kunte is mighty impressed with Wolfram and his prompted me to look in this detail

Interesting facts –

Sergey Brin interned at Wolfram.

Wolfram  generates strong opinions.

http://www.americanscientist.org/issues/id.3261,y.0,no.,content.true,page.2,css.print/issue.aspx (more balanced)

http://www.cscs.umich.edu/~crshalizi/reviews/wolfram/ (crude )

http://chrishecker.com/Kurt_Gödel_is_Laughing_His_Ass_Off_Right_Now

20 min review of latest lookup engines

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s