Sent to you by jeffye via Google Reader:
via Jeff's Search Engine Caffe by jeff.dalton on 2/2/09Alon Halevy, head of the "Deep Web" surfacing project at Google gave a talk at the New England Database Day Conference at MIT. Details of his talk are still sparse, but ReadWriteWeb has a partial writeup.
In the article, RWW outlines ways that Google hopes to improve search by leveraging structured data, leading towards improved semantic and product search. One way they Google will gather the data is harvesting it from tables: There are 14 billion such tables on the web, and, after filtering, about 154 million of them are interesting enough to be worth indexing.
He reportedly also outlined some of the key DB application challenges Google is working on: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery...
I really like Alon's research. See also Google's Deep Web Crawl and Web-scale Data Integration: You can only afford to Pay As You Go