Home » , » The Ivory Toolkit with the SMRF Retrieval Engine (under Hadoop Framework)

The Ivory Toolkit with the SMRF Retrieval Engine (under Hadoop Framework)

With the growth of IR dataset in size, it seems that a powerful platform for rapidly indexing and searching is needed.  Ivory is a newly announced experimental platform developed on the basis of Hadoop. It could be a good choice when we come to the billion era. This system has shown very competitive performance. I believe it will be the next successful experimental platform  if more documentation can be provided.However, for the out-of-box Ivory,  there are not sufficient algorithms implemented like in Terrier (also not enough). This would also be a future step for our LabLucene Project (under release). Besides the MapReduce framework, we would also like to integrate Indri Query Lanuage into LabLucene. After these two major steps, we would expect a first release of LabLucene. Right now, I just start learning Hadoop. I would also like someone to help me out. Anyone who wants to get involved in this unfunded project will be warmly welcomed.


The Ivory Toolkit with the SMRF Retrieval Engine

Ivory is a Hadoop toolkit for Web-scale information retrieval research that features a retrieval engine based on Markov Random Fields, appropriately named SMRF (Searching with Markov Random Fields). This open-source project began in Spring 2009 and represents a collaboration between the University of Maryland and Yahoo! Research. Ivory takes full advantage of the Hadoop distributed environment (the MapReduce programming model and the underlying distributed file system) for both indexing and retrieval.
In order to temper expectations, please note that Ivory is not meant to serve as a full-featured search engine (e.g., Lucene), but rather aimed at information retrieval researchers who need access to low-level data structures and who generally know their way around retrieval algorithms. As a result, a lot of “niceties” are simply missing—for example, fancy interfaces or ingestion support for different file types. It goes without saying that Ivory is a bit rough around the edges, but our philosophy is to release early and release often. In short, Ivory is experimental!
Ivory was specifically designed to work with Hadoop “out of the box” on the ClueWeb09 collection, a 1 billion page (25 TB) Web crawl distributed by Carnegie Mellon University. The initial release of Ivory is meant to serve as a reference implementation of indexing and retrieval algorithms that can operate at the multi-terabyte scale. Another interesting experimental aspect of Ivory is it’s retrieval architecture: we’ve been playing with retrieval engines that directly read postings from HDFS. The getting started guide with TREC disks 4-5 provides more details.



Popular Posts