Google

Archives

The Ivory Toolkit with the SMRF Retrieval Engine (under Hadoop Framework)

As the Increase of IR dataset in size, it seems that a powerful platform for rapidly indexing and searching is need.  Ivory is a newly announced search platform developed on the basis of Hadoop. It could be a good choice when we come to billion era.

This would also be a future step for our SaberLucene Project (under release). Beside MapReduce framework, we would also like to integrate Indri Query Lanuage into SaberLucene. After these two major steps, we could expect a first release of SaberLucene. Any help will be appreciated.

——————————————-

The Ivory Toolkit with the SMRF Retrieval Engine

Ivory is a Hadoop toolkit for Web-scale information retrieval research that features a retrieval engine based on Markov Random Fields, appropriately named SMRF (Searching with Markov Random Fields). This open-source project began in Spring 2009 and represents a collaboration between the University of Maryland and Yahoo! Research. Ivory takes full advantage of the Hadoop distributed environment (the MapReduce programming model and the underlying distributed file system) for both indexing and retrieval.

In order to temper expectations, please note that Ivory is not meant to serve as a full-featured search engine (e.g., Lucene), but rather aimed at information retrieval researchers who need access to low-level data structures and who generally know their way around retrieval algorithms. As a result, a lot of “niceties” are simply missing—for example, fancy interfaces or ingestion support for different file types. It goes without saying that Ivory is a bit rough around the edges, but our philosophy is to release early and release often. In short, Ivory is experimental!

Ivory was specifically designed to work with Hadoop “out of the box” on the ClueWeb09 collection, a 1 billion page (25 TB) Web crawl distributed by Carnegie Mellon University. The initial release of Ivory is meant to serve as a reference implementation of indexing and retrieval algorithms that can operate at the multi-terabyte scale. Another interesting experimental aspect of Ivory is it’s retrieval architecture: we’ve been playing with retrieval engines that directly read postings from HDFS. The getting started guide with TREC disks 4-5 provides more details.

Download

Documentation

Incoming search terms for the article:

  • ivory hadoop (3)
  • ivory mapreduce (3)
  • hadoop information retrieval (1)
  • SMRF ivory yahoo (1)
  • Ivory: An Open-Source Toolkit for Scalable Distributed Retrieval with MapReduce (1)

Related Posts

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>