By jeffye, on March 31st, 2009

Today, I would like to present you guys a series of IR related blogs and tutorials. I believe these will be benefical to your research, maybe even stimulate your brain to generate a novel idea.
This post will be continually updated .
Blogs
Search Engines Architecture Course – Agenda of the graduate course Search Engines Architecture.
Web Mining Course – Agenda of [...]
By jeffye, on March 28th, 2009

这也算是个好消息,IR研究用的数据集在量上有了一个大的的突破,以前做过最大的不超过3k万,gov2 压缩也才50G左右,SEWM cwt200G。这下直接上了几十倍,exciting,不知道在这么大的数据集还有几个Lab能搞定,量变不知道会导致多大程度的质变。另外以前在小数据上Ranking 得比较好的算法,不知道现在效果会怎么样。
The corpus of 1 billion web documents ClueWeb09 is now available. The upcoming TREC 2009 will use it. You can also see the crawl stats. It ships on 4 1.5 TB hard drives.
Incoming search terms for the article:trec数据集 (4)trec 2010 (2)TREC2010 的数据集 (1)trec 语料 下载 (1)TREC 数据集 (1)
By jeffye, on March 6th, 2009

是不是也我我一样为这几个拉丁缩写用法而烦,看看下面解释和比较分析,就知道它们有什么不同区别,
i.e.
- that is (stands for id
est from Latin). You can also
use it to substitute, “in other words.”
etc.
- and so on, and the rest (abbreviation for etcetera).
e.g.
- for instance, for example (abbreviation for exempli gratia in Latin). Remember e.g. by thinking of
it as “example given” and then follow it with a few examples. e.g. apples,
oranges, bananas.
et al means roughly “and [...]
By jeffye, on March 5th, 2009

牛津学生英语搭配词典(OXFORD Collocations Dictionary for Students of English)这是新东方李笑来老师极力推荐的字典,今天试用了一下果然不错–fall in love with it in first sight。都是英文解释,但感觉非常清晰。平时用一般词典查英文单词,虽然给出了一些翻译,但词怎么用,可以用到那些情况下,说实在查了后不一定能明白。但感觉这本词典用英文解释得非常清晰,一句话 it’s just what I need, Cheers! 当然这词典最据特设的还是其搭配的guidance,有了这个我们才真正知道一句话到底应该怎么说才好。譬如我们想说小的提高,是tiny improvement, little improvement 还是slight improvement,这本词典就能给你答案。
不过说实话英英解释词典,我最喜欢的还是 Google dictionary,比较喜欢里面那种解释词的方式。总之,google dictionary 与 oxford dictionary 组合是我个人的完美解决方案。
这些都能在lingoes 词典软件中找到,lingoes的词典综合功能的强大令人震撼,学习用途推荐使用。
然后再提供一个在线查询的链接:
墙内:http://zye.me/ocd/
http://www.xiaolai.net/ocd/index.html
墙外: http://5yiso.appspot.com/
chm电子版下载地址: 下载链接
—————————————————————–
OXFORD Collocations dictionary for students of English,牛津搭配词典这是一本较全面的英语搭配用法词典。收词9000条,各种搭配用法达到15万。例句丰富,多达5万多个,收录大量从近年真实语料中选取的例句,真实生动,25个“用法说明”,按照不同主题归类,10页分类插图,介绍不同领域的搭配用法.让英语单词“为我所用” 想说出自然地道的英文吗?想扩大自己的词汇量吗?想提高写作能力吗?毫无疑问,任何英语学习者的回答都是肯定的。针对广大英语学习者的实际情况,《牛津英 语搭配词典》从崭新的角度探究了英语中词与词之间的组合关系。
Incoming search terms for the article:Oxford Collocation Dictionary for Students of English [...]
By jeffye, on February 27th, 2009

BM family weighting scheme Introduction and important Literatures
1. Okapi BM25 是IR领域中一个非常重要的 Ranking 公式,bm的意思是best match, 理论基础为 Probabilistic Theory, 由 Stephen E. Robertson 在1970s发明, 也是Robertson 教授的成名作,奠定他在IR领域崇高地位。
//////////////Okapi bm25 formula/////////////////////
double K = k_1 * ((1 – b) + b * docLength / averageDocumentLength) + tf;
return Idf.log((numberOfDocuments – n_t + 0.5d) / (n_t+ 0.5d)) * [...]