Datasets
- 中文情感挖掘语料-ChnSentiCorp
- 10 years of CLEF data (ad-hoc, domain-specific, and geoCLEF) are available through DIRECT upon registration.
- Google Ngram Viewer
- LIbsvm Data sets
- ClueWeb09 Related Data: Freebase Annotations of the ClueWeb Corpora, v1 (FACC1)
Community Question Answering Datasets
Tutorials
- Introduction to Computational Advertising
- Stanford Sentiment Symposium Tutorial
- Statistical Data Mining Tutorials – Slides by Andrew Moore
- Performance Engineering of Software Systems–video Leture
- Introduction of data science — video Leture
- The Hitchhiker’s Guide to Python
- Introduction to Computational Advertising
- Stanford Sentiment Symposium Tutorial
- Matlab Summary and Tutorial Tutorial2
- Advanced Data Structures
- A Programmer's Guide to Data Mining
- Online Tutorials for Programming Languages
- Mathematical Optimization with SciPy
- Machine Learning Surveys
- Bayesian Modeling, Inference, Prediction and Decision-Making 10-day short course sponsored by eBay and Google
- Deep Learning from Standford
- http://scipy-lectures.github.io/intro/matplotlib/matplotlib.html
- Sentiment analysis
Open Course Video
Free textbooks
- ”Introduction to Information Retrieval” teaches you how a search engine works, in great detail.
- “Mining Massive Data Sets” covers a variety of big-data principles that apply to different types of information.
- Handbook of Natural Language Processing Wiki
- The definite Guide to Jython
- A Programmer's Guide to Data Mining
- An Introduction to Data Mining
- http://it-ebooks.info/ Free IT books download
- INTRODUCTION TO MACHINE LEARNING
Open Course:
Python tools
- PDFMiner -- Python PDF parser and analyzer
- Pure Python IP Location API -- Can be used on Google App English with little Tweak.
- http://code.google.com/p/google-api-python-client/
- http://www.bluehost.com/
- SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.
- Tentative NumPy Tutorial, Numpy Example List
- Numpy and Scipy Documentation
- Pattern -- It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics), clustering and classification (k-means, k-NN, SVM), and data visualization (graph networks).
Misc
- Key Scientific Challanges Program — Yahoo!
- Some basic IR Baseline from Yahoo! Research Wiki.
- IR baselines from Terrier
- The Top 10 Data-Mining Links of 2011
- X-RIME: Hadoop based large scale social network analysis
- 中国计算机学会推荐国际学术刊物
- 韦试辞典的在线版 (Webster Online)
- free online search-Oxford Collocations Dictionary for Students of English
- Google Dictionary Online
- DictService Web Service
- HTML color selector
- Web color Selector online
- CSS templates
- Free scripts download
- IPLocator
Radio & Video
Blogs to Read: