Open sources Packages and ToolkitsIctclas in Java.
It is reported to have a higher accuracy than Ictclas and many other similar packages,
and also supports user defined dictionary.
A simple python segmenter based on maximum matching
It's just 34 lines of code.
【中文分词开源项目】：SCWS http://t.cn/hda5lb ICTCLAS http://t.cn/hgTZs3HTTPCWS http://t.cn/zjNwvvv 庖丁解牛分词 http://t.cn/hCZC2z CC-CEDICThttp://t.cn/zjNZsss
MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm
written in C
A Chinese NLP Toolkits written in Java.
- Information Retrieval： Text Classification News Clustering
- Chinese Processing： Word Segmentation, POS tagger, Entity Recognition, Keyword Extraction, Dependency Grammar Parser, Time Phrase Recognition
- Structural Learning： Online Learning, Hierarchy Classification, Clustering, Reasoning
A series of Text Processing tools
|Templated Lists and Maps||Dynamically resizing lists holding objects or primitive data types such as|
|Templated Multi-dimensional matrices||Dense and sparse fixed sized (non-resizable) 1,2, 3 and d-dimensional matrices holding objects or primitive data types such as int, double, etc; Also known as multi-dimensional arrays or Data Cubes. More details.|
|Linear Algebra||Standard matrix operations and decompositions. LU, QR, Cholesky, Eigenvalue, Singular value. More details.|
|Histogramming||Compact, extensible, modular and performant histogramming functionality. AIDA offers the histogramming features of HTL and HBOOK. More detailshere and also there.|
|Mathematics||Tools for basic and advanced mathematics: Arithmetics and Algebra, Polynomials and Chebyshev series, Bessel and Airy functions, Constants and Units, Trigonometric functions, etc. More details.|
|Statistics||Tools for basic and advanced statistics: Estimators, Gamma functions, Beta functions, Probabilities, Special integrals, etc. More details.|
|Random Numbers and Random Sampling||Strong yet quick. Partly a port of CLHEP. More details here and there and also there.|
|util.concurrent||Efficient utility classes commonly encountered in parallel & concurrent programming. More details.|
jforests is a Java library that implements many tree-based learning algorithms.
jforests can be used for regression, classification and ranking problems. The following tutorial shows how jforests can be used for learning a ranking model using the LambdaMART algorithm.
Free open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
This is the old version of PyWordNet. PyWordNet was contributed to the NLTK project in 2006. Refer to that software for a more recent implementation of Python/WordNet that has been updated to Wordnet 2.1 and extended with some of the Wordnet similarity scoring algorithms.