Toolkits
-
FlexCRFs: Flexible Conditional Random Fields
-
CRFTagger: CRF English POS Chunker
-
CRFChunker: CRF English Phrase Chunker
-
JTextPro: A Java-based Text Processing Toolkit
-
JWebPro: A Java-based Web Processing Toolkit
-
JVnSegmenter: A Java-based Vietnamese Word Segmentation Tool
Redis
Redis是什么
REmote DIctionary Server(Redis) 是一个由Salvatore Sanfilippo写的key-value存储系统。Redis提供了一些丰富的数据结构,包括 lists, sets, ordered sets 以及 hashes ,当然还有和Memcached一样的 strings结构.Redis当然还包括了对这些数据结构的丰富操作。
JCharset
language-detection
This is a language detection library implemented in plain Java.
http://code.google.com/p/language-detection/
- Generate language profiles from Wikipedia abstract xml
- Detect language of a text using naive Bayesian filter
- 99% over precision for 53 languages
LingPipe
LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like:
- Find the names of people, organizations or locations in news
- Automatically classify Twitter search results into categories
- Suggest correct spellings of queries
- There are also a number of basic implementation of models in NLP, like HMM, CRF, LM, Chunking, SVD, POS, Clustering, Classification (Naive Bayes, Logistic Regression, …), EM, POS, and plenty more.
OpenNLP
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.
The goal of the OpenNLP project will be to create a mature toolkit for the abovementioned tasks. An additional goal is to provide a large number of pre-built models for a variety of languages, as well as the annotated text resources that those models are derived from.
NLTK Package
The Natural Language Toolkit (NLTK) is an open source Python library for Natural Language Processing. A free online book is available. (If you use the library for academic research, please cite the book.)
Steven Bird, Ewan Klein, and Edward Loper (2009). Natural Language Processing with Python. O’Reilly Media Inc. http://nltk.org/book
Apache Tika
- Supported Document Formats
- HyperText Markup Language
- XML and derived formats
- Microsoft Office document formats
- OpenDocument Format
- Portable Document Format
- Electronic Publication Format
- Rich Text Format
- Compression and packaging formats
- Text formats
- Audio formats
- Image formats
- Video formats
- Java class files and archives
- The mbox format
JNotify java
JNotify is a java library that allow java application to listen to file system events, such as:
- File created
- File modified
- File renamed
- File deleted
Colt
Colt provides a set of Open Source Libraries for High Performance Scientific and Technical Computing in Java.
| Feature | Description |
| Templated Lists and Maps | Dynamically resizing lists holding objects or primitive data types such asint, double, etc. Operations on primitive arrays, algorithms on Colt lists and JAL algorithms (see below) can freely be mixed at zero copy overhead.More details. Automatically growing and shrinking maps holding objects or primitive data types such as int, double, etc. More details. Space efficient high performance BitVectors and BitMatrices. More details |
| Templated Multi-dimensional matrices | Dense and sparse fixed sized (non-resizable) 1,2, 3 and d-dimensional matrices holding objects or primitive data types such as int, double, etc; Also known as multi-dimensional arrays or Data Cubes. More details. |
| Linear Algebra | Standard matrix operations and decompositions. LU, QR, Cholesky, Eigenvalue, Singular value. More details. |
| Histogramming | Compact, extensible, modular and performant histogramming functionality. AIDA offers the histogramming features of HTL and HBOOK. More detailshere and also there. |
| Mathematics | Tools for basic and advanced mathematics: Arithmetics and Algebra, Polynomials and Chebyshev series, Bessel and Airy functions, Constants and Units, Trigonometric functions, etc. More details. |
| Statistics | Tools for basic and advanced statistics: Estimators, Gamma functions, Beta functions, Probabilities, Special integrals, etc. More details. |
| Random Numbers and Random Sampling | Strong yet quick. Partly a port of CLHEP. More details here and there and also there. |
| util.concurrent | Efficient utility classes commonly encountered in parallel & concurrent programming. More details. |
Qt Jambi
Qt is the de facto standard C++ framework for high performance cross-platform software development. Qt Jambi is the Qt library made available to Java. It is an open source technology aimed at all desktop programmers wanting to write rich GUI clients using the Java language, while at the same time taking advantage of Qt’s power and efficiency.
The technology provides new possibilities for both Java and C++ programmers: It enables Java developers to take advantage of Qt’s features from within Java Standard Edition 5.0 and Java Enterprise Edition 5.0 as well as later versions. In addition, Qt Jambi also enables C++ programmers to easily integrate their Qt code with Java by providing the Qt Jambi generator.
For more comprehensive description of what qt-jambi provides, see here.
This is new website released at 10.03.2012 after far too many delays. If you still want to see old website, it can be seen at http://old.qt-jambi.org.