Gensim – Topic Modelling for Humans
Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.
Some other packages from Blei's website. online lda, tmve(online) --Topic Model Visualization Engine,
Some other packages from Blei's website. online lda, tmve(online) --Topic Model Visualization Engine,
PyML – machine learning in Python
PyML is an interactive object oriented framework for machine learning written in Python. PyML focuses on SVMs and other kernel methods. It is supported on Linux and Mac OS X.
-----------------------------------------------------------------------------------------------
mlpy
mlpy is a Python module for Machine Learning built on top of NumPy/SciPy and the GNU Scientific Libraries.
mlpy provides a wide range of state-of-the-art machine learning methods for supervisedand unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. mlpy is multiplatform, it works with Python 2 and 3 and it is Open Source, distributed under the GNU General Public License version 3.
scikit-learn: machine learning in Python
scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit scientific Python world (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems, accessible to everybody and reusable in various contexts:machine-learning as a versatile tool for science and engineering.
NLTK Package
The Natural Language Toolkit (NLTK) is an open source Python library for Natural Language Processing. A free online book is available. (If you use the library for academic research, please cite the book.)
Steven Bird, Ewan Klein, and Edward Loper (2009). Natural Language Processing with Python. O’Reilly Media Inc. http://nltk.org/book
Theano
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features:
- tight integration with numpy – Use numpy.ndarray in Theano-compiled functions.
- transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
- efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
- speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
- dynamic C code generation – Evaluate expressions faster.
- extensive unit-testing and self-verification – Detect and diagnose many types of mistake.
Check out how Theano can be used for Machine Learning: Deep Learning Tutorials and
the source code here
the source code here
sluggerml
SluggerML- baseball stats! Using datasets from retrosheet.org and seanlahman.com/baseball-archive/statistics. Depends on scikit-learn and python-nltk (and reportlab if you're building reports). On Debian, requires the following to build the aforementioned: python-dev python-numpy python-scipy
MDP
The Modular toolkit for Data Processing (MDP) package is a library of widely used data processing algorithms, and the possibility to combine them together to form pipelines for building more complex data processing software.
From the user’s perspective, MDP consists of a collection of units, which process data. For example, these include algorithms for supervised and unsupervised learning, principal and independent components analysis and classification.
From the developer’s perspective, MDP is a framework that makes the implementation of new supervised and unsupervised learning algorithms easy and straightforward. The basic class, Node, takes care of tedious tasks like numerical type and dimensionality checking, leaving the developer free to concentrate on the implementation of the learning and execution phases.