Pattern for Red Hen

Introduction

Pattern is a particularly user-friendly NLP framework in the form of a python module.

It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization.

This python module is free, well-documented, and bundled with 50+ examples and 350+ unit tests.

Pattern.en has lots of NLP tools, and counterparts in a few other languages. Pattern.web has harvesting tools. Pattern.vector has machine learning tools. Pattern.graph has visualization tools; pattern.metrics has statistical functions and canvas.js a rich set of advanced visualizations. It supports both python2.7 and python3.6. The main developer, Tom de Smedt, is part of the Red Hen Lab community.

The combination of web harvesting and NLP tools is particularly useful. In addition, "pattern.vector has tools for machine learning, such as a bag-of­-words Model that consists of Document objects, each with a Vector, clustering algorithms, classification algorithms (NaiveBayes, k­-NN, SVM, LSA, neural networks) and functions for training and testing (e.g., chi­ squared feature selection, grid search, confusion matrix, ...)."

For detailed examples and exercises, see https://www.clips.uantwerpen.be/pages/pattern