The principle of maximum entropy is a method for analyzing
available qualitative information in order to determine a unique
epistemic probability distribution. It states that the least biased
distribution that encodes certain given information is that which
maximizes the information entropy.
Support vector machines (SVMs) are a set of related
supervised learning methods used for classification and regression.
They belong to a family of generalized linear classifiers. They can
also be considered a special case of Tikhonov regularization. A special
property of SVMs is that they simultaneously minimize the empirical
classification error and maximize the geometric margin; hence they are
also known as maximum margin classifiers.
MALLET is a Java-based package for statistical natural
language processing, document classification, clustering, topic
modeling, information extraction, and other machine learning
applications to text.
Weka is a collection of machine learning algorithms for data
mining tasks. The algorithms can either be applied directly to a
dataset or called from your own Java code. Weka contains tools for data
pre-processing, classification, regression, clustering, association
rules, and visualization. It is also well-suited for developing new
machine learning schemes.
OpenNLP is an organizational center for open source projects
related to natural language processing. Its primary role is to
encourage and facilitate the collaboration of researchers and
developers on such projects.
GATE is... * the Eclipse of Natural Language
Engineering, the Lucene of Information Extraction, a leading toolkit
for Text Mining * used worldwide by thousands of scientists,
companies, teachers and students * comprised of an architecture, a
free open source framework (or SDK) and graphical development
environment * used for all sorts of language processing tasks,
including Information Extraction in many languages * funded by the
EPSRC, BBSRC, AHRC, the EU and commercial users * 100% Java
reference implementation of ISO TC37/SC4 and used with XCES in the ANC
* 10 years old in 2005, used in many research projects and
compatible with IBM's UIMA * based on MVC, mobile code, continuous
integration, and test-driven development, with code hosted on
SourceForge
WordNet® is a large lexical database of English, developed
under the direction of George A. Miller. Nouns, verbs, adjectives and
adverbs are grouped into sets of cognitive synonyms (synsets), each
expressing a distinct concept. Synsets are interlinked by means of
conceptual-semantic and lexical relations. The resulting network of
meaningfully related words and concepts can be navigated with the
browser. WordNet is also freely and publicly available for download.
WordNet's structure makes it a useful tool for computational
linguistics and natural language processing.
The Apache Lucene project develops open-source search
software, including: * Lucene Java, our flagship sub-project,
provides Java-based indexing and search technology. * Nutch builds
on Lucene Java to provide web search application software. * Lucy
is a loose C port of Lucene Java, with Perl and Ruby bindings. *
Solr is a high performance search server built using Lucene Java, with
XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search,
caching, replication, and a web admin interface. * Lucene.Net is a
source code, class-per-class, API-per-API and algorithmatic port of the
Lucene Java search engine to the C# and .NET platform utilizing
Microsoft .NET Framework. Lucene.Net is currently under incubation.
* Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries. Tika is currently under incubation. * Mahout is new
Lucene subproject with the goal of creating a suite of scalable machine
learning libraries.
The Lemur Toolkit is a open-source toolkit designed to
facilitate research in language modeling and information retrieval.
Lemur supports a wide range of industrial and research language
applications such as ad-hoc retrieval, site-search, and text mining.
Commons Math is a library of lightweight, self-contained
mathematics and statistics components addressing the most common
problems not available in the Java programming language or Commons
Lang.
JBoss Seam is a powerful new application framework for
building next generation Web 2.0 applications by unifying and
integrating technologies such as Asynchronous JavaScript and XML
(AJAX), Java Server Faces (JSF), Enterprise Java Beans (EJB3), Java
Portlets and Business Process Management (BPM).
The NLM Unified Medical Language System (UMLS) project
develops and distributes multi-purpose, electronic "Knowledge Sources"
and associated lexical tools for system developers. Researchers will
find the UMLS products useful in investigating knowledge representation
and retrieval questions.
PubMed is a service of the U.S. National Library of Medicine
that includes over 17 million citations from MEDLINE and other life
science journals for biomedical articles back to the 1950s. PubMed
includes links to full text articles and other related resources.
My working on project. It can answer ad-hoc clinical
questions. You can enter either, words, phrase,or question to ask it.
Answeres are automatically organized and ranked in a structured way.