Natural Language Processing

A lemma is a word that stands at the head of a definition in a dictionary. ... A lexeme is a unit of meaning, and can be more than one word. A lexeme is the set of all forms that have the same meaning, while lemma refers to the particular form that is chosen by convention to represent the lexeme.

Word Segmenter

For Chinese, Arabic, etc.

POS Tagger

Part-of-speech (POS) tager, assign parts of speech and other tokens to each word (noun, verb, noun-plural, etc.)

NER

Named entity recognizer (NER), extracts named entities (PERSON, ORGANIZATION, LOCATION)

Basic Dependencies

Looks like grammatical parser….

Grammatical parser

Works out the grammatical structure of sentences

Coreference resolution system

Resolves “he”,”she”,”it”,”his”, etc.

Deterministic: fast rule based for EN&CN

Statistical: machine learning based for EN

Neural: most accurate but slow for EN & CN

Sentiment analysis

Classifier

Email -> spam/normal

Bootstrapped Entity Learning

Input: seed sets (dictionaries) of entities for classes, and unlabeled text

Output: more entities belonging to the classes extracted from the text

Open Information Extraction

Extraction of relation tuples (typically binary relations) from plain text, no need to specify schema

Product/Services & Comparisons

Study 1 (NLTK, OpenNLP, CoreNLP)

https://drops.dagstuhl.de/opus/volltexte/2016/6008/pdf/OASIcs-SLATE-2016-3.pdf

Looks like OpenNLP has best esults amont NLTK, OpenNLP, CoreNLP, Pattern

However OpenNLP is language agnostic, is a tool for training NLP models. Need to train own model. And it's not necessarily easy to get a good model for Chinese. Training seems to require tagged data.

Second place seem to be

Amazon Comprehend

https://aws.amazon.com/comprehend/features/

Lacking some desired feature - like parser

IBM Watson NLU

https://cloud.ibm.com/apidocs/natural-language-understanding#text-analytics-features

Also desired features not so complete.

Google Sites

Report abuse