When researchers on AI started to work with text there were many challenges to defeat, which I would like to talk about some of them in this post.
Probably all those challenges could be summarized in a big and general one: how to make AI to "understand" text then learn it and then, to perform specific tasks.
Thinking on this last statement seems pretty general since language is how we communicate ideasÂ
This was a big challenge that seems to be very close (some think is already) to be solved. But independently of the answer to this issue, what I will try to do here is to make a philosophical analysis (from philosophy of science pov) of implications behind all this set of challenges related to NLP.
Text, at the beginning was thought to be similar than numbers, where there is a evident equivalence to reality. As Penrose explain in his "Path to reality" book, a book for physicist -have to say-, numbers lay down naturally into this equivalence to the physical reality. Think for example on pytagoras theorem: where hypotenuse, in physical reality, correspond to exact correspondence with catetos, and, arithmetic processes can give prove of it, this is to say, if you apply mathematical methods to measures of the triangle, you will verify what accurate this is in the physical word.
The idea on the text could be seen almost at the same way, what are the words (Text) if not a description (verbose, but description at least) of the physical world. Then one of the first problem it was to find the representation of words into numbers, and then to find the mathematical methods to manipulate in order to simulate/emulate understanding and then to produce an output.
One of the first tries of limited success was the Districutionale semantics, which attemtp to describe the meaning of a word with a vectorial representation, taking in account its distributional evidence rather than looking for its definition at the dictionary. The idea behind this approach is that words with similar meaning co-occurre together in a similar context. One of the advantage of this approach is that let understand and monitor the semantic evolution of words across time and domains. This is a problem called lexical semantic change problem.
For solving the problem and following the distributional semantics approach, Bag-of-words was invented, which represents text usin a one-hot encoding (1 if a word appears in a context, 0 if not). To this representation is known as Vector Space Model, because each word is represented as a vector.
Aditional NLP problems were approached in the same way, naming Text classification, word similarity, semantic relation extraction, word-sense disambiguation, etc.