Concept:
The idea behind this project is to write a program that learns any new language reading text from the web or also from many text files. To start this process, a series of known keywords should be entered to start to look for in the web (just google them); e.g.: "English: house, trouble, circumstances, globe, glasses, running".
The whole text of the sites which matches are then scanned completely. During this scan process, a big tree should be build. This tree of words will also have some automata properties.
Not only the amount of word appearances will be recorded, but also the most important connections to other words in the tree.
There should be also a clean-up sequence which should be triggered after some scanning cycles. This cleaning process should remove words with very few appearances, and also words which have not many other connections.
Status:
Many years ago, in the early 90's, I started with a program written in Borland Turbo C to process text files, extracting the words and counting them. A concept of the tree structure was firstly designed.
In the year 2015 I wrote a simple program in Visual C++ to read text from the web.
Since then, nothing else were done... so project is "on hold".
What could this tool do ?
- learn any language which is available in the web
- check spelling of words
- check syntax of sentences
- help on semantics
- etc...
Problems:
- processing and storing this big-tree...