A process to organize the information, the ECE nodes

Morphosyntactic Linguistic Wavelets (MLW)

Wavelets are mathematical tools that are used to decompose/transform data into different components (coefficients) that describe different levels of detail. MLW is an extension of the traditional Wavelets because they share several characteristics:

Table 1. traditional Vs MLW characteristics.

We use MLW to decompose texts in order to classify and organize the useful portions of it. The steps to bring a text into a MLW are:

1. Take the original text sample
2. Translate text into an oriented graph (called Eci) preserving most morphosyntactic properties
3. Apply filtering using the most suitable approach
4. If abstraction granularity and details are insufficient for the current problem Insert a new node, called Ece, in the knowledge organization
5. Repeat from step 3
6. Take the resulting sequence of filtering as a current representation of the knowledge about and ontology of the text
7. Take the resulting Eci as the internal representation of the new text event

The information is organized and gradually inserted in successive Ece nodes that grow from the specific association of similarities automatically found in the Eci's. This way the nodes branch out or collapse according to the evolution of the process. Every new text performs a tuning in the organization that tends to have minimal impact in long term. Below there is a figure that shows an example.

Every Ece must be considered a set of Eci's. In the picture, an Eci is an element organized in one of four areas inside Ece1. All the areas have a typical Eci, a centroid, which shares certain characteristic of the neighborhood. For instance, Eci 33 belongs to the area (or cluster) leaded by Eci-32 (the centroid). The similarities are guaranteed by the filtering process (the process by which all the Ecis are compared to each other and evaluated to be similar or not). But note that the filtering can vary according to the evolution of the node.

As the model works following the steps described previously, more text is fed to it and more Ecis are inserted. There is a moment when the successive Ecis are too many for the same cluster. At that moment it is broken down into pieces: one or more descendant Eces are created. Each new descendant has a cohesive subset of Ecis. Everything is organized taking a similarity distance that is described in the Testing section.

After many insertions the nodes are expanded and the organization becomes larger, forming a complex structure.

The nodes have many branches. Eci structures belong to just one node at a time. There is an association between node position and meaning. That is because the upper-level Ece results from the promotion of centroids in the node. The children nodes perform in the same way, resulting in a top-down organization where common characteristics dominate the structure-arrangement.

In the picture there is a portion of the global organization of the information from a macro perspective. The grey boxes indicate Ece nodes with branches (that is, with children nodes), and black ones indicate terminal nodes. Terminal nodes may have one or more Eci structures.

Page updated

Report abuse