On this page we will use a ML model to classify our documents into reviews about Chinese or Italian restaurants, based on the terms in the document
It is based on the workflow from the previous page, here we will focus on the nodes in the red box.
With the Color Manager node, we assign a custom color to each of the restaurant categories (Italian or Chinese)
With the Column Filter node, we remove the Document column. Note this leaves us with a table containing the restaurant category column and the Term columns, which contain only 1s or 0s, suitable as input for the ML model.
With the Table Partitioner node, we split our dataset in a Training and a Test set.
We then execute the Decision Tree node on the Training set
Now that our Model is trained, we execute it on the Test set with the Decision Tree Predictor node
Finally, with the Scorer node, we will evaluate how our Model performed. It classified 105 reviews correctly and 14 incorrectly, giving a score of 88,2%
On the next page , we will see an example on how to create a Tag Cloud visualization.