Ryan deployed SAS Enterprise Miner to perform modelling and unsupervised learning techniques on unstructured dataset. The former includes text parsing, text filtering, and text transformation while the latter includes techniques such as clustering, and topic or concept extraction.
Clustering is a type of exploratory techniques that discover natural grouping in data. And in clustering analysis, we group homogeneous objects into a similar cluster and heterogeneous objects into a different cluster. Topic extraction is to identify and create topics of interest based on the co-occurrence of terms in the documents. Nevertheless, the modelling will assist in transforming the unstructured dataset into some form of structured numerical representation for the unsupervised learning techniques to perform analysis.
In addition, this report will apply the cross-industry standard process for data mining CRISP-DM) in the following section of the study. CRISP-DM refers to an approach that manages the data/text mining project in an idealized sequence of events. Under the CRISP-DM framework, there are six stages. Kindly refer to his research for more information.
*This is for his coursework for his Specialist Diploma with Republic Polytechnic.