Data Mining with SPSS - Two step Cluster

Unlike other data clustering algorithms:

- Allows automatic selection of the most appropriate number of clusters.
- It can be used for continuous and categorical variables.
- The cluster model can be saved in an external XML file. Also, this file can be read and updated with more recent data.
- Is sufficiently robust against the violation of any of the cases.
- It is suitable for large data volumes.

Default distance

Log likelihood: The measure of the likelihood performs a probability distribution of the variables. Continuous variables are assumed to be normally distributed, while categorical variables are assumed to be multinomial. It is also assumed that all the variables are independent.

Assumptions

- Continuous variables are normally distributed.
- Categorical variables are multinomial distribution.
- All variables are independent

How to verify the assumptions of this method?

References:

Author

- Gabriela Esperón. Researcher @ AIGROUP, working in the PROA Project. Professor of mathematics @ Universidad de Palermo.

Page updated

Report abuse