Un cluster contient
Un centroïde de type Data qui représente la donnée centrale,
Un ensemble de données
Un numéro.
On peut ajouter des données statistiques telles que : la distance minimum d’une donnée au centroïde, la distance maximale, la distance moyenne, ...
1. class Cluster {
2. /**data associtaed to the cluster*/
3. ArrayList<Data> dataSet;
4. /**computed data that reprensent the center of the cluster*/
5. Data centroid;
6. /**nb of clusters*/
7. private static int nb;
8. /**no of the cluster*/
9. int no;
10. /**minimal distance between one of the data and the centroid */
11. double minDist;
12. /**maximal distance between one of the data and the centroid */
13. double maxDist;
14. /**average of the distances between the data and the centroid */
15. double moyDist;
16.
17. public Cluster(){
18. dataSet = new ArrayList<>();
19. centroid = new Data();
20. no = nb++;
21. }
22.
23. /**initialise the data with a centroid*/
24. public Cluster(Data _centroid) {
25. this();
26. centroid = _centroid;
27. }
28.
29. /**add a data to the cluster (and tell to the data that it is associated to this cluster)*/
30. public void add(Data data) {
31. dataSet.add(data);
32. data.setCluster(this);
33. }
34.
35. /**remove a data to from the cluster
36. * (and tell to the data that it is no more associated to this cluster)*/
37. public void remove(Data data) {
38. dataSet.remove(data);
39. data.setCluster(null);
40. }
41. }
La fonction principale de la classe est celle recalculant les coordonnées du centroide afin que chacune représente la moyenne des coordonnées des données du cluster.
1. /**recompute the center of the cluster*/
2. public void centralize() {
3. int nbElt = dataSet.size();
4. if(nbElt>0) {
5. int dim = dataSet.get(0).length;
6. //we can't pass a non final value to an inner class (like with lambda expression), so we pass by a small array
7. int[] tI = {0};
8. for(int i=0; i<dim;i++) {
9. tI[0] = i;
10. //sum of normalized ith values
11. double sumI = dataSet.stream().mapToDouble(d->d.getNormValue(tI[0])).sum();
12. double average = sumI/(double)nbElt;
13. if(centroid.getNormValue(i)!=average) {
14. centroid.setNormValue(i, sum/(double)nbElt);
15. //sum of ith values
16. sumI = dataSet.stream().mapToDouble(d->d.getValue(tI[0])).sum();
17. centroid.setValue(i, sumI/(double)nbElt);
18. }
19. }
20. }
21. }
Ajoutons une fonction pour le calcul des stats et pour la conversion du cluster en chaîne de caractères :
1. /**compute the stats (minimal distance from a data to the centroid, maximal distance, average of the distances)*/
2. public void computeStats() {
3. double somDist=0;
4. minDist=Double.POSITIVE_INFINITY;
5. maxDist=Double.NEGATIVE_INFINITY;
6. for(Data data:dataSet) {
7. double dist = data.distNorm(centroid);
8. if(dist<minDist) minDist=dist;
9. if(dist>maxDist) maxDist=dist;
10. somDist+=dist;
11. }
12. if(dataSet.size()>0) moyDist = somDist/dataSet.size();
13. }
14. /**@return the no of the clusters, its nb of data, the stats and the data*/
15. @Override
16. public String toString() {
17. StringBuilder sb = new StringBuilder("cluster " + no + ", nb elts = " + dataSet.size() + "\n");
18. sb.append("--> centroid = ").append(centroid).append("\n data");
19. if (dataSet.size()<50)
20. for(Data data:dataSet) sb.append(data.toString()).append("\n");
21. sb.append("--> dist min=").append(String.format(Locale.ENGLISH,"%.2f", minDist));
22. sb.append("; dist max=").append(String.format(Locale.ENGLISH,"%.2f", maxDist));
23. sb.append("; average dist=").append(String.format(Locale.ENGLISH,"%.2f", moyDist));
24. sb.append(" \n---- ");
25. return sb.toString();
26. }