Study of Clustering on Education

Some additional files related to our book chapter work are shared with you here.

Relevant MATLAB and Python codes are given below.

The name of this study and the list of authors are as follows. You can forward your questions and shares to them via e-mail.

MACHINE LEARNING FOR ENHANCED CLASSROOM HOMOGENEITY IN PRIMARY EDUCATION

Faruk BULUT*

Istanbul Esenyurt University, İstanbul, Türkiye

farukbulut ( at ) esenyurt.edu.tr

*Corresponding Author

İlknur DÖNMEZ

Türkiye Bilimsel ve Teknik Araştırma Kurumu, Gebze, Türkiye

ilknur.donmez ( at ) tubitak.edu.tr

İbrahim Furkan İNCE

Nişantaşı University, İstanbul, Türkiye

ibrahim.ince ( at ) nisantasi.edu.tr

Pavel PETROV

University of Economics - Varna, Bulgaria

petrov ( at ) ue-varna.bg

Bu çalışmada prosedürel işlemler ve süreç yönetimi nasıl olmalıdır?

Posedürel İşlemler

Nitelikli bir akademik veya ticari projenin gerçekleştirilebilmesi için resmi aşamaların eksiksiz ve zamanında yapılması şarttır. Bu tür bir çalışma için güvenilir ve gerçek dünya verilerinin elde edilmesi bir zorunluluktur. Bu bağlamda daha önce benzer bir çalışma gerçekleştirebilmek için İzmir İl Milli Eğitim Müdürlüğüne 2016 yılında müracaatta bulunulmuştu. Gerekli verilerin toplanabilmesi için anket soruları etik kurul onayından da geçmiş ve resmi izinler alınmıştı. Ek-1, Ek-2 ve Ek-3’de yapılan resmi yazışmaların birer numunesi görülmektedir. Esasen gizli kalması gereken bu yazışmalar aday araştırmacılara model olması açısından sergilenmektedir. Bu sayede araştırmacılar, nasıl bir girişimde bulunmaları gerektiğini işe başlamadan önce öngörmüş olacaklardır.

Süreç Yönetimi

Bu tür kapsamlı ve çok emek gerektiren çalışmalar belirli bir proje yönetim çerçevesi içerisinde ele alınmalıdır. Çalışmaya ait takip edilmesi gereken süreçler belirlenmeli, iş-zaman çizelgesi detaylıca çıkarılmalı ve süreci kontrol etmek için Gantt diyagramı çıkarılmalıdır. Öncelikli olarak önerilen bu proje aşağıdaki aşamalar takip edilerek gerçekleştirilebilir:

Eğitim bilimlerinde başarıyı etkilen birbirinden farklı faktörlerin tespit edilmesi ve sınıf dağılım durumunun incelenmesi.
Öğrenciye ait hangi özniteliklerin başarıyı etkilediği ve diğer öğrencilerden farklı kıldığınının tespit edilmesi.
Öğrenme ve kümeleme algoritmalarında kullanılacak en uygun özniteliklerin belirlenmesi ve anket oluşturma.
Gerekli resmi izinlerin alınması.
Anket çalışmasının izin verilen okullarda yapılması.
Anketler yardımıyla elde edilen verilerin sayısal verilere dönüştürülmesi.
Veri setinin hazır hale getirilmesi.
Yarı denetimli ve denetimsiz algoritmaların uygulanması için en uygun modelin belirlenmesi.
Yarı Denetimli algoritmalar yardımıyla elde edilen sonuçların ilgili eğitim kurumlarında çalışan eğitimcilere kontrol ettirilmesi.
Başarım değerlendirmelerinin yapılması.

İş Zaman Planlaması ise şu şekilde yapılabilir:

Tablo 1’de bütçe destekli akademik bir proje için hazırlanması gereken iş-zaman çizelgesi verilmektedir. Bu tabloda her bir aşama (başarı ölçütü) yüzdelik açıdan değerlendirilmiş ve projenin başarısındaki önemi de yüzdelik olarak verilmiştir.

Tablo 1’deki aşamalar Şekil 6’da görüldüğü üzere Gantt diyagramına çevrilmiştir. Bu diyagram proje yönetimlerinde sıklıkla kullanılan görsel bir takip-kontrol aracıdır.

Completing all official phases on time is crucial for high-quality academic or commercial studies. This involves securing reliable data and managing the project within a framework, including outlining processes, developing a schedule, and tracking progress with tools like a Gantt chart. The project involves identifying success factors in education, student attributes affecting success, suitable features for learning algorithms, obtaining permits, conducting surveys in schools, data transformation, dataset preparation, choosing the best model for algorithms, and evaluating performance with educators' assistance.

Bu konuda alınmış olan ve alınabilecek resmi belgeler ise şu şekildedir:

% Clustering in Education - Clustering Codes - MATLAB

% Faruk BULUT

% 2021

rng('default') % For reproducibility

X = [randn(100,2)*0.9+ones(100,2);

randn(100,2)*0.5-ones(100,2);

randn(100,2)*0.1];

[idx,C] = kmeans(X,2);

figure

gscatter(X(:,1),X(:,2),idx,'bgmryk')

hold on

title('VVoronoi Diagram of 12 cluster centers')

plot(C(:,1),C(:,2),'kx')

legend('1.Cluster','2.Cluster','3.Cluster','4.Cluster','5.Cluster','6.Cluster','KÃ¼me Merkezleri')

% Clustering in Education - Voronoi Codes

% Faruk BULUT

% 2021

clc; clear;

rng('default') % For reproducibility

X = [randn(100,2)*0.9+ones(100,2);

randn(100,2)*0.5-ones(100,2);

randn(100,2)*0.1 ];

[idx,C] = kmeans(X,3);

figure

gscatter(X(:,1),X(:,2),idx,'bgmryk')

hold on

title('Voronoi ile Örnek Kümeleme (k=3)')

plot(C(:,1),C(:,2),'kx')

rng default;

voronoi(C(:,1),C(:,2))

axis equal

legend('1. Küme','2. Küme','3. Küme','Küme Merkezleri')

# DBSCAN_OPTICS_Kümeleme, İlknur DÖNMEZ

from sklearn.cluster import DBSCAN

import sklearn.utils

from sklearn.preprocessing import StandardScaler

from sklearn import datasets

from sklearn.cluster import KMeans

from scipy.spatial.distance import euclidean

from sklearn import datasets

import pytest

from sklearn.cluster import KMeans

#import hdbscan

from scipy.spatial.distance import euclidean

import numpy as np

from scipy.spatial.distance import euclidean, cdist

from scipy.sparse.csgraph import minimum_spanning_tree

from scipy.sparse import csgraph

from __future__ import division

import pandas as pd

df=pd.read_csv(r'Data.csv')

X = df.iloc[:,:].values

from sklearn.cluster import OPTICS, cluster_optics_dbscan

import matplotlib.gridspec as gridspec

import matplotlib.pyplot as plt

import numpy as np

# Generate sample data

clust = OPTICS(min_samples=6, xi=0.9999999, min_cluster_size=.09)

fig =plt.figure(dpi=1200,figsize=(8, 6))

plt.rcParams.update({

"font.family": 'TimesNew Roman',

'axes.labelsize': 'x-small',

'axes.titlesize':'x-small',

'xtick.labelsize':'xx-small',

'ytick.labelsize':'xx-small'})

# Run the fit

clust.fit(X)

labels_050 = cluster_optics_dbscan(reachability=clust.reachability_,

core_distances=clust.core_distances_,

ordering=clust.ordering_, eps=1.3)

labels_200 = cluster_optics_dbscan(reachability=clust.reachability_,

core_distances=clust.core_distances_,

ordering=clust.ordering_, eps=1.5)

space = np.arange(len(X))

reachability = clust.reachability_[clust.ordering_]

labels = clust.labels_[clust.ordering_]

#plt.figure(figsize=(10, 7))

G = gridspec.GridSpec(2, 3)

ax1 = plt.subplot(G[0, :])

ax2 = plt.subplot(G[1, 0])

ax3 = plt.subplot(G[1, 1])

ax4 = plt.subplot(G[1, 2])

# Reachability plot

colors = ['g.', 'r.', 'b.', 'y.', 'c.']

for klass, color in zip(range(0, 5), colors):

Xk = space[labels == klass]

Rk = reachability[labels == klass]

ax1.plot(Xk, Rk, color, alpha=0.3)

ax1.plot(space[labels == -1], reachability[labels == -1], 'k.', alpha=0.3)

ax1.plot(space, np.full_like(space, 2., dtype=float), 'k-', alpha=0.5)

ax1.plot(space, np.full_like(space, 0.5, dtype=float), 'k-.', alpha=0.5)

ax1.set_ylabel('Erişilebilirlik (epsilon uzaklık)')

ax1.set_title('Erişilebilirlik mesafesi')

# OPTICS

colors = ['g.', 'r.', 'b.', 'y.', 'c.']

for klass, color in zip(range(0, 5), colors):

Xk = X[clust.labels_ == klass]

ax2.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3)

ax2.plot(X[clust.labels_ == -1, 0], X[clust.labels_ == -1, 1], 'k+', alpha=0.5)

ax2.set_title('Otomatik Kümeleme\nOPTICS')

# DBSCAN at 0.5

colors = ['g', 'greenyellow', 'olive', 'r', 'b', 'c']

for klass, color in zip(range(0, 6), colors):

Xk = X[labels_050 == klass]

ax3.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3, marker='.')

ax3.plot(X[labels_050 == -1, 0], X[labels_050 == -1, 1], 'k+', alpha=0.1)

ax3.set_title('0.1 epsilon uzaklığı \nDBSCAN')

# DBSCAN at 2.

colors = ['g.', 'm.', 'y.', 'c.']

for klass, color in zip(range(0, 4), colors):

Xk = X[labels_200 == klass]

ax4.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3)

ax4.plot(X[labels_200 == -1, 0], X[labels_200 == -1, 1], 'k+', alpha=0.1)

ax4.set_title('0.12 epsilon uzaklığı \nDBSCAN')

plt.tight_layout()

plt.show()

#t-SNE uygulaması sonrası:

import sklearn.manifold

from sklearn.cluster import OPTICS, cluster_optics_dbscan

import matplotlib.gridspec as gridspec

import matplotlib.pyplot as plt

import numpy as np

# Generate sample data

X = df.iloc[:,:].values

tsne = sklearn.manifold.TSNE(n_components=2, random_state=0)

X = tsne.fit_transform(X)

import pandas as pd

import matplotlib.pyplot as plt

fig =plt.figure(dpi=1200,figsize=(8, 6))

plt.rcParams.update({

"font.family": 'TimesNew Roman',

'axes.labelsize': 'x-small',

'axes.titlesize':'x-small',

'xtick.labelsize':'xx-small',

'ytick.labelsize':'xx-small'})

clust = OPTICS(min_samples=3, xi=.05, min_cluster_size=.09)

# Run the fit

clust.fit(X)

labels_05 = cluster_optics_dbscan(reachability=clust.reachability_,

core_distances=clust.core_distances_,ordering=clust.ordering_, eps=2)

labels_10 = cluster_optics_dbscan(reachability=clust.reachability_,

core_distances=clust.core_distances_,ordering=clust.ordering_, eps=2.1)

space = np.arange(len(X))

reachability = clust.reachability_[clust.ordering_]

labels = clust.labels_[clust.ordering_]

G = gridspec.GridSpec(2, 3)

ax1 = plt.subplot(G[0, :])

ax2 = plt.subplot(G[1, 0])

ax3 = plt.subplot(G[1, 1])

ax4 = plt.subplot(G[1, 2])

# Reachability plot

colors = ['g.', 'r.', 'b.', 'y.', 'c.']

for klass, color in zip(range(0, 5), colors):

Xk = space[labels == klass]

Rk = reachability[labels == klass]

ax1.plot(Xk, Rk, color, alpha=0.3,markersize=5)

ax1.plot(space[labels == -1], reachability[labels == -1], 'k.', alpha=0.3,markersize=5)

ax1.plot(space, np.full_like(space, 2., dtype=float), 'k-', alpha=0.5,markersize=5)

ax1.plot(space, np.full_like(space, 0.5, dtype=float), 'k-.', alpha=0.5,markersize=5)

ax1.set_ylabel('Erişilebilirlik (epsilon uzaklık)')

ax1.set_title('Erişilebilirlik mesafesi')

# OPTICS

colors = ['g.', 'r.', 'b.', 'y.', 'c.']

for klass, color in zip(range(0, 5), colors):

Xk = X[clust.labels_ == klass]

ax2.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3)

ax2.plot(X[clust.labels_ == -1, 0], X[clust.labels_ == -1, 1], 'k+', alpha=0.5)

ax2.set_title('Otomatik Kümeleme\nOPTICS')

# DBSCAN at 0.5

colors = ['g', 'greenyellow', 'olive', 'r', 'b', 'c']

for klass, color in zip(range(0, 6), colors):

Xk = X[labels_050 == klass]

ax3.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3, marker='.')

ax3.plot(X[labels_050 == -1, 0], X[labels_050 == -1, 1], 'k+', alpha=0.1)

ax3.set_title('0.1 epsilon uzaklığı \nDBSCAN')

# DBSCAN at 2.

colors = ['g.', 'm.', 'y.', 'c.']

for klass, color in zip(range(0, 4), colors):

Xk = X[labels_200 == klass]

ax4.plot(Xk[:, 0], Xk[:, 1], color, alpha=0.3)

ax4.plot(X[labels_200 == -1, 0], X[labels_200 == -1, 1], 'k+', alpha=0.1)

ax4.set_title('0.12 epsilon uzaklığı \nDBSCAN')

plt.tight_layout()

plt.show()

# kMeans_Aglomeratif_Kümeleme, İlknur DÖNMEZ

# K-means

df = pd.read_csv("Data.csv")

from sklearn.cluster import KMeans

X = df.iloc[:, :].values

wcss = []

for i in range(1, 11):

kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)

kmeans.fit(X)

wcss.append(kmeans.inertia_)

import matplotlib.pyplot as plt

import matplotlib

plt.figure(dpi=300)

plt.figure(figsize=(3,2))

plt.plot(range(1, 11), wcss)

#plt.title('The Elbow Method')

plt.xlabel('Number of clusters',size=11)

plt.ylabel('Within-cluster sum of squares',size=11)

plt.savefig('filename3.png', figsize=(3,2),size=11, dpi=1200)

plt.show()

kmeans = KMeans(n_clusters = 2, init = 'k-means++', random_state = 42)

y_kmeans = kmeans.fit_predict(X)

labels = kmeans.labels_

import matplotlib.pyplot as plt

import matplotlib

#Plot the clusters obtained using k means

fig = plt.figure()

ax = fig.add_subplot(111)

kmeans = pd.DataFrame(labels)

scatter = ax.scatter(df['F1'],df['F3'], c=kmeans[0],s=5, cmap='viridis')

ax.set_xlabel('Öz nitelik 1',size=10)

ax.set_ylabel('Öz nitelik 2',size=10)

plt.colorbar(scatter)

plt.savefig('Kmeans.png', figsize=(3,2),size=11, dpi=1200)

# Agglomerative

import pandas as pd

import numpy as np

from matplotlib import pyplot as plt

from sklearn.cluster import AgglomerativeClustering

import scipy.cluster.hierarchy as sch

dendrogram = sch.dendrogram(sch.linkage(X, method='ward'))

model = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')

model.fit(X)

agglabels = model.labels_

import matplotlib.pyplot as plt

import matplotlib

y=agglabels

fig = plt.figure()

ax = fig.add_subplot(111)

ax.scatter(X[agglabels==0, 2], X[agglabels==0, 0], s=5, marker='o', color='red')

ax.scatter(X[agglabels==1, 2], X[agglabels==1, 0], s=5, marker='o', color='blue')

ax.scatter(X[agglabels==2, 2], X[agglabels==2, 0], s=5, marker='o', color='green')

ax.scatter(X[agglabels==3, 2], X[agglabels==3, 0], s=5, marker='o', color='black')

scatter=ax.scatter

ax.set_xlabel('Öz nitelik 1',size=10)

ax.set_ylabel('Öz nitelik 2',size=10)

plt.show()

# SOM_Kümeleme , İlknur DÖNMEZ

# Verimiz için SOM algoritmasonı uygulayalım:

from minisom import MiniSom

import numpy as np

import pandas as pd

data = pd.read_csv("Data.csv")

data = data.values

# Initialization and training

som_shape = (1, 2)

som = MiniSom(som_shape[0], som_shape[1], data.shape[1], sigma=.5, learning_rate=.5,

neighborhood_function='gaussian', random_seed=10)

som.train_batch(data, 500, verbose=True)

# Initialization and training

som_shape = (1, 2)

som = MiniSom(som_shape[0], som_shape[1], data.shape[1], sigma=.5, learning_rate=.5,

neighborhood_function='gaussian', random_seed=10)

som.train_batch(data, 500, verbose=True)

# each neuron represents a cluster

winner_coordinates = np.array([som.winner(x) for x in data]).T

# with np.ravel_multi_index we convert the bidimensional

# coordinates to a monodimensional index

cluster_index = np.ravel_multi_index(winner_coordinates, som_shape)

import matplotlib.pyplot as plt

%matplotlib inline

# plotting the clusters using the first 2 dimentions of the data

for c in np.unique(cluster_index):

plt.scatter(data[cluster_index == c, 0],

data[cluster_index == c, 1], label='küme='+str(c), s=5,alpha=.7)

# plotting centroids

for centroid in som.get_weights():

plt.scatter(centroid[:, 0], centroid[:, 1], marker='x',

s=5, linewidths=5, color='k', label='küme merkezi')

plt.legend();

Page updated

Google Sites

Report abuse