Title of the academic study:

Meta Learning-Driven Prediction of a Base Learner’s Accuracy via Geometrical Complexity Measures over Diverse Datasets


Faruk BULUT, İlknur DÖNMEZ


The study is submitted to a journal. It is under review. If it is accepted for publication, later, give might a reference of our study in your paper as: 

Faruk BULUT,  Meta Learning-Driven Prediction of a Base Learner’s Accuracy over Diverse Datasets, ABC Journal, 2025.


Web link of the article is here : xxx


The Codes, Datasets, and Related Files

1 - DT_regression.arff : This file is a Weka format. In this file there are meta features of datasets. The last attribute in this ARFF file is the DT Accracy rate. Linear Regression model is computed with this file.  Download it from here.


2 - DT_regression_Normalized : In this file, all attributes in the file DT_regression.arff are normalized. Download it from here.


3 - Dataset spesifications.xlsx : There are external features of all datasets in this file. Download it from here.


4 - Normalized Datasets.xlsx : Some tabels in the article. Download it from here.


5 - MATLAB codes UCI datset analyzer : Source Codes. Download it from here.


6 - DCoL Software Download URL is here: http://dcol.sourceforge.net/

You need the DCoL-v1.1.tar.gz file. If you cannot download it, please send me an e-mail: bulutfaruk [at] gmail [dot] com. But don't forget to use this software in a Linux platform. I suggest you to use Ubuntu. 


7 - 115_UCI_Datasets.rar : The dataset collection. It is about 11MByte. All of them are derived from the UCI Repository. They are in a zipped folder. Unzip them first. Download it from here.


8 - MATLAB_Code_Decision_Tree.m : MATLAB code file that calculates the Decision Tree Accuracy for each of the dataset. Download it from here. You need a MATLAB (c) platform. The version should be higher than R2015.


9 - The names of the datasets: All of the names in MATLAB format, it is as:
alldatasets={'hillValley','bank','liver-disorders','bupa','cmc.2c2','liv','cmc.2c2','bpa','hab','cmc.2c0','breast-cancer','haberman','credit-g','yea.2c0','cylinder-bands','sonar','pim','glass.2c1','diabetes','lung-cancer','cmc.2c1','cmc.2c1','transfusion','vehicle.2c1','h-s','abalone.2c6','vehicle.2c0','veh.2c0','heart-statlog','abalone.2c7','gls.2c0','glass.2c0','colic','hepatitis','primary-tumor.2c0','abalone.2c5','column3C.2c0','column3C.2c2','lymph','abalone.2c8','waveform.2c0','wav40.2c0','wav21.2c0','mag','autos.2c1','balance-scale.2c0','autos.2c2','bankruptcy','waveform.2c2','bal.2c0','waveform.2c1','credit-a','abalone.2c4','glass.2c2','labor','ionosphere','col10.2c4','ecoli.2c1','audiology.2c3','ringnorm','col10.2c5','spambase','tic-tac-toe','ecoli.2c3','audiology.2c4','balance-scale.2c1','spa','monk','thy.2c0','ecoli.2c2','vehicle.2c3','wineCultivars.2c1','wdbc','ecoli.2c0','wne.2c0','wineCultivars.2c2','wineCultivars.2c0','vote','win.2c0','iris.2c1','splice.2c2','authors.2c0','vehicle.2c2','iris.2c2','tao','ozone','zoo.2c2','column3C.2c1','audiology.2c0','pageblocks.2c0','pbc.2c0','ecoli.2c4','solar-flare_1','d159','sick','pageblocks.2c4','pageblocks.2c1','anneal.2c1','kr-vs-kp','opt.2c0','statlog-sgm.2c0','seg.2c0','col10.2c6','zoo.2c0','soybean.2c3','pageblocks.2c3','pen.2c0','pageblocks.2c2','hypothyroid.2c0','mushroom','badges','badges2','col10.2c0','iris.2c0','zoo.2c3'};


10 - The whole experimental results in Excel format is here


11- The Whole normalized Experimental Results (Long form of Table 1) including 114 lines can be accessed here.


12 - The Correlation Matrix

% Correlation Matrix

% Load the whole experimental results into A

r = corr(A)

isupper = logical(triu(ones(size(r)),1));

r(isupper) = NaN

% Plot results

h = heatmap(r, 'MissingDataColor','w');

labels = ["F1", "F1v", "F2", "F3", "F4", "L1", "L2", "L3", "N1", "N2", "N3", "N4", "T1", "T2", "DTAcc"];

h.XDisplayLabels = labels;

h.YDisplayLabels = labels;