Data Sets‎ > ‎

EPM Data Set

Educational Process Mining (EPM): A Learning Analytics Data Set (Version 1.0)


The experiments have been carried out with a group of 115 students of first-year, undergraduate Engineering major of the University of Genoa. 

We carried out this study over a simulation environment named Deeds (Digital Electronics Education and Design Suite) which is used for e-learning in digital electronics. The environment provides learning materials through specialized browsers for the students and asks them to solve various problems with different levels of difficulty. For more information about the Deeds simulator used for this course look at:

and to know more about the exercises contents of each session see 'exercises_info.txt'. 

Our data set contains the students' time series of activities during six sessions of laboratory sessions of the course of digital electronics. There are 6 folders containing the students’ data per session. Each 'Session' folder contains up to 99 CSV files each dedicated to a specific student log during that session. The number of files in each folder changes due to the number of students present in each session. Each file contains 13 features. See 'features_info.txt' for more details.

For the details of activities performed by the students during the course, see 'activities_info.txt'

The data set includes the following files:

- 'README.txt'

- 'features_info.txt': contains information about the variables used on the feature vector.

- 'features.txt': List of all features.

- 'activities_info.txt': contains information about the variable 'activity'.

- 'activities.txt': list of all activities.

- 'exercises_info.txt': contains information about the variable 'exercise'.

- 'grades_info.txt': contains information about the grade data.


- 'Processes': contains the data files from Session 1 to 6. 
Do you wish to have all the data in one CSV file?  DOWNLOAD

- 'logs.txt': shows information about the log data per student Id. It shows whether a student has a log in each session (0: has no log, 1: has log).

- 'final_grades.xlsx': contains the results of the final exam in two sheets.

- 'intermediate_grades.xlsx': contains the grades for the students' assignments per session.

- 'final_exam.pdf': shows the content of the final exam (original in Italian).

- 'final_exam_ENG.pdf': shows the content of the final exam translated in English.


Use of this data set in publications must be acknowledged by referencing the following publication [1] 

[1] M. Vahdat, L. Oneto, D. Anguita, M. Funk, M. Rauterberg.: A learning analytics approach to correlate the academic achievements of students with interaction data from an educational simulator. In: G. Conole et al. (eds.): EC-TEL 2015, LNCS 9307, pp. 352-366. Springer (2015).
DOI: 10.1007/978-3-319-24258-3 26

This data set is distributed AS-IS and no responsibility implied or explicit can be addressed to the authors or their institutions for its use or misuse. Any commercial use is prohibited.

Other Related Publications:

[2] M. Vahdat, L. Oneto, A. Ghio, G. Donzellini, D. Anguita, M. Funk, M. Rauterberg.: A learning analytics methodology to profile students behavior and explore interactions with a digital electronics simulator. In: de Freitas, S., Rensing, C., Ley, T., Munoz-Merino, P.J. (eds.) EC-TEL 2014. LNCS, vol. 8719, pp. 596–597. Springer (2014).

[3] M. Vahdat, A. Ghio, L. Oneto, D. Anguita, M. Funk, M. Rauterberg, Advances in learning analytics and educational data mining, in: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2015).

Mehrnoosh Vahdat, Luca Oneto, Davide Anguita, Mathias Funk, and Matthias Rauterberg. September 2015.