Student Population Modelling

An M.Sc. dissertation project (V. Mitsionis, Hellenic Open University) related to this research delivered software and video tutorials on how to use the software for your university.

It is all available (in English, under a Creative Commnons licence) at http://users.thesp.sch.gr/vmitsionis/sn_predictor/

Here are the key working assumptions and how they are related to the workflow.

Each student may select any subset of availlable/allowable modules to enroll at the start of each academic year. This is based on one's individual study pace, on module pre-requisites and on university regulations about minimun and maximum numbers of enrollements per year (per student). Taken together, these factors suggest that some study paths are more popular than others; in our software this is reflected by transition probabilities between "study-states".

These probabilities have to be calculated from university registry data (or, guessed at).

Attached is an example (see .gif file) of how one might calculate actual enrollment numbers across distinct study paths (calculating the probabilities is a straightforward extention). We use Graphviz to visualize a graph, after having generated its Graphviz-compatible description (see attached .DOT.txt file). To avoid cluttering we have attached a variant that prunes paths with very small enrollment numbers.

Here's our workflow to come up with such a graph.

First, let's see a snapshot from the registry data:

27259;2004;PLS50;-1

27260;2004;PLS50;5.9

27260;2004;PLS51;5.4

27260;2005;PLS61;6.8

27260;2005;PLS62;6.5

27260;2006;PLSDE;9.5

27261;2004;PLS50;6.8

27261;2005;PLS51;6.2

27261;2006;PLS61;7.5

27261;2007;PLS60;6.5

The first line tells us that the student with ID 27259 enrolled in course PLS50 in year 2004 and failed it to sit the exams (-1).

The second line tells us that the student with ID 27260 enrolled in course PLS50 in year 2004 and got a passing (>=5) grade (5.9).

Data is sorted first by student ID and then by enrollment year.

The graph description is produced using two tools: (1) a lexical analyser (attached .tar.gz file) that produces a raw and verbose graph description based on the registry data and (2) a wrapper bash script (attached .sh file) that invokes the lexical analyser and post-processes the output to generate the final graph description. Both have been tested in Cygwin under Windows XP.

We would love to hear from you if you 'd like to test the programs with your own data and maybe get back to us with comments, suggestions, etc. Make sure you start by reading the comments in the code to get a better appreciation of the context, since you might like to change a couple of values, constants, etc.