Research

Applying Counterfactual Inference Using Predictive Models to Determine Alternate Course Sequences for Helpable Students

Summer 2020

As part of the Summer Undergraduate Scholars Program (SUSP) at UW Platteville I received a grant to assist one of my professors in their research during the summer. There were three other students along with me that continued the research from the previous summer's group of students.

What was the goal of our research?

The previous summer's research team had done some analysis and found that 79%* of the students in UW Platteville's Computer Science/Software Engineering (CSSE) department had struggled at some point in their academic career. 'Struggling' was defined as either having a C- or below in one or more classes, or having a GPA at or below 2.0 for one or more semester. The data set consisted of all students who had graduated from UW Platteville from 2013-2018 and had declared themselves as a CSSE major for at least one semester. Having half of all students struggle is certainly not a good sign, so we wanted to determine if those struggling students could have been helped by changing their course sequences, possible even changing majors.

*= The previous students had actually found that only 49% of students struggled, but they had incorrectly calculated this, and also didn't account for students that were dismissed. We recalculated and included dismissals and found that 79% of students struggled in the department.

Dismissed students were also identified as struggling

Our Methodology

We decided early on that we were going to use principal stratification to determine if the treatment (new course sequence) was actually effective, however we soon found out that principal stratification didn't work exactly how we wanted. After this discovery we switched to using counterfactual inference. this method allowed us to answer the question 'What if Student A actually took Student B's course path instead?" using machine learning prediction models to predict a student's grade in an alternate course.

For example, if we were testing a student who was struggling as a Software Engineer (SE) major, we would want to test how they would have done on the Computer Technology (CT) or Computer Information Systems (CIS) course sequences. So to predict their performance in the CT course sequence, we would train the prediction models on the CT students who had taken the desired course sequence. We then would use the SE student's grades in the early courses as the input for testing in the model, and finally we would receive grade predictions for the SE student on the CT path.

My main contribution to this research was finding the common course sequences for each major, and identifying the groups of students who could be used for training. Once this was done, I prepared all the test students for each experiment and did the analysis for the results.

These are the most common course sequences that I was able to find in our data set, color coded by major. Due to the variance in course sequences, I was limited to only one course per semester, otherwise there would not have been enough students that had taken each specific course path to create a good training set for the models.

Results

Unfortunately due to our data and methodology used, our results suffer from multiple kinds of data shift. The three main types of data shift consist of Covariate Shift, Prior Probability Shift, and Concept Shift. From what we had time to analyze, we suspect that our results may suffer from all of these data shifts, and we believe this is because all the data we used was from a real data set. This causes the prediction models to typically predict the average grades for each class, with a few exceptions of course. Because of this we got the exact opposite results of what we expected. They showed that switching from Software Engineering (commonly the most challenging major out of the three) to the other majors would have a negative impact on the student. In reality, most students do much better after switching to one of the less challenging majors.

Even though we don't believe our results to be correct, there was one alarming find that affected our results quite heavily. In order for a struggling student to be eligible to switch majors, they must have at least complete the first two or three common courses (these are the courses colored in white in the course sequence chart above). While searching for dismissed students that could have tried a different course sequence, I found that only 14 of the dismissed students had made it past the first couple courses before being dismissed or dropping out. This shows that a majority of dismissals in the CSSE department happen before the students' third or fourth semesters.

We ran out of time before we could try any solutions to the data shift, so this research unfortunately is not complete. If there is a plausible way to fix this in future summer research, then perhaps this project will be continued.

For a more in-depth view of the entire research project, here are some links to our group poster and a presentation that I gave at the end of the summer.

Software Test Automation

Fall 2021

One of the last courses in our Software Engineering curriculum at UW Platteville requires us to choose a research topic and spend the semester preparing a research paper and 35-40 minute presentation on the topic. I chose the topic of Test Automation because quality assurance is one of my big interests in the software field, and I was intrigued by the topic of automating the test processes. My paper is mainly an informational paper on the various benefits and limitations of test automation, and how/where it can be implemented. Currently the paper is still in its early stages, but I will upload it and include a link here once it is in a more complete state.

Page updated

Google Sites

Report abuse