INTERNSHIP PROJECT

A novel grading system to address the performance gap

Final Summative Report
Delta Internship, UW-Madison

Balázs Strenner

March 30, 2015


Abstract

A novel grading system was introduced in a math class that adjusted the workload of each student to be inversely proportional to the student's performance in the class. The goal was to decrease the performance gap in the class, and incentivize struggling students to work more and be successful. Although the effect of the new grading system on performance wasn't possible to measure accurately, students' reactions to the grading system provided useful qualitative feedback to assess the success of the intervention.


1 Introduction

My project stemmed from the observation that in my previously taught courses students failing the course tended to already lag behind in the beginning of the semester. It bothered me a lot that even though I could identify these at-risk students in the first few weeks, I was never able help them escape failure. The goal of my project was to come up with and implement a strategy that improves this situation.

The idea was a grading system whose purpose was providing strong incentive for poorly performing students to put more effort into the course and thus catch up. (The details of the grading system can be found in the Appendix.) With the goal of recognizing individual learning styles, this extra effort was understood really flexibly: anything helping the student learn was rewarded.

2 Methods

The project took place in a Math 132 class I was teaching in Fall 2014. This is the capstone course in the Math 130-131-132 sequence for future elementary and middle school teachers, covering probability, algebra and statistics at middle school level. The initial enrollment was 12, but one student dropped the course shortly after halfway into the semester.

2.1 Math performance

My original plan for measuring the success of the intervention was to analyze homework and exam scores throughout the semester and check if the gap between well and poorly performing students decreases over the semester. This quickly turned out to be a bad idea and therefore was abandoned, because it turned out that homework scores are a very unreliable measure of performance. I had both students who easily did well on exams, but were sloppy on the homework, and also struggling students who scored high on homework because they regularly asked for help in office hours. The exam scores seem to be a relatively accurate measurement of knowledge, but I did not think that two midterm exams and a final exam would give me enough data in this small class.

2.2 Math anxiety

I was also interested in how students’ math anxiety changed throughout the semester relative to their performance. Math anxiety was measured at the beginning and end of the semester using the Abbreviated Math Anxiety Scale (AMAS) developed by Hopko et al.  (2003). This scale consisted of 9 Likert-type questions each asking how anxious a student is in certain situations involving math.

On the post-test students were also asked if they feel that their scores are higher or lower than on the pre-test and were asked about the reasons of the change. Students filled out the test anonymously with a unique identifier that allowed comparing individual pre- and post-scores.

2.3 Survey on the grading system

After the first midterm I became aware that a few students disliked the unusual grading system and thought it was unfair. Around the second midterm it also became clear that the grading system doesn’t function according to its intended purpose, since it didn’t seem to incentivize the targeted students as desired. I decided that I was going to change the grading to a more conventional system as long as all students preferred that, and this presented a perfect opportunity to survey my students about their preferences and thus collect useful data for my research.

The survey (conducted online and anonymously) asked students’ opinion about the original grading system as well as two alternatives. The first alternative was simply a modified version of the original where the two midterms counted for only 25% of the Completion Grade instead of 50%. Since the homework and participation scores were very high and it was the midterms that pulled down the Completion Grades, this would have had the effect of reducing the amount of extra work necessary to bring the Completion Grade up for students not doing well on the midterms.

The second alternative was a more radical change, a usual averaging method for calculating final grades with the following weights: homework and participation (10%), two midterm exams (25%-25%) and final exam (40%). This version completely eliminated the need to do any extra credit activities for all students, but kept the relative weights of each grade item about the same.

Students were asked to rate how much they preferred each grading system on a scale of 1 to 5 and to provide a written explanation of their choice. After that, they were asked to rate the extent to which certain factors influenced their preferences.

3 Results

3.1 Math anxiety

10 students filled out both the pre- and post-tests. As Table 1 shows scores have increased for almost all items. 5 students had significantly (by more than 10%) increased scores at the post-test, and only 1 student has a significantly smaller score.













1 2* 3 4* 5* 6 7 8* 9 Overall











Pre-test 1.25 3.54 1.42 3.67 3.67 1.13 1.21 3.58 1.58 2.34











Post-test 1.4 3.9 1.9 4.1 3.6 1.7 1.9 3.9 1.8 2.69












Table 1: Mean scores for each of the 9 items of the anxiety scale in the beginning and end of the semester. For each item 1 corresponds to no anxiety and 5 corresponds to much anxiety. The items that are starred ask about a math exam. (E.g., item 2 reads “Thinking about an upcoming math test one day before.”)

6 students correctly guessed how their score had changed since the pre-test, and no student guessed the opposite of what actually happened (guessing an increase when it was actually a decrease or vice versa).

For the reasons of the change in anxiety, a large majority, 7 out of the 10 students mentioned the end of semester stress and upcoming tests. This was the only reason mentioned by more than one student and no students mentioned anything about how the class was actually taught.

3.2 Survey on the grading system

Students showed the highest preference for the 2nd alternative and slightly favored the 1st alternative to the original grading system (see Table 2). The mean scores of the influence of various factors on their preferences is shown by Table 3.




Grading system Mean Preference Rating


Original 1.5


1st alternative 2.2


2nd alternative 4



Table 2: The rating was on a scale of 1 to 5 with 5 being the highest preference and 1 the lowest.




Factor Importance for student


Experience with similar grading systems 3.2


Less extra credit activities to do 2.2


Fairness 3.4


Higher grade 2.9


Clarity of the grading system 3



Table 3: The rating was on a scale of 1 to 5 with 5 meaning “very important” and 1 meaning “not at all important”.

In the written explanations, by far the most common complaint (mentioned by 7 out of the 10 students) about the original system and the first alternative was that they depend too heavily on the exams. For the same reason some students also criticized the second alternative preferring the homework to count for 15% or 20% of the final grade instead of 10%. A few students expressed concerns about their grade depending heavily (40%) specifically on the final exam.

4 Discussion

4.1 Math anxiety

A large majority of students (70%) said that they were more anxious at the end of the semester because of the end-of-semester stress caused by upcoming exams. This might suggest that the AMAS scale cannot be effectively used in the classroom to measure general math anxiety, because the scores are significantly influenced by the momentary circumstances (exam time or not). (The high anxiety towards exams is also shown by the fact that scores for questions about math test were much higher than for other items.) Therefore no conclusions for the effectiveness of the intervention could be drawn from this data.

4.2 Survey on the grading system

It is clear that students preferred the 2nd alternative the most and the original system the least.

It is much less clear what conclusions can be drawn from the ratings of the importance of certain factors. Only the factor “Less extra credit activities to do” was rated significantly differently from the others, but it is not surprising since most students were doing well in the class, so they had no or very little extra credit activities to do. Also, the rating of fairness which was rated highest should be interpreted with caution since some students raised that issue in front of the class multiple times during the semester, and this might have biased other students’ opinions.

It were the written explanations that revealed that students’ main issue with the original system was the heavy dependence of their grades on their exams, but the data collected is not sufficient to tell why students do not like this. I speculate that the main factor is that students have relatively good control over other grade items. For instance, investing sufficient amount of time and getting help from peers or from the instructor basically guarantees a good grade for homework. There is however much uncertainty about exams, and students may feel that there is randomness involved in their grades.

In my experience, however, this is not the case. Even though there is clearly randomness involved in exam performance (whether those problems appear or the exam the student is most comfortable with, how well the students slept the night before, etc.), I always felt that exams grades reflected knowledge well. Certainly better than homework grades that can easily be misleading, since students may hire tutors to do their homework or purchase solution manuals to textbooks. And this is precisely why I have chosen the grade calculation to depend heavily on the exams: I thought it was fairer and a more accurate measurement of knowledge.

5 Learning experience

Unfortunately the intervention did not have the desired effect. This new grading system has proven to be too unusual and complicated for students to understand it properly. There were students who did way too much extra credit work even though they didn’t need it, and also a student who did need to do a significant amount, but did not do any. Moreover, my impression was that simply talking to the at-risk students in person provided them at least as much encouragement to improve as the grading system.

Students seemed to regard the need to complete extra credit activities as a burden and not a learning experience, therefore all chose a very uncreative task of writing solutions to past exam problems. Therefore this invitation to creative and authentic learning was ineffective since students solve past exam problems as preparation for the exams anyways.

Moreover, not only did the intervention not serve its intended purpose, students disliked it for the heavy dependence of their grade on the exams. The reason I did not see that this would be a problem might be that in the educational culture I am coming from (Hungary), the common way to grade is based on a single oral final exam.

6  Reflection on Delta pillars

About teaching-as-research, I mostly learned things one should not do. I made several mistakes when planning the classroom research.

First, the change in math performance was intended to be measured by homework performance throughout the semester. While this provides a bigger dataset then exam scores or a pre- and post-test, I realized that homework performance reflects actual math performance very poorly. For instance, there were weaker students coming to office hours regularly or studying with peers who therefore had good homework scores. There were also stronger students who turned in sloppy work and got lower scores. (They probably thought they were going to do well on the exams anyways, so homework, which is only a small percentage of their grade, wasn't going to matter much.)

It also turned out that measuring change in math anxiety by pre- and post-tests in the beginning and end of the semester is pointless. The tests showed higher anxiety for every item on the instrument at the end of the semester. Students consistently reported at the end of the semester that the reason their anxiety is higher because the exam period is coming up. The fact that students naturally become nervous because of end-of-semester exams seems to make it very hard if not impossible to draw conclusions on how the class itself affects students’ anxiety towards the subject.

During my project the Internship Seminar provided an inspiring learning community. Although hearing about other people’s projects did not turn out to have much influence on my project, I now have many new ideas for the future. The seminar also made it possible to learn how others think about teaching-as-research projects, and to develop certain skills like how to write a good questionnaire.

The main goal of my project was to deal with the diversity of students’ math skills they enter the class with by creating a grading system that provides more incentive for poorly performing students to work more and therefore improve and catch up by the end of the semester. Unfortunately the intervention seemed to have been unsuccessful for several reasons. The grading system turned out to be complicated and unusual, so students had a hard time understanding it properly. Moreover, some students (interestingly, good students!) found it unfair that poorly performing students need to work more. And most importantly I had the impression that talking to these weaker students in person was immensely more motivating for them than what the grading system provided.

Appendix – The details of the grading system

The following formula was used to calculate the final numerical grade:

Final Grade = Completion Grade x Exam Total

Here the Exam Total is the average of the grades of the three exams, weighted as: Midterm 1 (25%), Midterm 2 (30%), Final Exam (45%).

The Completion Grade is the average of all other grading items and the two midterms, plus any other extra credit work that makes up lost points: Homework and participation (50%), Midterm 1 (25%), Midterm 2 (25%), Extra credit activities/projects (unlimited). The Completion Grade was capped at 100%.

Even students struggling on the homework and midterms had the opportunity to obtain 100% for the Completion Grade with sufficient effort to make up for the lost points. The more a student was struggling with the course the more extra effort was needed to be put into the class. For example, a student with perfect homework and midterm exam score had a completion grade of 100% without doing any other activities. A student who did fairly well on these items and got 85% on average needed to complete some projects to earn a 100%, while a student who only got 60% needed to do even more.

An example for calculating the final numerical grade is as follows. Assume that a student got and Exam Total of 80% and a Completion Grade of 95%. Then the final numerical grade is 0.95 0.8 = 0.76 = 76%.

References

   Hopko, Derek R, Mahadevan, Rajan, Bare, Robert L, & Hunt, Melissa K. 2003. The Abbreviated Math Anxiety Scale (AMAS): construction, validity, and reliability. Assessment, 10(2), 178–82.

Comments