Description

Post date: Oct 23, 2009 3:46:30 AM

Why I do this

Back in 1994, I strongly believed that Penn St. was the best team in college football. But Nebraska remained #1 through the course of both teams' undefeated season and easily won the final vote in both polls. There had been #1 controversies in 1989, 1990, 1991, and 1993 as well, so this was just the last straw for me. At the beginning of the next season, I started doing my own ratings just like I imagine poll voters do: looking at results, adding in observations from games I saw, and putting the teams in order. This was long before the days of DVR, and I never figured out how to use a VCR to record something I wasn't watching, so I mostly used it so I would know what games to watch on Saturdays.

In 2003, I believed LSU was (1) most deserving of the BCS title game, and then (2) more deserving of #1 at the end of the year. In following the progression of the computer polls (which became relevant to the national title with the advent of the BCS), I read a lot into them and wrote to the authors of some of the BCS ratings. For example, I asked Jeff Sagarin what his ratings would have been in 2002 had Miami not been flagged for pass interference in the first overtime. I searched for a long time to come up with a good way of rating teams without either a secret formula being involved or something very complex (involving such things as logarithms and least-squares). I finally found one, GBE, and my own ratings began developing from there.

College Football News has an interesting take on ranking teams:

"You must rank teams based on how good you believe they are at the moment. That's the point when it comes to putting the teams in some order. However, once the year is complete, it's only fair to take the subjectivity out of it and go by what actually happened on the field."

I believe that first part is utter nonsense. So if a winless team beats the best team in the country, I should then rank the winless team #1 the following week, assuming I was correct in picking out who #1 was up to that point? This philosophy is also why you see "forgiveness votes" given to a team like USC who loses to a team like Stanford a couple of years ago. They lost ground in the rankings for a while but since (I imagine) voters believed they were good (and they were decent on the whole), except for that moment that had passed, they deserved to be ahead of other teams with perhaps better on-field results.

The last part: "take the subjectivity out of it and go by what actually happened on the field," is exactly what needs to be done, but why on earth would you want to wait until the year is complete? There is no point to doing weekly ratings then.

General description

What I try to measure is what a team has accomplished to date through wins and what they've failed to accomplish in losses. Note that as of the rankings linked to above, the average team's score is -.33. So I suppose, like any other ranking, some teams will view a medium rank as an accomplishment, and some will not be satisfied with even a very high rank. The average number being negative is not indicative of how I think one should feel about a given team being average. I just want to rate teams objectively with an eye toward who the best teams are rather than placing the most average team 60th.

Leaving out the math for a moment...I do not use a strict RPI formula, but it is more record-based and there is no complex math like there is in an ELO formula. This is a two-tiered ranking system. An initial rating, which is similar to the RPI, is first computed, which allows me to then rate teams as opponents. This initial rating is similar to the RPI in that opponents' winning percentage counts for twice as much as opponents' opponents' winning percentage. It is dissimilar in that the strength of scheduled (SoS) is on a 12-point scale and is multiplied by the record. Then I make a dramatic departure from the RPI. Every win (as long as the opponent has any value in the initial rating which I just described) adds to a team, and every loss subtracts from a team. So a team with no wins has only negative weeks. Unlike in the RPI, wins are not segregated from strength of opponent. For instance, in the RPI, if team A and team B beat team C, the team with the better SoS (which results from unrelated games) between team A and team B gets more points. This rating ensures team A and team B will get identical credit for the same win. Margin of victory is not considered except under a very precise set of standards that exist only to address the heightened degree of difficulty that exists when a team plays on the road (see below).

How I get the SoS for the initial rating is like this: (1/(opponents' winning percentage)) x 2 + (1/(opponents' opponents' winning percentage)). Then I subtract that number from 12. If it's a really bad schedule (like Hawaii for a long time in 2007), that first number I compute is a number higher than 12, so it doesn't work (resulting in unrated teams early on). And the rest of what I do is multiply the SoS by the winning percentage (with a couple adjustments for home and away and FCS {I-AA} opponents), which gives me a "value" for each team. Beating a team of a certain value gives you one number (the better the value, the higher the number); losing to them gives you a different number (the better the value, the lower the number). Then, I take the loss number and subtract it from the win number (again with home/away and adjustments for wins over FCS {I-AA} teams).

The second tier of the ratings, technical description and illustration

The rating is divided by 30 to give me the "win number," and 1/(the rating times 2) to give me the "loss number." Losses hurt more than wins help, but the priority is to determine the best teams, so I think this is appropriate. I want a bad loss to hurt, and when it is a close call in strength of wins, I'd rather benefit the 1-loss team with a more understandable loss. Teams that win most of the time get a little bit more leniency for playing weaker teams. But the other side of that coin is that if most of the opponents are not very good, there is still a relatively limited opportunity to accumulate points.

In 2008's final ratings, Utah did not have a loss and got a lot of credit for beaten once-beat Alabama but still finished behind Florida and Texas. Florida had a worse loss than Texas (even though the team to which Florida lost, Ole Miss, defeated the team to which Texas lost, Texas Tech) did but had more good wins. Texas had more good wins than Utah. Alabama (a loser to Utah) and Oklahoma (a loser to Texas) were similar, and Oklahoma was a little higher. Also, Texas' other wins were still enough to overcome the one loss. I think this is an appropriate balance. The loss didn't hurt Texas so much that they dropped below Utah, but losses still make enough of a difference to help the better records go to the top. I don't want the rest of the season to drag a team like Utah down too much despite a good win like that, but teams without good wins and an otherwise similar schedule would have no chance of getting so high.

Home/away

This is all I do for home and away games. First of all, this is only triggered if the home teams wins by 3 or less or in overtime. It is pretty consistent, across many seasons and both NFL and college football, that a home team on average has a 3-point advantage. Most ratings which consider home/away treat all games universally, but I do not believe in this since it probably didn't matter where the game was played if a team wins by more than that. Think about it this way. If the game were played in an unfamiliar, centrally located stadium with no people the game would start off 0-0 and there would be no effect on what happened due to location. So the average advantage is exactly 0. That means a team even winning by 1 has shown itself to be superior for that instance. It gets a little murkier when it is not a neutral environment. Since the average advantage is about 3 for the home team, that's all I'm willing to consider, even though of course a crowd can help or hurt one play, and that play could have a much bigger impact than three points (an interception return instead of a touchdown, for example, {assuming one-point PATs} is a 14-point variation). Anyway, the only impact is a winning home team's victim is counted for 9/10 as good of a team as it is for the home team's rating, and the losing team's opponent is considered 11/10 as good as their rating actually indicates they are. This makes the win count for just a little less, and the loss subtracts just a little less.

I-AA games

I used a complicated formula that I can't even quote you exactly to determine how much credit to give to I-AA wins. Basically, I differentiated how I-AA opponents perform against each other (exactly .500 of course) from how they perform against I-A teams in general.

This enabled me to value the average win against a I-AA opponent. So you do get credit for beating a I-AA team that is successful against other I-AA teams over one that is not, but even beating an otherwise perfect I-AA team who has a win over a mediocre I-A opponent will only get you about as much credit as beating a mediocre I-A team yourself. Just like I-A opponents, if a I-AA team has no wins, you get no credit. And I made it so that if the I-AA team gets few wins it is only marginally better than beating a I-A team with no wins. But there is no automatic quality that exists just because a team is I-A (Western Kentucky, who finished #120 in 2008, proved nothing by moving up to I-A and losing). You don't get any credit for suiting up and going onto the field no matter who your opponent is. A I-AA team who beats I-AA opponents still has shown some degree of competence in a football game.

As for how this affects SoS, I-AA wins are not factored in (with the adjustment mentioned on 9/27, I'm going to subtract back out the I-AA losses). While your first reaction may be that this helps teams and opponents of teams who play I-AA games, except as to SoS ratings, this is not the case. The credit that a team gets is added after the first step of the calculation. SoS is as strong as a team’s schedule against I-A opponents only, although FBS (I-A) opponents who play FCS teams have those FCS games reflected in the initial ratings.

SoS computation

There are numerous different ways to calculate SoS according to opponents and opponents’ opponents as I do. Of course I did not invent this idea. Some just add together twice the wins and losses (if they even add it that part twice) for opponents plus the cumulative records of opponents’ opponents. You can also compute the average for each component (opp. and opp. opp.), again factoring in opponents twice. I believe this method is unsatisfactory. I want each game to have the same weight (which is why I made the adjustment mentioned above). The rating is based on accomplishments. Just as beating a team in Week 1 is equal to beating that team in Week 12, Week 1 should have the same impact on SoS as Week 2. So what I do is average wins and losses for each week. For example, if after 4 weeks, a team has played a 1-3 team, a 2-2 team, and a 3-1 team, and had a bye week, its the total record of opponents would be 6-6. It would be the same if the records were 1-4, 2-1, and 3-1. I think the team that has played more games should count for more because there is more basis on which to judge. This also keeps FCS wins by opponents from playing too large of a role.

Although many rankings systems (Sagarin, for instance) average ratings from the final results (after the fact) and I have an SoS rating as a preliminary step, something I have in common with those systems is that there is more to the formula than SoS and record. SoS listed may or may not correspond to the SoS computed by an average of opponents or to dividing the final ratings by winning percentage. Please be aware if this fact in using my SoS to argue for or against a certain team.

Timing of first rating

I sometimes still get unusual results (such as -25 ratings), even teams that are impossible to rate, in the first couple of weeks when I do these ratings in October. As the season goes on, the chance of these events diminish. 5 weeks of play seems to take care of most statistical anomalies that run afoul of my system. So to begin the ratings any earlier would neither be constructive nor instructive.