NCAA Tournament: Predicting the Bracket, At Large Selections

Updated January 2017

As discussed on the "NCAA Tournament: Selection, Seeding, and Bracketing Criteria" page, there are specific criteria the Women's Soccer Committee must apply in making the 33 at large selections for the NCAA Tournament.  In preparing this page, I'm assuming that readers are familiar with those criteria.  If not, I urge you first to go to that page and become familiar with the criteria. I describe them under the heading "Selection Criteria."

To better understand how the Committee weights the different criteria and decides how a team "scores" under each criterion, I did a study of the Committee's at large selections over the last ten years, from 2007 through 2016.  In the study, I looked for criteria-related patterns that every Committee at large selection met and that every Committee decision not to select a team met.  From the study, I identified a number of patterns for selection and non-selection that are consistent with every Committee decision.  These patterns are set out below.

The patterns serve two purposes:

Anyone interested in predicting what the Committee's at large selections will be can apply the patterns to the field of teams that are not Automatic Qualifiers to see what at large selections and non-selections the Committee will make if its decisions are to be consistent with its decisions over the last ten years.  And, if the patterns as applied to the current year's data do not identify a full field of 33 at large selections or identify more than 33 at large selections, a careful look at how many of the patterns, if any, the candidates meet for and against selection can help identify which teams are "in play" for and likely to be awarded the last of the at large positions.

Perhaps more important, the patterns are a vehicle for testing whether the Committee's decision-making is consistent from year to year.  If the Committee goes off the beaten path, its decisions will be inconsistent with the patterns.

In order to identify patterns, I had to consider the likely intended purpose and use of each criterion.  This is not always obvious.  For example, how does the Committee use the Adjusted RPI?  What about the Adjusted Non-Conference RPI?  What is the purpose of the "Results Against Teams Already Selected" criterion?  What is the purpose of the "Results Over the Last Eight Games" criterion?   How important is a team's conference's average Adjusted RPI; and how important is the position the teams finished in its conference's regular season and tournament?

I also had to consider practical factors that would influence the Committee's decision-making process such as:

The compressed amount of time within which the Committee must make its decisions.  Although the Committee meets periodically during the season, it does not receive the final end-of-season results from the NCAA staff until Sunday afternoon of the last day of the regular season.  The Committee then has less than 24 hours within which to fill out the Tournament bracket in order to be able to announce it on Monday afternoon.

The very large amount of information available to the Committee.  There are enough data to be overwhelming, so Committee members need to figure out ways to organize them and keep their analyses relatively simple.

My system does not pretend to match the actual thought process the Committee goes through in arriving at decisions or that any individual member goes through.  Rather, my system simply looks at the Committee's decisions from the outside, compares those decisions to the criteria and the criteria-related data, and says that all of the Committee's decisions over the last ten years were consistent with certain patterns and here's what those patterns are.

THE CRITERIA AND HOW I QUANTIFY TEAMS' CRITERIA-BASED "SCORES"

Some of the NCAA's criteria are based on pure numbers -- such as a team's Adjusted RPI and its ARPI rank -- and some are less defined -- such as Results Against Common Opponents.  The following is a list of the criteria my system considers and, where a criterion is less defined, an explanation of how I quantify the data related to that criterion in order to be able to identify the Committee's decision-making patterns:

Adjusted RPI

ARPI Rank

Non-Conference ARPI

NCARPI Rank

Conference Average ARPI

Conference Average ARPI Rank, in relation to the other conferences

Conference Standing:  For this, I determine where a team finished in its conference regular season using the typical point system of 3 points for a win and 1 point for a tie.  I also determine where a team finished in its conference tournament, if the conference has a tournament.  As an example of how I determine teams' conference tournament finishes, if a conference has a four-team tournament, the tournament champion finishes 1, the runner-up finishes 2, and the two losing semi-finalists finish 3.5 (the average of 3 and 4).  If there are six other teams in the conference, I assign them points based on their finishing position in the conference regular season competition.  Once I have a team's regular season and tournament finishing positions, I average them to produce the team's Conference Standing.  For teams with no conference tournament, I assign them their conference regular season finishing position.

It's important to be clear that when my position looks at Conference Standing, it combines a team's regular season conference standing and its tournament finishing position.  It does not look at regular season standing and tournament finishing position separately.  Some advocates, for teams that finished high in their conference regular season competition, and taken the position that the Committee separately should consider and make decisions based on their conference regular season finish.  I've carefully reviewed the data in relation to the Committee's decisions over the last 10 years and have concluded that the Committee, at least as a whole, does not do that.

Head to Head Results:  For this criterion, I consider all game results where two teams in the ARPI Top 60 play each other.  I use the top 60 because over my ten year study period, no team outside the top 60 has received an at large selection.  (In fact, no team outside the top 57 has been selected, when teams are rated using the 2015 -- in other words, current -- ARPI formula.)  For each Top 60 head-to-head game, I assign points to the two teams as follows:

If Team A has beaten Team B, then Team A receives 2 points

If Team A has lost to Team B, then Team A receives -2 points

If Team A has tied Team B @ Team B's site, then Team A receives 1 point

If Team A has tied Team B @ a neutral site, then Team A receives 0 points

If Team A has tied Team B @ Team A's site, then Team A receives -1 point

I go through this Head to Head points award process for both Team A and Team B.  And, if Team A and Team B have played multiple times, I go through the point award process for each game.  (Teams have played as many as three times within the same season.)

Once I've awarded points for all Top 60 head to head games, I add up each team's points from all of its Top 60 head to head games and divide the sum by the number of head to head games the team played.  This gives the team an average point score per Top 60 head to head game.  This amount is the team's Top 60 Head to Head Results score.  Occasionally a Top 60 team has no head to head games with any other Top 60 team.  Under my point assignment system, that team receives a Top 60 Head to Head Results score of 0.

Common Opponent Results.  For this criterion, I consider each pairing of Top 60 teams to see if and where those two teams had common opponents.  For each common opponent that two paired teams have, I look at each team's result against the common opponent and assign that result a point value using the same point assignments as for head to head results above (+2, +1, 0, -1, -2).  For each team I then add together all of its points for common opponents with the other team, which gives me the total common opponent points for each team.  I then determine the difference between the two teams' total common opponent points.  The team with the higher total common opponent points will have a positive difference and the other team will have a negative difference (unless the total common opponent points of the two teams are equal, in which case the difference will be 0 for each team).

Once I've done this for all pairings of top 60 teams, I add up the differences that each team has to get its total differences.  I then divide each team's total differences by the number of common opponent games it played to get an average difference per common opponent.  This amount is the team's Top 60 Common Opponent Results score.

Common Opponent Results Rank.

Results Against Top 50 Teams.  One of the NCAA's criteria is Results Against Teams Already Selected for the bracket, if the result is against a team ranked #75 or better by the ARPI.  For what I'm doing, this is impossible to program in advance of the Committee making its decisions.  As a surrogate for this criterion, I use a team's results against top 50 teams.  (I could have used top 60 teams, but from what I've seen over my nine year study period, the Committee does not consider results against teams ranked below #50 to be "impressive."

To score these results, I use a geometrically increasing scoring system as follows:




A key point to understand about the Results Against Top 50 Teams formula is that it looks only at good results (wins and ties) and is very heavily weighted towards good results against highly ranked teams.  Thus a good result against a team ranked 1 or 2 in the ARPI is worth 6 times as much as a similar result against a team ranked 7 to 9.  A good result against a team ranked 7 to 9 is worth 5 times as much as a similar result against a team ranked 13 to 15.  And so on.

Results Against Top 50 Teams Rank.  Once I have determined the top 60 ARPI teams' Results Against Top 50 Teams scores, I also rank the top 60 teams based on these scores.

I use this scoring system for results against Top 50 ranked teams because, after reviewing a number of years of Committee decisions, it appears to me that the primary purpose of the Results Against Teams Already Selected criterion is to show the level at which a team has demonstrated it is able to compete.  So, for example, a team that has beaten or tied the ARPI #1 or 2 ranked team is going to be in a highly preferred position for an at large selection, since it directly has demonstrated it can compete at the highest level possible, indeed might be able to win the College Cup.  And a team that has not beaten or tied any top 50 team is going to be in an un-preferred position, since it hasn't demonstrated directly that it can compete even at the top 50 level.

As further justification for the formula heavily weighting results against highly ranked teams, consider the following chart based on six years' data:


This chart shows the percentage of games that top 50 teams lose or tie, by rank.  The blue line represents the actual data.  The black line is a computer generated logarithmic trend line based on the data.  The chart demonstrates that with an opponent at the high end of the rankings (ranked #1 or 2, for example) it is much more difficult to win or tie a game than it is even at nearby poorer ranking levels.

And consider, in addition, the following chart also based on six years' data:


This chart shows the average number of top 50 opponents per year that teams play, by rank.  The pink line represents the actual data.  The black line is a computer generated logarithmic trend line based on the data.  The chart shows that, especially at the high end of the rankings, higher ranked teams play more games against top 50 teams than more poorly ranked teams.

Thus not only do the most highly ranked teams lose or tie considerably fewer games than even teams nearby in the rankings, but they also accomplish that against more difficult competition.  This justifies the Results Against Top 50 Teams formula's giving very high weights to good results against very highly ranked teams, even in comparison to nearby ranked teams.

Results Over the Last Eight Games.  The purpose of this criterion is not clear.  If one actually gives value to all of the results from a team's last eight games, a good portion of those games already have received value under the Results Against Top 50 Teams criterion.  One could disregard those games and give value -- either positive or negative -- for other last eight games results, but this seems to me an unlikely purpose for the criterion.  My conclusion is that the purpose of this criterion is somewhat the converse of the purpose of the Results Against Teams Already Selected criterion:  The Last Eight Games criterion, I believe, is to deal with poor results, in contrast to the Results Against Teams Already Selected criterion's dealing with good results.

However, programming to evaluate only a team's last eight games is difficult because of differences in teams' schedules.  I therefore use a surrogate for this criterion: poor results over the entire season.  To score this criterion, a team receives a negative score for each poor result -- a tie or a loss -- against each opponent ranked #56 or poorer.  (I chose #56 a few years ago, when no team with a rank of #56 or poorer had received an at large selection.  Currently, using the 2015 ARPI for ratings, the #56 rank level would be replaced by #58.  The difference is small enough that I haven't changed the scoring system.)  For each such poor result, the negative points assigned are as follows:

Tie v opponent ranked #56 to #100:  -1
Loss v opponent ranked #56 to #100:  -2

Tie v #101 to #150:  -3
Loss v #101 to #150:  -4

Tie v #151 to #200:  -5
Loss v #151 to #200:  -6

Tie v #201 or poorer:  -7
Loss v #201 or poorer  -8

IDENTIFYING THE PATTERNS

With the above criteria and scoring systems, I then have done two things (for the RPI-related criteria, this is based on the 2015 formula):

First, for each individual criterion, I looked to see if some teams who met or exceeded a particular level of the criterion, or conversely met or fell below a particular level, either always received an at large selection or never received an at large selection.  As an example, the data show that teams with ARPIs of 0.5987 or better always have received at large selections.  Conversely, teams with ARPIs of 0.5702 or poorer never have received at large selections.  Thus I treat the former as a pattern such that, if a team meets it, it is expected to receive an at large selection.  And the latter is a pattern such that, if a team meets it, it is expected to not receive an at large selection.

Second, I looked at the criteria in pairs.  Since there are 13 individual criteria, if I pair them all (13 items taken 2 at a time), this means looking at 88 pairs.  For each pair, I apply a formula to get a combined value for the two criteria, with the criteria weighted so each contributes 50% of the combined value.  An example of such a pairing is a team's conference's Average Conference ARPI Rank and the team's own ARPI rank.  If a team's value for the ARPI Rank and Conference ARPI Rank pair is 43 or less, then the team always has received an at large selection.  Conversely, if the team's value is 77.4 or more, then the team never has received an at large selection.

Note: for some criteria and paired criteria, a higher value is better; and for others, a lower value is better.

After going through the above process, the table below shows the patterns I've identified for teams always having received and always having been denied an at large selection to the NCAA Tournament.  In the table, the "Code" column is simply something I use for data organization purposes, so you can ignore it.  The "Factor" column lists the individual criteria and the paired criteria.  The "Factor" column also, for the paired criteria, shows the formula I apply to compute the value for the pair with the two criteria weighted at 50% each. The "At Large" column shows the historic pattern for always having received an At Large selection.  The "No At Large" column shows the historic pattern for never having received an at large selection.  In the "<= or >=" column, the first indicator in the column applies to the At Large pattern and the second indicator applies to the No At Large pattern-- thus for the ARPI, a team always has received an at large selection if its ARPI has been >= (greater than or equal) to 0.5987 and it always has been denied an at large selection if its ARPI has been <= (less than or equal) to 0.5702.

When I apply these patterns to the data over the last 10 years, they produce 94% of the Committee's at large selections.  For the other 6% -- typically 1 selection per year, but ranging as high as 6 and with an average of 2 per year -- there is a group of teams, ranging from 2 teams at the low end to 6 at the high end, that meet both no "yes" patterns and no "no" patterns.  The last at large selections came from these teams.


2017 Bracket Formation AL Standards






















Comments