Getting the Correct Data

Updated April 2024

In order for any statistical rating system to work well, it must use correct and complete data.  This page describes the NCAA's data gathering system for Division I women's soccer.  It also describes the process I use to be sure I have correct and complete data when I generate RPI ratings.  Finally, this page describes the most common ways in which data problems occur.

NCAA GAME SCORE REPORTING SYSTEM

Each year, before its first game or by September 1, whichever comes first, each Division I school is supposed to enter its season schedule into the NCAA's on-line statistics database.  The NCAA statistics database for the 2023 season is here: NCAA 2023 Division I Women's Soccer Data.  (The NCAA system is not yet set up for the 2024 season.)  This link takes you to the current day Scoreboard page for Division I women's soccer, which shows of the day's games and, when entered into the system by schools, the results.  If you click on a team's name, you will go to its schedule page and from there you can use links to find the team's roster and various team statistics.  At the top of the Scoreboard page there's a calendar icon you can use to go to the scoreboard for different days' games.  And, for each game there's a box score link if you want to see the game's box score.  In addition, if a team does not have a game on the day the Scoreboard shows but you want to go to its schedule page, you can type its name into the Team Search box on the upper right, then in the drop down menu click on the team whose schedule you want to see, and it will take you to that team's schedule page.

Once the season begins, by 10:00 pm local time on Sunday, each school is supposed to enter its game results from the preceding week into the NCAA statistics system.  Most teams enter their game results sooner than that, relatively shortly after the games are over.  The system is set up so that if two opposing schools enter inconsistent data, the system will "red flag" the inconsistency for correction.  The NCAA staff monitors the data base to be sure any inconsistencies get corrected.

The NCAA's statistics database is the source for the data the NCAA uses to compute the RPI.  From this database, the NCAA's system extracts the RPI-relevant data and stores it separately within the RPI computation system.  If the statistics database is incomplete, or if it contains errors that the NCAA staff knows of and can verify, then the staff can locate and enter the incomplete data and can correct erroneous data within the RPI system.  It is important to know that the NCAA's staff responsible for the RPI has the authority and is able to make corrections to the RPI data but does not have the authority to make changes to the statistics data.  A result of this is that one cannot fully verify that the RPI data are correct by checking the statistics data.  In some cases, the statistics data will be incorrect but the RPI data will be correct due to corrections the NCAA RPI staff has made.

The NCAA system and staff, with occasional outside help, do a good job of getting complete and correct data into the RPI system.

MY DATA GATHERING SYSTEM

In advance of each season, I use each school's on line schedule page to prepare a master schedule for the season.  I then use the All White Kit College Women's Soccer Schedule Composite Schedule page and the NCAA Women’s Soccer Statistics as my primary data sources for game results.  The All White Kit website is a complete, accurate, and current system for Division I women's soccer.  If you use the "Information" drop down menu at the upper left of each of the AWK webpages, you also can connect to a variety of pages with all sorts of useful information about teams, conferences, RPI ratings, and the recent history of Division I women's college soccer.  From several of the site's pages, you can connect to a detailed record page for each team or alternatively can use a link to go directly to the team's website schedule page.  The system also has other valuable features that avid Division I Women's Soccer fans will enjoy and that Division I women's soccer coaching staffs will find very useful.  This is a truly great resource for Division I women's college soccer fans and coaches.   I highly recommend it.

If needed, I also use the individual college athletics websites, which are the most accurate source of information.  Whenever I have doubt about a game's result, I confirm the result by going to the athletics websites of both colleges involved in the game.  I also occasionally go to the conference website for the college for which I am seeking game results.

As discussed on the "NCAA Tournament: Bracket Procedure" page, the NCAA publishes RPI rankings weekly during the regular season, starting after completion of the fifth weekend of competition.  It also publishes detailed RPI reports at the NCAA RPI Archive weekly on the same schedule.  As soon as the NCAA publishes its rankings, if my rankings differ from theirs, I compare my individual team win/loss/tie and other detailed data to theirs.  By doing this, I can identify inconsistencies between my data and the NCAA's.  Once I've identified inconsistencies, I check the college websites and determine whether I have data errors or the NCAA does.  If the errors are mine, I make corrections.  I also coordinate my corrections with the All White Kit system.  If the errors are the NCAA's, I notify the appropriate NCAA staffer indicating the documentation I have for the error.  The NCAA then can do its own independent check and make appropriate corrections to its RPI data (but, as described above, not necessarily to the game statistics data).  In past years, some of the data errors I've identified have been mine.  Those entering data into the NCAA's system, however, also make errors; and on occasion, the NCAA RPI staff does not discover the errors.  When I have identified NCAA data errors for their staff, they always have made appropriate corrections.

By going through this process, I can be certain that the data I use are the same data the NCAA uses and are correct and complete.  My process also helps the AWK system be sure its data are correct and complete.

PROBLEMS THAT CAN OCCUR IN THE NCAA'S DATA GATHERING PROCESS

There are a couple of ways data errors can creep into the NCAA's system:

1.  Someone must report game result data into the NCAA's Game by Game system.  If no one reports a game result, then the game doesn't get counted.  For every game, each of the two schools is responsible for reporting the game result and the system red-flags any inconsistent game result reports.  Both schools, however, do not always report their game results.  In addition, it appears that for some conferences, the schools rely on their conference staff to report conference tournaent games data rather than doing it themselves.  For these games, only the conference staff is entering data for the opposing teams so that the dual entry safeguard is not in effect.

2.  The system, like any data gathering system, depends on the correct reporting of game results.  These include not only win-loss-tie data, but also home-away-neutral data since the RPI formula's bonus/penalty adjustment amounts are based on whether games are at home, away, or neutral sites.  In my experience, most of the NCAA's errors are in the game location data, in particular related to early season tournaments and end-of-season conference tournaments where games are at neutral sites.  For these games, the NCAA has given explicit instructions about reporting whether games actually are home, away, or neutral.  Unfortunately, however, neutral site tournament games often have teams designated as home/away even though the games are neutral for both opponents; and occasionally those reporting the data into the NCAA's system use the home/away designations when they should be using a neutral site designation.

3.  Another issue is the correct reporting for conference tournament games that go to shootouts.  Shootout games are ties for RPI purposes, not wins/losses.  Unless each person in the reporting chain understands this, it is possible for a shootout to be reported and entered as a win by one team and a loss by the other.  This occasionally happens.

4.  Assuming the person entering the data has the correct data to enter, the person still must enter it correctly.  This does not always occur.