Sean Forman is the founder and operator of the site baseball-reference.com, one of the preeminent baseball statistics sites on the internet. He was kind enough to give me a few minutes of his time for an interview. Below is the transcript.
Charles Geier: So how labor-intensive is it to maintain a site as big as yours?
Sean Forman: Well we’ve set it up so that there is very little… generally if things go well, the site updates itself each day. So we’re not collecting any data, we buy feeds for the data, and it gets put into our database. Everything is pretty much automated. Every once and a while something will happen where we have to go in and fix it by hand, but the baseball site updates, and day-to-day operation is pretty much automated.
CG: Now the statistics feeds that you receive, do they come from some place like Elias (sport bureau)?
SF: No. there are companies out there that will sell you stats, you know, for a fee.
CG: OK. Now as far as statistics go; what do you think about statistics that are newly emergent such as VORP or win-shares? Do you think they have changed the mind of people who are very homerun, RBI, average-centric?
SF: No, in general no. I think those are largely niche stats. I think they are more telling than traditional stats, but I think it will be a long time before we see VORP in a newspaper or run in a television news broadcast. I think they have more use, but I don’t think they are going to supplant the traditional stats.
CG: Now is that in terms of a fan standpoint, or from a franchise standpoint. Do you think there are GM’s who have had their eyes opened by these stats?
SF: Oh there are definitely GMs who pay attention to this. Every team has a number-cruncher somewhere on their staff, so I think the franchises themselves are becoming aware of some of these alternative numbers, and have probably developed some of their own that we just don’t know about. But from a fan perspective, obviously guys like Bill James and the Baseball Prospectus have their audience, but its still a drop in the bucket compared to the total audience for baseball.
CG: If you had to describe a typical visitor to baseball-reference.com, what type of people would you say are typically visiting your site? Do you think its just the “I want to check something out”, or maybe more of an in-depth person, or a fantasy player?
SF: I think its all of the above. We get people who are looking for the box score of the first game they went to when they grew up, people who are settling arguments like “who was the Mets left-fielder in 1967?” and then we get the hard-core people who want to win an argument, like analyzing how someone did vs righties and lefties so they can win an argument on a board. We also have lost of media who use the site as well. We have a wide cross-section. You can kind of look at us as a utility company. We want to be the first place everybody goes when they are looking up statistical information in sports.
CG: Now having used the site myself, I feel like I am able to get a better sense of how good a particular player was, and of his career overall due to the amount of information available, do you think that a site like yours allows for certains players to be reevaluated, when you can see everything laid out, you can use the “Compare” function…etc?
SF: Yes, definitely. I think things like “Similar players”, looking at the number of times they were on the leaderboards, using the “Neutralize” feature, I think that definitely allows you to compare players more readily. I think comparing Ty Cobb to Ted Williams is going to be tough because of the eras they played in, but I think we do give you tools which allows those comparisons to take place.
CG: I was personally noticing that players who have a stellar reputation, sometimes when you compare them to someone who may not have quite as big a name, you can see that in an apples to apples comparison, a player might deserve more or less credit than they have historically been given.
SF: Yeah, I think context is incredibly important. I think people tend to underestimate the importance that context has in evaluating baseball stats. If you are going to compare Sandy Koufax and Pedro Martinez head-to-head, Koufax looks like the much better pitcher. Now if you take into account the context in which they played, e.g. run scoring, quality of offenses…etc, Martinez is , in my opinion, a far,far better pitcher than Koufax was. You can make the numbers tell you whatever you want. You get a lot of suspicious hall of fame arguments based on numbers. I think with the right balance, you can come up with a player’s true value to their team in pretty good detail.
CG: Do you think that there can be an overarching, unifying, statistic? I know its not possible to boil baseball down to just one stat what a player’s value is, but do you see with things like SABRmetrics, the emergence of more unifying statistics offensively, defensively and for pitching.
SF: To be honest, I think we’re already there. I think we are measuring 95% of what goes on in the field and what goes into winning ballgames. I tend to tire of people going back and forth trying to squeeze those last couple of percentages out of a game. I tend to think we’ve already got what we need to determine, in pretty good order, how the players should be ordered in terms of pitching and offense.
Defense is a little tougher, you know, the opportunity for a defender is different than the opportunity for a batter or a pitcher. I don’t know if we’ll ever get a great grasp on defense, just because it so hard to measure. As far as pitching and hitting, if you look at the numbers we already have, you get 95-97%of what happened on the field.
CG: Given the fact that your profession is what it is, what are your feelings when people discuss the “intangibles of a player”? People will frequently mention someone like (Derek) Jeter, who, when you break him down along many of these advanced statistical matrices, he comes out on the short end of the stick in terms of range factor, runs allowed…etc. Do you believe in the notion of the intangible element to a player, or the “clutch” factor?
SF: I believe that we are not measuring 100% of what happens. There is 3-5% of the came that we are not measuring, that it, by definition, intangible. Whether Jeter 5% intangibles are off the charts, we don’t know. I mean, the guy’s won a lot of ballgames. I think, often, we attempt to come up with an explanation for things that happen, when really there is no explanation other than random things happening. I think when you start talking about intangibles, you are speaking about things that by definition can’t be measured, so you can define them however you want.
CG: Are you a fantasy sports player yourself?
SF: I do play a little bit.
CG: And how do you do in your baseball leagues.
SF: I do OK. Before I started the site, I actually got into web design because I started doing a site called the Iowa Farm Report, and I created a means of tracking minor league prospects. When I was doing that I spent a huge amount of time tracking players and trying to predict what they were going to do, and I did fairly well in my fantasy leagues. Nowadays, I have a couple of score sheet teams and I am kind-of middle of the pack. I am not willing to invest the time necessary to win those leagues anymore. I have a family now, and other things I enjoy doing, so I am a middling player. I certainly don’t dominate by any stretch.
CG: now as far as baseballreference and the other reference sites, how would you like to improve the site, and what improvements would you like to make to make the site even better?
SF: I am in the process of redesigning baseball-reference to work on the same platform as the other reference sites. All of those sites were relaunched in the past year in order to allow some sorting features, and basically their platforms are all consistent and they have the same look. I am working on converting baseball-reference to the same format. I haven’t looked lately, but you are talking like 40-50 thousand lines of code than need to be essentially rewritten.
Beyond that, I think this next year we will probably look at trying to do some things with mobile. I got a smartphone myself within the past year, and I find myself wanting to look things up while I’m at the ballpark. So, I think something that works on a small format screen is definitely something we are going to look into. I also thnk we want to improve our subscription areas. More of the same and doing it better.
CG: I want to thank you for your time. If I have any follow up questions, may I email you again?
SF: Sure, no problem.