As much as they're confused for stats, sabermetrics are not stats. Sabermetrics are a way to use numbers to try to represent baseball in hopes of getting a better understanding for what's actually transpiring in the game.
Nobody understands sabermetrics more than Bill James. After all, the baseball historian and longtime senior adviser to the Boston Red Sox coined the term more than 35 years ago.
With the 86th annual Major League Baseball All-Star Game being played tonight at the Great American Ball Park in Cincinnati, from here on out there will probably be a whole lot of sabermetric analysis that takes place during the stretch run of the season toward the pennant race.
Here, James talks to Tech Times about everything sabermetrics, including its common misconceptions, how they cut through the core of players' value, and why there's probably an entire layer of untapped data analysis that we haven't uncovered just yet. Play ball.
When did you coin the term sabermetrics?
It was a little before 1980. It might have been '78 or something.
What would you say is the biggest misconception of sabermetrics?
Sabermetrics are not stats. In fact, sabermetrics don't really have a damn thing to do with stats. It's really a misunderstanding. Sabermetrics are related to statistics in the same way that economics is related to economic statistics and the same way that physics is related to numbers about the universe. The need for the term is obviously an ill-fated effort to make that distinction.
If you're looking at the stats for small advantages, that's not sabermetrics. If you're using the numbers to try to represent the baseball universe and trying to understand what is actually going on in the game, that's what makes sabermetrics. You're trying to use the data to answer questions that are of interest to everybody, but which there's no other way to answer.
For example, what are the characteristics of winning teams? How important is speed relative to power? How important is infield defense relative to outfield defense? Sabermetrics is using the numbers and other methods to try get a better understanding of those kinds of issues.
What I want people to understand is sabermetrics is trying to answer the same questions that everyone else is trying to answer. We're trying to learn the same things that a scout is trying to learn. We're trying to learn the same things that a general manager is trying to learn. We're trying to learn the same thing that a sports writer is trying to learn.
The numbers are a pathway toward trying to understand those same issues that everybody else is understanding and when people in my field forget that, they do so at their own peril. We're at our most effective when we're trying to understand what everyone else is trying to understand — is this player finished or is he just in a slump? Does this player have great potential or did he just happen to hit .350 last year because he was lucky?
Under stats, ESPN.com lists 'Sabermetric Batting Stats' and 'Sabermetric Pitching Stats.' Why does the association with stats stick to sabermetrics so much?
You have to understand, and in our field we have to understand, that people see the world the way they see it. Our field, like any other field of knowledge, produces a great many numbers and people looking at us from the outside tend to see the numbers that we produce and I don't object, nor am I offended by or hurt by or annoyed by people not in the field referring to stats created by our industry as being sabermetric stats. I understand why they do that. It's kind of inevitable, but the confusion is if you think those kinds of stats are analysis in and of themselves.
How do sabermetrics give you a sharper insight than merely just stats?
Let's take the defensive spectrum. The defensive spectrum is an organized alignment of the defensive positions with the shortstop on one end and first base on the other. The shortstops are fast, they tend to be younger, they need to have a good throwing arm. First basemen and left fielders tend to be much slower, they tend to be older, they don't need to throw and because there's less pressure put on them as defensive players, they're expected to hit much more.
The defensive spectrum is actually a universal concept in sports and probably relates to things other than sports. In any sport — maybe not in football because in football the positions are absolutely fixed and a player doesn't come into the league as a quarterback and exit the league as a linebacker — but in soccer or basketball, where there is some interaction between the positions, I don't know what the spectrum is in those sports, but I certainly know the same principles have got to apply.
The center in basketball is the first base ... it's a position in which speed is less emphasized and strength is more emphasized. When you're trying to sort out what is the relative value of speed versus power, it becomes useful to have a sense of what the defensive spectrum is and where a player is on that spectrum. It's a useful part of the analysis.
On offense in baseball, power is vastly more valuable than speed, but the value of speed in baseball is predominantly in the field. There's a reason why a first baseman doesn't play shortstop. He's not quick enough and the cost of having a slow shortstop would be so overwhelming that nobody would really do it. So, it's a way of seeing where the value of this skill fits into the overall composition of the game, if that makes any sense.
Absolutely. There are some sabermetrics that are flat-out complicated. Do you get that often?
Yeah, a lot of it now is over my head, but think about what that means. You and I are fairly sophisticated tech guys, right? And if it's going over our head or it's hard for us, it's just not permeating the culture at all. The things that are done that have value are things which can be regarded as output of field. Nobody hires a meteorologist because they want to know about meteorology. They want to know about the weather. In order for us to be useful, we've got to be the same way. We have to produce intelligible things rather than complicated things that people don't really understand.
Has a portion of sabermetrics gotten out of control in the sense that it's too tangled?
I wouldn't want to criticize what anyone else is doing if I don't understand it. If you look at what makes an airplane fly, I'm sure I don't understand 99 percent of it, but I still benefit from it when I need to fly. If people produce things that are so complicated that we don't understand, it doesn't necessarily mean that they're not tremendously valuable. It just means that they're not tremendously valuable yet. They might be valuable once they get incorporated into a workable scheme of the game.
When you started as senior adviser to the Red Sox in October 2002 and you look at today's game, can you give us an idea of how widespread the use of sabermetrics in the majors is right now?
In 2002, the Red Sox were sort of at the leading edge of analyzable thinking within the game. If we were at the same point now that we were at in 2002, we might be 29th among the 30 teams. It's not in any way comparable where we're at now as opposed to then.
Are there still teams that perhaps don't put as much emphasis on these numbers as others?
Certainly there are some, yes, that emphasize that type of analysis much more than others. But even those that are most behind the curve still embrace a lot of things that are produced by sabermetrics, without realizing where they come from.
The [infield/outfield] shifts come out of data analysis, they come out of our field. Even if you don't believe in that stuff, everybody shifts right now. It's not like you don't believe in it so much that you don't employ the same defense as everybody else does.
The pitch framing has become a big thing in the past three or four years and we all understand now that this is an element of the game that has been overlooked for a long time. And it's not that it wasn't there or that there was zero awareness of it. But since people in our field have documented that it's important, it's harder to ignore.
So, I don't think you'd get a team that said, 'We don't believe in that pitch-framing stuff.' They might not believe in pitch-framing data, but they are still not going to ignore a fairly well-proven stat that a catcher can help a pitcher by the way he receives the ball. Some people are ahead and some people are behind, but there is nobody who isn't part of the parade at some level.
Is there any such thing as a tell-all sabermetric when it comes to pitching and batting?
People have been developing tell-all metrics for a long time. What I always look at is what are they missing? If there's a tell-all metric that isn't missing anything, then we're done, right? They don't need us any more. I always look at those metrics like, 'What are we missing?' There's an enormous amount that's real and central to the game, but that we can't measure.
When we look at 'Fielding Independent Pitching' and 'Late-Inning Pressure Situations,' they're two very different sabermetrics, but they, more than a lot of others, seem to really cut to the core of the makeup of a player.
What we're always trying to do is see through the illusions created by the numbers and see what is underneath and real and the fielding independent pitching numbers are quite helpful in that respect because it's a systemized, organized effort to filter out the things that are in the pitcher's record which aren't real. They're not related to his skill, it's just something that happened. That's tremendously helpful and tremendously significant.
If you go back to the oldest way of looking at a pitcher — 1975 — pitchers were evaluated by win-loss records. You'd have a pitcher sometimes who might have an ERA of 4.80, but they scored a ton of runs for him and he finished 17-9. People actually thought that he was a great pitcher because he had this ability to pitch well enough and win.
In the modern world, we know that it's nonsense and they just scored a lot of runs for him. Even the dumbest guy in baseball knows that win-loss records aren't that reliable because the offense doesn't even out for people. That's a circumstance-dependent record. ERA is a circumstance-dependent record. But even if you filter out the illusions in ERA and the illusions in run support, some guys are just lucky. Fielding independent pitching stats are an effort to filter that out and to the extent that they're successful, it's tremendously useful to do that.
How does pitch count affect sabermetrics for pitchers?
There's people who will say sabermetrics are nonsense, but they don't start their pitchers and let him throw out a 150 pitches. Hard data drives out soft data and soft opinions. It's unfortunate, but it's a reality of life. If it can't be measured, people don't really pay that much attention to it. Since about 1980, it has to do with sabermetrics, but it more has to do with the computer age - we reached a point to keep track of the amount of pitches that everybody threw.
You've seen that there were many pitchers that were being worked to death at a very young age. The arrival of that information swept the game and people focused on it and used it. But I wouldn't say that the use of the pitch count has significantly reduced injuries or significantly reduced injuries to young pitchers. I don't know that there's solid proof that it's a positive contribution.
Is there another layer of unchartered territory that sabermetrics hasn't gotten to, but that could be on the horizon?
My view on the world is we have an ocean of ignorance and a small island of knowledge. You can convert areas of ignorance into areas of knowledge forever and it doesn't have that much impact because you still don't understand the world. None of us do. The world is a million more times complicated than any of us understand. We haven't done anything yet to compare with potentially what we could do.