Anyone can look at a stat page or box scores or a website and easily find the hitters today or from days past that are the elite, the ones who always hit .320, or always hit 40 homeruns, or always have .550 slugging percentages, for example. But harder to figure out is why some players can have stat lines of .283/.388/.557 with 35 HR and 101 RBI one year, and then put up .235/.396/.463 with 23/58 numbers the next (thank you 2005-2006 Morgan Ensberg for the example). Or why some players will be phenomanal for one season and then mediocre the next two, and then be great again, but then regress back to more pedestrian numbers.
In the past four or five years, the statistic of Batting Average on Balls in Play (or BABIP) has been looked at as a key indicator of consistent (or nonconsistent) offensive performance. Essentially, the definition of BABIP is just what it sounds like, it is the batting average of every ball a batter puts into “play” on the field. So, home runs, and strikeouts are not counted in this stat. The formula is (H-HR)/(AB-K-HR). Foul balls that are “play”able are also counted. So, a foul grounder is not included, but a pop foul caught by the third baseman is. A BABIP of about .288 to .290 is considered average according to most sources I have found. The Hardball Times and Baseball Prospectus both have good definitions and explanations or you can just click the Stats Glossary link above.
The thing I like about BABIP is that it is not dependent upon any other statistic other than making contact with the ball. It can be looked at before you measure anything else about the at bat. Everything from RBIs to doubles to runs scored to stolen bases happen after BABIP. It’s like saying, “let’s see how good you are at producing by putting the ball in play, then we’ll talk.”
Anyway, here are the top 20 in BABIP for the National League in 2006 taken from The Hardball Times:
Looking closer at this top 20 list from ‘06 list, I see eleven that are producing at about the level you would expect this year (Cabrera, Holliday, Hawpe, Utley, Ramirez, Gonzalez, Bay, Lo Duca, Taveras, Freel, Tracy), one who is extremely over-producing (Jenkins), and one who is one the DL (Roberts). From this, you have seven left who are legitimately struggling or having an “off-year” at least by their past accomplishments. A few can be attributed to minor injury, but I want to examine that group.
Freddy Sanchez - 2006 numbers - .344/.378/.473:
Sanchez was injured to start this year, causing him to miss a few games the first week. But, since then, what has happened to last year’s batting champ? His numbers this year only show a .262/.298/.311 line, plus a BABIP of only .308. This case is actually one of the easiest to decipher. For a line drive hitter who never hits homeruns, all of his hits are in play. And when your BABIP is 80 to 90 points above league average, a lot of regression should be expected the next year.
Ryan Howard - .313/.425/.659:
You don’t see many big masher types on the top 20 list for BABIP. In fact, Holliday is about the closest you can come to a similar power hitter. The reason? So many of their hits come from homeruns and they hit so many fly balls that their line drive ratios are smaller, leading to less hits when they don’t hit homeruns. To have a 50-homerun hitter have a BABIP of .363 is rare (Howard is the only one in at least four years), and, combining his injury with a BABIP of .264, Howard is coming back down to earth in 2007.
David Wright - .311/.381/.531:
Wright is an interesting case study, because in 2006 his BABIP was an off-the-charts .350, but his line drive percentage (LD%) was 19.5%, just above average. In 2005, Wright’s BABIP was still a great .343, but his LD% was 25.4, third in the majors. This year, Wright has settled into a merely really good .318 BABIP and 22.7 LD%. Without a phenomenal number in either one of those categories this year, he has settled into .262/.358/.404 numbers. Is it possible we are seeing the real David Wright, or at least something closer to it?
Jamey Carroll - .300/.377/.404:
This is your prototypical example of an average to below-average player having a career year and then falling back down again. The cause? Has to be his high BABIP of .342 in ‘06. In 2005, his line was .251/333/.284 with a BABIP of .307, and so far in 2007, he is looking at .183/.309/.207 with a BABIP of .231.
Garrett Atkins - .329/.409/.556
I don’t know what this means, but there are five Rockies in the top 30 for BABIP in 2006, and they would hold three of the top eight spots this year, but Taveras is a few at bats shy of qualifying. Dropping off the top of the list this year is Atkins. Remember, 2006 was Atkins’ first full season in MLB, so basing his talent level on that one season would be foolish. So far this year, Atkins is at .243/.329/.360. That will undoubtedly improve, as will his .279 BABIP, partly due to the Coors factor, but let’s temper our expectations of him until we see some more playing time.
Rafael Furcal - .300/.369/.445
2006 was a career year for Furcal, with career highs in average, slugging pct., hits, HR, RBI, walks, and Runs Created. This can be attributed to two things: severe spikes in BABIP and average with RISP, both well above career norms. BABIP for Furcal last year was .335, and average with runners in scoring position was an outstanding .344. In 2007, while also fighting the injury bug, Furcal has only managed .254/.328/.314 with a .286 BABIP. It will be interesting to see how he performs once truly healthy.
Ryan Zimmerman - .287/.351/.471
While a lot of RBI opportunities may be missing in 2007 with Soriano and Nick Johnson missing from the Nat’s lineup, BABIP is reflective only upon what the individual does. So when a brand new rookie tears up the league with a .329 BABIP, you have to take that with a grain of salt, no matter if it’s Soriano and Johnson or Ronnie Belliard and Dmitri Young in the lineup. With pitchers undoubtedly being careful with him, Zimmerman has only put up .250/.301/.355 numbers so far in ‘07 with a .284 BABIP, below average for the league.
Of the eleven mentioned above as having years consistent with what you might expect, only Lo Duca at .293 has a BABIP that has dropped below .310 (20 points above average) this year. If they have not already done so, the eleven players on my list are developing into consistent hitters in terms of BABIP. Even the younger players of the eleven, such as Ramirez or Taveras, are showing marked improvement in the category. We may be on our way to seeing a trend that consistent production in terms of BABIP leads to overall, year-to-year consistent offensive production. Fluctuations can lead to severe up- or downswings by a hitter no matter what they have done in the past.
Another stat I mentioned throughout the descriptions of the seven struggling players was Line Drive Percentage (LD%). Essentially this is the percentage of your balls in play that are line drives. Typically, 75% of line drives end up as hits. When you start comparing the seven “strugglers’” BABIP and LD%, you get some interesting results. Here is the same list as before, but now with their LD% for 2006:
Except for Jason Bay, all of these players in 2006 were right around or above the traditional average LD% (in a paper entitled The Effect of Batted Ball Types on Balls in Play, JC Bradbury found that the mean LD% was 17.8% in 2004…but in this forum, someone named Krishna108 claims to have calculated a mean of 20.9% for 2002-2006…THT allows you to sort through LD% for the past four years here). That in itself is anticlimactic, but looking at just the seven “strugglers” from this year tells a unique story.
Not only are all seven of them in the top 20 in the NL for BABIP for 2006, but six of the seven were in the top 26 in NL LD% for the same season (Wright checked in at #40).
So if you are looking at the definition of seven lucky seasons, this might fit the bill - being in the top 26 in both BABIP and LD% for any given season. Regression to the mean is expected in both statistics, and thus one can expect a drop in production, whether it be slight or massive.
Who are some candidates to watch as we go forward this year and look into next? Watch for regression from these possible candidates (with their 2007 BABIP so far): Derrek Lee (.469), Aaron Rowand (.385), Ryan Theriot (.371), and Russel Martin (.368).
______________________
All stats are as of May 13, 2007.
Filed under: Hitting, MLB, Random, Sabermetrics, Stats


Almost immediately after I posted this, I had an uneasy feeling. I reported that the seven strugglers from this year also appeared in the top 26 of 2006’s LD%. And they did. I also said that being part of both of those lists suggested good luck and that they were due for a crash.
But, what I left out was that eight out of the remaining 13 from the BABIP list also appeared in that same top 26 who are not struggling this year (Cabrera, Gonzalez, Jenkins, Hawpe, Lo Duca, Tracy, Freel, and Holliday). So I felt like I fudged the numbers a little bit, and the illustration of players who perform over-their-heads well in both categories being doomed for failure ends up being a poor one.
So, I went back and looked at the strugglers and saw that they, on average, have played in 452.14 games or 2.8 seasons. The other eight who are doing well have played in an average of 576.13 games or 3.6 seasons, almost a full season more than the others. So I thought perhaps experience has something to do with it.
But then something else hit me. The differences have to be a result of their LD% for this year.
You see, besides David Wright, not only have the strugglers failed to live up to their previous gaudy BABIP numbers, but they have failed to match their 2006 AND career averages in line drive percentage. On the other hand, the eight who are performing at expected levels are right in line with their career numbers, half of them even putting up better numbers than in the past.
Less line drives lead to less balls in play that have a high percentage of being hits (percentages of hits from GB and FB are very low), which leads to lesser numbers in all rate and quantitative stats, which leads to a struggling year.
So, all of this to say something that should be very simple. When evaluating players, whether it be for fantasy or fun or your future role as a GM, look at current rates of BABIP and LD% and compare them to a player’s career numbers. Do they seem way too high? Well, then they are due for a little bit of a fall. Do they seem too low based on past production? Maybe a hot streak is in the near future. The pattern I am starting to see is that you can’t have one without the other.
This is a very interesting article to me. I’ve been wondering what happened to Garrett Atkins this year and this explains it a little.
I am curious to know what Derek Lee’s numbers have been over the past four years (keeping in mind the wrist injury from last year) have been. Just to see what really to expect, becuase I feel like he’s been consistent in almost all production since his days in Florida.
I do agree with Theriot and Rowand falling off. Theriot because I really don’t think he’s this good and there have been no indicators of him being this good. Rowand because he’s not as young as he used to be and I just expect his production to continue to fall.
The one I’m not sure of is Martin. His numbers last year and the strong start this year indicate that this might not be a fluke. I do think it’s too early in his career and we don’t have a big enough sample size to tell, but I expect good things from him for quite a few years.
Derrek Lee’s career BABIP from 1998 to 2007: .281, .280, .325, .331, .333, .305, .306, .349 (2005), .333 (2006, injury), .469. His career BABIP is .325, which is great. The reason I point out Lee is because .469 is almost 200 points higher than league average! So some regression is bound to come….but think of it this way: it may come in the form of more homers, but with less batting average and doubles (17 doubles to 2 HR is quite the anomoly so far this year).
His career LD% is 22.1% and he is at 27% so far in 2007, three points higher than he has ever had.
You’re right about Russell Martin. He just does not have a big enough sample size to really judge yet; barely 600 career plate appearances.
I agree that Lee’s numbers were starting out above average this year, but I just wondered just how far it was above his average, which is a lot, but even if it falls to his avg that’s not too shabby. Thanks for that.
Wonderful post, r-dawg….the logic and the stats are nice…..hmmm, you’re almost getting me interested in fantasy baseball….
Another interesting way to look at this is to reverse it and look at pitchers. How many hits do they give up on balls put into play. There seems to be no consistent rate of hits/ball in play for pitchers, which somewhat explains the inconsistency in pitching. In baseball the rule is that everything evens out over a season. Someone makes a great play on a great hit of yours and later on in the season an outfielder will misjudge a routine play and you will get that hit back. What it really seems like is that everything evens out over a career and not a season. Regression to the mean is common among hitters and pitchers. So the pitchers who strike out the most batters and who walk the least batters and who give up the least home runs will be the best pitchers year in and year out. Everything else is more about luck.