No matter the number of new pitching metrics invented by sabermetricians today, it seems increasingly more difficult to supplant WHIP as one of the (if not THE) premier pitching metrics. Since 1979, WHIP has withstood the test of time as front office personnel continue to this day to use Okrent’s fantasy baseball-designed statistic. As saberists, we think, contemplate, and collaborate to create statistics more representative of a pitcher’s abilities. However, none seem to supplant the simplistic WHIP.
Our community has witnessed excellence with the creation of elegant metrics like xFIP and tERA, which strive to eliminate external factors from a pitcher’s performance. The biggest hurdle we try to overcome lies in the eyes of the general fan. It is unfortunate that ESPN, MLB.tv, FOX, and regional networks haven’t picked up on these and other greats like wOBA and RE24.
In lieu of trying to create an entirely new metric, we’ll use Okrent’s WHIP as the framework for our evolutionary pWHIP. This will be a bit of a process, so hang tight.
We’ll start by building off of the philosophy that we ought to strive to measure players’ performances and not their production. To summarize, I’ve identified the distinction here:
Performance – the execution of an action. In baseball, a hitter’s performance is the sum of all actions within his control. It’s how he performs.
Production – something produced; output. That is, a hitter’s production is the outcome of his performance. It’s what he produces.
More specifically, a hitter’s performance should be measured by what he can control within the batter’s box (basically, how hard he hits the ball and at what launch angle). His production is the outcome of the play (i.e. where the ball and he end up). Many times, the production does not appropriately define the performance (example 1 & example 2). Therefore, I theorize that we ought to measure the performance and ignore the production. A quick note: later, this will get a bit complicated as we need to formulate a baseline to measure performance. In order to do so, we’ll use the large sample of production to define performances. Let’s continue.
The benefits of WHIP:
- Widely known/used/adopted
The drawbacks of WHIP:
- Measures a pitcher’s performance based off of a hitter’s production
- For ball in-play, uses a binary form of measurement (either a hit or an out)
- Defense dependent
- Doesn’t account for errors
We’ve chosen to use WHIP as the framework for the education of our performance vs. production philosophy due to its primary benefits. And during the development that ensues, we will address each of the drawbacks 1 – 4.
Addressing Drawback #1 – Measure Performance Not Production
As most know, WHIP is defined as: [BB + H] / IP. A pitcher can most certainly control walks (does he place the ball in the zone or not?), so let’s turn to hits. Let’s ask ourselves, “what are hits?” Hits are the outcome of a hitter’s performance.
Wait a minute… we’re currently measuring a pitcher’s performance based off the outcome of a hitter’s performance? I submit that we ought to measure a pitcher’s performance based off the inverse of the hitter’s performance. After all, a hitter strives for “timing” as a pitcher attempts to “disrupt his timing.” So, how shall we measure hitters’ performances? We’ll take a look at which hitter performances typically lead to which outcomes. That is, let’s inventory the exit velocities and launch angles to determine which are most likely to lead to outs and which lead to hits (using April 2009 MLB HITf/x data).
First, let’s take a look at the location of outs (red).
To help understand the data, it may help to refer to the following.
And now, hits in green.
And again for illustration.
Let’s now have some fun. We’ll overlay hits on top of outs and fit a curve to the leading edge of the data set (the mathematics behind the curve will be explained at a later date). To steal a term from economics, we’ll call this the “Efficient Frontier” of hitting.
In baseball terms, a hitter strives to display “timing” during his performance by “barreling” the ball. If successful, he will obtain optimal exit velocity and launch angle; and performing as such will inherently lead to the production of more hits.
To summarize where we are, the plot above shows how similar performances can lead to different outcomes. Therefore, we ought to measure hitters’ and pitchers’ abilities based off of how they perform and not what they produce. That is, we should observe the exit velocity and launch angle to determine a hitter’s performance, and in turn, a pitcher’s performance. A staple in many NCAA programs, teams measure this on their QAB (quality at bat) chart as a “hard hit ball.” Within MLB organizations, they have the ability to measure this using HITf/x.
Now we understand we must measure the performance, but what happens when two similar performances lead to different outcomes? This leads us to the addressing of our second drawback of WHIP.
Addressing Drawback #2 – Measure Performance in Scalars not Binary
(Note: this is the identical approach I took to measuring a hitter’s pOBA.) To do so, I’ve broken down the exit velocities (0 – 120 mph) into 4 mph increments. Similarly, I’ve divided the launch angle (-70 to +90 degrees) into 3 degree segments. This gives us a 40×40 matrix or 1,600 “bins” (many of which are empty) to define any given performance. As similar performances lie within any given “bin,” we’ll average their outcomes for use for future reference when measuring performance. With fewer than 5,000 AB’s analyzed, the resolution is marginal with multiple outliers. Given Sportvision’s larger data set, we could provide a significantly more impressive model to formulate pWHIP. The plot below displays mean WHIP within each bin. (Black bins are almost certainly hits, fading to lighter grey for less likely hits.) Focus on the core of the “Efficient Frontier” as the hitter’s goal of high performance; and so, a pitcher will strive to achieve high performance away from this core.
We have now succeeded in eliminating the judgment of a hitter’s (and thus a pitcher’s) performance by binary means. It is now a sliding scale dependent upon exit velocity and launch angle. To explain using an example, a bloop single off a pitcher is no longer treated as a hit (a “1″). It is treated as 0.10 of a hit (assuming 1 in every 10 bloopers fall in for hits). We’ll call this HLV (hitting look-up value). Similarly, a crushing line drive that gets snared by a 3B is no longer an out (or “0″). The HLV here is 0.80 (assuming 8 of every similarly crushed balls go for hits). Thus, we have a table of 1,600 HLV’s where each HLV bin corresponds to a specific 3 mph exit velocity and 4 degree launch angle. Temporarily, our new metric stands as:
pWHIP = [BB + HLV] / IP
Addressing Drawback #3 – Defense Independent
This methodology has now removed a pitcher’s defense and substituted it with the larger sample of the mean of all MLB defenders. Therefore, ornate adjustments are not necessary. This metric measures a pitcher’s performance regardless of his own defense!
Addressing Drawback #4 – Accounting for ROE’s
One small hiccup in the simplicity of WHIP, is that it doesn’t account for ROE’s. That is, if a hitter reaches base on an error, a pitcher is not credited with an out. In essence, he “did his job” but factors external to his control forced him to move forward in the inning without receiving credit for the out. To utilize our model to its full potential, we must account for errors to measure a pitcher’s abilities. After all, errors do happen. A pitcher who strikes out fewer hitters, must rely on his defense more.
This is a bit tricky; I hope to provide clarity here. In generating our large data set LUT (look-up table), we’ll include ROE’s by hitters as “hits.” That is, all we truly care about (again, when creating the large sample set) is whether an exit velocity/launch angle combination led to a hitter reaching base or not. This will allow us to account for all balls put in-play while associating each 3 mph x 4 degree bin with an HLV. Because we have now included ROE’s in our metric, we must expand pWHIP slightly.
pWHIP = [BB + HLV] / [IP + ROE/3]
where HLV is the associated value from our exit velocity/launch angle LUT.
WHIP is an outstanding metric, however it falls short by discretely measuring individual performances by their individual outcomes. A broken bat, bloop single or 7-hop ground ball through the left side are unfortunate outcomes of an otherwise high pitching performance. Measure a performance by the performance, not the outcome. This should be the new standard.
We could expand into the third dimension and include spray angle, however that discussion is for another day. Clearly hitters have distinct sprays and, thus, can control this angle to a certain extent, however I contend a hitter cannot control his spray angle enough to create hits and avoid outs. Next week, I’ll compare WHIP & pWHIP for all qualified pitchers during the April 2009 data set.