pWHIP & the Efficient Frontier: Using HITf/x to improve baseball’s greatest pitching metric

No matter the number of new pitching metrics invented by sabermetricians today, it seems increasingly more difficult to supplant WHIP as one of the (if not THE) premier pitching metrics.  Since 1979, WHIP has withstood the test of time as front office personnel continue to this day to use Okrent’s fantasy baseball-designed statistic.  As saberists, we think, contemplate, and collaborate to create statistics more representative of a pitcher’s abilities.  However, none seem to supplant the simplistic WHIP.

Our community has witnessed excellence with the creation of elegant metrics like xFIP and tERA, which strive to eliminate external factors from a pitcher’s performance.  The biggest hurdle we try to overcome lies in the eyes of the general fan.  It is unfortunate that ESPN, MLB.tv, FOX, and regional networks haven’t picked up on these and other greats like wOBA and RE24.

In lieu of trying to create an entirely new metric, we’ll use Okrent’s WHIP as the framework for our evolutionary pWHIP.  This will be a bit of a process, so hang tight.

We’ll start by building off of the philosophy that we ought to strive to measure players’ performances and not their production.  To summarize, I’ve identified the distinction here:

Performance – the execution of an action. In baseball, a hitter’s performance is the sum of all actions within his control. It’s how he performs.

Production – something produced; output. That is, a hitter’s production is the outcome of his performance. It’s what he produces.

More specifically, a hitter’s performance should be measured by what he can control within the batter’s box (basically, how hard he hits the ball and at what launch angle). His production is the outcome of the play (i.e. where the ball and he end up).  Many times, the production does not appropriately define the performance (example 1 & example 2).  Therefore, I theorize that we ought to measure the performance and ignore the production.  A quick note: later, this will get a bit complicated as we need to formulate a baseline to measure performance.  In order to do so, we’ll use the large sample of production to define performances.  Let’s continue.

The benefits of WHIP:

  1. Simplicity
  2. Widely known/used/adopted

The drawbacks of WHIP:

  1. Measures a pitcher’s performance based off of a hitter’s production
  2. For ball in-play, uses a binary form of measurement (either a hit or an out)
  3. Defense dependent
  4. Doesn’t account for errors

We’ve chosen to use WHIP as the framework for the education of our performance vs. production philosophy due to its primary benefits.  And during the development that ensues, we will address each of the drawbacks 1 – 4.

Addressing Drawback #1 – Measure Performance Not Production

As most know, WHIP is defined as: [BB + H] / IP.  A pitcher can most certainly control walks (does he place the ball in the zone or not?), so let’s turn to hits.  Let’s ask ourselves, “what are hits?”  Hits are the outcome of a hitter’s performance.

Wait a minute… we’re currently measuring a pitcher’s performance based off the outcome of a hitter’s performance?  I submit that we ought to measure a pitcher’s performance based off the inverse of the hitter’s performance.  After all, a hitter strives for “timing” as a pitcher attempts to “disrupt his timing.”  So, how shall we measure hitters’ performances? We’ll take a look at which hitter performances typically lead to which outcomes.  That is, let’s inventory the exit velocities and launch angles to determine which are most likely to lead to outs and which lead to hits (using April 2009 MLB HITf/x data).

First, let’s take a look at the location of outs (red).

Screen Shot 2013-04-29 at 2.32.58 AM

To help understand the data, it may help to refer to the following.

Screen Shot 2013-04-29 at 2.33.09 AM

And now, hits in green.

Screen Shot 2013-04-29 at 2.33.28 AM

And again for illustration.

Screen Shot 2013-04-29 at 2.33.38 AM

Let’s now have some fun. We’ll overlay hits on top of outs and fit a curve to the leading edge of the data set (the mathematics behind the curve will be explained at a later date).  To steal a term from economics, we’ll call this the “Efficient Frontier” of hitting.

Screen Shot 2013-04-29 at 2.33.49 AM

In baseball terms, a hitter strives to display “timing” during his performance by “barreling” the ball.  If successful, he will obtain optimal exit velocity and launch angle; and performing as such will inherently lead to the production of more hits.

To summarize where we are, the plot above shows how similar performances can lead to different outcomes.  Therefore, we ought to measure hitters’ and pitchers’ abilities based off of how they perform and not what they produce.  That is, we should observe the exit velocity and launch angle to determine a hitter’s performance, and in turn, a pitcher’s performance.  A staple in many NCAA programs, teams measure this on their QAB (quality at bat) chart as a “hard hit ball.”  Within MLB organizations, they have the ability to measure this using HITf/x.

Now we understand we must measure the performance, but what happens when two similar performances lead to different outcomes?  This leads us to the addressing of our second drawback of WHIP.

Addressing Drawback #2 – Measure Performance in Scalars not Binary

(Note: this is the identical approach I took to measuring a hitter’s pOBA.)  To do so, I’ve broken down the exit velocities (0 – 120 mph) into 4 mph increments.  Similarly, I’ve divided the launch angle (-70 to +90 degrees) into 3 degree segments.  This gives us a 40×40 matrix or 1,600 “bins” (many of which are empty) to define any given performance.  As similar performances lie within any given “bin,” we’ll average their outcomes for use for future reference when measuring performance.  With fewer than 5,000 AB’s analyzed, the resolution is marginal with multiple outliers.  Given Sportvision’s larger data set, we could provide a significantly more impressive model to formulate pWHIP.  The plot below displays mean WHIP within each bin.  (Black bins are almost certainly hits, fading to lighter grey for less likely hits.)  Focus on the core of the “Efficient Frontier” as the hitter’s goal of high performance; and so, a pitcher will strive to achieve high performance away from this core.

Screen Shot 2013-04-29 at 2.33.58 AM

We have now succeeded in eliminating the judgment of a hitter’s (and thus a pitcher’s) performance by binary means.  It is now a sliding scale dependent upon exit velocity and launch angle.  To explain using an example, a bloop single off a pitcher is no longer treated as a hit (a “1″).  It is treated as 0.10 of a hit (assuming 1 in every 10 bloopers fall in for hits).  We’ll call this HLV (hitting look-up value). Similarly, a crushing line drive that gets snared by a 3B is no longer an out (or “0″).  The HLV here is 0.80 (assuming 8 of every similarly crushed balls go for hits).  Thus, we have a table of 1,600 HLV’s where each HLV bin corresponds to a specific 3 mph exit velocity and 4 degree launch angle.  Temporarily, our new metric stands as:

pWHIP = [BB + HLV] / IP

Addressing Drawback #3 – Defense Independent

This methodology has now removed a pitcher’s defense and substituted it with the larger sample of the mean of all MLB defenders.  Therefore, ornate adjustments are not necessary.  This metric measures a pitcher’s performance regardless of his own defense!

Addressing Drawback #4 – Accounting for ROE’s

One small hiccup in the simplicity of WHIP, is that it doesn’t account for ROE’s.  That is, if a hitter reaches base on an error, a pitcher is not credited with an out.  In essence, he “did his job” but factors external to his control forced him to move forward in the inning without receiving credit for the out.  To utilize our model to its full potential, we must account for errors to measure a pitcher’s abilities.  After all, errors do happen.  A pitcher who strikes out fewer hitters, must rely on his defense more.

This is a bit tricky; I hope to provide clarity here.  In generating our large data set LUT (look-up table), we’ll include ROE’s by hitters as “hits.”  That is, all we truly care about (again, when creating the large sample set) is whether an exit velocity/launch angle combination led to a hitter reaching base or not.  This will allow us to account for all balls put in-play while associating each 3 mph x 4 degree bin with an HLV.  Because we have now included ROE’s in our metric, we must expand pWHIP slightly.

pWHIP = [BB + HLV] / [IP + ROE/3]

where HLV is the associated value from our exit velocity/launch angle LUT.

Summary

WHIP is an outstanding metric, however it falls short by discretely measuring individual performances by their individual outcomes.  A broken bat, bloop single or 7-hop ground ball through the left side are unfortunate outcomes of an otherwise high pitching performance.  Measure a performance by the performance, not the outcome.  This should be the new standard.

Future Work

We could expand into the third dimension and include spray angle, however that discussion is for another day.  Clearly hitters have distinct sprays and, thus, can control this angle to a certain extent, however I contend a hitter cannot control his spray angle enough to create hits and avoid outs.  Next week, I’ll compare WHIP & pWHIP for all qualified pitchers during the April 2009 data set.

Performance vs. Production

Special thanks to Harry Pavlidis (@HarryPav) for supplying me with the HITf/x data (April ’09 public release) earlier tonight. As I’m new to the world of practicing sabermetrics, I appreciated his introductory article.

Growing up when at the plate, I was taught by my Dad and other coaches to “find a way to get it done.” If a bloop single scored the run, then I succeeded. It wasn’t until I became a coach myself that my perspective changed (slightly). I began coaching five years ago and was introduced to the concept “Performance vs. Production.” Defined below:

Performance – the execution of an action. In baseball, a hitter’s performance is the sum of all actions within his control. It’s how he performs.

Production – something produced; output. That is, a hitter’s production is the outcome of his performance. It’s what he produces.

I’ve appreciated the increasing desire for sabermetricians to understand a player’s value without external factors. Tailored statistics such as DIPS, FIP, and tERA have provided us with the ability to analyze pitchers without influence from the other 8 players on the field. Even without context such as the out/base state or Tango’s leverage index, these provide outstanding value.

The “Performance vs. Production” philosophy teaches us to focus on what we can control. A hitter who crushed a ball with a 100+ mph exit velocity at a +10 degree angle (from horizontal) may not get out of the box as the third baseman snares the line drive. In the old days, this scenario would have me believe that I did nothing but fail. However, I now realize how the 100+ mph hitter above actually performed at an extraordinarily high level, yet his production fell to some bad luck. When analyzing how a hitter performs, we must temporarily remove the outcome and focus on what the hitter can control. Fundamentally, three basic factors describe a hitter’s performance: the exit velocity off his bat, the launch angle, and the spray angle.

In developing this philosophy, I’ll make the sweeping assumption that a hitter cannot control his spray angle. (Yes, our eyes and spray charts tell us that certain players like Ortiz & Bonds predominantly pull the baseball.) However at the macro scale, most “good” hitters hit the ball where it’s pitched. That is, a hitter keeps his hands inside the baseball on an inside pitch to barrel it to his pull side while outside pitches are allowed to get deeper and are taken to the oppo field. On the micro level, hitters are taught to keep their bat through the hitting zone longer; once his hands have started and his bat travels through the zone, his bat spray angle continuously changes with no ability to make an adjustment without severely sacrificing bat speed.

Therefore, we’ll focus on the two factors a hitter most certainly can control: ball exit velocity and launch angle. Traditionally these are recognized as “bat speed” and “timing,” respectively. A quick analysis of the available HITf/x data shows a distribution of each of these factors below.

hit_fx_exit_velo_hist

Notice the average at bat produced a ball exit velocity of 82.3 mph at a launch angle 13.3 degrees above horizontal.

hit_fx_launch_angle_hist

Now plotting every batted ball’s launch angle/exit velocity combination, we observe the following scatter plot.

hit_fx_scatter

This is where things get interesting (and fun). We can now view the two factors within a hitter’s control in one location. However, we need to find a way to measure it. Long term, increased performance will lead to increased production. So let’s measure the performance. Tango’s wOBA is awesome, however it focuses on a hitter’s production. A hitter can’t control how an opposing player fields his batted ball, so why should we measure his value based off of these factors? A hitter has virtually no control over reaching on errors, and I would argue (at the MLB level) he has very little control over HBPs. [I'm a long time HBP advocate for players as long as they are properly taught how to protect themselves (maybe a future teaching post). Yes, he controls where he stands in the box and how he reacts to an inside pitch, but in general, HBPs are controlled by the pitcher.] Even singles, doubles, and triples possess a large defensive bias.

Therefore, let’s remove luck, chance, and defensive alignment/skill and evaluate a hitter’s performance. To do so, we’ll take a look at what exit velocity and launch angle combinations will lead to expected outcomes. I’ve broken down the two factors into 40 “bins.” Specifically, I broke exit velocity into 4 mph increments and launch angle into 3 degree increments. Therefore, the 40×40 matrix provides 1,600 bins (many are empty, grey). Within each bin, a certain number of batted balls were tracked and marked with a specific outcome (e.g. out, single, HR, etc.) I then averaged the wOBA within each 4 mph x 3 degree bin. The heat map below displays the outcome of that analysis. Red & yellow denote high wOBA (HRs & XBH’s); teal/green depict medium wOBA (think singles); blue represents poor wOBA (outs). Optimal hitting occurs approximately between 18-40 degrees in excess of 95 mph.

HITfx_heat_map

Due to the limited data set (< 5,000 AB’s), the resolution is grainy, at best. I’m imagining Sportvision’s database by now would provide quite an elegant heat map. Given the opportunity to analyze realtime HITf/x data, I would measure a hitter’s performance using Tango’s wOBA as a foundation. However, in lieu of assigning the linear weight associated with the outcome of a hitter’s AB, let’s analyze the exit velocity and launch angle, then use a LUT (look up table, with significantly higher resolution) to find the average linear weight associated with all balls hit within that bin. Accounting for all hits, outs, and ROEs then averaging will enable us to substantially reduce performance increase/suppression due to factors external to the hitter. This allows us to utilize the large data set of the broad population to analyze an individual hitter’s performance. Maybe we call it pOBA (Performance On-Base Average).

We could then make an adjustment for baserunning speed/ability using FIELDf/x. For balls hit to the outfield, we could then easily perform park adjustments, if desired. After all, in the infield, home to 1st is 88′ – 9″ regardless of the stadium in which one plays. Pair our newly created context independent pOBA statistic with my longtime favorite RE24, and you’ve got an incredible analysis for MLB hitters.

The Creation of ePPA: Estimating P/PA Given Count Information

With the wealth of information available at the MLB level, P/PA (pitches per plate appearance) data can be easily queried through your favorite database. (I currently use a combination of MySQL, MATLAB, and R for all of our analytics here at Diamond Charts.) However, at the collegiate level, detailed P/PA data must be derived from the pitch count. Sounds simple, but don’t forget that two-strike foul balls aren’t accounted for in a standard play-by-play. Therefore, in order to provide a slightly more accurate P/PA estimation, we’ve created ePPA (estimated P/PA) to account for those foul balls that occur with two strikes. Methodology is below.

Assumption: NCAA hitters foul off as many 2-strike pitches as MLB hitters. (Later we can turn this assumption into a hypothesis and test its validity.)

Utilizing MLB PBP data from 2002-2012, we find 1.003M PA’s with two strikes prior to the action pitch. During these PA’s we find an actual P/PA of 5.11 while observing a count-based P/PA (simply, balls + strikes + 1) of 4.62. The plot below displays the distribution of total strikes encompassing all 1.003M 2-strike MLB PA’s over the past decade. As we can see, approximately 70% of PA’s with 2 strikes have 2 total strikes (that is, they have 0 fouls on 2 strike counts). However an adjustment clearly remains necessary for the other 30%. In case you’re curious, the most number of total strikes seen in one PA in the past decade occurred in 2004 in Los Angeles as Alex Cora of the Dodgers homered off Matt Clement of the Chicago Cubs in an 18 pitch at-bat.

2strikes_hist

One step further:
In order to more accurately find an ePPA that approximates more closely to the actual P/PA, we have broken these numbers down into the four possible 2-strike counts. The table below displays the actual P/PA with the count based P/PA (cPPA = balls + strikes + 1) and the difference between them.

balls_ct P/PA cPPA diff
0 3.243 3 0.243
1 4.370 4 0.370
2 5.533 5 0.533
3 6.747 6 0.747

The chart makes sense, the more balls/pitches a hitter sees, the more opportunities he has to foul off a 2-strike pitch. Thus, for every 2-strike PA, we add the differential factor based upon the number of balls in the count, providing a slightly more accurate representation of ePPA (ePPA = cPPA + diff). Because P/PA is a cumulative statistic (one that should be analyzed over a large data set), these simplistic approximations will suffice.

Effect:
2-strike counts make up approximately half of all plate appearances, thus providing an overall addition of approximately 0.24 P/PA for each hitter (a bit more for more patient hitters, a bit less for the aggressive). Thinking differently, on the order of 10 foul balls/game (per team) occur with two strikes.