General SABR | Diamond Charts' SABRlog

Special thanks to Harry Pavlidis (@HarryPav) for supplying me with the HITf/x data (April ’09 public release) earlier tonight. As I’m new to the world of practicing sabermetrics, I appreciated his introductory article.

Growing up when at the plate, I was taught by my Dad and other coaches to “find a way to get it done.” If a bloop single scored the run, then I succeeded. It wasn’t until I became a coach myself that my perspective changed (slightly). I began coaching five years ago and was introduced to the concept “Performance vs. Production.” Defined below:

Performance – the execution of an action. In baseball, a hitter’s performance is the sum of all actions within his control. It’s how he performs.

Production – something produced; output. That is, a hitter’s production is the outcome of his performance. It’s what he produces.

I’ve appreciated the increasing desire for sabermetricians to understand a player’s value without external factors. Tailored statistics such as DIPS, FIP, and tERA have provided us with the ability to analyze pitchers without influence from the other 8 players on the field. Even without context such as the out/base state or Tango’s leverage index, these provide outstanding value.

The “Performance vs. Production” philosophy teaches us to focus on what we can control. A hitter who crushed a ball with a 100+ mph exit velocity at a +10 degree angle (from horizontal) may not get out of the box as the third baseman snares the line drive. In the old days, this scenario would have me believe that I did nothing but fail. However, I now realize how the 100+ mph hitter above actually performed at an extraordinarily high level, yet his production fell to some bad luck. When analyzing how a hitter performs, we must temporarily remove the outcome and focus on what the hitter can control. Fundamentally, three basic factors describe a hitter’s performance: the exit velocity off his bat, the launch angle, and the spray angle.

In developing this philosophy, I’ll make the sweeping assumption that a hitter cannot control his spray angle. (Yes, our eyes and spray charts tell us that certain players like Ortiz & Bonds predominantly pull the baseball.) However at the macro scale, most “good” hitters hit the ball where it’s pitched. That is, a hitter keeps his hands inside the baseball on an inside pitch to barrel it to his pull side while outside pitches are allowed to get deeper and are taken to the oppo field. On the micro level, hitters are taught to keep their bat through the hitting zone longer; once his hands have started and his bat travels through the zone, his bat spray angle continuously changes with no ability to make an adjustment without severely sacrificing bat speed.

Therefore, we’ll focus on the two factors a hitter most certainly can control: ball exit velocity and launch angle. Traditionally these are recognized as “bat speed” and “timing,” respectively. A quick analysis of the available HITf/x data shows a distribution of each of these factors below.

Notice the average at bat produced a ball exit velocity of 82.3 mph at a launch angle 13.3 degrees above horizontal.

Now plotting every batted ball’s launch angle/exit velocity combination, we observe the following scatter plot.

This is where things get interesting (and fun). We can now view the two factors within a hitter’s control in one location. However, we need to find a way to measure it. Long term, increased performance will lead to increased production. So let’s measure the performance. Tango’s wOBA is awesome, however it focuses on a hitter’s production. A hitter can’t control how an opposing player fields his batted ball, so why should we measure his value based off of these factors? A hitter has virtually no control over reaching on errors, and I would argue (at the MLB level) he has very little control over HBPs. [I'm a long time HBP advocate for players as long as they are properly taught how to protect themselves (maybe a future teaching post). Yes, he controls where he stands in the box and how he reacts to an inside pitch, but in general, HBPs are controlled by the pitcher.] Even singles, doubles, and triples possess a large defensive bias.

Therefore, let’s remove luck, chance, and defensive alignment/skill and evaluate a hitter’s performance. To do so, we’ll take a look at what exit velocity and launch angle combinations will lead to expected outcomes. I’ve broken down the two factors into 40 “bins.” Specifically, I broke exit velocity into 4 mph increments and launch angle into 3 degree increments. Therefore, the 40×40 matrix provides 1,600 bins (many are empty, grey). Within each bin, a certain number of batted balls were tracked and marked with a specific outcome (e.g. out, single, HR, etc.) I then averaged the wOBA within each 4 mph x 3 degree bin. The heat map below displays the outcome of that analysis. Red & yellow denote high wOBA (HRs & XBH’s); teal/green depict medium wOBA (think singles); blue represents poor wOBA (outs). Optimal hitting occurs approximately between 18-40 degrees in excess of 95 mph.

Due to the limited data set (< 5,000 AB’s), the resolution is grainy, at best. I’m imagining Sportvision’s database by now would provide quite an elegant heat map. Given the opportunity to analyze realtime HITf/x data, I would measure a hitter’s performance using Tango’s wOBA as a foundation. However, in lieu of assigning the linear weight associated with the outcome of a hitter’s AB, let’s analyze the exit velocity and launch angle, then use a LUT (look up table, with significantly higher resolution) to find the average linear weight associated with all balls hit within that bin. Accounting for all hits, outs, and ROEs then averaging will enable us to substantially reduce performance increase/suppression due to factors external to the hitter. This allows us to utilize the large data set of the broad population to analyze an individual hitter’s performance. Maybe we call it pOBA (Performance On-Base Average).

We could then make an adjustment for baserunning speed/ability using FIELDf/x. For balls hit to the outfield, we could then easily perform park adjustments, if desired. After all, in the infield, home to 1st is 88′ – 9″ regardless of the stadium in which one plays. Pair our newly created context independent pOBA statistic with my longtime favorite RE24, and you’ve got an incredible analysis for MLB hitters.

With the wealth of information available at the MLB level, P/PA (pitches per plate appearance) data can be easily queried through your favorite database. (I currently use a combination of MySQL, MATLAB, and R for all of our analytics here at Diamond Charts.) However, at the collegiate level, detailed P/PA data must be derived from the pitch count. Sounds simple, but don’t forget that two-strike foul balls aren’t accounted for in a standard play-by-play. Therefore, in order to provide a slightly more accurate P/PA estimation, we’ve created ePPA (estimated P/PA) to account for those foul balls that occur with two strikes. Methodology is below.

Assumption: NCAA hitters foul off as many 2-strike pitches as MLB hitters. (Later we can turn this assumption into a hypothesis and test its validity.)

Utilizing MLB PBP data from 2002-2012, we find 1.003M PA’s with two strikes prior to the action pitch. During these PA’s we find an actual P/PA of 5.11 while observing a count-based P/PA (simply, balls + strikes + 1) of 4.62. The plot below displays the distribution of total strikes encompassing all 1.003M 2-strike MLB PA’s over the past decade. As we can see, approximately 70% of PA’s with 2 strikes have 2 total strikes (that is, they have 0 fouls on 2 strike counts). However an adjustment clearly remains necessary for the other 30%. In case you’re curious, the most number of total strikes seen in one PA in the past decade occurred in 2004 in Los Angeles as Alex Cora of the Dodgers homered off Matt Clement of the Chicago Cubs in an 18 pitch at-bat.

One step further:
In order to more accurately find an ePPA that approximates more closely to the actual P/PA, we have broken these numbers down into the four possible 2-strike counts. The table below displays the actual P/PA with the count based P/PA (cPPA = balls + strikes + 1) and the difference between them.

balls_ct	P/PA	cPPA	diff
0	3.243	3	0.243
1	4.370	4	0.370
2	5.533	5	0.533
3	6.747	6	0.747

The chart makes sense, the more balls/pitches a hitter sees, the more opportunities he has to foul off a 2-strike pitch. Thus, for every 2-strike PA, we add the differential factor based upon the number of balls in the count, providing a slightly more accurate representation of ePPA (ePPA = cPPA + diff). Because P/PA is a cumulative statistic (one that should be analyzed over a large data set), these simplistic approximations will suffice.

Effect:
2-strike counts make up approximately half of all plate appearances, thus providing an overall addition of approximately 0.24 P/PA for each hitter (a bit more for more patient hitters, a bit less for the aggressive). Thinking differently, on the order of 10 foul balls/game (per team) occur with two strikes.

Diamond Charts' SABRlog

A collection of SABR analysis & teachings by the DC crew.

Category Archives: General SABR

Sandbox: wOBA by Pitch Location

pWHIP & the Efficient Frontier: Using HITf/x to improve baseball’s greatest pitching metric

Performance vs. Production

The Creation of ePPA: Estimating P/PA Given Count Information