Brief Hiatus

Just dropping a quick note for readers out there, Diamond Charts is working diligently to upgrade its capabilities and will resume SABRlog posts in June. We appreciate your interest.

All the best,
DC

pWHIP & the Efficient Frontier: Using HITf/x to improve baseball’s greatest pitching metric

No matter the number of new pitching metrics invented by sabermetricians today, it seems increasingly more difficult to supplant WHIP as one of the (if not THE) premier pitching metrics.  Since 1979, WHIP has withstood the test of time as front office personnel continue to this day to use Okrent’s fantasy baseball-designed statistic.  As saberists, we think, contemplate, and collaborate to create statistics more representative of a pitcher’s abilities.  However, none seem to supplant the simplistic WHIP.

Our community has witnessed excellence with the creation of elegant metrics like xFIP and tERA, which strive to eliminate external factors from a pitcher’s performance.  The biggest hurdle we try to overcome lies in the eyes of the general fan.  It is unfortunate that ESPN, MLB.tv, FOX, and regional networks haven’t picked up on these and other greats like wOBA and RE24.

In lieu of trying to create an entirely new metric, we’ll use Okrent’s WHIP as the framework for our evolutionary pWHIP.  This will be a bit of a process, so hang tight.

We’ll start by building off of the philosophy that we ought to strive to measure players’ performances and not their production.  To summarize, I’ve identified the distinction here:

Performance – the execution of an action. In baseball, a hitter’s performance is the sum of all actions within his control. It’s how he performs.

Production – something produced; output. That is, a hitter’s production is the outcome of his performance. It’s what he produces.

More specifically, a hitter’s performance should be measured by what he can control within the batter’s box (basically, how hard he hits the ball and at what launch angle). His production is the outcome of the play (i.e. where the ball and he end up).  Many times, the production does not appropriately define the performance (example 1 & example 2).  Therefore, I theorize that we ought to measure the performance and ignore the production.  A quick note: later, this will get a bit complicated as we need to formulate a baseline to measure performance.  In order to do so, we’ll use the large sample of production to define performances.  Let’s continue.

The benefits of WHIP:

  1. Simplicity
  2. Widely known/used/adopted

The drawbacks of WHIP:

  1. Measures a pitcher’s performance based off of a hitter’s production
  2. For ball in-play, uses a binary form of measurement (either a hit or an out)
  3. Defense dependent
  4. Doesn’t account for errors

We’ve chosen to use WHIP as the framework for the education of our performance vs. production philosophy due to its primary benefits.  And during the development that ensues, we will address each of the drawbacks 1 – 4.

Addressing Drawback #1 – Measure Performance Not Production

As most know, WHIP is defined as: [BB + H] / IP.  A pitcher can most certainly control walks (does he place the ball in the zone or not?), so let’s turn to hits.  Let’s ask ourselves, “what are hits?”  Hits are the outcome of a hitter’s performance.

Wait a minute… we’re currently measuring a pitcher’s performance based off the outcome of a hitter’s performance?  I submit that we ought to measure a pitcher’s performance based off the inverse of the hitter’s performance.  After all, a hitter strives for “timing” as a pitcher attempts to “disrupt his timing.”  So, how shall we measure hitters’ performances? We’ll take a look at which hitter performances typically lead to which outcomes.  That is, let’s inventory the exit velocities and launch angles to determine which are most likely to lead to outs and which lead to hits (using April 2009 MLB HITf/x data).

First, let’s take a look at the location of outs (red).

Screen Shot 2013-04-29 at 2.32.58 AM

To help understand the data, it may help to refer to the following.

Screen Shot 2013-04-29 at 2.33.09 AM

And now, hits in green.

Screen Shot 2013-04-29 at 2.33.28 AM

And again for illustration.

Screen Shot 2013-04-29 at 2.33.38 AM

Let’s now have some fun. We’ll overlay hits on top of outs and fit a curve to the leading edge of the data set (the mathematics behind the curve will be explained at a later date).  To steal a term from economics, we’ll call this the “Efficient Frontier” of hitting.

Screen Shot 2013-04-29 at 2.33.49 AM

In baseball terms, a hitter strives to display “timing” during his performance by “barreling” the ball.  If successful, he will obtain optimal exit velocity and launch angle; and performing as such will inherently lead to the production of more hits.

To summarize where we are, the plot above shows how similar performances can lead to different outcomes.  Therefore, we ought to measure hitters’ and pitchers’ abilities based off of how they perform and not what they produce.  That is, we should observe the exit velocity and launch angle to determine a hitter’s performance, and in turn, a pitcher’s performance.  A staple in many NCAA programs, teams measure this on their QAB (quality at bat) chart as a “hard hit ball.”  Within MLB organizations, they have the ability to measure this using HITf/x.

Now we understand we must measure the performance, but what happens when two similar performances lead to different outcomes?  This leads us to the addressing of our second drawback of WHIP.

Addressing Drawback #2 – Measure Performance in Scalars not Binary

(Note: this is the identical approach I took to measuring a hitter’s pOBA.)  To do so, I’ve broken down the exit velocities (0 – 120 mph) into 4 mph increments.  Similarly, I’ve divided the launch angle (-70 to +90 degrees) into 3 degree segments.  This gives us a 40×40 matrix or 1,600 “bins” (many of which are empty) to define any given performance.  As similar performances lie within any given “bin,” we’ll average their outcomes for use for future reference when measuring performance.  With fewer than 5,000 AB’s analyzed, the resolution is marginal with multiple outliers.  Given Sportvision’s larger data set, we could provide a significantly more impressive model to formulate pWHIP.  The plot below displays mean WHIP within each bin.  (Black bins are almost certainly hits, fading to lighter grey for less likely hits.)  Focus on the core of the “Efficient Frontier” as the hitter’s goal of high performance; and so, a pitcher will strive to achieve high performance away from this core.

Screen Shot 2013-04-29 at 2.33.58 AM

We have now succeeded in eliminating the judgment of a hitter’s (and thus a pitcher’s) performance by binary means.  It is now a sliding scale dependent upon exit velocity and launch angle.  To explain using an example, a bloop single off a pitcher is no longer treated as a hit (a “1″).  It is treated as 0.10 of a hit (assuming 1 in every 10 bloopers fall in for hits).  We’ll call this HLV (hitting look-up value). Similarly, a crushing line drive that gets snared by a 3B is no longer an out (or “0″).  The HLV here is 0.80 (assuming 8 of every similarly crushed balls go for hits).  Thus, we have a table of 1,600 HLV’s where each HLV bin corresponds to a specific 3 mph exit velocity and 4 degree launch angle.  Temporarily, our new metric stands as:

pWHIP = [BB + HLV] / IP

Addressing Drawback #3 – Defense Independent

This methodology has now removed a pitcher’s defense and substituted it with the larger sample of the mean of all MLB defenders.  Therefore, ornate adjustments are not necessary.  This metric measures a pitcher’s performance regardless of his own defense!

Addressing Drawback #4 – Accounting for ROE’s

One small hiccup in the simplicity of WHIP, is that it doesn’t account for ROE’s.  That is, if a hitter reaches base on an error, a pitcher is not credited with an out.  In essence, he “did his job” but factors external to his control forced him to move forward in the inning without receiving credit for the out.  To utilize our model to its full potential, we must account for errors to measure a pitcher’s abilities.  After all, errors do happen.  A pitcher who strikes out fewer hitters, must rely on his defense more.

This is a bit tricky; I hope to provide clarity here.  In generating our large data set LUT (look-up table), we’ll include ROE’s by hitters as “hits.”  That is, all we truly care about (again, when creating the large sample set) is whether an exit velocity/launch angle combination led to a hitter reaching base or not.  This will allow us to account for all balls put in-play while associating each 3 mph x 4 degree bin with an HLV.  Because we have now included ROE’s in our metric, we must expand pWHIP slightly.

pWHIP = [BB + HLV] / [IP + ROE/3]

where HLV is the associated value from our exit velocity/launch angle LUT.

Summary

WHIP is an outstanding metric, however it falls short by discretely measuring individual performances by their individual outcomes.  A broken bat, bloop single or 7-hop ground ball through the left side are unfortunate outcomes of an otherwise high pitching performance.  Measure a performance by the performance, not the outcome.  This should be the new standard.

Future Work

We could expand into the third dimension and include spray angle, however that discussion is for another day.  Clearly hitters have distinct sprays and, thus, can control this angle to a certain extent, however I contend a hitter cannot control his spray angle enough to create hits and avoid outs.  Next week, I’ll compare WHIP & pWHIP for all qualified pitchers during the April 2009 data set.

Performance vs. Production

Special thanks to Harry Pavlidis (@HarryPav) for supplying me with the HITf/x data (April ’09 public release) earlier tonight. As I’m new to the world of practicing sabermetrics, I appreciated his introductory article.

Growing up when at the plate, I was taught by my Dad and other coaches to “find a way to get it done.” If a bloop single scored the run, then I succeeded. It wasn’t until I became a coach myself that my perspective changed (slightly). I began coaching five years ago and was introduced to the concept “Performance vs. Production.” Defined below:

Performance – the execution of an action. In baseball, a hitter’s performance is the sum of all actions within his control. It’s how he performs.

Production – something produced; output. That is, a hitter’s production is the outcome of his performance. It’s what he produces.

I’ve appreciated the increasing desire for sabermetricians to understand a player’s value without external factors. Tailored statistics such as DIPS, FIP, and tERA have provided us with the ability to analyze pitchers without influence from the other 8 players on the field. Even without context such as the out/base state or Tango’s leverage index, these provide outstanding value.

The “Performance vs. Production” philosophy teaches us to focus on what we can control. A hitter who crushed a ball with a 100+ mph exit velocity at a +10 degree angle (from horizontal) may not get out of the box as the third baseman snares the line drive. In the old days, this scenario would have me believe that I did nothing but fail. However, I now realize how the 100+ mph hitter above actually performed at an extraordinarily high level, yet his production fell to some bad luck. When analyzing how a hitter performs, we must temporarily remove the outcome and focus on what the hitter can control. Fundamentally, three basic factors describe a hitter’s performance: the exit velocity off his bat, the launch angle, and the spray angle.

In developing this philosophy, I’ll make the sweeping assumption that a hitter cannot control his spray angle. (Yes, our eyes and spray charts tell us that certain players like Ortiz & Bonds predominantly pull the baseball.) However at the macro scale, most “good” hitters hit the ball where it’s pitched. That is, a hitter keeps his hands inside the baseball on an inside pitch to barrel it to his pull side while outside pitches are allowed to get deeper and are taken to the oppo field. On the micro level, hitters are taught to keep their bat through the hitting zone longer; once his hands have started and his bat travels through the zone, his bat spray angle continuously changes with no ability to make an adjustment without severely sacrificing bat speed.

Therefore, we’ll focus on the two factors a hitter most certainly can control: ball exit velocity and launch angle. Traditionally these are recognized as “bat speed” and “timing,” respectively. A quick analysis of the available HITf/x data shows a distribution of each of these factors below.

hit_fx_exit_velo_hist

Notice the average at bat produced a ball exit velocity of 82.3 mph at a launch angle 13.3 degrees above horizontal.

hit_fx_launch_angle_hist

Now plotting every batted ball’s launch angle/exit velocity combination, we observe the following scatter plot.

hit_fx_scatter

This is where things get interesting (and fun). We can now view the two factors within a hitter’s control in one location. However, we need to find a way to measure it. Long term, increased performance will lead to increased production. So let’s measure the performance. Tango’s wOBA is awesome, however it focuses on a hitter’s production. A hitter can’t control how an opposing player fields his batted ball, so why should we measure his value based off of these factors? A hitter has virtually no control over reaching on errors, and I would argue (at the MLB level) he has very little control over HBPs. [I'm a long time HBP advocate for players as long as they are properly taught how to protect themselves (maybe a future teaching post). Yes, he controls where he stands in the box and how he reacts to an inside pitch, but in general, HBPs are controlled by the pitcher.] Even singles, doubles, and triples possess a large defensive bias.

Therefore, let’s remove luck, chance, and defensive alignment/skill and evaluate a hitter’s performance. To do so, we’ll take a look at what exit velocity and launch angle combinations will lead to expected outcomes. I’ve broken down the two factors into 40 “bins.” Specifically, I broke exit velocity into 4 mph increments and launch angle into 3 degree increments. Therefore, the 40×40 matrix provides 1,600 bins (many are empty, grey). Within each bin, a certain number of batted balls were tracked and marked with a specific outcome (e.g. out, single, HR, etc.) I then averaged the wOBA within each 4 mph x 3 degree bin. The heat map below displays the outcome of that analysis. Red & yellow denote high wOBA (HRs & XBH’s); teal/green depict medium wOBA (think singles); blue represents poor wOBA (outs). Optimal hitting occurs approximately between 18-40 degrees in excess of 95 mph.

HITfx_heat_map

Due to the limited data set (< 5,000 AB’s), the resolution is grainy, at best. I’m imagining Sportvision’s database by now would provide quite an elegant heat map. Given the opportunity to analyze realtime HITf/x data, I would measure a hitter’s performance using Tango’s wOBA as a foundation. However, in lieu of assigning the linear weight associated with the outcome of a hitter’s AB, let’s analyze the exit velocity and launch angle, then use a LUT (look up table, with significantly higher resolution) to find the average linear weight associated with all balls hit within that bin. Accounting for all hits, outs, and ROEs then averaging will enable us to substantially reduce performance increase/suppression due to factors external to the hitter. This allows us to utilize the large data set of the broad population to analyze an individual hitter’s performance. Maybe we call it pOBA (Performance On-Base Average).

We could then make an adjustment for baserunning speed/ability using FIELDf/x. For balls hit to the outfield, we could then easily perform park adjustments, if desired. After all, in the infield, home to 1st is 88′ – 9″ regardless of the stadium in which one plays. Pair our newly created context independent pOBA statistic with my longtime favorite RE24, and you’ve got an incredible analysis for MLB hitters.

Game Theory: Down Angle

Let’s talk managerial strategy. Imagine runners on 2nd & 3rd with 1 out and the defense is “playing in.” As a baserunner at 3rd, how are you instructed to react on a groundball? Most likely, you’re taught to “see it through the infield.” But should you? Let’s evaluate the decision using some numbers.

Before we begin, we need to understand a concept called “run expectancy” (RE). Given any base/out situation, the RE is the average number of runs scored from that moment until the end of that inning. This can be found empirically (off historical data) or through simulations (using Markov chains). Tom Tango founded this idea using MLB data more than a decade ago. We’ve followed his lead using the empirical approach with NCAA data since the adoption of BBCOR bat restrictions.

Back to our situation: 2nd & 3rd with 1 out. The NCAA RE in this situation is 1.562 runs, but that’s not so important. What we want to evaluate is the RE after this play is complete. Depending on the baserunning decision our final RE may be different. To keep the complexity of this evaluation to a minimum, we need to establish a few assumptions.

Assumption 1. Only one error may be made on a play, and any error made does not advance baserunners any further than the base they already exist. (Think throwing error caught by 1B that pulls him off the bag.)

Assumption 2. The 3rd base coach has clairvoyance in deciding the fate of the runner at 2nd (and 2nd only). In other words, on a single, he’s always perfect in his decision on whether or not to send the runner home when rounding 3rd. If he’ll be out, he holds him at 3rd. If he’ll score, he waves him home.

Assumption 3. The hitter will never end up past further than 1st base. Clearly there are several scenarios in which this could occur, however to eliminate complexity, we’ll stick with this as our final assumption.

Now we can move to our decision. As described earlier, we have two options. First, we have the traditional “see it through the infield” choice. Second, we have our “down angle” philosophy. Each decision has it’s related outcomes.

Option 1. Runners freeze until the ball is through the infield, then advance. Given a routine groundball, this option has four basic possible outcomes A – D.

A – Hitter grounds out. Runners remain at 2nd & 3rd. 0 runs scored. 2 outs.

B – Hitter grounds to a fielder who makes an error. Runners remain at 2nd & 3rd, and the hitter is safe at 1st. 0 runs scored. 1 out.

C – Hitter singles through the infield. Because they froze to see the ball through, runners move up one base each. Hitter is safe at 1st. Baserunners now at 1st & 3rd. 1 run scored. 1 out.

D – Hitter singles through the infield. Although they froze, both runners ended up scoring. Hitter is safe at 1st. 2 runs scored. 1 out.

Option 2. Runners move on down angle contact. That is, they advance upon a downward launch angle regardless of the spray angle. Given a routine groundball, this option has five potential outcomes E – J.

E – Hitter grounds into a FC as a fielder throws home to tag the runner out. Runners now at 1st & 3rd. 0 runs scored. 2 outs.

F – Hitter grounds to a fielder who makes an error. Runners now at 1st & 3rd. 1 run scored. 1 out.

G – Hitter singles through the infield. Although they were moving on contact, only one runner scored. Hitter is safe at 1st. Baserunners now at 1st & 3rd. 1 run scored. 1 out.

H – Hitter singles through the infield. Because they were moving on contact, both runners were able to score. Hitter is safe at 1st. 2 runs scored. 1 out.

J – Hitter grounds to a fielder who is unable to make the play at home, but can make the play at 1st. Runner from 2nd advances to 3rd. 1 run scored. 2 outs.

Assigning Probabilities:

Now that we’ve identified the two options and their corresponding potential outcomes, it’s time to apply some probabilities of occurrence for each specific outcome. Then, we can complete the decisional analysis. Our justification:

1. The average NCAA fielding percentages on all plays is 0.965 (thus, the error in decision #1 has a 3.5% chance of occurring). However, the play at home is unique and rare. Therefore, we’ll assume that an error here occurs twice as often, so a 7.0% chance of occurrence. (Runner has a lead, is off on contact, and it’s a tag play at home. Many more variables than a standard 6-3 putout.)

2. Given the ball is hit on the ground with the infield in, the ball will get through for a single 40% (estimated) of the time.

3. Of all singles with 1 out, a runner on 2nd will score on 47.6% of them when a runner reacts normally (doesn’t freeze to see the ball through the infield). When freezing, the runner on 2nd will score approximately 25% of the time.

4. 10% of the groundballs will result in a fielder being unable to make a play at home, but will be able to get the out at 1st (think diving stop, but not enough time to throw home).

Analyzing the traditional “see it through the infield” decision:

Decision Outcome Bases Occupied Runs Scored Outs Remaining RE of End State Runs + RE Probability of Occurance e-Value
1 A _23 0 2 0.684 0.684 0.665 0.45
1 B 123 0 1 1.792 1.792 0.035 0.06
1 C 1_3 1 1 1.357 2.357 0.300 0.71
1 D 1__ 2 1 0.630 2.630 0.100 0.26
1.49

 
And now the “down angle” method:

Decision Outcome Bases Occupied Runs Scored Outs Remaining RE of End State Runs + RE Probability of Occurance e-Value
2 E 1_3 0 2 0.611 0.611 0.430 0.26
2 F 1_3 1 1 1.357 2.357 0.070 0.16
2 G 1_3 1 1 1.357 2.357 0.210 0.49
2 H 1__ 2 1 0.630 2.630 0.190 0.50
2 J __3 1 2 0.449 1.449 0.100 0.14
1.57

With the runs, outs, RE’s, and probabilities identified, we can compute an expected value, simply (Runs + RE) * probability. Without getting too technical, the final e-value is our new RE the moment the ball leaves the hitter’s bat on a negative launch angle. As we find a value for each decision, we want to continually choose the higher (assuming risk neutrality) to generate more runs over time. Thus, long term, the down angle philosophy will increase RE by 0.08 runs. Next week, we’ll provide a sensitivity analysis to determine how dependent our findings are to the various assigned probabilities.

Remember, you can’t evaluate a decision based on it’s outcome. Separate the two. In summary, go down angle… and go home.

RBI Hitting Approach: Should it Change?

Having had a lengthy discussion with the three other coaches on our collegiate staff regarding a hitter’s approach with RISP, I thought I’d make an attempt to analyze how professional hitters might alter their approach in RBI situations. Specifically, we’re focusing on runners at 3rd or 2nd/3rd with less than 2 outs. We have eliminated all situations in which a runner resides on 1st since a coach/manager may elect to employ a strategic play (e.g. a hit & run). We have also eliminated all bunt attempts from this study. We truly want to evaluate the approach a hitter may take without external factors such as managerial strategy.

With a proven, established approach already in place in our college program, a new perspective provided the suggestion to be more aggressive and “eat the RBI early.” That is, a hitter in an RBI situation ought to look for something he can hit early in the count. This philosophy to “look for something you can hit versus drive” may allow one to avoid more strikeouts, but will it lead to success?

First, let’s establish the “standard” approach. Seems to me, the simplest situation is 0 on, 0 out. A hitter’s goal is to simply get on base. With 500,011 MLB PA’s from 2002-2012, this is clearly enough data to establish our standard. (Other situations such as 0 on, 1 out could be considered, however certain nuances of the game could affect the P/PA. For example, a 1-pitch AB by the leadoff hitter may force the second hitter of the inning to take a pitch to avoid the quick inning.) Nevertheless, we find 3.8175 P/PA with 0 on and 0 outs.  In case you’re curious, for hitters with at least 20 PA’s, I’ve plotted the leadoff hitting approach below.

ppa_0on_0out

Now we’ll compare to our RBI situations. See the table below for the base states and the corresponding P/PA for each of the population as a whole.

Outs Bases Occupied P/PA
< 2 3rd 3.8175
< 2 2nd & 3rd 3.8255

Such small differences in P/PA appear to provide no statistical difference from the P/PA in our standard situation. However, to finally answer the systemic question, we need to create a hypothesis of the overlapping data and test the statistical significance. To do so, we could do old school calculations using Student’s t-test, or we can use modern day software to simply tell us that these two distributions are actually statistically the same (that is, with 95% confidence, they are derived from identical means).  However, let’s take a look at how individual hitters (with at least 30 PA’s) alter their approach from leadoff to RBI situations.ppa_diff

The plot above shows how some individual hitters will, in fact, alter their approach in RBI situations.  Keep in mind that pitchers are taught to pitch less to contact with RISP.  That is, they will attempt to execute pitches with more selectivity.  Typically, they are taught one (or more) of three philosophies in potential RBI situations:

  1. work the “lower U” of the strike zone
  2. attempt to achieve pitch height AND width (i.e. hit the corners of the zone)
  3. go to offspeed, typically down, and especially if it’s a swing and miss pitch (resulting in more pitches out of the zone)

A quick look at PITCHf/x data tells us that a pitcher will throw strikes in the zone approximately 52% of the time to a leadoff hitter and just 45% of the time in our noted RBI situations.  Thus, a causal increase will undoubtedly occur in the hitter’s P/PA in these situations.  Next week, we’ll take a look at the RBI success rate as a function of the altered approach. Stay tuned…

 

Baserunning 101: Home to 1st is not 90′

Again, home to 1st is not 90′.

It’s 88′-9″.

20130417-124443.jpg

When baseball was created, the 90 feet we all know was defined as the distance to the baselines. However, 1st (and 3rd) base are situated on/inside the 90 degree angle that the baselines create. Now, you may be asking yourself, “who cares?” In reality, nobody should really care about the actual distance from home to 1st (it really is 88′-9″ from the back corner of home plate to the FRONT of 1st base… 90′ baseline minus a 15″ base).

The key here, is the word “front.” When telling yourself, or teaching another, to run to first base, encourage him/her to hit the front part of the bag. How so? Well this is something that can’t quite be explicitly taught. Even sabermetrics won’t help us here (yet). What you can do, however, is get out of the box and run hard. Then, with three strides remaining, make a small adjustment to slightly lengthen or shorten your strides to ensure you hit the front (not middle, not back) part of the bag. This adjustment requires athleticism and that “feel” for the game we all look for.

Too many times (from little league to the pros), we’ve witnessed players hit the back half of the base rather than the front. Sometimes, even, they won’t hit it at all.

20130417-125936.jpg
Picture courtesy of the NY Times

Focus on the little things to improve your game.

The Creation of ePPA: Estimating P/PA Given Count Information

With the wealth of information available at the MLB level, P/PA (pitches per plate appearance) data can be easily queried through your favorite database. (I currently use a combination of MySQL, MATLAB, and R for all of our analytics here at Diamond Charts.) However, at the collegiate level, detailed P/PA data must be derived from the pitch count. Sounds simple, but don’t forget that two-strike foul balls aren’t accounted for in a standard play-by-play. Therefore, in order to provide a slightly more accurate P/PA estimation, we’ve created ePPA (estimated P/PA) to account for those foul balls that occur with two strikes. Methodology is below.

Assumption: NCAA hitters foul off as many 2-strike pitches as MLB hitters. (Later we can turn this assumption into a hypothesis and test its validity.)

Utilizing MLB PBP data from 2002-2012, we find 1.003M PA’s with two strikes prior to the action pitch. During these PA’s we find an actual P/PA of 5.11 while observing a count-based P/PA (simply, balls + strikes + 1) of 4.62. The plot below displays the distribution of total strikes encompassing all 1.003M 2-strike MLB PA’s over the past decade. As we can see, approximately 70% of PA’s with 2 strikes have 2 total strikes (that is, they have 0 fouls on 2 strike counts). However an adjustment clearly remains necessary for the other 30%. In case you’re curious, the most number of total strikes seen in one PA in the past decade occurred in 2004 in Los Angeles as Alex Cora of the Dodgers homered off Matt Clement of the Chicago Cubs in an 18 pitch at-bat.

2strikes_hist

One step further:
In order to more accurately find an ePPA that approximates more closely to the actual P/PA, we have broken these numbers down into the four possible 2-strike counts. The table below displays the actual P/PA with the count based P/PA (cPPA = balls + strikes + 1) and the difference between them.

balls_ct P/PA cPPA diff
0 3.243 3 0.243
1 4.370 4 0.370
2 5.533 5 0.533
3 6.747 6 0.747

The chart makes sense, the more balls/pitches a hitter sees, the more opportunities he has to foul off a 2-strike pitch. Thus, for every 2-strike PA, we add the differential factor based upon the number of balls in the count, providing a slightly more accurate representation of ePPA (ePPA = cPPA + diff). Because P/PA is a cumulative statistic (one that should be analyzed over a large data set), these simplistic approximations will suffice.

Effect:
2-strike counts make up approximately half of all plate appearances, thus providing an overall addition of approximately 0.24 P/PA for each hitter (a bit more for more patient hitters, a bit less for the aggressive). Thinking differently, on the order of 10 foul balls/game (per team) occur with two strikes.

Hello to the Sabermetric Community

Hello to all baseball fans, coaches, players, front office personnel, and sabermetricians. Having launched Diamond Charts just a few months ago, we are at the infant stages of our dive into the sabermetric world. We’ve created this sabrlog as a fun outlet to document our endeavor. We encourage discussion and hope you won’t hesitate to point out flaws, whether in methodology or even simple oversights.

As former collegiate players and a current coach, we enjoy teaching the game. Thus, this sabrlog may also include a sprinkling of (hopefully) interesting aspects related to the fundamentals of baseball. Our current focus is on the collegiate game, so many of our articles will focus on some aspect of it. However we’ll often use MLB data to “learn from the pros” and make inferences to improve the college game. We’ll strive to provide high quality articles (in lieu of high quantity) that are both on point, yet deep enough to provide an accurate analysis. We hope you enjoy.

All the best,
Diamond Charts

[Twitter @DiamondCharts]