Using "Moneyball" statistics to analyze Big West Conference baseball

Eric Stubben is a mechanical engineering senior and the Mustang News sports editor. Jacob Phillips is a mechanical engineering senior. Together, they applied sabermetric analysis to Big West Conference baseball.

“But can he get on base?”

The 2011 movie Moneyball brought the advanced baseball topics of sabermetrics and statistical analysis into the limelight. Highlighted by Oakland Athletics general manager Billy Beane’s obsession with players who could get on base, the movie portrayed Beane’s team’s unheralded rise from lovable losers to American League West Champions.

Despite sabermetrics’ popularity in professional baseball, most information on college teams is confined to traditional statistics. We set out to analyze the offensive side of Big West Conference baseball using more advanced principles.

Last season, the Cal Poly baseball team had six players drafted — including four position players. We ran those players through our metrics to determine how good each player really was.

The statistic, “Season WAR,” is explained later in the article, but it’s basically a measure of how good a player really is. As the numbers show, second baseman Mark Mathias, who was Cal Poly’s best player last season, earned his third-round draft pick.

So why did we go through all the trouble just to analyze how last year’s rather forgettable season turned out?

All of this is a trial run and justification for where we want to go next. We want to provide weekly updates of the Big West Conference’s best teams and offensive players so that we can project who will earn end-of-season accolades. We want to determine which Big West players will be drafted, and where those players will be drafted relative to each other. We want the Big West Conference baseball conversation to extend beyond scouts and distant analysts and into the realm of code, equations and projections.

Why sabermetrics?

Sabermetrics analyze more than just the basics. While basic baseball stats, including batting average, RBI and hits are simple, eye-catching statistics, sabermetrics look at the whole picture. Sure, certain players hit dozens of home runs in a season or knock in several runs, but sabermetrics undermine the logic that flashy players who jack home runs are necessarily more important than scrawny guys who wreak havoc on the base paths.

In essence, sabermetrics take a slew of data and compare each player against the league average, or “replacement level” player. In this project, the league average is based on the Big West Conference.

The nice thing about sabermetric calculations is that they all boil down to one statistic: Wins Above Replacement (WAR).

WAR is a number that measures how good a player is compared to the league average and how many theoretical wins a player adds to his team each season.

For example, a player whose talent level is at the league-average replacement level has a WAR of 1.0. A great player like Mike Trout, who finished second in last year’s American League Most Valuable Player voting, maintains an extremely high WAR. According to Baseball Reference, Trout’s WAR last season was 9.4.

How did last season look?

Looking at each team’s total WAR (the sum of each player’s WAR), we can see which teams performed the best offensively in 2015.

Based on our analysis, four of the top five offensive teams in 2015 finished the year as the top four teams in the conference: Cal State Fullerton, UC Santa Barbara, UC Irvine and Cal Poly. The lone exception was sabermetric offensive leader, UC Davis.

Looking at generic stats, it quickly becomes obvious that the Aggies were near the top of the league in nearly every offensive statistic. However, they failed miserably when it came to pitching, accounting for their 9-15 Big West Conference record and seventh-place finish.

Individually, our algorithm pans out nicely, too. Comparing the best players from our analysis to last season’s all-conference teams stacks up relatively well.

How does our algorithm work?

Disclaimer: If you are a person who has a general hatred of math and trusts us enough to accept our results, skip over this section for the sake of your mental health.

There is no set algorithm that defines WAR. The two most common websites that calculate sabermetric values, Fangraphs and Baseball Reference, often have slight disagreements on players’ worth.

Luckily, we found enough information on Fangraphs’ website to meander our way through our own sabermetric calculations, beginning with batting runs.

Batting Runs

The first value we needed to calculate was “weighted on base average” (wOBA). The statistic roughly resembles on base percentage, but factors in sacrifice flies (SF) and separates intentional and unintentional walks (IBB and uBB). Other variables in the calculation pictured below include singles (1B), doubles (2B), triples (3B), home runs (HR), total walks (BB) and at bats (AB).

Calculating wOBA allowed for the next, more important calculation: weighted runs above average (wRAA) based on the calculation shown below.

The wRAA statistic is simply a way to weigh wOBA for each player against the league’s wOBA based on each player’s number of plate appearances (PA) and is the statistic used to define “batting runs” when calculating a player’s WAR. Basically, wRAA helps weigh a player against the league and helps get us closer to a WAR value. In our case, we defined the “wOBA scale” value to be around 1.0, very similar to the value used in Major League Baseball (MLB) calculations.

Baserunning Runs

The main variable used to measure baserunning effectiveness is “weighted stolen bases,” or wSB. The metric is calculated as follows:

In the previous equation, “runSB” and “runCS” are the number of runs either created by a stolen base (SB) or number of runs taken away when caught stealing (CS). The caught stealing equation is shown below, where Fangraphs estimates the runs per out value to be round 0.151. The runSB statistic is calculated by the same equation, just with a positive value.

As noted in the wSB equation, the league-wide averages for caught stealing and stolen bases must also be calculated. We used the following equation, as presented by Fangraphs.

Positional Adjustment

Of course, some positions are easier to play than others. A catcher exerts more energy on defense than a first baseman does. Likewise, a shortstop uses much more energy than a left fielder. In theory, a player who uses less energy on defense should be a better hitter than a player that exerts more energy. Based off of Fangraphs’ calculations again, positional adjustments are listed in the following table.

Of course, all of Fangraphs’ data is based off of a full MLB season. In order to compensate for this metric, we simply scaled our positional adjustment values to make them relative to an 162-game MLB season.

Replacement Level Runs

In order to determine how good a player is, we have to define what a replacement level player is. Relative to the league, we have to determine how many runs a replacement level player must create through batting, baserunning and positional adjustment. The equation we used to determine a replacement level player’s runs is given below.

In this case, the “PA” statistic refers to the total number of plate appearances per player. Essentially, this means that players with more plate appearances are more valuable than players with fewer plate appearances.

We also had to assume a value for “runs per win.” In fact, the creation of this value involved our most lively discussion. In any given year, the average “runs per win” for an MLB team hovers between nine and 10 runs. However, that value includes both “offensive runs” and “defensive runs.”

According to Fangraphs, offensive runs only account for 57 percent of a team’s total runs. The other 43 percent of runs come from defense and pitching. That means that a team must generate roughly 5.7 offensive runs per win. Since our calculations only deal with offensive runs per win, this is the value we care about. Accounting for the fact that college baseball scores more runs per game than MLB games, we settled on a “runs per win” value of 7.5.

Total WAR

If you’re like most readers and began skimming the article after the first equation, you should start reading again here. The past sections all lead up to our total WAR calculation for each player. The equation we used is shown below.

More about our model

We’re students, not professionals. We don’t have all the resources we need to make the most accurate model possible, so we had to make some assumptions.

First of all, we had to ignore advanced baserunning metrics. Some professional organizations use camera tracking systems to track player efficiency on the base paths. For the same reason, we had to omit defensive and pitching WAR. Too many defensive and pitching WAR metrics have camera tracking for us to include.

Other assumptions for constant values were previously listed in this article.

What’s Next?

Not defensive WAR. Not pitching WAR. Don’t get too excited – as previously stated, we just don’t have the resources to do so.

As mentioned at the top, we do, however, plan to keep weekly, updated sabermetric statistics and projections throughout the 2016 season. The projections should help us determine who the best players and teams in the Big West Conference are offensively.

Using “Moneyball” statistics to analyze Big West Conference baseball

Leave a comment