Artwork by @SportsGuyGio
Economic 400 research paper written by senior Salem State University student, testing the Bill James ‘Moneyball’ theory in correlation to the 2018 MLB season.
Applying various basic statistical averages within each of the 30 organizations in Major League Baseball for 2018, this paper applies the Bill James “Moneyball” theory. A theory involving the application of modern-day sabermetrics in order for franchises such as the 2002 Oakland Athletics to compete with the superior market New York Yankees. All despite the multi-million dollar payroll gap between the two teams. Using team statistics from the 2018 MLB season via sources such as ESPN, MLB, and pro-baseball reference, it can be noted that indeed James’ iconic theory of giving sabermetrics a seat in the front office chair, is applicable and relevant in today’s game.
University of Kansas Economics graduate Bill James is inconically known as the almighty father of sabermetrics, transcending the perspective in which the sport of baseball is viewed today. Most infamously, James remains iconically known for his infamous theory “Moneyball”, exploiting the undervalued and therefore giving low payroll organizations a seat at the contenders table. Following the conduction of my research, I look to uncover relations within specific statistics in the 2018 Major League Baseball season, most specifically on-base percentage which James’ theory is based off of, and it’s direct impact on an increase of wins in the 2018 season.
To begin conducting my research, I took into consideration the multiple factors that could play into an organization’s standings in the win column by the end of the season. To try and uncover what exactly leads to a high win percentage in the MLB, I calculated the on-base percentage, salary, runs allowed on average by starter, runs allowed on average by bullpen, home-runs, and batting average. This laid down the foundation for my goal at solving where a team’s success came from in 2018.
The Bill James approach of running a front-office from the general manager’s chair would tell you that it is on-base percentage and not home-runs in which one should primarily focus on in order to construct a high-performing team at an efficient cost. According to James investing long-term multi-million dollar contracts to players with big names and high home-run totals isn’t efficient, nor does it make sense from the perspective of an aspiring contender with ambitions of delivering a World Series title in October.
The most iconic frame of reference to Bill James and his contribution to the sabermetric shift in Major League Baseball would be none other than Billy Beane and the 2002 Oakland Athletics.
Beane, Oakland Athletics President of Baseball Operations, took a page out of the Bill James book of baseball, applying his theory surrounding on-base percentage. An approach, at the time, which was viewed as sac-religious in the eyes of many across the league and baseball enthusiasts everywhere. Nevertheless, working with $40,004,167 dollars in a league where big-market clubs have over double that in payroll to spend isn’t ideal from a contenders point of view. Beane would ultimately be the man responsible for both debuting and brining validity to the sabertrical approach of constructing a pro-baseball team to tackle a 162-game season.
The 2002 Oakland Athletics approached their season with an unfriendly approach to traditional baseball enthusiasts who rely on home-runs, rbi’s, and batting average when evaluating talent and the premier source of runs for a ball club. The father of modern-day baseball sabermetrics Bill James enlightened the baseball world, most notably Athletics General Manager, Billy Beane, with his infamous theory known as “MoneyBall”. This theory according to James, simply exploited the overvaluing of power, and undervaluing of on-base percentage. The premier baseball statistic where James believed runs were generated from. Beane would use James’ theory in an approach, leading a 103-win Athletics team in 2002. An organization that matched the ‘02 Yankees with a 103-58 season. However when dividing Oakland’s ‘02 payroll ($39,722,689) to New York’s ($114,457,768), Beane managed to spend roughly $245,200 per win versus the Yankees $1.4 million per win.
This approach to using a new perspective of statistical analysis opened the gate for success such as Theo Epstein’s genius to leading the Boston Red Sox to an exorcism of their daunting 86-year “curse” in 2004. James has since served as a consultant for the Red Sox and has completely reinvented the way that both fans and front office executives view their evaluation of players, most notably those who perhaps would not have received the recognition and market value that they do today.
MoneyBall has since paved the way for the prototypical “stat-nerd” which can now be valued and welcomed to the front office of a big-league organization. Bill James, former bean factory employee and current advisor of the Boston Red Sox, revolutionized the way those who don’t contribute on the field, can impact an organization in the front office.
Similar papers by Brown, D. T., Link, C. R., & Rubin, S. L. (2017). Moneyball after 10 years: How have major league baseball salaries adjusted? Journal of Sports Economics, have also analyzed the aftermath of the James’ sabermetrics introduction to the game of baseball, instead from a free-agent market perspective, more specifically how James’ theory in considering a player’s on-base percentage, has created a shift in teams and the way the evaluate the free agent/ trade market in taking that very statistic into higher consideration. Thus leading to a higher market value for those once undervalued core of players who’s on-base percentage perhaps didn’t earn them as much in the pro baseball market prior to sabermetric era of Major League Baseball.
Using as authors the “pre-MB era”, referenced by Brown, D. T., Link, C. R., & Rubin, S. L. (2017), it can be noted that indeed Major League Baseball and its players experienced a notable shift during the era. Through equations such as the following, titled “Equation 1” or the “H&S mode”, which use on-base percentage, slugging percentage, and plate appearances to calculate the true value of player to his team. Using these three statistics, the equations helps better understand the true source of value in which a team relied on receiving from a particular player, breaking a scouts analysis down to numbers.
Applying this very equation, it was further discovered that through the first three years of the post-MB era, coefficients of on-base percentage saw a major increase from the 2003 season to the 2004 season. That increase from 1.43 to 4.11 didn’t see as nearly as much of an increase during the following two seasons, nevertheless numbers remain higher than those of the 03’ season.
Congdon-Hohman, J., & Lanning, J. A. (2018). Beyond moneyball: Changing compensation in MLB. Journal of Sports Economics, have also contributed related findings through their research, taking a similar approach from a different perspective in their discovery of applying the value of specific statics, emphasized by pro organizations in various periods of time in Major League Baseball. Their discovery stems from the approach of applying on-base plus slugging in order to separately exploit the value of slugging and on-base percentage to a team. The “Regression of Log Player Salary on Aggregate Statistics” model splits various periods in Major League Baseball up into six different cohorts which reveal a significant valuation increase within the 1980s to 1990s and 2000s cohort of the model. With all being said, through my extensive research I seek to find a correlation between the invest of on-base percentage and on field success of Major League Baseball organizations following the 2003 season.
Related to those previously written regarding the application of Bill James and his contribution to the history of Major League Baseball, this paper will serve yet another which references and applies James’ work in exploiting where importance of investment sits in the spenders seat of MLB clubs.
This paper applies results from the 2018 MLB season with an implication of the Moneyball theory to exploit the source of increasing an organization’s win-percentage by the end of the season. Using regression results to better understand and explain the crucial on-base percentage factor for impacting a teams slot in the standings for 2018, ultimately proving the continuous application of the Bill James philosophy in building a contender for season success.
Opposed to the previously conducted findings and papers referenced above, this paper will take a basic approach in a singular season’s result totals in revealing the still alive legitimacy of finding an increase of wins through the increase of investing in on-base percentage, aimed at finishing within the top tier in all of baseball in that specific statistical categories by the season conclusion.
All variables applied within this research are presented within the following beta equation below, used to measure covariance:
β0+β1, OBP + β2, Payroll + β3, RASP + β4, RABP +β6, BA + β5, HRs + Ei
Figures 1-5 (Statistical Comparison Scatter Plots)
The calculation method I used for the on-base percentage of every MLB organization in 2018 does not include the error factor for men on base. I calculated the 2018 season’s OBP by adding a teams total base hits to walks, and dividing that number by total at-bats within the regular season. I then used the variable of my calculated on-base percentage and ran it through a scatter plot which is represented below. The 2018 Boston Red Sox who went on to win the World Series, notched the highest on-base percentage for that season, tied with their fall classic competitor, the Los Angeles Dodgers.
Figure 1 (Win-Percentage vs. On-Base Percentage, 2018):
Much like Daniel Brown, Link, and Rubin (2017), I applied what they refer to as “post-MoneyBall era” statistics in order to get a sense of how the sport of baseball has experienced a shift prior to that of the infamous transcending 2002 Oakland Athletics. In the 2018 season, it can be identified that there are various examples in which MLB organizations with top-tier on-base percentages finished with a top-tier record among the standings following the 162-game season.
Results reveal a linear correlation within the relationship between both on-base percentage and win-percentage for the 2018 season. As James theory states, with a higher on-base percentage, follows an increase in wins by the end of the season. It can be seen that organizations such as Houston, Los Angeles, Boston, and New York all finished among the top tier for average team on-base percentages for the 2018 season. In fact it was Boston, 2018’s leading on-base percentage organization among all 30 franchises, that would ultimately finish atop all in the October World Series. Just further support for James theory coming into play with on-field results.
While Bill James and the 2002 Oakland Athletics followed up their run with a brief appearance in the postseason, the 2018 Boston Red Sox carried on their regular season finishing on top by the end of October, in a sense embracing what it was the Oakland Athletics and Billy Beane failed to accomplish.
Nevertheless, there are multiple factors to take into account when trying to identify the primary source for a high or low win-percentage of an MLB team. Statistics alone are simply one of many potential factors one must consider when accurately making a connection. Factors beyond control most definitely come into play when speaking in regards to something as unpredictable as professional sports. This leads to the next set of data which came together in the second figure, MLB win percentage versus salary.
In order to construct this figure, I calculated the average salary for each organization, I divided the total payroll of each of the 30 MLB teams by the total number of players within the roster. The average salary for the 2018 season ranged from over $2 million per player, to nearly $7.5 million per player across the league. This gives a frame of reference as to just about which organizations invest how much into their player on average, despite whether or not they are able to due to the size of their market. Financial investment limits just about how high or low the quality of players a front office head honcho is able to deliver its fans every season. We refer to a teams payroll in order to get a sense of just how flexible and creative a general manager can be any given off-season, due to the great amount of salary cap resources or lack thereof. However it must be noted that it was Bill James and his philosophy that doesn’t invest any mind into the idea of a high payroll leading to a high win-percentage by a season’s conclusion.
Figure 2 (Win-Percentage vs. Average Salary Per Player, 2018):
Fortunately for the sake of the theory’s legitimacy, results through figure 2 reveal a more scattered relationship opposed that of the one presented in figure one, on-base percentage. It can be seen through the visual that salary isn’t exactly the strongest measure in identifying a statistical correlation and relationship to Major League Baseball’s source of win-percentage for a season. As James preached, every organization has a shot at contention in the game. The issue at hand has simply been those willing to adapt to a forever evolving game. During his time, James viewed those in support of ground-breaking contracts, as “medieval” and simply blind to the true source of generating wins within an organization. For the 2018 MLB season, there can be specific organizations who’s elite payroll and free-agent market freedom can be attributed to their final standings average. Nevertheless, the measure of salary isn’t one that has cemented its place in validation for being responsible for wins alone.
Next came perhaps the most commonly referenced statistic when analyzing any organization’s offense, batting average. Another variable of data which, like salary, was divided by the total amount of players within each MLB roster. This statistic simply serves as a calculation for the average offensive production in which a player delivers in a percentage form. The following figure (figure 3) is a representation of the average batting average by a player on each MLB team, against the win-percentage of their organization.
Figure 3 (Win-Percentage vs. Team Batting Average, 2018):
Another statistic who’s relationship serves James’ views right is displayed via figure 3 where the average team’s batting average for 2018 and win-percentage are presented. Batting average is yet another calculation which played and remains a primary factor into a players value in the eyes of fans, teams, and other enthusiasts everywhere. In the case of Moneyball, batting average is simply another relic of 19th century baseball. A perspective long ago and before the sabermetric introduction within the game of baseball. While this statistic remains one taken into great consideration within the All-Star ballot, MVP voting, and even free-agent market, it again doesn’t provide a strong and solid measure when calculating the statistic as a whole for a given organization. It’s a very flawed and messy measure when considering the gaps between players which then come as a result of the games played and at-bats. This calculation is merely a service measure for personal evaluation, not a franchise one.
In order to better this research and get a more objective stance within testing the duration for James’ legitimacy in baseball, I understood that offensive statistics were not the only ones to take into account for a franchises 162-game success or lack thereof. In a sport where producing runs is crucial, efforts in not allowing those runs is also crucial. Thus comes the earned run average factor into this research.
Earned run average, or ERA, simply calculates the average number of runs allowed by a pitcher through a three decimal value. Obviously, with that being said, producing the lowest possible ERA is essential in any teams hopes of winning as many games possible in the regular season, and continuing that success come playoff season in October.
Now with applying this very statistic within this research came two separate ERA’s in order to provide a more dissected measure. Starting pitching and bullpen pitching. This gave me two separate averages for each of the 30 MLB teams for 2018. A visual in differential between the impact of the starting rotation and bullpen for all organizations.
Figure 4 (Win-Percentage vs. Starting Pitcher ERA, 2018):
Figure 5 (Win-Percentage vs. Bullpen Pitching ERA, 2018):
Results from the earned run average measure within the starting pitcher and bullpen were nearly similar, yet it was bullpen pitching which provided a more linear visual. Now again, Bill James and his Moneyball theory accounted for statistics from the offensive point of view more than anything. While Billy Beane and other front-office executives across Major League Baseball can and have applied the James’ beliefs in building a rotation and bullpen, the idea surrounded offense. With that being said, we can certainly apply the findings of runs allowed by pitching in search of win-generating statistics. In 2002, Billy Beane and the Oakland Athletics embraced this revolutionary way of laying the foundation for their pitching roster.
Alas, now comes the final statistic utilized in order to further support Moneyball. Home runs. Perhaps the simplest, most iconic, and thrilling fan favorite stat in the sport of baseball to this day. Home runs are perhaps the biggest enemy to Bill James and his ideals regarding baseball. A flashy crowd awakening occurrence in which a player takes a ball 300 plus feet into a stadium stands. While home-runs do serve as memorable, iconic moments on baseball’s biggest stage, as James pleads, they’re overrated and serve as a complete misunderstanding where wins come from.
Figure 6 (Win-Percentage vs. Home Run Total, 2018):
As for the home run total in the 2018 season, graph results provide no concrete evidence which would lead to a correlation between home-runs hit and win percentage increasing. As the Moneyball theory follows, home-runs are merely an overrated form of appropriately determining where wins come from on the field. They’re both deceiving and misleading despite how impactful they may serve within a series or a game in particular, the fact of the matter remains that a 162-game season is far too long of a duration to rely simply on the long-ball for where your respective team stands by the end of the year.
After the scatter-plot discoveries, I reorganized my findings by aligning the independent variables based on which one’s were most significant. The order was finalized as such: on-base percentage, runs allowed by the bullpen, batting average, franchise payroll, and runs allowed by the starting pitching staff on average. My results regarding the OBP statistics remained consistent after various regressions in which I ran it’s individual results against those of the independent variables with the most significance.
Regression Testing/ Results
In order to accurately determine the correlation and relationship between the dependent variable (win-percentage), and dependent variables (statistics presented in figures 1-5), several regressions were run to further test the significance within each individual stat. Significance can be calculated through a regression by referring to the p-value column.
Initial results from the first regression stood by and supported Bill James and his philosophy. The p-value for on-base percentage within the first test stood at 0.057603. This provides a showing of statistical significance, further supporting the relevance of Moneyball in 2018, 16 years following it’s breakthrough and introduction to professional baseball. The only other statistic which revealed any significance within the independent variables list was runs allowed by the bullpen which notched a p-value of 0.003947 following the regression. Now this result in particular does serve as both interesting and not one that challenges Bill James. Again, Moneyball is an offense motivated idea based on revamping the on-field performance from the guys who both take the field and step into the batter’s box.
Initially my hypothesis stood solely by James and on-base percentage. To see conflicting results of significance measurement between the earned run averages of starting pitching and bullpen pitching certainly came as a surprise. Nevertheless, this was just step one of the regression testing ran on the variables applied within this research.
Dummy Variable Application
The next step of action to better understand and analyze the results of my variables was to apply several dummy variables to run within my next regression. These variables known as “dummy variables” are simply variables which contain the value of either one or zero and help uncover any absence or presence of an effect which could play into a shift within the results found. Those variables applied within my research were all divisions within the MLB, expect for the American League East. Now, in applying these divisions, I was able to uncover the impact of playing in a specific division within the league of professional baseball.
Figure 7 (MLB Division Dummy Variable Results):
Figure seven presents the findings of applying the dummy variables, MLB divisions minus the presence of the American League East, which reveal a significance present within two of the five tested divisions in baseball. The National League West, which resulted in a p-value of 0.3972, and the National League East, which revealed a p-value of 0.361173. These findings presented interesting results that Bill James and his Moneyball theory can’t exactly take into account simply because applying it isn’t necessarily essential for the main objective and motive behind his theory. Yes, James aimed at exploiting the flaws and misunderstandings of low-payroll front-office in a small-market organization in order to compete with those of a larger market. However, the findings of the National League East and West simply reveal that both divisions are simply not as competitive as those whose p-value didn’t result as significant. According to these results, organizations within those two National League divisions, are compiled of teams with the lowest performing organizations and produce the lowest win-loss percentage of any other in Major League Baseball.
Unfortunately this finding isn’t exactly one that comes with much purpose or use for the motive of this research. When enduring the perspective of the general manager seat in a front-office, your respective division’s win-percentage isn’t an ideal measure to take into consideration when you sit among the lowest performing teams in baseball. Instead your goal is to simply finish atop with the highest possible win percentage for yourself in order to lock a playoff spot by the regular season’s end.
On-Base Percentage Findings in Following Regressions
Furthermore, the next step within the data analysis process in my research was analyzing consistency within the on-base percentage statistic. Again, this stat is the primary focus and motive behind both the findings of Bill James and the uncovering of this research to support James’ theory in modern times. The main objective is based on the application of on-base percentage and whether or not an organization with a high performing on-base percentage, will also perform atop with a high win-percentage in the MLB.
Within the follow-up regression presented in figure eight below, the independent variables which proved to be insignificant such as home-runs, batting average, and payroll were removed. This then left a test of those which proved to contain significance in impacting a teams win-percentage.
Figure 8 (Modified Significant Variable Regression):
As presented by the regression results, on-base percentage consistently remained the most significant statistic in affecting the win-percentage for pro-baseball organizations in 2018. This regression further supported the current and ongoing relationship between a team’s on-base percentage and their wins total. With our resulted R-Squared value above the 0.7 mark, we get an idea of the independant or significant variables and their impact on our dependent variable, win-percentage. It can be noted that the independent variables account for a significant amount of movement within the dependent variable, wins within a 162-game regular season.
Through the process of running regressions to better support my point, there were four total regressions ran following my initial regression with original data set. There my on-base percentage statistic proved to be statistically significant and without a doubt, the major factor which could be attributed to an organization’s win percentage in the MLB for 2018.
Figure 9 (Summary Statistics):
The summary statistic findings present the modified regressions findings above. There were only two of the six proposed independent variables which revealed any significance in its relation to win-percentage in 2018. Following the process of running various regression tests, it was indeed on-base percentage that stood as the one true source of any significant impact on a teams record within this specific season in baseball.
|OBP (On-Base %)||1.63||**||0.79|
|BA (Batting Average)||-0.47||0.57|
|ERA (Starting Pitching)||-0.004||0.03|
With all being said, it can most certainly be revealed that Bill James and his Moneyball theory on the significant of on-base percentage and its correlation to win-percentage in Major League Baseball, remains alive, prevalent, and well. Nevertheless, it remains up to those in the higher ups of Major League Baseball to apply the philosophy and priorities of Bill James into their own team builds. The head honchos of the front-offices across the MLB must certainly take into account their investment in on-base percentage and just how much it could elevate their team atop the win column in seasons ahead. While Billy Beane and the Oakland Athletics of 2002 didn’t ultimately reach the end goal of delivering a World Series title, they still managed to transcend the game in that single season. Whether it was their record-breaking 20-game win streak or the fact that they equally matched the 103-win total of the 2002 New York Yankees whose payroll is nearly triple that of Oakland’s, this chapter in Major League Baseball introduced a new modern generation of statistics.
The modern-day age application of sabermetrics is a chapter of the history of baseball which Bill James is responsible for. His model of uncovering the sought-after talent in Major League Baseball is one that can and has been applied in organizations of both large and small markets, despite their salary cap flexibility. The exposing of on-base percentage and its significance cannot be denied and therefore must continue to make its way among the factors most considered in a front-office.
With the findings of earned run average among bullpen pitchers showing significance, perhaps future research could instead use the Bill James sabermetrics approach to uncover the overlook statistics from pitchers. Exploiting the primary source of success on the mound and apply those findings to win-percentage to potentially discover a sought-after correlation that James himself didn’t consider.
Bill James and his theories may not have been viewed as ideal or perhaps even relevant in the eyes of a pre-21st century baseball enthusiast. Nevertheless, the application of the Moneyball theory is one that produces results supporting it. They did so in 2002 and have continued to across the league 16 Major League Baseball seasons later.