2004 Mlb Wins Regression
Essay by review • February 10, 2011 • Research Paper • 1,920 Words (8 Pages) • 2,503 Views
On Wednesday, October 27th 2004, the Curse of the Bambino was finally lifted off the City of Boston and its long-suffering baseball fans (see Appendix A for more on the Curse). For the first time in 86 years, the Boston Red Sox were the world champions of baseball.
There is no arguing that the 2004 Red Sox were a good team that played excellent baseball throughout the season. The team was led not by talent cultivated through the Red Sox' farm system but by high-priced, free-agent acquisitions such as Pedro Martinez, Manny Ramirez, Keith Foulke, Curt Shilling and David Ortiz. The average age for a Red Sox team member was 31.1 years, the oldest team average in the league. Additionally, the cumulative payroll for the 2004 Red Sox was the second highest in Major League Baseball at $125,208,542 or $4,173,618 per player. The previous two statistics describe some of the off-field demographic makeup of the 2004 Red Sox. In additional to being a veteran and well-paid ball club, the Red Sox performed well on the field as well. The team batting average (number of hits divided by number of official at-bats) of the Red Sox was tied for the highest of the 30 Major Leagues teams at 0.282. In terms of pitching statistics, the Red Sox were in the top third of earned run average (E.R.A.; the number of earned runs allowed per nine innings of play). Fielding average (number of successful fielding attempts divided by total number of fielding attempts) is the only major statistic where the Red Sox were significantly below the mean, ranking in the bottom quartile.
I am interested in analyzing the Major League Baseball data from the 2004 season to determine the factors that best predict success (measured by the number of team wins). I am especially interested in analyzing the relationship between wins and payroll. I am most curious about this relationship because this relationship can be controlled by the ball club's management. On-field performance is less controllable by the team's management because it has a higher Ð''human performance' element. Furthermore, I will obtain the linear regression equations for the various variables and detailing the additional amount of wins for the marginal amount of the independent variable. In addition to analyzing the relationship between payroll and wins, I am also interested in analyzing the relationship between other major statistical categories and wins. The other categories I will analyze are team age, team batting average, team earned run average and team fielding percentage. After this analysis, I hope to determine which variables have the highest correlation to winning baseball games for the 2004 season.
Major League Baseball is the only major professional sport that does not have a salary cap (the maximum in total payroll that a team can pay its players). For example, the National Football League has a salary cap for 2004 of about $75 million and the National Basketball Association has a salary cap for 2004 of approximately $44 million. There are multiple reasons for a salary cap but two of the underlying reasons are parity (equality) and competitiveness. It is assumed that without a salary cap, large market teams such as New York, Los Angeles, and Chicago will be able to Ð''buy up' all the good players leaving the small market cities such as Minneapolis, Cincinnati, and Phoenix with the less-talented left-overs. Additionally, teams that win more games and make the playoffs and World Series receive extra revenue from TV, thereby creating even more of a discrepancy if large market teams have an advantage in winning more games and playing in the post-season.
In 1998, MLB Commission Bud Selig formed a panel to report on the economic conditions within baseball. One of the findings of that panel was that team payrolls have become increasingly disparate; the gap between "rich" and "poor" teams is not only wide, but it is growing. The effect, according to the panel, is a dramatic decline in parity and competitiveness of MLB.
In this report, most of the data being analyzed was gathered from the Major League Baseball website (www.mlb.com) or the ESPN website (espn.go.com).
The dependent variable for each analysis is a team's total number of victories in the 2004 season (including the playoff and World Series). The independent variables for these regressions are 1. team payroll, 2. average team age, 3. team batting average, 4. team earned run average, and 5. team fielding percentage. Regression analysis results were obtained using the least squares method in the Data Analysis tool within Microsoft Excel.
Regressing team wins and payroll results in a correlation coefficient of 0.56, a coefficient of determination of 0.31 and an estimated simple linear regression equation of:
Total Wins = 64 + .26 (each $1,000,000 of payroll expense)
A correlation coefficient of 0.56 shows a moderate, positive relationship between payroll and wins. This moderate, positive correlation can also be visually seen in the scatter plot of the two variables (see attached chart entitled 2004 MLB Total Team Wins and Team Payroll). A coefficient of determination of 0.31 shows the amount of total variance in wins explained by the payroll variance. As expected, the regression equation has a positive slope, meaning that every additional $1,000,000 spent results in approximate 0.26 wins. When looking at the p-value for payroll, I am testing whether the payroll variable significantly predicts the number wins. The null hypothesis I am using states that payroll is NOT a significant predictor of wins. At a 5% significance level with a p-value of 0.001, I reject the null hypothesis. Therefore I accept the alternative hypothesis, that payroll is a significant predictor of wins.
When regressing team wins and average player age, the results are a correlation coefficient of 0.70, a coefficient of determination of 0.49 and an estimated simple linear regression equation of:
Total Wins = -156.7 + 8.6 (each year of average team age)
For 2004, there definitely seems to be a strong, positive relationship between the amount of team wins and the average age of the team. Further analysis on this will be provided later in the report. Once again, at a 5% significance level we will end up rejecting the null hypothesis (p-value: 0.000016).
When regressing the offensive statistic of team batting average and team wins, the result is once again a moderate, positive relationship. The results are a correlation coefficient of 0.57, a coefficient of determination of 0.32 and an estimated simple linear regression
...
...