Back in 2013, I was in the middle of a 3-month endurance studying period for the GMAT. During this time, the Baltimore Orioles reached out to me to apply for their 2014 Internship. As part of their application, they provided a series of Baseball Analytics questions for me to answer. Here are those questions and answers. This was so much fun to do even though I did not receive a position with the Orioles. I thought I'd be fun to share this with you all.
Baltimore Orioles 2014 Internship Questionnaire
Please answer either of the following two questions in about 400 words or less.
1) What is the best way to use a pitching staff? We know this question is vague; take it in any direction you like.
2) Which of the following two players would you prefer? Explain your thought process, but also focus on one aspect to compare and contrast most specifically. Assume both are 27-year-olds.
A LHH who is above-average defensively at RF & LF
.255/.330/.415 in 2,000 career PA
.260/.350/.430 in 700 PA vs. RHP in 2012-13
.227/.250/.330 in 300 PA vs. LHP in 2012-13
.245/.305/.385 over his last 150 PA
Career Pitches/PA: 3.70
A RHH who is average defensively at 3B/2B
.250/.320/.400 in 2,000 career PA
.241/.309/.387 in 700 PA vs. RHP in 2012-13
.270/.345/.430 in 300 PA vs. LHP in 2012-13
.265/.350/.460 over his last 150 PA
Career Pitches/PA: 4.10
Please answer the following four questions. For questions 4-6, feel free to use bullet points or incomplete sentences.
3) Please propose an analytics project that you think would be of assistance to the Baltimore Orioles’ front office. Be as specific as possible: What question would you be attempting to answer? What data would you use? How would you go about collecting that data? What programs and statistical techniques would you use to analyze the data? How do you foresee this project helping the front office? 600 words or less.
4) In your opinion, what area represents the next potential source of competitive advantage for MLB front offices? Just identify it and explain it briefly in roughly 50 words or less.
5) Please list 4-5 of your favorite baseball-related websites or books. For each item in the list, include a note or two explaining why you find this source interesting or informative.
6) Please list the programs, computer languages, and foreign languages in which you are fluent. For each one, what is your comfort level on a scale of 1-10? No further explanation is necessary.
1 = very basic familiarity, 10 = completely fluent.
Example: Excel (7)
Question 7 is completely optional. You will not be penalized for skipping it; in fact, it is our hope that many – if not most – candidates leave it blank. Our aim is not to take up hours of your time, but if there’s something you’d like to add that we haven’t yet covered, this is your space.
7) Is there anything else about you we should know? Be creative.
The following analysis assumes that unspecified attributes of the situation (e.g., team need, player cost, etc.) are the equal for both players.
We are presented with two options: 1) an above average defensive RF/LF and 2) an average defensive 2B/3B.
Baseball positional defensive ability is inversely proportional to availability of positions’ players. With the low supply of SS and C, teams are more willing to accept less offensive performance since there is an inherent scarcity premium. Using results from a study on defensive positional value, 2B/3B positional values were worth +2.5 runs above average per season and RF/LF were at -7.5 runs above average per season. Thus, an average 2B/3B produces 10.0 more runs than an average RF/LF.
However, it must be noted that the option presented is an above average RF/LF. For this analysis, I will define “above average” to mean a player who produces defensive value above average equal to the 75th percentile of his position data sample. For the 2012-2013 season, an above average RF/LF translates to approximately 7-10 runs per season compared to an average RF/LF . Given the small range of values and the margin of error, I am compelled to state that an above average defensive RF/LF is worth about the same as an average 2B/3B.
Over their 2000 PA, Player A bests Player B in all triple slash categories (AVG, OBP, SLG). Additionally, we are also given platoon data (700 PA vs R, 300 PA vs L) and his most recent 150 PA data. 150 plate appearances is not nearly enough of a sample size to reliably evaluate a player’s true talent level for the statistics used (AVG, OBP, SLG) . For that reason, these 150 PAs will be weighed very low on the decision matrix. In 2012-2013, 29% of all PAs were vs L and 71% were vs. R . Both players were essentially used as a “regular” (non-platoon) player. This means that the players were used primarily as “regular everyday” players. Assuming this use continues, one can combine both the 700PA vs R and the 300 PA vs L as one sample of 1000 PAs. This yields an interesting result. Both players have exactly the same triple slash line (.250/.320/.400) for the 2012-2013 seasons. For the 2012-2013 seasons, the league average triple slash was .254/.319/.401. Over the past 1000 PAs, both players produced the same offensively and both were decidedly average at that.
In 2013, the average Pitches/PA was 3.85; Player A was below average and Player B was above average. A recent study suggests that “running up pitch counts” is becoming less effective as pitchers are throwing more strikes and issuing fewer walks and taking advantage of batters’ passivity. As such, Pitches/PA does not tell us much in regard to forecasting.
Conclusion: Slight edge to Player A for better career numbers
The difference in value between these players on paper is negligible as both players approximately equal in defense and offense over 2012-2013 seasons. Without knowing more team factors, I would pick Player A. He has a better career triple slash line, so for forecasting purposes, I would regress him to his mean, which would provide a slightly higher forecast than Player B, who has a lesser mean. Also, since he is a lefty and there are substantially more vs. R Plate appearances, his strength would be focused on, while minimizing his weakness.
Since 1994, when the league expanded to 6 divisions with the addition of the Central Division in each league, the AL East has the highest average win total required to win the division at approximately 96 wins, whereas the NL West has just over 90 wins as its average wins required to win the division . That is a 6 Win Difference! The Baltimore Orioles have played in the most competitive division over the past 20 years. For my research, I would do everything in my power to compile a comprehensive proposal to realign the divisions to level the playing field and provide all teams an equally competitive landscape to compete in.
I don’t foresee this happening in the near future so my analytics project would center on a different topic. In the current landscape of baseball, more teams are locking up their young talent earlier and longer and not allowing them to leave via free agency until past their primes . This has caused the talent pool for free agency to dry up, increasing salaries for the players who go to free agency, which makes it even more important to lock up the young valuable players you control.
This year, the name Manny Machado has started to be used in the discussion of top position players in the game right now along with the other two top young talents, Mike Trout and Bryce Harper . Prior to his injury this year, he was profiling as the type of player you lock up for many years. However, this injury injects significant risk into this proposition. But does it have to?
The risk that is encountered when we are discussing injury is the risk of the unknown. What will the future bring for Machado? How long will it take for him to recover? What will his future production look like? Will he ever be the same? Can he stay at a premium position of 3B or shortstop? Can he remain an elite defensive player? All of these questions boil down to “how can we reliably predict his future production?” If you could reliably predict his future production, you could project his value and in turn, know what kind of contract to offer.
For this analytics project, I would focus on building a medical database that includes data about injuries that baseball players suffer from most. I would look at both the general population for this data and also a specific subset for baseball players; assembling data from other sports too could be a potential third sample set of data and collecting data on recovery time, failures, repeat injuries, related injuries, and many other items. For baseball players, I would look at the production of players with similar injuries and how they performed pre- and post-injury.
To gather data, I would scour medical journals and databases for every piece of data I can pull out. Working with the Baltimore Orioles medical staff and medical consultants would be a necessity. Once the information has been gathered in one repository, I would use statistical tools like JMP to see which variables are correlated most with the output factors useful for us (recovery time, post-recovery performance vs. pre-recovery, etc.).
Although this research project would have a broad scope of injuries, my focus would be on Machado. If we were to develop a probabilistic model of the potential outcomes for Manny Machado’s recovery and future production, we could answer two major questions for the evaluation of a contract extension for Machado:
1) Do we know enough to mitigate risk from this injury to sign him long-term?
2) What terms should we favor in contract negotiations? Information asymmetry between the Orioles and Machado’s representation could provide a favorable negotiating position.
The most difficult challenge of this project is gathering the information all in one place. Once it is stored locally, the other members of the Orioles organization would have access to it and be able to perform their own analysis, which would greatly stimulate the propagation of ideas.
I think teams will begin to exploit holes/technicalities with the relatively new CBA. Some teams have already enacted some strategies to use more money on over-slot picks and gain savings on later rounds with under-slot. One scenario is sign and trade deals similar to the NBA. In the MLB version, it could work for teams with a protected pick sign a player and trade to a team who has an unprotected pick (note: I’m not sure if this is legal per CBA).
- Fangraphs (website: fangraphs.com)- This is a staple of the sabrmetrics community. The diversity and depth of the analysis is exceptional.
- Crashburn Alley (website: crashburnalley.com/) - I’m a native Philadelphian and this site has by far the best writers and analysis for Phillies baseball.
- Moneyball (book) - Michael Lewis is one of my favorite writers. Any book that combines one of my favorite writers with one of my passions (baseball) is an instant favorite of mine.
- The Book (book) and Inside The Book (website: insidethebook.com) - In my opinion, Tom Tango, provides some of the best analysis of baseball online.
- Brooks Baseball (website: brooksbaseball.net) - Dan Brooks continually improves his BrooksBaseball website allowing the masses to view PitchFX data though a terrific GUI.
JMP Statistical Analysis(5)
 MacAree, Graham (Unknown). Positional Adjustment. Fangraphs. Retrieved October 27, 2013, from http://www.fangraphs.com/library/misc/war/positional-adjustment/
 Team runs saved per position data gathered and complied from fangraphs.com database
 Carleton, Russell A. (2012, July). Baseball Therapy -It's a Small Sample Size After All. Baseball Prospectus. Retrieved October 29, 2013, from http://www.baseballprospectus.com/article.php?articleid=17659
 Vs R PA and vs L PA data gathered and complied from fangraphs.com database. 108,162 PAs vs L; 260,890 PAs vs R in 2012-2103 seasons
 Verducci, Tom (2013, June). Wedge Issue: Why plate discipline has now gone too far. Sports Illustrated. Retrieved October 28, 2013, from http://sportsillustrated.cnn.com/mlb/news/20130604/dustin-ackley-eric-wedge-sabermetrics/
 Data gathered and compiled from Baseball Databank (years 1994-2010) (http://www.baseball-databank.org) sql database and from Fangraphs (years 2011-2013) (http://www.fangraphs.com)
 Cameron, Dave. (2012, May) Orioles Wisely Lock Up Adam Jones. Fangraphs. Retrived Novemer 7, 2013, from http://www.fangraphs.com/blogs/orioles-wisely-lock-up-adam-jones/
 Baseball Prospectus. (2013, May). The Lineup Card -Making the Case for Harper, Machado, or Trout. Baseball Prospectus. Retrieved November 7, 2013 from http://www.baseballprospectus.com/article.php?articleid=20734