05/07/2013 06:05 pm ET Updated Jul 07, 2013

When Big Data in Sports Fails


As discussed in Michael Lewis' book, Moneyball, baseball has become the most quantitative, statistics-driven sport in the world. It's now common knowledge that professional teams employ analysts responsible for interpreting statistics like on-base percentage (OBP) and runs batted in (RBI) as well as more recently developed categories like wins above replacement (WAR) and on-base percentage plus slugging (OPS). This exhaustive data analysis is done for nearly every professional player in both the major and minor leagues in order to find and acquire talent to field the best possible team.

Fortunately for pro teams, the statistical analysis on professional players is trustworthy because the athletes all compete on a level playing field: against other professional athletes.

But what about the statistics for amateur athletes, where the quality of competition varies dramatically? Nate Silver, renowned economist and statistician, recently told attendees at the MIT Sloan Sports Analytics Conference that "drafting and developing talent properly still represents the greatest value-add to a sports organization." Though, in scouting amateur players, this presents a tremendous challenge because traditional statistics can be misleading.

In high school, for example, the drastic range of skill and competition makes some athletes appear to be elite when in fact they aren't. The same can be said in college and, in conjunction with the use of metal bats, statistics can misrepresent the true talent level of players for offense, defense, and pitching.

Adding further complexity is the growth and changing dynamics of the amateur baseball market as a whole.

Amateur Baseball Market in the United States (2011)
College players: 31,264
Junior college players: 13, 649
High school players: 471,025

Domestically, teams are responsible for over half a million amateur prospects in a given year. In 2012, Major League Baseball reported that nearly 30 percent of roster players came from countries outside the U.S., such as the Dominican Republic, Venezuela, Japan, Puerto Rico, Cuba, Canada and Taiwan. This means there are potentially millions of amateur prospects that teams must attempt to discover, evaluate and accurately project.

Trusted Relationships and High Quality Info Delivered To Decision-Makers

To Mr. Silver's point, how do teams find great amateur talent when the traditional statistical data can't be trusted?

The answer is trusted relationships. Given the statistical limitations in evaluating amateur athletes, sports organizations rely on high quality, qualitative information from sources they trust. These credible sources come in the form of scouts, college coaches, and high school coaches. From professional teams and their scouts to college teams and their coaches, the process is predominantly relationship driven.

A famous example of this process occurred in 1992, when the Yankees selected a high school prospect out of the state of Michigan despite the fact that he weighed only 159 lbs at the time. The prospect was Derek Jeter, and the scout was Dick Groh. Like most prospects, Jeter dominated his high school competition and had great numerical baseball stats. Groh recognized that Jeter possessed qualitative abilities and intangibles that baseball stats couldn't measure. The Yankees organization trusted Groh's assessment of Jeter and drafted him sixth overall because the information was coming from a credible source.

To improve this process, the biggest question has become: how do teams acquire this essential insight on more amateur athletes faster and cheaper?

In an attempt to manage all of this prospect data, teams are using various customer relationship management tools (CRM's) and enterprise software developed by both large corporations and startups. Scouting reports have become digitized. Video clips are now accessible from smart phones and tablets. However, significant holes exist.

More Data Doesn't Equate to Better Data

Big data solutions can help teams filter out specific types of players, but it often doesn't go nearly deep enough when evaluating a player's potential. With the current model in place, it's not cost-efficient to see all players who meet a team's specific criteria. This means collecting hoards of sensitive information, discerning the credibility of those information sources, eliminating all candidates who don't meet their needs, and then further evaluating the remaining players.

The New York Times once quoted Gary Hughes, long-time scout of 43 years and a special assistant to the Cubs' general manager, Jim Hendry, as saying, "a prospect's 'makeup' -- his emotional and psychological stability, along with his self-confidence -- is as much a part of the assessment process as his physical tools, and it's an intangible."

In 1992, five other MLB teams passed on Derek Jeter, evidently missing the intangibles that the Yankees did not. This past year's American League Rookie of the Year, Mike Trout, was initially passed over by the majority of Major League teams in the first round of the 2009 draft.

How many teams are kicking themselves for missing these players? Furthermore, how many remarkably talented prospects go unnoticed each year? How many wins are sacrificed? How can technology help ensure teams improve their chances of realizing this value? The key is empowering trust and improving the process by which credible information reaches decisions makers.

Do Other Sports Have The Same Problem Baseball Does?

While baseball is so deeply engrained with statistics, what about other sports like football and basketball? Like Mr. Silver, other experts speaking at the MIT Sloan Conference said there exist significant opportunities.

"The NFL is way behind the other sports (in terms of analysis)" says Scott Pioli, former Kansas City Chiefs general manager. Dallas Mavericks owner Mark Cuban says, "The top 10 guys are easier to spot. How do we improve in identifying the others?" Just as in baseball, the number of international players in the NBA rises each year, with the use of technology, so do the scouting costs. Knowing the limitations of amateur stats, professional teams and college programs are trying to find the best way to acquire better information on more athletes faster and cheaper from sources they can trust.

If drafting effectively presents the biggest value add opportunity for any team, the scouting process in baseball and other sports will continue to evolve. Talent recruiting in sports is very different from other industries. LinkedIn doesn't get specific enough to help these unique "employers." It's vital that teams receive the highest quality information from sources they trust most. This applies to a professional scout looking for the next Cy Young award winner but also to an NCAA coach looking for a pitcher who meets his athletic, academic, qualitative requirements. Most importantly, in light of such sensitive information in a definitively competitive industry, teams must be afforded comprehensive privacy and protection for their coveted information.