Co-authored with Steve Krawciw
A New York Times article covering the latest Triple Crown horse race winner, American Pharaoh, noted that the horse was identified as having an amazing potential when the animal was only 1 year old. The prediction of success was made by a team of data scientists who estimated the horse's performance by noting the size of the winner's heart and other characteristics and comparing them with those of past race winners. On the future potential of the horse, the data scientists advised him, "to sell the house, but keep the horse." Their prediction paid off - American Pharaoh won. The real victory, however, can be assigned to data science - the researchers' ability to identify the winner ahead of time based on quantitative metrics.
At its core, the data science behind the horse's win is similar to the methods deployed by modern analysts of financial markets. By observing and measuring recurring characteristics and phenomena in the financial markets, data scientists are able to pinpoint winning stocks, predict market crashes, detect market manipulation and the like.
With time, financial data analysis is becoming increasingly precise and data-intensive. This is in part driven by ever-plummeting costs of technology required to crunch data, by ever expanding data availability, and by the success of data science in financial operations. Big data analyses often drift to the shortest time frames, involving data captured in milliseconds and microseconds. Firms like Getco, Virtue and Quantlab have short-term data analyses over the past couple of decades. Not only institutions benefit from the advantages of short-term financial data analyses; smaller investors can benefit, as well.
Why would ordinary investors care about big data in Finance, and big data at high frequencies, in particular? To answer this question, consider an average investor, Joe, who desires to do something mundane: buy or sell a stock or another financial instrument at the market open prices. To do so, Joe has two methods at his disposal:
1) Joe can place a market order that tells his broker or an exchange to fill his order as soon as possible at the best price available.
2) Joe can place a limit order specifying a particular price, but no time limit for his trade.
If Joe chooses the market order route, he is guaranteed to have bought his desired security, but possibly at a much worse price than the opening bid or even ask price. During the few minutes immediately following the market opening prices strive to incorporate all of the information pent up from overnight, when the markets are closed. The information is transmitted into the markets via traders' orders, and the disparity of views causes the prices to bounce violently up and down briefly, until the traders reach a consensus on prices. Due to the volatility, a market-order Joe may be filled at the worst possible price, possibly erasing Joe's projected gain from the trade.
As an alternative, Joe may choose to place a limit order and specify the price at which he is willing to buy the financial instrument of interest. Here, Joe is facing another decision, the price itself. If Joe chooses a price that is too low, his order may never execute. If the price is too high, he does nothing to outperform his market-order scenario. How can Joe determine a price that is just right, that is, both favorable and results in a timely execution?
A simple, yet effective strategy could be to place a limit order at a "mid-price": a price that is the average of the bid and ask at the market open. To do so, however, one needs a timely source of market data, from which to calculate the mid-price. (Most brokers provide their clients with free access to data that is 15 minutes delayed - too slow for Joe to successfully identify and execute upon his strategy.) This basic example illustrates just one big high-frequency data application that, over time, can make a million-dollar difference in accounts of investors who embrace the data and those who do not. The mantra of today's investors in financial and horse markets alike may be "keep data science, and choose to trade the assets according to the science's predictions."
What other inferences can one make from market data using data science? The possibilities are limitless. Like in any field of research, identifying working models involves lots of trial and error, data observation, and, eventually, intuition about what works and what does not. The data is the first necessary stepping stone.