We deal with trade-offs all the time. "You can have it good, fast or cheap... pick any two." The implementation constraints for this decision tree are clear-cut and obvious. If you want it good and fast, it won't be cheap. If you want it fast and cheap, it won't be good. If you want it good and cheap, it won't be fast.
The data science question you ask may be as simple as, "Do you want it good, fast or cheap?" but achieving a desired outcome will most likely require more than just the right mathematical and computer science tools. Successful implementation may require zero-sum trade-offs that have an artistic side to them.
Here are seven data science issues that will compete for your attention as you think about solving problems that enable data-driven business decisions.
Algorithmic Complexity -- How quickly or slowly will the algorithm(s) perform? Algorithms are processes or rules followed to solve problems. They can be extremely simple or remarkably complex. Every algorithm has a speed of execution parameter determined by the computing environment (how fast or slow is the CPU, front-side bus speed, storage seek time, etc.).
Quantity of Data -- How big is the data set? Big Data may be an overused marketing term, but in practice, the data set you need to analyze may be colossal. The size of the data set will have a substantial impact on the amount of computer power required to accomplish your analysis.
Input Speed -- How fast will the data have to be ingested? Will it come in real time from the Twitter firehose or at hundreds of gigabits per second over a fibre channel connection from a data warehouse? Is it sitting on a storage device that is local to the CPU? Is there a wireless network connection involved? Is the data coming through various networks, bounced around the earth via satellite and reassembled via load balancing tools before you can get to it? And on and on...
Output Speed -- Do you need your results in real time as in, right now? Do you want an hourly report or, will a daily or weekly digest suffice? I've been in meetings where executives have asked for real time reports (because dashboards are cool) when a daily or weekly digest would more than suffice. The consequences of asking for something to be delivered real time when real time reporting is not a necessity are dire. It is not uncommon for this simple ego-driven request to double, triple or even quadruple the cost of a system. Careful what you ask for, you just might have to pay for it.
Accuracy -- What level of accuracy is required to achieve the desired outcome? Are approximations acceptable or does the problem require nth decimal place accuracy? Put another way, is it okay to be 80 percent sure or do you need to be 100 percent sure? This is a trade-off that is worth a Socratic discussion with your mathematician.
Confidence -- What is the acceptable range of confidence in the results? You can make your own scale for confidence levels but a systemic "high confidence" rating is a very important component of any calculation.
Data-Set Complexity -- How complex is the data set? Is it structured or unstructured? How much data overlap exists (annual financial reports in the presence of monthly reports, etc.) Are component parts linearly separable? Are the data distributed in multi-dimensional arrays, etc.? There is literally no end to the hot mess of data-set complexity and it will have a huge impact on the efficacy and even the feasibility of any analytic technique.
The Art of Selecting Analytic Techniques in a Zero-Sum Game
Pick any one of the above components as most important and the other six will have to give. Selecting analytic techniques is a delicate balance of art and science.
But there's more... while selecting analytic techniques is a zero-sum game, the process is often made more complex by the addition of business policies, legal restrictions or regulatory constraints. How will you deal with personally identifiable information (PII) or personal health information (PHI)? Both require special handling and a targeted approach.
We have a team ready to help you prepare to work with your data, understand the opportunities afforded by machine learning and pattern matching and even do a data science readiness assessment. Just shoot me an email and I'll be happy to work with you to help you achieve your business goals.
I'm the Managing Director of the Digital Media Group at Landmark|ShellyPalmer, a tech-focused investment banking and advisory firm specializing in M&A, financings, and strategic partnerships. You may also know me as Fox 5 New York's on-air tech expert. Follow me @shellypalmer or visit shellypalmer.com for more info.
The Morning Email helps you start your workday with everything you need to know: breaking news, entertainment and a dash of fun. Learn more