This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.
If an enterprise has Big Data on hand, the urge to convert it into actionable information that would help business thrive seems obvious. Let’s zoom in on how we can apply Machine Learning to business. To illustrate the case, we’ll use the results of a problem-solving session that took place at a datathon.
Datathon is a hackathon focused on problem-solving using Machine Learning.
The Knot to Unravel: Real-world Client Data
Advertisement
A bank, a sponsor of the datathon, provided proprietary real-world client data in an anonymized form. The datathon participants were to analyze the datasets by generating multiple hypotheses and identifying the viable ones. The problem was expected to be solved using cluster analysis, a method of unsupervised learning.
Unsupervised Learning – a machine learning algorithm that teaches the computer system to identify inferences in datasets. They would consist of input data without labeled responses. Cluster analysis helps find hidden patterns or grouping in data based on specific parameters. For instance, this could be segmentation of subscribers of a mobile services provider.
Multiparameter Data as Basis for Hypotheses
The team was to make hypotheses using data with disparate input parameters. They included description of the product or service acquired, the amount paid using the credit cards issued by the bank, and user demographics – age and sex. The majority of the data fell into categories based on high-level payment destinations – shops, gas stations, services, etc. Some of the categories had a more detailed level of description, for instance, AliExpress, Uber, Burger King, iTunes. The major hypothesis made by the team was as follows: if they analyzed the user money spending patterns, they would generate a rather informative user portrait.
Data Processing, Pattern and Correlation Analysis
Processing Unlabeled Data
Advertisement
The team processed the unlabeled data as follows: reduced its dimensionality, performed clustering and correlation analysis. For this, they used such tech and tools as Python, t-SNE, DBSCAN, and Matplotlib. The participants also sanity-checked the data against the real-world parameters. Thus, they identified an outlier in payment destination values that amounted to 16,000 for an Uber ride. When studied closely, the amount turned out to have a foreign currency attribute. Once the team converted the amounts in major currencies to a common currency and screened the rare ones, the data became more informative.
Identifying Patterns and Correlations
The team managed to identify several meaningful patterns and interdependencies. The graph analysis and cluster analysis used by the participants demonstrated a correlation among the clients commuting via Uber and those who shop on iTunes. Another cluster located nearby showed that credit card holders with foreign currency accounts are young people who are regulars at local bars and restaurants.
Unbiased Hypotheses Verification
The Datathon participants test-proved that unsupervised learning is a great fit for validating unbiased hypotheses. This method does not aim to identify cause-and-effect relations or achieve stable results, which otherwise may add subjectivity to the data processing results. For instance, the assumption that an average fast-food lover would frequent different fast-food brands did not prove valid during this problem-solving session.
The unsupervised learning method will benefit those service providers who have accumulated large data volumes about their clients. It enables analysts and marketers to obtain an impartial insight into customer behavior: how the client activity changes if the company has modified its service or introduced a new one; whether the existing service offering has a weak spot, which needs fixing.
Advertisement
The topic covered is the result of Olga Babik’s contribution.
Olga Babik is a tech blogger and marketing specialist with Softeq, a software company in Houston, TX. Olga closely collaborates with the Softeq engineering team who work on a variety of IoT projects with the focus on big data mining and machine learning processing at the backend. She highlights her colleagues’ first-hand experience and skills in prototyping, devising, integrating, deploying, and supporting connected solutions driven by firmware, software, and hardware.
Our 2024 Coverage Needs You
It's Another Trump-Biden Showdown — And We Need Your Help
The Future Of Democracy Is At Stake
Our 2024 Coverage Needs You
Your Loyalty Means The World To Us
As Americans head to the polls in 2024, the very future of our country is at stake. At HuffPost, we believe that a free press is critical to creating well-informed voters. That's why our journalism is free for everyone, even though other newsrooms retreat behind expensive paywalls.
Our journalists will continue to cover the twists and turns during this historic presidential election. With your help, we'll bring you hard-hitting investigations, well-researched analysis and timely takes you can't find elsewhere. Reporting in this current political climate is a responsibility we do not take lightly, and we thank you for your support.
The 2024 election is heating up, and women's rights, health care, voting rights, and the very future of democracy are all at stake. Donald Trump will face Joe Biden in the most consequential vote of our time. And HuffPost will be there, covering every twist and turn. America's future hangs in the balance. Would you consider contributing to support our journalism and keep it free for all during this critical season?
HuffPost believes news should be accessible to everyone, regardless of their ability to pay for it. We rely on readers like you to help fund our work. Any contribution you can make — even as little as $2 — goes directly toward supporting the impactful journalism that we will continue to produce this year. Thank you for being part of our story.
It's official: Donald Trump will face Joe Biden this fall in the presidential election. As we face the most consequential presidential election of our time, HuffPost is committed to bringing you up-to-date, accurate news about the 2024 race. While other outlets have retreated behind paywalls, you can trust our news will stay free.
But we can't do it without your help. Reader funding is one of the key ways we support our newsroom. Would you consider making a donation to help fund our news during this critical time? Your contributions are vital to supporting a free press.
As Americans head to the polls in 2024, the very future of our country is at stake. At HuffPost, we believe that a free press is critical to creating well-informed voters. That's why our journalism is free for everyone, even though other newsrooms retreat behind expensive paywalls.
Our journalists will continue to cover the twists and turns during this historic presidential election. With your help, we'll bring you hard-hitting investigations, well-researched analysis and timely takes you can't find elsewhere. Reporting in this current political climate is a responsibility we do not take lightly, and we thank you for your support.
Thank you for your past contribution to HuffPost. We are sincerely grateful for readers like you who help us ensure that we can keep our journalism free for everyone.
The stakes are high this year, and our 2024 coverage could use continued support. Would you consider becoming a regular HuffPost contributor?
Dear HuffPost Reader
Thank you for your past contribution to HuffPost. We are sincerely grateful for readers like you who help us ensure that we can keep our journalism free for everyone.
The stakes are high this year, and our 2024 coverage could use continued support. If circumstances have changed since you last contributed, we hope you'll consider contributing to HuffPost once more.