THE BLOG
05/18/2016 02:27 pm ET Updated May 19, 2017

Unsupervised Machine Learning Could Help Us Solve the Unsolvable

In machine learning, the ultimate goal is to train a machine or computer to learn and infer like a human, taking into account much more information and making better decisions in exponentially less time than humans are able to do. While the potential applications of machine learning multiply daily, the time and effort required to train optimized machine learning systems is substantial and, as voiced by Dr. Yoshua Bengio in a recent interview, constant training and feeding of information is not a "natural" way to learn and obtain knowledge. All of these factors point to the invaluable potentials of unsupervised learning, which if done successfully can learn and infer largely without the aid of a human.

Only within the last decade has machine learning gained a foothold, thanks to the explosion of data via the internet paired with cheaper and more powerful computer hardware. Most of this learning is 'supervised'; if researchers want a system to be able to distinguish between images of a two-story house and a skyrise, the system is traditionally trained on thousands of images in both categories until the system can accurately recognize one from the other.

'Deep learning' is a sophisticated branch of machine learning, made possible by the development of layers of artificial neural networks, that is capable of making use of coveted unsupervised learning techniques. The beauty of unsupervised learning is its inhuman ability to detect patterns and extract information from a 'noisy' environment, potentially with better accuracy and scalability than with supervised learning. Though the first "primitive" deep learning algorithms were conceptualized and published back in the 1960s, the first real breakthrough achievement occurred in 2012 when Google and Stanford researchers used a deep learning system that was able to identify cats in digital images without any previous training.

Deep learning is still in its infancy. At present, most unsupervised learning systems still require some human feedback and training after initial data analysis. But the ability for a system to accurately learn "on its own" with minimal or no human feedback not only reduces required human labor hours and long-term costs, but has far-reaching potential implications in terms of the types of problems that these systems may be able to help humans solve.

When differentiating between traditional machine learning (supervised) and 'deep learning' (unsupervised) systems, it's helpful to think of the common analogy of how to sort a basket of fruit. In a supervised learning system, the machine has already been trained to recognize types of fruit i.e. bananas, apples, cherries, pears, etc. The system has a built-in understanding of the 'response variable' i.e. the features that make up any given fruit, and 'knows' to look for these features when classifying future examples that it encounters. In supervised learning, the learning itself is the end goal.

In contrast, unsupervised learning systems freely analyze 'patterns' in unlabeled data, with no corresponding error or reward linked to a conclusion. It works with 'unlabeled data' and is similar to 'associative' or 'discovery' learning in humans, something that we do very well (and often take for granted). For example, when an unsupervised system is asked to sort or arrange fruits based on raw observations, the system might 'choose' to arrange the fruit based on recognition of color, placing strawberries and cherries in the 'red' category; or, the system might sort based on observed sizes, grouping pears, apples, and oranges in a 'medium-sized' fruit category. This latter method is commonly known as 'clustering' and the accepted approach used by these systems to categorize information. Unsupervised learning is a stepping stone, a means to another end such as categorization or finding potential correlations or solutions unable to be spotted by humans or supervised learning systems alone.

Despite significant progress, the underlying processes of unsupervised learning, what's really happening at the level of the artificial neurons, is still a mystery. Risks include building models that fail or don't work in unexpected situations; however, the real gap lies in a lack of explanation by the system for its findings or results, human beings are still the ultimate interpreters. It may be that an unsupervised learning system comes up with a set of conclusions that are nonsensical or seemingly indecipherable by human beings, so finding ways to synthesize analysis with meaning is a significant code that still needs to be cracked.

Still, deep learning achievements over the past five years have been significant. In addition to Andrew Ng's team's breakthrough in 2012, there are others working away at building successful unsupervised learning systems, including Facebook's AI Research (FAIR) Team and OpenAI. Natural language processing is a major area in which unsupervised learning systems have been used with success. Search the web, and you'll find potential applications of unsupervised learning in other areas, including identifying Internet spam, intrusion detection in unpredictable environments, and functional genomics.

At present, supervised learning methods still rule on accuracy in cases of pinpointing abnormalities, though unsupervised learning methods are more flexible in their ability to interpret raw data. A breakthrough in understanding deep learning processes at a foundational level could further revolutionize the field, vastly improving accuracy in interpretation of information and opening up opportunities for future applications that we may not even be able to imagine today.