THE BLOG
11/14/2016 04:38 pm ET Updated Nov 15, 2017

The Impact of Machine Learning on Healthcare

What according to you are the use of machine learning and data science in the health-care? originally appeared on Quora - the knowledge sharing network where compelling questions are answered by people with unique insights.

Answer by Sebastian Raschka, Author of Python Machine Learning. Computational Biology PhD candidate at MSU, on Quora.

I am not working in the health-care field, but I met several people who are working at the intersection between machine learning and health-care. For example, the Mias lab in our department. (G. Mias Lab) focusses on collecting and integrating omics data from various online databases and resources to predict risk factors for certain diseases. And Samantha Kleinberg, Author of Why, is doing remarkable research, applying and developing various statistical modeling techniques related to health care (Samantha Kleinberg).

Looking at the biomedical literature, I think that the classic approach for characterizing the function of a particular protein or gene is to look at it in isolation (knocking it out or overexpressing) to link it to a certain phenotype. This bottom-up approach is certainly necessary to identify the key players related to health. However, a gene or protein is by nature only a small part of a bigger, more complex system, and I believe that pooling the information from different sources of experiments and devices could provide us with potentially useful features towards understanding this big picture, to make advances in health-care. In particular, I am thinking of monitoring different risk factors over time. If we can do that efficiently, I believe that the health-care-related benefits can be enormous.

I'd say that one of our goals is to catch health-related issues early, before they become real problems, for instance, tracking the risk factors that are related to diabetes before a person actually develops diabetes. Developing better treatments for people who have diabetes is really important, but if we better understand which combination of circumstances increases the risk of diabetes, we can potentially help a large proportion of people from actually developing this disease. Without having done any research in this field, I would say that integrating information such as family history, gene expression levels, age, shopping behavior, exercise regimens, and so forth can help towards detecting high risks early.

We are collecting more and more data, which is partly made available for research in anonymized form, and it seems that plugging it into machine learning algorithms to build predictive models should be a piece of cake. However, one of the main challenges is that this data is highly heterogenous, and cleaning and combining data from different databases is probably the bottle neck. Also, there's, of course, the privacy concern -- data is anonymized in a way that makes it hard to link the features from different datasets. However, I believe that companies such as Apple are working on solutions to track data anonymously on electronic devices such as smartphones.

Now, I think that working on a good solution to make all this personal data available to researchers in a useful yet anonymous way is the first step to building better health-problem detection systems. I believe that once this problem is solved, it could pave the way to personal warning systems, which could combine data such as shopping behavior, daily exercise and diet info, maybe personal genomes and occasional blood tests.

This question originally appeared on Quora. - the knowledge sharing network where compelling questions are answered by people with unique insights. You can follow Quora on Twitter, Facebook, and Google+.

More questions:​