MIT-based researcher Rick Young is one of the world's top molecular biologists. His laboratory at the Whitehead Institute for Biomedical Research has helped define many of the key principles of how gene expression is regulated, especially in stem cells and cancer cells. At this month's symposium organized by the International Society for Stem Cell Research (ISSCR), Rick presented some very provocative data that is bound to result in controversial discussions about how researchers should assess gene expression.
While the DNA in cells encodes for somewhere between 20,000 and 35,000 genes, only a fraction of these genes are functionally active in any given cell type. Based on the current understanding of cells, the function of a cell is primarily regulated by the genes that are being actively expressed (i.e., when RNA is being generated from selected genes and these RNA molecules are then translated into functionally active proteins). Distinct cells in a person, such as neurons or heart cells, share the same DNA, but they have very different sets of genes that are actively "expressed" by being transcribed into RNA. To determine which genes encoded in the DNA are active in a given cell type, molecular biologists therefore routinely measure the number of RNA copies of the genes. It has become very common for molecular biology laboratories to use global gene expression analyses to understand the molecular signature of a cell. These global analyses measure the gene expression of thousands of genes in a single experiment, by extracting the RNA from cells being studied. The comparison of the gene expression profiles of different groups of cells, such as cancer cells and their healthy counterparts, has uncovered many important new mechanisms of gene regulation. The Gene Expression Omnibus is a public repository for the huge amount of molecular information that has been generated. So far, more than 800,000 samples have been analyzed, covering the gene expression in a vast array of organisms and disease states.
Rick himself has extensively used such expression analyses to characterize cancer cells and stem cells, but at the ISSCR symposium he showed that most of these analyses are based on the erroneous assumption that the total RNA content in cells remains constant. When the gene expression in cancer cells is compared with that of healthy non-cancer cells, the analysis is routinely performed by normalizing or standardizing the RNA content. The same amount of RNA from cancer cells and non-cancer cells is obtained, and the global analyses are able to detect relative differences in gene expression. However, a problem arises when one cell type is generating far more RNA than the cell type it is being compared with.
In a paper published in the journal Cell entitled "Revisiting Global Gene Expression Analysis," Rick Young and his colleagues discuss their recent discovery that the cancer-linked gene regulator c-Myc increases total gene expression by two- to threefold. Cells expressing the c-Myc gene therefore contain far more total RNA than cells that don't express it. This means that most genes will be expressed at substantially higher levels in the c-Myc cells. However, if one were to perform a traditional gene expression analysis comparing c-Myc cells with cells without c-Myc, one would "control" for these differences in RNA amount by using the same amount of RNA for both cell types. This traditional standardization makes a lot of sense; after all, how would one be able to compare the gene expression profile in the two samples if one did not use the same amount of total RNA? The problem with this common-sense standardization is that it misses out on global shifts of gene expression, such as those initiated by potent regulators such as c-Myc. According to Rick Young, one answer to the problem is to include an additional control by "spiking" the samples with defined amounts of known RNA. This additional control would allow the analysis of an absolute change in gene expression, in addition to the relative changes that current gene analyses can detect.
In some ways this seems like a minor technical point, but I think that it actually points to a very central problem in how we perform gene expression analysis, as well as many other assays in cell biology and molecular biology. One is easily tempted to use exciting, large-scale analyses to study the genome, epigenome, proteome or phenome of cells. These high-tech analyses generate mountains of data, and we spend an inordinate amount of time trying to make sense of the data. However, we sometimes forget to question the very basic assumptions that we have made. My mentor Till Roenneberg taught me how important it is to use the right controls in every experiment. The key word here is "right" controls, because merely including controls without thinking about their appropriateness is not sufficient. I think that Rick Young's work is an important reminder for all of us to continuously reevaluate the assumptions we make, because such a reevaluation is a prerequisite for good research practice.
An earlier version of this article was posted on the Scilogs stem cells blog The Next Regeneration.