The Value of ENCODE

09/24/2012 09:41 am ET | Updated Nov 24, 2012

Aren't we humans marvellous, with our trillions of cells, our long life spans, complex organs and higher cognitive functions? How lovely to look down on the lower creatures, like a humble microscopic worm called C.elegans. Poor old C.elegans, with its measly 1024 cells, pitiful little organs and limited brain power. You might as well compare a single Enid Blyton story with the complete works of Shakespeare. You'll never find a worm doing the fantastic things we can do, from creating the rules of ping-pong to sequencing the entire three billion letter alphabet of the human genome. Yet it was that very achievement, just over a decade ago, that made scientists feel a little less smug about our status as superior organisms.

It had long been assumed that we humans are top of the evolutionary tree in complexity because we possess the most complicated DNA blueprint (our genome). And it's certainly true that each of our cells contains a lot more DNA than that of poor old C.elegans. But the unexpected finding from the 2001 sequence release was that we and the little worm have almost exactly the same number, and same types of genes (a gene being defined as a stretch of DNA that codes for a protein, the business molecules in cells). We and the worm each possess around 21,000 genes. The only reason our genome is so much bigger is because we have loads of DNA that doesn't code for proteins. In fact, a whopping 98 percent of our genome is this non-coding DNA, more cruelly referred to as "junk."

But as the old Victorian expression about finding value in rubbish has it, there's brass in muck. A huge consortium of scientists has spent the last six years sifting through this supposed garbage tip of DNA and now the ENCODE (Encyclopedia of DNA Elements) project is starting to find the vast number of tiny flecks of gold and gemstone fragments that account for the overwhelming richness of the human organism.

Imagine you are standing on the top of one of the Himalayan peaks, of about 25,000 feet, on a day with cloud cover up to 20,000 feet. All you would see would be the other mountain peaks. That's the situation we were in when we first decoded the human genome and saw only the genes. Now, with ENCODE, the cloud has partially cleared and we can see not just how the mountains are linked together, but details of the valleys, the bridges, the forests, the villages and much more besides.

The junk DNA is actually crammed full of information. It's as if Shakespeare wrote fifty lines of stage direction for every line of dialogue. And what stage directions. Forget about "Exit, pursued by a bear." These would be more along the lines of "If performing Hamlet in Vancouver and The Tempest in Perth, then put the stress on the forth syllable of this line of Macbeth. Unless there's an amateur production of Richard III in Mombasa and it's raining in Quito."

Similarly, the various bits of junk DNA are saturated with such instructions. These act like volume controllers, governing how strongly individual genes are expressed in specific cells at different times, and how they respond if the environment changes. The human genome employs a huge variety of molecules and mechanisms to make the greatest use of this additional level of information. By contrast, there is very little of this extra complexity in the C.elegans genome and the tiny worm is stuck with a relatively rigid pattern of gene expression and is far more hard-wired than us. In visual terms, both we and the worm have a a palette of primary colours but the lower organisms aren't able to mix them to create hundreds of different shades.

The differences extend all the way through the kingdom of life. Humans and chimps share in the region of 99 percent of the genetic instruction book, but yet again we humans are able to use our information in a greater and more sophisticated variety of ways. This is particularly the case in the brain, and may be the explanation for why we are the smartest of all the apes.

Of course no system is perfect, and especially one that is designed to confer a lot of flexibility. Ever since the human genome was sequenced, scientists have been puzzled by really odd research findings. When they have searched the genome to find regions that are associated with risks of diseases in humans, from multiple sclerosis to type two diabetes, they were bemused to discover that the suspect regions weren't in genes. The findings from ENCODE are starting to show that many of these candidates for disease susceptibility are in parts of the junk DNA that control expression of key genes in relevant cell types.

It's not clear yet if the new data will help us treat human disease. Each individual component of the vast regulatory networks may in isolation only have a relatively minor effect. Creating a drug to target just one regulator may have as little impact as removing one small brick from a large well-designed house. But the more we understand the regulatory networks involved in health and disease, the more chance we have of identifying the key vulnerable pinch points in our organ systems, and this may drive new treatment approaches.

ENCODE has cost nearly $200 million and involved hundreds of scientists so far, and is by no means finished. Some scientists are concerned it could start to consume a disproportionate amount of funding. After all, there will always be more cell types to analyse, more details to explore and new questions to ask. To go back to the mountains analogy, do we need to see the houses in the village or the rice in the pots in the houses? But after such a spectacularly successful start, it's unlikely that the foot will come off the gas pedal just yet.