Bob Dylan, ENCODE and Evolutionary Theory: The Times They Are A-Changin'

09/12/2012 04:59 pm ET | Updated Nov 12, 2012
  • James A. Shapiro Author, 'Evolution: A View from the 21st Century'; Professor of Microbiology, University of Chicago

"And don't criticize
What you can't understand"

Last week, the ENCODE project (ENCyclopedia Of Dna Elements) released a tremendous amount of new information about our genomes. The results of literally hundreds of millions of experiments using the most current "high throughput" technologies provided the data for over a dozen scientific papers in the journals Nature and Genome Research. The conclusions about organization and expression of the human genome were so significant that they were the topic of a front-page story in The New York Times.

The massive collaborative project examined how our genomes are copied into RNA, interact with regulatory proteins, and are compacted in chromatin, which organizes the genome for cellular differentiation. ENCODE examined DNA from dozens of cell types to find out if the results changed in specific ways from one kind of cell to another. Cell type specificity provides a strong indication that the data are biologically relevant.

ENCODE described their most striking finding as follows:

"One of the more remarkable findings described in the consortium's 'entrée' paper is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly 'junk DNA'. The authors report that the space between genes is filled with enhancers (regulatory DNA elements), promoters (the sites at which DNA's transcription into RNA is initiated) and numerous previously overlooked regions that encode RNA transcripts that are not translated into proteins but might have regulatory roles. Of note, these results show that many DNA variants previously correlated with certain diseases lie within or very near non-coding functional DNA elements, providing new leads for linking genetic variation and disease."

In other words, the old idea of the genome as a string of genes interspersed with unimportant noncoding DNA is no longer tenable. Many eminent scientists had opined that the noncoding DNA, much of it repeated at many different locations, is nothing more than "junk DNA." ENCODE revealed that most (and probably just about all) of this noncoding and repetitive DNA contained essential regulatory information. Moreover, much of it was also copied into RNA with additional but still unknown functions.

I had a longstanding, personal interest in the repetitive part of our genomes (up to as much as two-thirds of all our DNA) because it is composed of mobile genetic elements. I first discovered these elements in bacteria in my thesis research in 1968. I remember being scientifically offended by a 1980 article from Francis Crick and Leslie Orgel describing this DNA as "selfish" and functionless.

My interest in the roles of repetitive and mobile DNA has continued since my thesis more than four decades ago. The initial sequencing of the human genome in 2001 found over 40% to be mobile repeats spread throughout our genomes, thirty times more than protein-coding DNA.

In 2005, I published two articles on the functional importance of repetitive DNA with Rick von Sternberg. The major article was entitled "Why repetitive DNA is essential to genome function."

These articles with Rick are important to me (and to this blog) for two reasons. The first is that shortly after we submitted them, Rick became a momentary celebrity of the Intelligent Design movement. Critics have taken my co-authorship with Rick as an excuse for "guilt-by-association" claims that I have some ID or Creationist agenda, an allegation with no basis in anything I have written.

The second reason the two articles with Rick are important is because they were, frankly, prescient, anticipating the recent ENCODE results. Our basic idea was that the genome is a highly sophisticated information storage organelle. Just like electronic data storage devices, the genome must be highly formatted by generic (i.e. repeated) signals that make it possible to access the stored information when and where it will be useful.

The abstract of our paper tells the story:

"ABSTRACT: There are clear theoretical reasons and many well-documented examples which show that repetitive DNA is essential for genome function. Generic repeated signals in the DNA are necessary to format expression of unique coding sequence files and to organise additional functions essential for genome replication and accurate transmission to progeny cells. Repetitive DNA sequence elements are also fundamental to the cooperative molecular interactions forming nucleoprotein complexes. Here, we review the surprising abundance of repetitive DNA in many genomes, describe its structural diversity, and discuss dozens of cases where the functional importance of repetitive elements has been studied in molecular detail. In particular, the fact that repeat elements serve either as initiators or boundaries for heterochromatin domains and provide a significant fraction of scaffolding/matrix attachment regions (S/MARs) suggests that the repetitive component of the genome plays a major architectonic role in higher order physical structuring. Employing an information science model, the 'functionalist ' perspective on repetitive DNA leads to new ways of thinking about the systemic organisation of cellular genomes and provides several novel possibilities involving repeat elements in evolutionarily significant genome reorganisation. These ideas may facilitate the interpretation of comparisons between sequenced genomes, where the repetitive DNA component is often greater than the coding sequence component."

Although we could not predict in detail all the ways repeated DNA would serve genome functions, I think our statements stand up well in light of the recent data. Without knowing the specifics, we were correct in asserting that the genome had to be highly formatted to serve as the marvelous information organelle it is in every living cell and organism.

So, while Rick's choice of evolutionary philosophies is different from mine, I am grateful to him for doing so much work on a paper that remains a source of justified scientific pride. Thinking of the genome informatically and of mobile DNA as a potent force for genome organization are central to the arguments presented on this blog and in my book.