11/30/2012 04:15 pm ET | Updated Jan 30, 2013

Why the 'Gene' Concept Holds Back Evolutionary Thinking

Wendell Read just sent me notice of a new paper in Genome Biology linking transposable elements, long intergenic non-coding RNAs, and cell type. This paper highlights difficulties in understanding genome evolution using the conventional idea of "genes."

In the early days of genetics, following the rediscovery of Mendel's principles in 1900, there was a furious effort to define the nature of Mendel's regularly segregating "factors" (a properly neutral term). In 1909, the Danish botanist Wilhelm Johannsen coined the term "gene" ("gen" in German) to denote a fundamental unit of heredity.

Over the following decades, genes took on a theoretical life all their own, as described in Evelyn Fox Keller's 2002 book, The Century of the Gene. In a 1948 Scientific American article, soon-to-be Nobel Laureate George Beadle wrote: "genes are the basic units of all living things."

When Barbara McClintock and Curt Stern demonstrated simultaneously in 1931 that genetic markers of plants (maize) and animals (Drosophila) locate on visible chromosomes, the idea began to crystallize that that the genotype consisted of linear arrays of genes strung along chromosomes like "beads on a string."

This notion of the genome as a collection of discrete gene units prevailed when the neo-Darwinian "Modern Synthesis" emerged in the pre-DNA 1940s. Some prominent theorists even proposed that evolution could be defined simply as a change over time in the frequencies of different gene forms in a population.

The identification of DNA as the key molecule of heredity and Crick's Central Dogma of Molecule Biology initially seemed to confirm Beadle and Tatum's "one gene -- one enzyme" hypothesis. However, molecular genetics quickly introduced difficulties with the theory of atomistic genes aligned like beads on a string.

A major challenge was Britten and Kohne's1968 discovery of massive amounts of repetitive DNA in certain genomes. Today, we know our DNA contains over 30 times as many base-pairs in repeats as it does in protein coding sequences. By the conventional view, if genes are the only important actors, then these surprisingly abundant "intergenic" repeats must constitute "junk DNA" and be "ultimate parasites" in the genome.

As readers of this and other science blogs know well, the junk DNA idea has been challenged by the large-scale ENCODE project, designed to produce the "Encyclopedia Of DNA Elements" independently of theoretical prejudices. In its first few years, ENCODE has documented cell type-specific biochemical activity in over 80 percent of this repetitive DNA and known functions in 20 percent.

The basic issue is that molecular genetics has made it impossible to provide a consistent, or even useful, definition of the term "gene." In March 2009, I attended a workshop at the Santa Fe Institute entitled "Complexity of the Gene Concept." Although we had a lot of smart people around the table, we failed as a group to agree on a clear meaning for the term.

The modern concept of the genome has no basic units. It has literally become "systems all the way down." There are piecemeal coding sequences, expression signals, splicing signals, regulatory signals, epigenetic formatting signals, and many other "DNA elements" (to use the neutral ENCODE terminology) that participate in the multiple functions involved in genome expression, replication, transmission, repair and evolution.

Various combinations of coding sequences and signals operate dynamically to produce multiple RNA and protein molecules from a single stretch of DNA. Different regions of RNA and DNA join together to encode protein products. Distant sites in the genome cooperate to control genome expression and replication. Every cellular and organism trait is "determined" by molecules encoded at numerous genome locations.

A particularly important novelty highlighted by the Genome Biology paper is the unexpected and burgeoning role of so-called "non-coding" RNAs (ncRNAs) in all aspects of genome function. Cells transcribe many functional ncRNAs from so-called "intergenic" regions that had no functional importance according to the genocentric theory.

From an EVO-DEVO point of view, it is important to note that many morphogenetic changes in evolution occur at regulatory sites rather than coding sequences. Moreover, we continue to discover how many of these changes occur "intergenically" and involve supposedly "selfish" mobile elements:

"Of the ~1.1 million constrained elements that arose during the 90 million years between the divergence from marsupials and the eutherian radiation, we can trace 19 percent to mobile element exaptations."

One of the major limitations of the genocentric view is that it does not readily account for the evolution of genomic networks encoding complex traits. Using both the combinatorial view of genome system architecture and the ability of mobile elements to distribute regulatory motifs in bursts, network evolution becomes far easier to explain.

Conventional thinkers may claim that molecular data only add details to a well-established evolutionary paradigm. But the diehard defenders of orthodoxy in evolutionary biology are grievously mistaken in their stubbornness. DNA and molecular genetics have brought us to a fundamentally new conceptual understanding of genomes, how they are organized and how they function.

Shortly before he passed away, Kurt Vonnegut told a radio interviewer that the public senses something amiss with what they have been told about evolution. Maybe the new, high-tech understanding of genomes will help reverse the disastrously low level in the U.S. of public understanding of evolutionary biology.

NCSE, please take note!