Huffpost Science
THE BLOG

Featuring fresh takes and real-time analysis from HuffPost's signature lineup of contributors

James A. Shapiro Headshot

How Natural Genetic Engineering Solves Problems in Protein Evolution

Posted: Updated:

A good way to see how natural genetic engineering facilitates the evolutionary process is to review what we have learned about protein evolution. Many of the links in this posting will be to articles from Scientific American, which should be easier for non-biologists to understand.

In the 1940s, the link between genetic information and proteins was established by the "one gene-one enzyme" hypothesis of Beadle and Tatum. This link was the basis for working out the "genetic code" for amino acids. The Central Dogma of Molecular Biology postulated that the two main tasks of DNA sequences were to encode their own replication and the amino acid sequences in proteins. The encoded proteins would determine the characters of cells and organisms.

When this view of DNA function was translated into conventional evolutionary thinking, random mutations were seen as copying errors that altered the DNA sequence one base-pair at a time and, consequently, protein sequences one amino acid at a time. This idea fit with the neo-Darwinian notion of gradual accidental change and provided a molecular picture of how proteins, the working molecules of the cell, could evolve new structures and functions.

A 1985 Scientific American article by Allan Wilson entitled "The Molecular Basis of Evolution" summarizes this one-amino-acid-at-a-time view of protein evolution. Countless evolutionary biology papers have been written based on measuring the frequencies of base changes in DNA coding sequences that either altered the encoded amino acid (non-synonymous mutations) or left it the same (synonymous mutations).

The fact that single amino acid changes could not easily account for many aspects of protein evolution was generally ignored. For example, how proteins changed their size, formed completely novel structures, or combined the capacities to bind multiple different molecules were difficult to account for on the basis of successive single amino acid substitutions. These difficulties were even used by Intelligent Design advocates to argue that proteins could not evolve by natural means.

Our understanding of protein evolution changed in an unexpected and revolutionary way in the 1970s by two major genomic discoveries. One observation was that many protein coding sequences are not continuous but are broken up into separate pieces called "exons" (expressed sequences) and "introns" (intervening sequences). This gave rise to the hypothesis that different exons could be combined together to make novel proteins.

The second discovery was that proteins are not organized simply as strings of individual amino acids. Certain subprotein regions, called "domains" and composed of dozens to hundreds of amino acids, were found to be present repeatedly. Sometimes the domains were repeated in the same protein, and generally they were found to be present in many different proteins. This modular organization as series of domains was a radically new way to think about protein structure. It fit beautifully with the idea of proteins encoded by different combinations of exons.

By laboratory genetic engineering, it was possible to add domains together in new combinations. The results showed that the domains retained their functions when combined in new ways. Thus, they were truly independent modules capable of rearrangement.

Genome sequence data has abundantly confirmed that domain recombination has been a major source of functional novelty in evolution. The report of the draft human genome in 2001 contained two separate figures illustrating protein evolution by "domain accretion" and "domain shuffling." Protein evolution by domain rearrangement is so much the norm that major data bases of domains (rather than whole proteins) are often the first places bioinformaticians look for clues about the functional significance of new coding sequences.

If we think about how domain accretion and shuffling must happen naturally, we realize that they have to involve cutting and splicing extended DNA sequences encoding the multiple amino acids in each domain. In other words, protein evolution requires natural genetic engineering.

Fortunately, we know how different extended sequences can be spliced together by several different mechanisms in living cells. Some of these mechanisms operate purely at the DNA level, while others involve RNA that is reverse-transcribed to DNA and inserted in the genome. Both real-time experiments and genome sequence data tells us that both kinds of mechanism can and have played major roles in domain rearrangements during evolutionary history.

From the point of view of generating novelty, the ability to combine different pre-existing functional modules in new combinations is far more efficient than taking a slow random walk through protein sequence space one amino acid at a time. Domain combinations arise in a single cell generation; so the process is far more rapid than accumulating single amino acid changes.

Since the modules each have functionality, it is likely that new combinations will maintain some functionality as well, whereas many amino acid changes will either make no difference or prove detrimental. In addition, there are some functional protein structures that cannot be reached from other structures by changing individual amino acids.

Naturally, the new combined functionalities have to be useful to the cell or organism to survive selection. But combinatorial mathematics tells us that a rather small number of individual domains can generate a very large number of distinct combinations. Moreover, as we shall see in a future posting, natural genetic engineering provides the means to generate new domains that have never existed before. So there is no apparent limit to the versatility of natural genetic engineering.

When I pointed out the potential of domain shuffling by natural genetic engineering to Intelligent Design advocates who claimed protein evolution by natural mechanisms was impossible, they refused to recognize genomic data as irrefutable evidence and insisted on real-time experiments. I disagree with them strongly on the DNA sequence data.

But I do agree that it would be good to have experiments revealing more about the natural capacity for protein evolution. The superiority of domain shuffling has been demonstrated in biotechnology manipulations. Now is the time for geneticists to explore the in vivo potential of natural genetic engineering in a more adventurous fashion (about which more later).