Dr. Rob Dillon, Coordinator

Tuesday, July 15, 2008

Gene Trees and Species Trees

I’ve just returned from the 2008 meeting of the American Malacological Society in Carbondale, where sometimes our science moved forward, and sometimes it seemed as though we were fighting to hold it back. Confusion over molecular phylogenetic techniques - what they can and cannot tell us about evolution - seems to be pervasive in our discipline, and may be growing. During the discussion that concluded the final symposium of the meeting, “Describing Mollusk Species in the 21st Century,” one of our colleagues went so far as to venture, “We all agree that gene trees are the same as species trees, right?” And mine may have been the only voice, of perhaps 100 present, raised in protest.

Our collective confusion seems to stem, at least partly, from a misunderstanding of the process known as “coalescence.” Though the entire week in Carbondale, as would be typical for any meeting of systematic biologists, speakers presented evolution as a branching process, unfolding from the past to the present as a series of random bifurcations to an often mind-numbingly large number of tip sequences sampled today. In 1982, however, J. F. C. Kingman (1) opened a fertile field of theoretical inquiry when he became the first to model evolution from the present to the past, as a random process of binary joining. Kingman called the particular mathematical process involved the "n-coalescent."

Kingman’s initial coalescent model was simple genetic drift viewed backward in time. Assuming a constant population size, random mating, and no selection, he showed that it is possible to describe the probability distributions of the genealogical trees of all the alleles in a population, and the time it would take them to coalesce into a most recent common ancestor. In the last 25 years, theoreticians have developed Kingman's model and explored all three of its primary assumptions in great detail.

The effects of population subdivision on the coalescent process, for example, are vividly demonstrated by the results of W. B. Jennings & S. V. Edwards, published in the September 2005 issue of Evolution (2). This work initially escaped my attention, and perhaps the attention of many of our colleagues, because (Shame on me!) the research deals with birds, the opposite of mollusks.

Overcoming their poor choice of study organism, however, Jennings & Edwards reviewed a variety of observations from historic biogeography, morphology, and behavior to suggest that two Australian species of grass finch, Poephila acuticauda and P. hecki, are sister species, diverged more recently from each other than from Poephila cincta. The authors then sequenced single copies of genes from 30 anonymous nuclear loci for the three species, and obtained the expected acuticauda/hecki sister relationship in 16 cases. Their sequence data suggested that acuticauda and cincta were sister species in 7 cases, hecki and cincta were sister species in 5 cases, and yielded ambiguous results for the remaining two genes, for an overall success of 16/30 = 53%.

My attention was called to these results as I was struggling through Chapter 5 of John Wakeley’s new book, “Coalescent Theory, An Introduction" (3). Quoting Wakeley directly (p 164), “A fundamental realization is that gene genealogies (often called gene trees) are not identical to phylogenies, or species trees.” The probability that the two types of trees are discordant is a function of the internode time, T. This is not the total time since the origin of the taxa being examined, nor the time since they diverged, but the internodal time during which they were diverging (Wakeley's Figure 5.4, at left).

The phenomenon has been termed "incomplete lineage sorting," and has been considered a problem by phylogenetic systematists (4). But evolutionary scientists consider such sequence polymorphism a potential source of important data. Given a discordance of 12/30 and a generation time of one year, Jennings & Edwards estimated T from the divergence of the outgroup P. cincta to the divergence of the sister species acuticauda and hecki to be about 300,000 years.

At least as important as an understanding the difference between a gene tree and a phylogeny, however, is an understanding of the difference between a phylogeny and a model of biological speciation. Ornithologists did not recognize the three sets of finch populations by the specific nomena acuticauda, hecki, and cincta because of any of the 30 genes studied by Jennings & Edwards. Rather, the species were described on the basis of the plumage, song, and other behaviors by which the birds distinguish each other. Traits such as these are the result of natural selection.

Evolutionary science was born, 150 years ago, when a well-studied gentleman from England proposed that speciation might be the result of natural selection. But the assumptions underlying the 30 gene trees made by Jennings & Edwards, and indeed the assumptions underlying every phylogenetic tree shown by every researcher at the AMS meeting in Carbondale, were strictly neutral. So a statement to the effect that “we all agree that gene trees are the same as species trees” is equivalent to saying, “None of us here believes that Darwin foolishness, do we?”

I would suggest that gene trees of the sort typically on display at our recent meeting are best understood as weak, null hypotheses of population relationships. Even in this era of rapid and cheap sequencing, we malacologists do not typically examine multiple individuals per population, or multiple populations per species. And if we sequence more than one gene per individual, we typically concatenate our sequences into a single analysis. Thus our inference regarding the actual evolutionary relationships between the populations represented at the tips of our gene trees is very, very weak.

But given a gene tree, and heaven knows we were given a lot of them in Carbondale, one might well test to see if it corresponds to the species tree. Under any model, neutral or otherwise, it might or it might not. The recent literature includes several papers reporting striking discordance between gene trees and species trees in groups of freshwater snails (5 - 8). On the other hand, however, Chuck Lydeard, Amy Wethington, and I have found fairly close correspondence between gene trees (CO1 and 16S) and species trees (estimated from experiments measuring both prezygotic and postzygotic reproductive isolation) in the freshwater pulmonate family Physidae (9).

So in conclusion, I object to the statement, “We all agree that gene trees are the same as species trees” for three reasons – it is incorrect, wrong, and bad. It is incorrect under the neutral model, since it neglects the error associated with incomplete lineage sorting, and it has been demonstrated wrong by 150 years of observations on the importance of selection in the speciation process. And it is bad because it’s a science-stopper. The relationship between any particular gene tree and any set of natural populations is a fertile area of inquiry; one which I hope will see renewed interest in the future.

PS - From Kevin Roe:
From: kjroe@iastate.edu
Sent: Thursday, July 17, 2008 11:24 AM
To: Strong, Ellen; Kevin J. Roe
Cc: fwgna@hotmail.com; Dillon Jr., Robert T.
Subject: Re: FW: Gene Trees and Species Trees

To the list members: Although I am not a member of the FWGNA group I want to clarify something, namely the question posed at the AMS meeting in Carbondale that prompted Rob to alert the community to the process of lineage sorting etc.The question I asked was: "Does everyone here agree that gene trees equal species trees?" because to me that was the implication of where the discussion was going. Personally, I do not think that gene trees necessarily equal speciestrees and therefore am concerned about the use of barcoding for species delineation (intentionally or unintentionally).

Kevin J. Roe
Natural Resource Ecology & Management
Iowa State University
339 Science II
Ames, IA 5011-3221


(1) Kingman, J. F. C. (2000) Origins of the coalescent: 1974 - 1982. Genetics 156: 1461-1463.

(2) Jennings, W. B. & Edwards, S. V. (2005) Speciational history of Australian grass finches (Peophila) inferred from thirty gene trees. Evolution 59: 2033-2047.

(3) Wakeley, J. (2008) Coalescent Theory, An Introduction. Roberts & Company, Greenwood Village, CO. 326 pp.

(4) Maddison, W. P. 1997. Gene trees in species trees. Systematic Biology 46:523–536.

(5) Dillon, R. T., Jr. & R. C. Frankis (2004) High levels of mitochondrial DNA sequence divergence in isolated populations of freshwater snails of the genus Goniobasis. Am. Malac. Bull. 19: 69-77.

(6) Lee, T., H. C. Hong, J. J. Kim and D. O’Foighil (2007) Phylogenetic incongruence involving nuclear and mitochondrial markers in Korean populations of the freshwater snail genus Semisulcospira (Cerithioidea: Pleuroceridae). Molec. Phylog. Evol. 43: 386-397. See my essay of February '08 for more.

(7) Walther, A., T. Lee, J. B. Burch, and D. Ó Foighil (2006) E Pluribus Unum: A phylogenetic and phylogeographic reassessment of Laevapex (Pulmonata: Ancylidae), a North American genus of freshwater limpets. Molecular Phylogenetics and Evolution, 40: 501-516. See my essay of July '07 for more.

(8) Dillon, R. T., Jr. & J. D. Robinson (in press) The snails the dinosaurs saw: are the pleurocerid populations of the Older Appalachians a relict of the Paleozoic Era? JNABS.

(9) The manuscript detailing most of our observations on reproductive isolation is still in preparation. But the gene tree has been published: Wethington, A.R. & C. Lydeard (2007) A molecular phylogeny of Physidae (Gastropoda: Basommatophora) based on mitochondrial DNA sequences. Journal of Molluscan Studies 73: 241 - 257. See my essay of October '07 for more.