Proving Darwin: Fun with Endogenous Retroviruses!
This post is part of a larger series on evolution called, click here to see the index of all the posts.
I used to be a skeptic of evolution. When I first started reading about the issue several years ago, I was intrigued by some of the evidence I found for change over time, and absolutely amazed at all the evolutionary changes that had been observed in the lab and in the wild, mainly because I never knew that any evolution had ever been observed. I was reluctant to believe that humans and chimps had evolved from a common ancestor millions of years ago without an absolute proof, or at least without a piece of evidence strong enough to be a 99.99999% proof. This was in no small part because (1) I thought that if I was wrong about evolution I might burn in hell, and didn’t want to take such a chance if it was risky, and (2) I was still in the process of leaving behind the black-and-white, absolutist worldview of my fundamentalist upbringing. One day, while reading the 29 Evidences for Macroevolution, I stumbled upon a piece of evidence so powerful that it put the question of creation vs. evolution beyond all reasonable doubt, even for my somewhat unreasonable standards: the evidence from endogenous retroviruses.
Endogenous retroviruses are just that: viruses. They infect humans. They infect other species. But they have a trick up their sleeve: when they infect a living thing, occasionally they insert their DNA inside of the host’s DNA! When a retrovirus does this to a sperm or an egg, the retrovirus will appear in the DNA of the son or daughter that develops from it. When that child grows up and has its own children, its children inherit the endogenous retrovirus, and they pass it on to their children, and they pass it on to their children, and so on down the line.
Now here’s the really interesting part, the part you have to pay attention to. Do you know what happens when an endogenous retrovirus (hereafter abbreviated ‘ERV’) infects two different individuals of the same species? The endogenous retrovirus ends up in a different part of the genome (DNA code) of each one! To illustrate this, let’s say that before the ERV inserted itself, the genome looked like this:
[Gene 1] [Gene 2] [Gene 3] [Gene 4] [Gene 5]
And let’s say that after the ERV got in there, it looked like this:
[Gene 1] [Gene 2] [Gene 3] [ERV] [Gene 4] [Gene 5]
Because of the way that the ERV tends to just randomly throw itself into the genome, a separate ERV infection in another individual would look like this:
[Gene 1] [ERV] [Gene 2] [Gene 3] [Gene 4] [Gene 5]
I want to tell a story about this that will make it easy to understand, so let’s call the individual with the ERV between genes 3 and 4 “Bob” and the individual with the ERV between genes 1 and 2 “Ryan.” All of Bob’s kids, grandkids, and great grandkids are going to inherit his ERV, and they will inherit it between genes 3 and 4. All of Ryan’s grandkids will inherit the ERV between genes 1 and 2. If we look at future generations of the species that Bob and Ryan belong to (whether we imagine them as human, kangaroos, crocodiles, whatever) we will be able to tell which ones are descendants of Bob and which ones are descended from Ryan based on whether they have the ERV and what place it’s in in the genome (between genes 3 and 4 = related to Bob, between genes 1 and 2 = related to Ryan). In fact, in the real world we can identify relationships with surgical precision this way, because ERV insertion doesn’t happen everyday: it’s a very rare event. The human genome has between thirty and thirty five thousand genes (and most other plants and animals have similarly long genomes, containing many thousands of genes at the least) and so the odds of two different individuals ending up with the same ERV inserting into the same place in their genome is very low, to say the least. The extremely low probability of this happening is what makes it such a good way to tell when two individuals descended from a common ancestor.
I must emphasize that this story is not just a story: ERVs really do work this way; direct observation has proven that ERVs insert themselves into the genome at random and that ERVs are inherited. Some creationists claim otherwise, but a careful reading of the peer-reviewed research on this topic shows otherwise (The papers cited by Blogger Abbie Smith are especially worth looking at, and she masterfully summarizes what these papers say in plain English).
Various breeds of sheep are thought to have been bred from a common ancestor long ago, and there is tons of archaeological evidence that help show the family relationship of these sheep: the breeding of sheep started out in southwest Asia, then people took some of the Asian sheep to Africa and Europe, and then to the rest of Asia. The modern day descendants of these ancient sheep, then, are related to greater-and-lesser degrees depending upon when their ancestors were separated from one another. If ERVs are really a good way to tell family relationships, then the family relationship we construct from their ERVs ought to be exactly the same as the family relationship implied by the archaeological evidence of ancient sheep herders and their migration into various parts of the world. Guess what? That’s exactly what researchers have found (HIV researcher Abbie Smith blogged about these findings here, and you can see the original peer-reviewed paper here).
Humans and chimps have seven known ERVs in common; the same virus inserted in the exact same place in the genome. Seven times. Now this is expected if humans and chimps share a common ancestor, evidence like this is close to 100% likely if they do. After all, it would be really weird if humans and chimps came from a common ancestor, but somehow that ancestor (and all of its ancestors from tens of millions of years back into the past) avoided all contact with ERVs that are so prevalent today (and apparently through many thousands of years in the past, as the sheep studies have shown us).
On the other hand, if human beings don’t share a common ancestor with chimps, how likely is the ERV evidence? Humans have about thirty thousand ERVs in their genomes (and presumably chimps have a similar number) and they share at least seven of these in common with chimps (there may be more that have not been identified yet, but I will assume that these are the only ones just to be generous towards the creationists, because having more than seven would be even deadlier evidence of common ancestry). Let’s assume that all of these ERVs have a ‘preference’ for inserting inside some particular part of the gene, like the promoter, but that which gene they insert into is random (research has found that some, but not all, ERVs have such a ‘preference,’ and if the ERVs shared by humans and chimps did not have such a preference it would make separate ancestry even more unlikely, since the probability of inserting into some particular part of some particular gene is necessarily lower than the probability of inserting into just some particular gene; in other words: the probability of two ERVs both getting into ‘gene 5′ is much lower than the probability of two ERVs both getting exactly in the center of ‘gene 5′). This is fair; Every ERV ever studied has not shown a ‘preference’ for any particular gene, and in fact research has repeatedly shown otherwise, just check a library database or the papers I cited previously.
Anyway, if humans and chimps don’t share a common ancestor, what would we expect? If humans and chimps both contracted the same ERV today, the probability of that ERV inserting into the same gene in both is thirty thousand to one, because there are thirty thousand genes and because the gene the ERV inserts itself into is random. That is to say: if humans and chimps were exposed to the same virus thirty thousand times, we’d expect they’d share one insertion in common due to chance and not ancestry. The human genome has about thirty thousand ERV insertions in it (see references here) and so if common ancestry weren’t true we’d predict that humans and chimps might share one ERV in common. Two would be somewhat unlikely, but possible. But humans and chimps share seven. It is obviously a big stretch to say that this could’ve happened without common ancestry, but exactly how big of a stretch is it? Well, the probability of any particular ERV inserting in the same place twice is one out of thirty thousand, and so the probability of two particular ERVs inserting in the same place is one out of thirty thousand times one out of thirty thousand, and so the probability of seven particular ERVs inserting in the same place is one out of thirty thousand to the seventh power! If we take into account that there are thirty thousand chances for this to happen (since there are about thirty thousand ERVs in the human genome), then the math works out neatly: 30,000 out of 30,000^7. Reducing the math a bit, all this means that the common ERV insertions have only 1 chance in 729,000,000,000,000,000,000,000,000 of occurring if common ancestry is false. And they say evolutionists believe in blind chance!
How do creationists deal with evidence like this? Very poorly. Abbie Smith has already taken care of most of their desperate attempts to deal with this evidence, so I won’t repeat anything she says here. Go read her post. I will take care of two claims that she missed. First, one intelligent design proponent, Cornelius Hunter, has said this:
“[Retroviruses] occasionally violate the evolutionary pattern. Apparently they are not quite such ‘perfect tracers of genealogy.’ To be sure, such outliers are unusual, but if they can be explained [without inheritance] then so can the others…”
This is very revealing. Hunter claims that some ERVs and other genetic markers of ancestry ‘occasionally violate’ evolutionary predictions, but understands that these are ‘outliers’ and are ‘unusual.’ If Hunter was right about even this much, it’d be cold comfort to creationists like him. After all, when the majority of a theory’s predictions are confirmed, it’s much more parsimonious to assume that apparently conflicting evidence is just that: apparent, and that it has some reasonable explanation. Think of it like this: suppose we want to know whether a student, Johnny B, has studied for a multiple choice test. We look at the grade he got on the test to confirm or disconfirm the hypothesis that Johnny studied. Each correct answer adds a little bit of weight to the theory that Johnny B studied, and each wrong answer adds a little bit of weight to the hypothesis that Johnny B did not. If Johnny B comes out with an 90% score, then it is likely that he studied, simply because the majority of the evidence we have (his answers) are better predicted by that hypothesis than by the alternative (that he didn’t study). The 10% of his answers that are incorrect are most likely the result of Johnny forgetting or misunderstanding the question. To argue the reverse, that the 10% of those answers are proof he didn’t study, and that the other 90% are the result of chance, is perverted reasoning that goes against common sense and even basic logic. Yet Hunter wants us to do exactly this.
Worse than that, the one piece of ERV evidence that Hunter claims runs counter to common ancestry is actually completely consistent with it. If you’re interested, there’s a video explaining Hunter’s claim and what’s wrong with it, and it results from a phenomenon known as incomplete lineage sorting (which the video author describes but does not specifically name). A result that could not be explained with incomplete lineage sorting would be an ERV stuck in the same places of widely diverged species but absent amongst more closely related species: like an ERV stuck in the same place in the human and zebrafish genome, but absent from all other mammalian genomes.
Another way that creationists deal with evidence like this is to admit that this is evidence of common ancestry between chimps and humans, but to object that “It doesn’t prove universal common ancestry!” (that is, it doesn’t prove all species are related, just these two). The truth is, though, that ERVs have been used to establish evolutionary relationships among a broad variety of different groups (Douglas Theobald mentions that every member Feline family has been shown to have at least one ERV in common, excluding the ERVs they share with other groups of animals) and mammals have multiple ERVs in common. In fact, Biologist Sean Carroll has written a wonderful book, The Making of the Fittest, detailing how there are many genomic elements that serve a “fingerprint” of common ancestry in the same way that ERVs do. While evidence like this (to my knowledge) does not prove the relatedness of every single living species (though there is other evidence that does support universal common descent), this type of evidence can be used to unite several large groups of plants and animals (See Making of the Fittest).
An intelligent design theorist might allow all species of those groups (such as mammals) naturally descended from a common ancestor, but hold back on concluding that all species are related. This is a logical possibility (as long as we ignore the other genetic evidence that blows it out of the water) but it only serves to highlight the weakness of separate ancestry theories. Universal Common Ancestry is easy to test, since by definition it states that all species are descended from a common ancestor and therefore necessarily entails a broad array of predictions about many relationships. Separate Ancestry, though, is much more difficult to test, simply because you can also adjust it whenever two species (or families, or any other group) are found to be related. Here’s a humorous imaginary conversation with a separate ancestry theorist: “Chimps, Humans, and Gorillas share a common ancestor, you say? Well Ok, but universal common ancestry isn’t true, I don’t think African apes could share a common ancestor with South American monkeys, you haven’t proven my belief in separate ancestry is wrong. New World and Old Primates share a common ancestor? Well, Ok, but all mammals don’t share a common ancestor, you haven’t proven my belief in separate ancestry is wrong.” The disproof of any one of those proposed relationships would have falsified common ancestry, but the verification of all of them doesn’t disprove separate ancestry, and it’s for that reason that separate ancestry is less open to testing and cannot claim to predict the evidence of common ancestry of African apes, mammals, or any group with as much predictive force as common ancestry does. Either it doesn’t predict the evidence as well as universal common ancestry, or it can only be made to fit the evidence of common ancestry if it is specified in advance to allow for certain groups to be related in the cases where we just so happen to have extremely strong evidence that they are. In other words, evidence like what I have been talking either lowers the prior or the evidential probability that separate ancestry is true.