A junkyard for natural selection?
A major paper was just published in Nature. â€œIdentification and analysis of functional elements in 1% of the human genome by the ENCODE pilot projectâ€? was written by a consortium involving contributions from many scientists. I will discuss a few of their more interesting findings, related to questions like "how much of our DNA doing something useful?"
They found that â€œthe majority of its [DNA] bases are associated with at least one primary [RNA] transcript.â€? We know that only a small fraction of our DNA gets translated into protein (by ribosomes using information from messenger RNA), but this result shows that most of it gets turned into something like messenger RNA anyway. Much of the DNA that doesnâ€™t code for protein is repetitive sequences. These could be considered â€œextra selfishâ€? genes, as their main activity seems to be making copies of themselves and inserting the copies randomly. (By â€œrandomlyâ€?, I mean random with respect to the effects on the health of the individual affected. They may not be random in some biochemical sense.)
Self-replicating DNA sequences may sometimes have beneficial side effects for the species, if not for individuals. For example, a repetitive sequence may insert itself somewhere in a way that changes gene regulation, such as which proteins are made when. Usually, a random change will have a negative effect on survival and reproduction â€“ bad for the individual -- but the occasional positive changes will increase by natural selection â€“ good for the species. So repetitive DNA can be useful in the same sense that mutations caused by radiation or chemicals are useful. See this discussion at vwxynot.
What do RNA transcripts that donâ€™t get translated into protein, but aren't limited to self-replication, do? This is an active area of research. Some serve important functions, including regulation of other genes. In some cases, RNA that doesnâ€™t code for protein but serves other functions may be a remnant from the hypothetical â€œRNA worldâ€?, where RNA once served as both genetic material (a role now played by DNA) and as enzymes speeding chemical reactions (a role now played mainly by proteins). For example, much of a ribosome is RNA.
Of course, if some DNA that doesnâ€™t code for protein turns out to play an important function, that doesnâ€™t disprove the hypothesis that most of it is â€œjunk." By junk, I mean that you can change most of its sequence without adverse effects. If, on the other hand, it turns out that very little of our DNA is junk, then one of my favorite examples of â€œstupid designâ€? would fall, although there are many others. (We don't expect perfection from natural selection, in contrast to a hypothetical omniscient designer. It might be harder to distinguish the products of natural selection from those of a busy committee with a lot of other projects on their agenda, and varying in competence!)
This paper doesnâ€™t show that junk DNA is rare, however. In fact, they say that only â€œ5% of the bases in the genome can be confidently identified as being under evolutionary constraint in mammals.â€? In other words, there is â€œa large pool of neutral elements that are biochemically active but provide no specific benefit to the organism.â€? They go on to say that these â€œmay serve as a â€˜warehouseâ€™ for natural selection.â€? Any DNA sequence can serve as raw material for natural selection, I guess, just as a junkyard can yield useful materials for a tinkerer. But duplication and modification of existing functional genes (either protein-coding or regulatory) is more likely to yield something useful than a bunch of self-replicating junk DNA is. See this related discussion from Sandwalk.
â€œRegulatory sequences that surround transcription start sites are symmetrically distributed, with no bias towards upstream regions.â€? I would have thought that the DNA sequence near where transcription into RNA starts would be more likely to have a role in regulation, but no. I should have known better, because my wife (who is always right, supposedly) is working on an interesting example of downstream regulation.