Discoveries in DNA: What's New Since You Went to High School?
Joel Eissenberg, Ph.D., associate dean for research and professor of biochemistry
and molecular biology at Saint Louis University School of Medicine, shares highlights
from recent decades in the field of genetics.
Scientific and technological advances in the last 50 years have led to extraordinary
progress in the field of genetics, with the sequencing of the human genome as both
a high point and starting point for more breakthroughs to come.
Joel Eissenberg, Ph.D., associate dean for research and professor of biochemistry
and molecular biology at Saint Louis University School of Medicine
Advances in molecular genetics have propelled progress in fields that deal with inherited
diseases, cancer, personalized medicine, genetic counseling, the microbiome, diagnosis
and discovery of viruses, taxonomy of species, genealogy, forensic science, epigenetics,
junk DNA, gene therapy and gene editing.
For many non-scientists, a recap may be in order. With such rapid progress, the field
has moved well beyond the knowledge covered in many biology classes over the years.
If you took high school biology in the 1960s, you probably learned about DNA as the genetic material and the structure of the
DNA double helix (published in 1953 by Watson and Crick). You may also have learned
about the genetic code, by which the sequences of DNA encode amino acids (worked out
by Nirenberg, Khorana and colleagues by 1961).
If you took high school biology in the 1970s, you probably also learned about cloning (worked out by Herb Boyer, Stanley Cohen
and Paul Berg by 1972) and the potential for recombinant DNA technology to provide
gene therapy, create novel drugs and improve agriculture.
If you took high school biology in the 1980s, you may have learned about the clinical use of recombinant human insulin for diabetes
treatment (approved for the Eli Lilly products in the US by the FDA in 1982). In agriculture,
the use of Agrobacterium tumefacians as a bacteria-mediated delivery system to transfer
recombinant DNA to crops (developed by Mary-Dell Chilton and colleagues in the 1970s)
marked the advent of GMO foods and other commercial plant products.
Image courtesy of the National Institutes of Health, adapted by Ellen Hutti.
If you took high school biology in the 1990s, you probably learned about the molecular basis for human genetic disorders such
as cystic fibrosis (1989), Huntingtons (1993), Duchenne and Becker muscular dystrophy
(1987), and a rapidly growing list of single-gene disorders, and the correspondingly
rapid growth in clinical diagnostic technology based on DNA sequence information,
enabling certain diagnosis, sometimes before the advent of overt symptoms.
If you took high school biology in the ‘00s, you probably heard about the completion of the human genome sequence. The completion
of a “rough draft” was announced by President Bill Clinton and British Prime Minister
Tony Blair in 2000, although a more-or-less complete sequence was only finalized in
2006. You may also have learned that this achievement heralded the arrival of the
age of personalized medicine.
The big breakthrough in decoding the human genome was the invention of technology
to obtain large amounts of DNA sequence, which began in the 1970s with the work of
Ray Wu, Walter Gilbert, Fred Sanger and their colleagues to establish the core strategies
for obtaining continuous sequence information for DNA chains. This included advances
in recombinant DNA technology—for example, the creation of recombinant artificial
chromosomes—combined with semi-automated (invented in the Leroy Hood lab in 1986)
and later automated DNA sequencing. Today, the goal of obtaining the complete sequence
of an individual genome for $1,000 is nearly within reach.
Although it isn’t biology, it must be acknowledged that the human genome sequencing
project also required parallel advances in computer speed and storage to acquire,
store and manipulate billions of nucleotides of DNA sequence. The assembly and analysis
of human tumor cell genomes, many of which contain chromosome deletions, duplications
and insertions, as well as single nucleotide changes, requires immense data storage
capacity and high-speed computation.
The invention of the Polymerase Chain Reaction (PCR) technology by Kary Mullis and
colleagues in 1985 transformed molecular genetics. This had immediate application
for DNA diagnostics, because once a gene implicated in an inherited disorder has been
identified and sequenced in its normal form, PCR could be used to amplify the corresponding
sequences from patient DNA in a matter of hours, with sequencing of the PCR products
to identify the exact molecular mutation in a matter of days at that time, and today
in a matter of hours. PCR has also seen application in the identification of emerging
The findings made by scientists during these decades have led to advances in many
different fields. Explore the following topics to learn more about how breakthroughs
in molecular genetics are being applied in the world.
Thus far, the impact of molecular genetics on human disease has been primarily to
identify specifically which genes are implicated in specific diseases. For single-gene
disorders, mutations have been discovered in hundreds of genes. The current challenge
is to identify which genes contribute to multifactorial conditions like obesity, heart
disease, alcohol dependency, schizophrenia and autism. So far, “genome-wide association
studies” have identified variant DNA sequences showing statistical association with
these and other complex diseases, but demonstrating a mechanistic role for these variants
has proven elusive.
Rapid and inexpensive genome sequencing, together with high-speed informatics and
a large and expanding database of annotated human DNA sequence variants associated
with disease risk, has made possible personalized medicine and customized genetic
counseling. In a particularly famous case, actress Angelina Jolie elected to have
a double mastectomy and her ovaries and fallopian tubes surgically removed when she
learned that she carried a mutation in the BRCA1 gene that predicted an 87 percent
risk of breast cancer and a 50 percent risk of ovarian cancer. Unfortunately, for
the vast majority of genetic cancer associations, removing nonessential tissue is
not an option. However, prior knowledge of an increased risk can result in increased
surveillance, and cancer is most curable when caught early. For example, persons at
increased risk for hereditary non-polyposis colorectal cancer should undergo frequent
colonoscopies to identify and remove pre-cancerous colon polyps before they turn into
full-blown colon cancer. Since 23andMe began offering direct-to-consumer marketing
of genomic sequencing, over a dozen companies now offer various forms of genome sequencing
A healthy intestine carries about 10 times as many microorganisms as the number of
cells in the entire body. The metabolic activity of these microorganisms can significantly
impact health. For example, their metabolic activity is an important source of biotin
(vitamin B7), and the composition of gut microflora can shape the immune response,
leading to sensitivity or resistance to allergies and autoimmunity. The availability
of a large number of complete microbial genomes and the technology of high-volume
DNA sequencing has enabled the genotyping of gut microbiomes under different dietary
and health conditions, leading to new, detailed understanding of the differences between
healthy and unhealthy gut microflora. Direct-to-consumer gut microflora sequencing
services are currently available, although the benefits of this knowledge for otherwise
healthy people are currently limited.
Rapid DNA sequencing has displaced more time-consuming, costly and less specific culture
or antibody assays in the detection, classification and diagnosis of infectious disease.
In 2003, DNA sequencing at the Michael Smith Genome Sciences Centre in Toronto and
at the Centers for Disease Control in Atlanta showed that the virus causing an outbreak
of severe acute respiratory syndrome (SARS) in China was a new variant of coronavirus.
This made the epidemic easier to track and contain. More recently, sequences of 99
Ebola virus genomes taken from 78 confirmed patients at different points in their
infections helped to map the origin and spread of the virus during the 2014 outbreak
in Sierra Leone. This epidemiological information should aid in containment efforts
in future outbreaks.
Most high school biology students learn some basic animal and plant taxonomy. The
foundation of classical taxonomy is morphology. With the availability of genome sequences
from representative species in whole phyla, rigorous quantitative measurements of
genetic distance based on DNA sequence divergence has been used to test existing evolutionary
trees and to re-classify organisms in all kingdoms. For example, large-scale taxonomic
DNA sequence comparisons have established more rigorous relationship trees and taxonomic
distances for the large and diverse class Aves (birds) and the phylum Arthropoda.
Human genome sequencing provides much more detailed and specific genealogical information.
Several commercial services will provide information on likely ancestry based on combinations
of DNA sequence variants known to be rare or prevalent among people originating from
different regions of the world. However, it should be noted that one sometimes unwelcome
outcome of genome sequences for pedigree or genealogical purposes is the discovery
of non-paternity. While rates vary widely between different populations, they have
ranged between 2 and 30 percent in specific studies.
Forensic science is increasingly turning to DNA sequencing to implicate or exonerate
potential culprits and to identify remains. In such cases, it is usually sufficient
to sequence only a subset of genomic DNA representing regions found to be most variable
among individuals. This approach circumvents the much higher cost of sequencing and
data management for whole genome sequencing, while providing sufficient specificity
for forensic purposes.
High school biology students are taught Mendel’s laws of genetic inheritance. The
first law states that a genetic trait is transmitted from one generation to the next
without modification, even when it is recessive and not evident in carriers. The idea
that a genetic trait could be modified by the life experience of its carrier is a
violation of this law and is typically dismissed as Lamarckian fallacy in high school
Biochemists have known that human DNA (as well as the DNAs of many microbes, plants
and animals) contains other bases besides the canonical adenine, cytosine, guanine
and thymine (ACGT). In human chromosomes, 3 to 5 percent of the cytosine bases are
actually a modified form of cytosine called 5-methyl cytosine. This modification is
usually associated with repression of genes in humans. Importantly, the extent of
modifications can differ at the same gene in the sperm and egg, such that the expression
of dad’s copy will be different from mom’s copy in the child that inherits them. This
phenomenon is called “imprinting.” Parental imprinting is important for genetic health,
as failure of this imprinting underlies syndromes such as Prader-Willi, Angelmans,
Beckwith-Weidemann and Silver-Russell syndromes.
The transmission of different states of gene expression through multiple cell divisions
and across generations has been termed “epigenetic,” since the underlying DNA sequence
is identical in both states. While the four bases of DNA – adenine, cytosine, guanine
and thymine – cannot be altered by a parent’s life experiences, scientists have discovered
that a form of one of the base pairs, cytosine, can be expressed in different forms
as a result of environmental factors. Techniques have been developed over the last
decade to allow for “epigenomics,” the genome-wide characterization of methylation
patterns. Epigenomics is presently being used to identify inherited pre-disposition
to obesity, diabetes, cardiovascular disease, addiction and psychiatric disorders,
as well as markers for aging and cancer progression.
Drugs that inhibit or stabilize epigenetic marks are in clinical use for cancer and
are being tested for other indications like sickle cell disease.
For over 50 years, it has been known that among multicellular animals and plants,
the size of genomes can vary over orders of magnitude in ways that are not explained
by the apparent complexity of the organisms specified by those genomes. With the availability
of whole genome sequencing, it has become evident that, for example, the number of
genes in humans is not much greater than that of the fruit fly. Much of our genome
consists of repetitive DNA sequences, transposable elements and non-functional relicts
of genes and transposons with no discernible function. This DNA has been termed “junk
DNA” to convey this apparent lack of function.
In the past 15 years, however, detailed analysis of which regions of DNA are transcribed
into RNA copies has uncovered a significant amount of non-protein-coding RNAs that
serve regulatory functions. So-called microRNAs have been shown to target specific
protein-coding RNAs for destruction or inhibition of protein synthesis. Other RNAs,
termed long noncoding RNAs, also appear to regulate expression of protein-coding RNAs,
but the mechanisms are just now being worked out. That said, this still leaves a majority
of our genome with no apparent function. It seems likely that the mechanisms by which
DNA accumulates in genomes through transposon jumping and genome duplications is not
balanced with a similar rate of sequence elimination, and that the burden of this
unpurged DNA has little or no evolutionary cost.
A major goal for gene sequencing has always been as a platform for the design and
implementation of gene therapy. The first gene therapy was bone marrow engraftment
for the treatment of leukemia and other blood cancers. The first human bone marrow
transplant was performed in 1956 by E. Donnall Thomas. In these therapies, the patient’s
own blood-producing bone marrow cells are treated with radiation and chemotherapy,
and the blood of a twin or closely matched donor is instilled. The idea is that the
ablation will destroy any remnant of the cancer together with the healthy hematopoietic
stem cells, and that donor’s blood stem cells will populate the patient’s marrow and
regenerate the entire red and white blood cell repertoire from healthy cells. In effect,
this is gene therapy, since the donor’s genes are replacing the patients genes in
the blood cell lineages.
Targeted gene therapies, however, had to wait for (1) the identification of the genes
to target, (2) the cloning and/or sequencing of the relevant genes and in some cases,
the specific disease-causing variant, (3) a full understanding of the normal gene
function and regulation, and (4) the development of efficient ways to deliver genes
to the relevant tissues at therapeutic levels. The advent of molecular cloning, DNA
sequencing and the many tools of molecular genetics and cell biology has given us
sufficient knowledge of the basis for disease and the genes to target, but what has
limited the application of gene therapy has been efficient gene delivery systems.
Scientists realized viruses could be the perfect tool to do the work of gene editing.
They are already designed by nature to insert themselves into our DNA. The first successful
targeted human gene therapy was reported in 2000. This was a virus-mediated therapeutic
gene to treat X-linked severe combined immunodeficiency. The deck was stacked in favor
of success in the choice of this particular disease since it was known that the therapeutic
gene just had to be expressed in a modest number of blood cells to achieve therapeutic
benefit. The therapeutic gene was carried by an engineered retrovirus that was used
to infect the patient’s blood cells before they were re-injected into the patient.
Thus far, targeted gene therapy successes have been very limited. Other blood disorders
that have shown significant benefit from targeted gene therapy in small trials include
hemophilia (specifically, factor IX deficiency), severe beta-thalassemia (deficiency
for the adult beta-globin gene) and leukemia, where the patient’s immune cells were
treated to enable them to recognize cancer cells and destroy them. Targeted gene therapy
for degenerative blindness caused by Leber congenital amaurosis improved vision for
a few years but failed to arrest the degeneration process.
The first approved commercial targeted gene therapy is Alipogene tiparovovec (trade
name Glybera), a virus-mediated therapeutic delivery of human lipoprotein lipase to
the muscle cells of lipoprotein lipase deficiency patients. It was approved for clinical
use in Europe in 2012.
For many genetic disorders, the disease results from the expression of a defective
gene product, not the complete absence of the product. For example, sickle cell disease
and the major form of cystic fibrosis are both associated with abnormal proteins.
In such cases, editing a patient’s own gene to the normal form should provide greater
benefit than merely expressing the normal protein in the presence of the abnormal
For gene editing to work, it is essential to uniquely target a single site among the
3 billion nucleotides in the haploid (single set of unpaired chromosomes) human genome.
In other words, therapeutic gene editing therapy must be able to efficiently edit
the intended target without introducing off-target editing at sites that fortuitously
resemble the intended target.
Two targeting strategies are currently under development—protein-based targeting and
RNA-based targeting. In both cases, the idea is to target an enzyme that cuts both
strands of the double helix at a specific site. If the goal is to inactivate the target
gene, the creation of the break is sufficient to trigger cellular mechanisms that
lead to error-prone repair and inactivating mutations. True editing—the replacement
of bad sequence with good sequence—requires the simultaneous introduction of DNA fragments
containing good sequence into the same cells.
Protein-based targeting strategies rely on custom-engineered modular proteins that
recognize and bind to specific DNA sequences. One approach builds on the so-called
zinc finger protein fold first described in sequence-specific transcription factors.
The publicly traded biotech company Sangamo BioSciences was founded in 1995 to exploit
zinc finger protein engineering for gene therapy and agricultural genetic engineering.
In this approach, a series of zinc finger modules, each chosen to recognize a specific
3-nucleotide motif, are fused in tandem to one another and to a nuclease subunit.
Another protein-based targeting strategy, TALENs, is based on the transcription activator-like
effector (proteins secreted by Xanthomonas bacteria). In this case, repeating 33-34
amino acid modules with specificity to each of the four bases in DNA are fused in
tandem to create the targeting peptide.
More recently, the clustered regularly-interspaced short palindromic repeats (CRISPR)
prokaryotic immunity mechanism has been exploited for gene editing. In this case, specific DNA sequences are targeted by RNA-DNA hybridization, which
directs the Cas9 enzyme to cleave DNA at the target site. This mechanism was only
worked out in 2007, but has already emerged as the front-runner technology for gene
editing, due to its relative simplicity and high efficiency. In human cells, the efficiency
of zinc-finger- and TALE-mediated editing achieve efficiencies of 1 to 50 percent,
while CRISPR-Cas9 editing has been reported to have efficiencies of up to 78 percent
in single-cell mouse embryos. Exciting clinical applications of gene editing include
correcting the mutation in the bone marrow stem cells of patients with sickle cell
disease or hemophilia.
One of the most exotic applications of genetic engineering to be proposed is the resurrection of woolly mammoths. A project underway at Harvard University, under the direction of geneticist Dr.
George Church, seeks to edit the elephant genome to create a cold-tolerant mammoth-like
chimera using CRISPR/Cas9 technology. This strategy is enabled by the complete sequencing
of the mammoth and elephant genomes, and based on gene annotations that suggest the
target genes most likely to program the mammoth adaptations to extreme cold into elephant
DNA. The stated goal is to repopulate the tundra and boreal forest in Eurasia and
North America, to protect endangered Asian elephants and to revive an ancient grassland
in the tundra, with the hope of forestalling the melting of Siberian permafrost.