Teacher resources and professional development across the curriculum

Teacher professional development and classroom resources across the curriculum

Monthly Update sign up
Mailing List signup
Rediscovering Biology Logo
Online TextbookCase StudiesExpertsArchiveGlossarySearch
Online Textbook
Back to Unit Page
Unit Chapters
The Human Genome Project
Sequencing a Genome
Finding Genes
Is the Eukaryotic Genome a Vast Junkyard?
The Difference May Lie Not in the Sequence but in the Expression
Determining Gene Function from Sequence Information
The Virtues of Knockouts
Genetic Variation Within Species and SNPs
Identifying and Using SNPs
Practical Applications of Genomics
Examining Gene Expression
Proteins & Proteomics
Evolution & Phylogenetics
Microbial Diversity
Emerging Infectious Diseases
Genetics of Development
Cell Biology & Cancer
Human Evolution
Biology of Sex & Gender
Genetically Modified Organisms
Sequencing a Genome

Sequencing a genome is an enormous task. It requires not only finding the nucleotide sequence of small pieces of the genome, but also ordering those small pieces together into the whole genome. A useful analogy is a puzzle, where you must first put together the pieces of a smaller puzzle and then assemble those pieces into a much larger picture. Two general strategies have been used in the sequencing of large genomes: clone-based sequencing and whole genome sequencing (Fig. 1).

Figure 1. Sequencing
In clone-based sequencing (also known as hierarchical shotgun sequencing) the first step is mapping. One first constructs a map of the chromosomes, marking them at regular intervals of about 100 kilobases (kb). Then, known segments of the marked chromosomes (which can contain very small fragments of DNA) are cloned in plasmids. One special type of plasmid used for genome sequencing is a BAC (bacterial artificial chromosome), which can contain DNA fragments of about 150 kb. The plasmid's fragments are then further broken into small, random, overlapping fragments of about 0.5 to 1.0 kb. Finally, automated sequencing machines determine the order of each nucleotide of the many small fragments.

Data management and analysis are critical parts of the process, as these sequencing machines generate vast amounts of data. As the data are generated, computer programs align and join the sequences of thousands of small fragments. By repeating this process with the thousands of clones that span each chromosome, researchers can determine the sequences of all the larger clones. Once they know the order of all the larger clones, the researchers can join the clones and determine the sequence of each chromosome.

Finding the sequence of the smaller clone fragments is relatively easy. The challenge is assembling all the pieces. The National Human Genome Research Institute (the public consortium headed by Francis Collins) used clone-based sequencing for the human genome. In doing so, they relied heavily on the work of computer scientists to assemble the final sequence.

Whole genome shotgun sequencing skips the mapping step of clone-based sequencing. Instead, it (1) clones millions of the genome's small fragments in plasmids, (2) sequences all of these small overlapping fragments, and then (3) uses computers to find matches and join them together.

Celera Genomics, a private company headed by J. Craig Venter, used this approach to clone the human genome. Although they started much later than the public consortium, Celera completed its draft sequence at about the same time as the consortium; however, it had the advantage of having access to all the consortium's maps.

Genome sequencing projects now generally use some combination of chromosome mapping, and clone-based and whole genome shotgun sequencing of smaller fragments. The technology developed for sequencing the human genome - both in terms of sequencing DNA and in the software and hardware used to assemble the sequences into a genome - has resulted in the rapid sequencing of many other genomes.

Back Next


© Annenberg Foundation 2017. All rights reserved. Legal Policy