| Sequencing a Genome |
Sequencing a genome is an enormous task. It requires not only finding the nucleotide sequence of small pieces of the genome, but also ordering those small pieces together into the whole genome. A useful analogy is a puzzle, where you must first put together the pieces of a smaller puzzle and then assemble those pieces into a much larger picture. Two general strategies have been used in the sequencing of large genomes: clone-based sequencing and whole genome sequencing (Fig. 1).
In clone-based sequencing (also known as hierarchical shotgun sequencing) the first step is mapping. One first constructs a map of the chromosomes, marking them at regular intervals of about 100 kilobases (kb). Then, known segments of the marked chromosomes (which can contain very small fragments of DNA) are cloned in plasmids. One special type of plasmid used for genome sequencing is a BAC (bacterial artificial chromosome), which can contain DNA fragments of about 150 kb. The plasmid's fragments are then further broken into small, random, overlapping fragments of about 0.5 to 1.0 kb. Finally, automated sequencing machines determine the order of each nucleotide of the many small fragments.
Data management and analysis are critical parts of the process, as these sequencing machines generate vast amounts of data. As the data are generated, computer programs align and join the sequences of thousands of small fragments. By repeating this process with the thousands of clones that span each chromosome, researchers can determine the sequences of all the larger clones. Once they know the order of all the larger clones, the researchers can join the clones and determine the sequence of each chromosome.
Finding the sequence of the smaller clone fragments is relatively easy. The challenge is assembling all the pieces. The National Human Genome Research Institute (the public consortium headed by Francis Collins) used clone-based sequencing for the human genome. In doing so, they relied heavily on the work of computer scientists to assemble the final sequence.
Whole genome shotgun sequencing skips the mapping step of clone-based sequencing. Instead, it (1) clones millions of the genome's small fragments in plasmids, (2) sequences all of these small overlapping fragments, and then (3) uses computers to find matches and join them together.
Celera Genomics, a private company headed by J. Craig Venter, used this approach to clone the human genome. Although they started much later than the public consortium, Celera completed its draft sequence at about the same time as the consortium; however, it had the advantage of having access to all the consortium's maps.
Genome sequencing projects now generally use some combination of chromosome mapping, and clone-based and whole genome shotgun sequencing of smaller fragments. The technology developed for sequencing the human genome - both in terms of sequencing DNA and in the software and hardware used to assemble the sequences into a genome - has resulted in the rapid sequencing of many other genomes.