| Genomics-Based Predictions of Cellular Proteins |
We now have large databases of gene sequences, predicted protein sequences, and known 3D protein structures; yet we still don't know the total protein composition of a cell. Determining the proteome of a cell is a complicated task. There are two approaches to obtaining this information: computer-based and experimental.
The computer-based method uses the genome sequence of an organism to predict genes, based on known characteristics of protein-coding regions of the genome. (See the Genomics unit for a discussion of computer-based methods for gene identification and microarrays to identify expressed genes.) However, even if we know that a particular sequence is a gene, we don't necessarily know all the possible proteins it makes.
One reason is that one gene may produce more than one mRNA. RNA splicing is the normal process in which intron sequences are removed from the pre-mRNA, producing the mRNA, which corresponds to the exons. However, some transcripts can be spliced in alternative ways (alternative splicing), joining different exons (Fig. 3). The result is two or more different mRNA molecules from one gene. Variants of a protein produced by alternative splicing may have a similar physiological activity, a different and unrelated activity, or no activity at all. According to one estimate, about forty percent of human genes are alternatively spliced. This is one mechanism that accounts for the relatively large number of proteins produced by only about 35,000 human genes.
A more direct approach to identify proteins in a cell is to measure enzyme activities and other functions for which there are biochemical assays. In some cases, we can identify the function of new proteins by combining our knowledge of metabolic pathways in many organisms with the predicted function from genome analysis. With this type of information, researchers can readily identify new enzymes. To do this, they examine the similarity of the genome sequences to known enzymes, as well as the presence (in the same genome) of the proteins that are required for the other steps in the metabolic pathway.