| || |
| || |
| Hunter Fraser |
Fraser is a UC Berkeley doctoral candidate. His research on the evolution of protein interactions, in collaboration with Aaron Hirsch, was published in the journals Science and Nature.
What question were you trying to answer in your research?
Our study was motivated by our observation that when you look at different proteins they tend to evolve at very different rates-some evolve very quickly and some evolve very slowly. So we wanted to answer the question: What causes that difference in rates of evolution for different proteins.
Was this theory already out there?
The theory was that more important proteins will tend to evolve more slowly because the evolution will be more constrained. That's because when you tend to get a mutation in an important protein, then selection will act to get rid of that mutation from the population if it's affecting your organism. On the other hand, in a lesser important protein, you're not going see such a strong effect of selection because the organism doesn't really care when you get a mutation in a not-so-important protein.
The theory had been out there for about 20 years or so but actually it only became possible to test it in the last couple of years with this large amount of functional genomic data that's been becoming available.
So what did you do in your research?
We decided to take a look at several different variables that you can look at with functional genomic data sets and see how they correlate with the rites of evolution of proteins.
One thing we did was to look at number of protein-protein interactions that each protein has to see if that has an effect on the rate at which each protein evolves. Our theory was that more protein-protein interactions would cause a protein to be more slowly evolving for two reasons. One might be that it's more important to the organism, so mutations will tend to accumulate less in it because of selection. And the other will be a structural reason, which is just that protein-protein interactions tend to impose a structural constraint on proteins. So the more interactions you have the more constraint there is and so structural constraint translates to evolutionary constraint, so we're looking for either one of those.
What were the specific steps in your study?
First we went out and tried to find as much data that was pertinent to the study as we could. To do that we compiled a list of protein-protein interactions in yeast. Recently there's a method that's become possible in yeast called the yeast two-hybrid system, which is just a test for determining whether any two proteins interact inside the cell. We can use that test which others have applied on a very large scale to find thousands of interactions and we can take their data and try to correlate that with the rites of evolution of different proteins.
So we're trying to bring functional genomics, which is a brand new area of science into a very old area of study, that of evolution. By bringing those two together we can ask questions that have been around for a long time in the field of evolution, but haven't been able to be addressed because the data just hasn't been out there. Using this proteomic data from interactions gotten by the yeast two-hybrid system we can ask questions like: Do protein-protein interactions affect that rate at which proteins evolve?
Can you more thoroughly describe the theory that you were trying to test in your research?
There are two separate arguments, which are related but also separate. One of them is the structural constraint argument and that is when you have a two proteins-let's say my hand is a protein and this hand is another protein-and they interact with each other, they have to have a surface in which one can fit into the other because it's a physical interaction just like a lock and a key.
So you can imagine that because one protein fits into the other, that is going to cause a structural constraint on both proteins, but only in the area where they're interacting, so this side of my hand which is not interacting would not be constrained with this interaction. But if this protein is interacting with a different protein on this side then that's going to be another structural constraint.
The more protein-protein interactions, the more constraint there is going to be overall on that protein. So structural constraint translates to sequence constraint because the sequence determines the structure of proteins. When we look at the sequence constraints we confer structural constraints from that. That's the structural argument.
The second argument is the dispensability of proteins, which we looked at in a separate study, but also is related to this idea of structural constraints. This is idea is that different proteins have different dispensabilities to an organism. Certain proteins you can do without just fine. If you have a mutation in them, you won't even know the difference because there'll be no difference in the organism. But other proteins can have very serious consequences if mutated or they can even lead to death. So there's a very wide range of protein dispensabilities, as we call it.
This determines the protein evolutionary rate in large part because you can imagine if you have a mutation in a protein that's very important to an organism then that organism with the mutation is not going to survive, it's not going to have offspring and so that mutation will not spread throughout the population. The protein will have very slow evolution. Any changes that come about in it will not propagate, whereas in contrast if you look at a less important protein the opposite will be true because selection will not act on it since the organisms will be just fine with any mutations in it.
What were some of the challenges in your research?
A big problem when doing comparative genomics-comparing one genome to another genome-is that if they've been evolving separately for a long time then the proteins in them can look very different, so it's hard to find the corresponding protein from one organism back to another organism. We tried to overcome that with a certain method that we used to try to find out which proteins were homologous (sharing sequence similarity) and which ones were not arisen by gene duplications.
The way we found these similar genes in the different organisms was to use a common algorithm that's used in biology called BLAST. What BLAST does is it searches for regions of similarity between two proteins. What we do is we take the entire set of genes from one proteome and we compare it to the other set of genes from another organism. And we ask for each one of these genes in the first organism, does it show significant similarity to any of these genes in a second organism? Then we just flip the problem around and do it the reverse way. We say in the second organism we look at each gene individually and we say, does it show similarity to any of these genes in this first organism?
By taking those two lists and putting them together, we can combine them to get enough information to give us a pretty idea of which genes might have been related by common decent.
Any other challenges?
There's a lot of complications that come along with any study so in this one, for example, we had to deal with different sources of these protein-protein interaction data. How to combine them and to compare them with each other because one data set might be much more reliable than another data set and one might be overlapping quite a bit with another data set so the problem is how to combine those optimally.
In the end we decided that there were lots of complicated ways to go about it but the most simplest way was what we decided to do. That was to take all these lists, combine them, and throw out any duplicate interactions that we found without adding any extra weight to those interactions, even though those interactions will be more likely to actually occur in the living cells given that we saw them in two separate lists.
Want so you want people to know about your study?
It's interesting to see how this all these new methods of functional genomic and proteomic approaches can be applied to very old problems that have been around for a long time. We think that this might be a very interesting field to do that in because evolutionary biology has a lot of old questions that really haven't been able to be answered for quite a while. But now we can start to look in more detail using these global approaches that genomics and proteomics allowed us to use, such as finding things like correlations between evolutionary rates, co-evolution between different proteins and things like that that really hadn't been possible before.
So your study was about proteins but also an evolution study, right?
There are very different levels that you can look at evolution. There's one level which many people traditionally think of evolution as the ape going to the man kind of level of evolution and that's certainly one large area of study, and there's a lot of interesting work going there.
But what I'm concentrating on is a different level that's looking at a more molecular level so that's called "molecular evolution." What we do is we say, can we try to find patterns and correlates of gene evolution just on a gene by gene basis, not looking at entire organisms and seeing how they change over time-just individual proteins.
How is molecular evolution related to organismal evolution?
Molecular evolution underlies the more complicated kind of evolution that you think of when you think of apes to man. When you think about it, what's going on in evolution is just you're seeing the changes accumulate on the large scale when you look at the large scale, but those changes are coming from the DNA level. So by looking at a DNA level we're really getting down to the most detailed level that we can get to and we can perhaps some day apply those findings to larger scale evolution, but for now we're just concentrating on looking at these individual proteins.
One interesting application that has already been done is, for instance, in human language. There's a gene, called the Fox P2 gene, and people have found that they can correlate the rate of evolution of that gene with language development in humans. They found that actual amino acid changes correspond to different levels of language-ability. In humans that have a mutation in that gene they've actually lost the ability to speak to a large extent. So when we compare their gene to chimps and to other monkeys, we see differences in that which might be applicable in studying language on a very large scale.
What were the results of your study?
Once we had found this correlation between the number of protein-protein interactions and the rate of evolution, we wanted to look at more detail, start to tie up the loose ends. Another thing we looked at was how do different proteins that interact with each other tend to evolve with respect to one another. So just as you can think of any two objects that need to fit together, for instance, a lock and a key, if one changes the other is going to need to change in a reciprocal kind of way in order for the interaction to be maintained.
We thought this might apply to these proteins as well so we compared evolution of interacting proteins-Protein A interacts with Protein B-then other evolutionary rates similar to one another and we found that they were, which is what you'd expect given that they have to co-evolve in order for these interactions to be maintained.
One difficulty in looking at these correlations on a genome-wide scale is that the correlations don't always apply to all proteins. Sometimes you need to restrict your attention to a certain part, still a very large part. We're still talking about thousands of proteins-but it still might need to be restricted in order to see the correlation in its strongest form.
What we tried to do was to say, well, maybe we can see this relationship being even stronger in the more well conserved proteins. That could be for a number of reasons. We decided to look just in these well-conserved proteins and when we did, we saw that the correlation was extremely strong, much stronger than we'd seen it before so that was quite a "booyah" moment.
What are the unanswered questions in your research?
There are a lot of questions that we still need to address, so even though we found several correlates of what determines the rate of evolution of proteins we still haven't come close to answering the problem completely. There's still a lot of unanswered variance in these rates of evolution of proteins that we don't have any idea what's causing it.
I'm looking forward to the next few years when more genomic data will continue to pour out of many labs across the world and we can use that and apply it to the study of evolution to try to pin down exactly all the factors that tend to influence these rates of evolution.