From genomes to protein models and back
The ultimate goal of computational biology in the genomic era is to be able to use information on the partsí list of an organism to develop a system wide understanding of its functioning. But, how complete is our knowledge of the partsí list at our disposal?
The number of genes in the human genome, for example, ranges between 20,000 and 25,000, but these figures seem to be much lower than the number of required biochemical functions. Several explanations can be invoked. For example, the mechanism of alternative splicing can provide with more than one protein per gene, the same protein can have different functions or it can perform a different role when in complex with different partners. Structure modelling techniques can be used to evaluate the likelihood of these hypotheses, by assessing the probability that an alternate transcript does indeed fold as a protein, by analysing a protein structure to identify functional sites and by predicting which complexes the protein can form in vivo. I will discuss examples of the results of these approaches applied to different genomes.