Under the supervision of Olivier Tenaillon in IAME (faculty of medicine of Bichat hospital) and Ivan Matic's team Robustesse et évolvabilité de la vie
In order to further analyse the mutational patterns, I used Direct-Coupling Analysis (DCA). This statistical physics approach allows to predict the effect of a mutation occurring in a gene and inducing an amino-acid change in the corresponding protein. By modelling the interactions between pairs of amino acids within the protein, DCA allows the genetic context in which the mutation occurs to be taken into account.
Genetic context was found to be key to the quality of the predictions made by DCA. This context is built up over long time scales by the addition of many weak interactions between amino acids. These do not affect all residues of a protein in the same way. DCA can predict the variability of these residues. In particular, between 30% and 50% of the sites in a protein are highly constrained by the genetic background of E. coli. A mutation at one of these sites will generally be deleterious if it occurs alone. These sites do not therefore tolerate polymorphisms. However, they can co-evolve over long time scales so that the amino acids observed there vary widely between species.
If individual residues of a protein can evolve at different rates, so can proteins. I have developed a selection test, based on the DCA, which allows genes to be compared with each other. In the short term, the essential genes are those under the strongest purifying selection pressure, while the level of expression determines the long-term rate of evolution. This test also detects inactivations of transcriptional factors, inactivations that appear to be selected in the short term but counter-selected in the longer term.
The present work demonstrates the interest of coupling the study of large genome databases with modelling approaches to understand the evolution of a species on different time scales