Jinfeng Zhang

Department Statistics

Harvard University


Development of Monte Carlo Methods for Characterizing the Sequence and Structure Relationship of Proteins

Proteins are the machinery of life and are involved in almost all biological processes. A protein functions through a well defined three-dimensional structure. The next step after genome sequencing projects is to understand the sequence and structure relationship of proteins or genes encoded in the genomes. Such understanding will fundamentally advance biology and public health. The protein structure modeling problem has been formulated as first defining a free energy function for protein structures, which follow a Boltzmann distribution, and then designing a sampling method so that protein structures can be sampled from the distribution. Both defining the energy function and designing sampling methods are very challenging. I developed new Markov Chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) methods for studying protein structures. The MCMC method was tested on a simplified protein folding model, HP model. Finding minimum energies for HP sequences is still an open problem with a 20-year history. The new method significantly outperformed all previously reported methods. SMC was applied to estimate entropy and free energy of ensemble protein structures, one of the most challenging problems in computational chemistry. I will present some interesting findings discovered through sampled structures using the method. I will also discuss applications of the Monte Carlo methods in other possible settings.


Back to Colloquium Series