Jinfeng Zhang
Department Statistics
Harvard University
Development of Monte Carlo Methods for Characterizing the Sequence
and Structure Relationship of Proteins
Proteins are the machinery of life and are involved in almost all biological
processes. A protein functions through a well defined three-dimensional
structure. The next step after genome sequencing projects is to understand
the sequence and structure relationship of proteins or genes encoded in the
genomes. Such understanding will fundamentally advance biology and public
health. The protein structure modeling problem has been formulated as first
defining a free energy function for protein structures, which follow a
Boltzmann distribution, and then designing a sampling method so that protein
structures can be sampled from the distribution. Both defining the energy
function and designing sampling methods are very challenging. I developed
new Markov Chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) methods
for studying protein structures. The MCMC method was tested on a simplified
protein folding model, HP model. Finding minimum energies for HP sequences
is still an open problem with a 20-year history. The new method
significantly outperformed all previously reported methods. SMC was applied
to estimate entropy and free energy of ensemble protein structures, one of
the most challenging problems in computational chemistry. I will present
some interesting findings discovered through sampled structures using the
method. I will also discuss applications of the Monte Carlo methods in other
possible settings.
Back to Colloquium Series