EuroSciPy logo


Cambridge, UK - 27-30 August 2014

Markov Model Analysis Of Proteins With Python

Max Linke


Molecular dynamics simulations currently simulation times up to a millisecond [1]. But biologically interesting processes happen on a ms to s timescale. Markov State Models are a way to bridge the timescale gap between simulations and biological relevant processes [2].

To load the simulation data into python we started to develop a general tool to load and analyze MD-simulations in python. We use Cython to wrap data-loading functions of GROMACS [3] and load them into custom data models to interactively select the protein backbone or specific amino acid sequence.

To build the Markov Model the phase space has to be clustered. Because high dimensional clustering is computationally hard and also requires a lot of sampling we use dimension reduction methods like Principal Component Analysis and Time Independent Correlation Analysis[4] and do a k-means clustering in the low-dimensional subspace. The Markov state model is then build with the algorithms from [2].

Python-tools employed are

  • cython
  • numpy
  • scipy
  • scikit-learn
  • statsmodels
  • matplotlib
  • ipython

[1] Shaw, David E., et al. "Millisecond-scale molecular dynamics simulations on Anton." High Performance Computing Networking, Storage and Analysis, Proceedings of the Conference on. IEEE, 2009.

[2] Bowman G, Pande V, Noe F, "An Introduction to Markov State Models and their application to long timescale molecular simulations" Advances in Experimental Medicine and Biology Vol 797

[3] Pronk, Sander, et al. "GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit." Bioinformatics 29.7 (2013): 845-854.

[4] L. Molgedey and H. G. Schuster. Separation of a mixture of independent signals using time delayed correlations. Phys. Rev. Lett., 72:3634–3637, Jun 1994.