ExTASY: A Python Extensible Toolkit for Advanced Sampling and Analysis in Biomolecular Simulation

Dr. Ardita Shkurti

Audience level:


In this poster we will discuss a python-based toolkit for advanced sampling and analysis in biomolecular simulation providing details on the python scientific libraries used to implement it. We will also show examples of the enhanced sampling provided by one of the workflows of the framework compared to conventional molecular dynamics techniques, for a variety of biomolecular simulation use-cases.


A wide range of tools for biomolecular simulation, and postprocessing of the data generated, are available to the user community. However, such tools are often specific to a particular Molecular Dynamics (MD) software code or offer very focussed functionality, so biomolecular simulation scientists must become familiar with a range of only partially compatible tools to satisfy their requirements. ExTASY[1] (Extensible Tools for Advanced Sampling and analYsis) has been designed as a framework to address this issue, providing firstly a uniform interface that can integrate the capabilities of a variety of existing tools, and secondly enhancements to them in two particular directions: (i) extended advanced techniques for large-scale data analysis; and (ii) workflows enabling on-the-fly steering of the molecular simulation process according to reasoning extracted from analysis of intermediate data during the simulations. One of these workflows, CoCo-MD, automates the interleaving of MD simulations and trajectory data analysis using the CoCo method[2] in order to direct dynamics towards a wider exploration of the conformational space of the biomolecular system. The logic behind CoCo-MD leverages two python scientific packages, respectively scipy.linalg (linear algebra) and scipy.ndimage (multi-dimensional image processing). The linear algebra functions are used to determine a dimensionality-reduced space of the biomolecular system’s conformational dynamics. Then each of the snapshots of the coordinates of the system is projected in the new space. The histogram of the projections in the reduced space approximates the so-far sampled volume of the conformational space of the biomolecular system. Considering this histogram as a multi-dimensional image, we use the sicpy.ndimage package to determine regions most distant from any sampled so far. After remapping such points into the original dimensionality space, they become start points for the subsequent simulation step, with the aim of promoting the simulation of rare conformational shapes of the biomolecular system. We will show examples of the enhanced sampling provided by CoCo-MD with respect to traditional molecular dynamics approaches for a variety of types of biomolecular simulation problems.

[1] http://extasy-project.org/

[2] Laughton CA, Orozco M, Vranken W. CoCo: a simple tool to enrich the representation of conformational variability in NMR structures. Proteins 2009;75:206–216.