EuroSciPy logo

EuroScipy

Cambridge, UK - 27-30 August 2014

Single Pythonic Molecules

Rebecca Murphy

Sat 30 12:10 p.m.–12:20 p.m. in William Gates Building

Abstract

The Science

Single-molecule fluorescence (smFRET) is an experimental biophysical technique. Using smFRET, researchers can study the structure and dynamic behaviour of individual biological molecules. However, accurate analysis of smFRET data remains a problem. To learn about the behaviour of biological systems, we must extract photon statistics from challenging, artifact-prone conditions.

To date, smFRET researchers have not established common methods for data analysis. Each research group maintains their own codebase and develops their own analysis tools. This is becoming a significant barrier to research progress as it is difficult to verify the performance of published analyses, or to reproduce the work of other research groups.

Here, we present a series of fully open source tools written in python to address the challenges of smFRET data analysis.

The Library

Firstly, pyFRET is the first open source library for analysis of smFRET data. pyFRET is an extremely small library. Although just a few hundred lines of python code, it provides all the necessary tools for simple end-to-end processing of smFRET data, including file parsing, photon burst selection and denoising, data analysis and data fitting.

The Model

Secondly, we present a physical model of a smFRET experiment. This model, constructed in a few hundred lines of python, allows rapid simulation of smFRET data. Using this model to generate datasets with known parameters, we evaluate the reliability of different smFRET analysis methods, as well as their robustness to experimental conditions, such as signal to noise ratio, dataset size and biomolecule concentration.

The Statistics

Finally, we use our physical model as the basis for a novel analysis using model-based Bayesian statistics. We have implemented a custom-built Metropolis sampler to infer intramolecular distances and molecular concentrations directly from a raw smFRET dataset with no intermediate data selection steps. We show that this Bayesian method systematically outperforms earlier techniques in both accuracy and robustness. Furthermore, our implementation deliberately separates the sampling process from the parametric model, providing a general, extensible tool for Bayesian model fitting.

Together, our python analysis tools make a significant contribution to the smFRET research community. They provide a framework of both novel and well-established analysis tools as well as a straightforward method to simulate datasets with known parameters and evaluate analytical techniques.

Sponsors