We are using information theory and hidden Markov models to improve the design and interpretation of single molecule fluorescence measurements. Single molecule (SM) measurements are rapidly becoming commonplace in research laboratories around the world and are contributing to many areas of investigation because of their ability to provide insight into phenomena that were previously intractable because of the ensemble averaging present in bulk measurements. In particular the dynamics of conformationally heterogeneous systems are benefiting from single-molecule studies. Protein folding and conformational dynamics, enzymology, ribozyme function, bacterial light harvesting, and protein-nucleic acid interactions are just a few examples of complex systems that have benefited from the application of SM techniques. However, the impact of SM results has been mitigated by the lack of uniform data analysis and interpretation. This research focuses on SM fluorescence measurements and how to place the experimental design, analysis, and expectations onto solid statistical and theoretical ground.
We use information theory to determine the fundamental limits of SM experiments. This provides a theoretical framework that can be used for experimental design as it provides the limit of the measurement’s ability to make inferences about the properties of the system. It will also provide the benchmark by which to judge data reduction methods.
We are developing statistically rigorous analysis methods based on hidden Markov models develops the algorithms and core codes to implement statistically rigorous methods of data analysis that allow unbiased estimation of system parameters with accuracy approaching the information theory limit including meaningful uncertainty estimates.
We are implementing these methods as user-oriented additions to common data analysis packages so as to provide useable tools for experimental design and analysis to allow other investigators to exploit these methods for their own research.
Intervalence electron transfer spectra in mixed-valence molecules are frequently modeled by an interacting pair of adiabatic potential energy surfaces. The presence or absence of a double minimum in the lower surface is correlated with trapped or delocalized charges, respectively. The coordinate involved in this interpretation is the asymmetric normal coordinate representing the nuclear motions taking the molecule from one extreme to the other. In this paper, a model is developed involving both a symmetric and an asymmetric coordinate on an equal footing. The time dependent theory of electronic spectroscopy is used to calculate both absorption and resonance Raman spectra. The model uses physically meaningful interactions in the mixed-valence molecule including the electronic coupling, vibrational coupling, vibrational force constants, and bond length changes as a result of the electron transfer. The effect of these interactions on the relative intensities of symmetric and asymmetric modes in both the absorption and resonance Raman spectra are examined. The quantitative calculations are discussed in parallel with the physical meaning. The calculations show how the spectra can smoothly go from domination by one type of mode to the other. The most important effects are caused by the bond length changes, the electronic coupling, and the force constant changes.
The measurement of fluorescence from single protein molecules has become an important new tool in the study of dynamic processes, allowing for the direct visualization of the motions experienced by individual proteins and macromolecular complexes. The data from such single-molecule experiments are in the form of photon trajectories, consisting of arrival times and wavelength information on individual photons. The analysis of photon trajectories can be difficult, particularly if the motions are occurring at rates comparable to the photon arrival rate or in the presence of noise. In this paper, we introduce the use of hidden Markov models (HMMs) for the analysis of photon trajectory data that operate using the photon data directly, without the need for ensemble averaging of the data as implied by correlation function analysis. Using a simple kinetic model, we examine the relationship between the uncertainty in the estimates of the motional rate and the photon detection rate. Remarkably, we obtain relative uncertainties in the rate constants of as little as 3% even when the interconversion rate is equal to the photon detection rate, and the uncertainty increases to only 10% when the interconversion rate is 10 times the photon detection rate. This suggests that useful information can be obtained for much faster kinetic regimes than have typically been studied. We also examine the impact of background photons on the determination of the rate and demonstrate that the HMM-based approach is robust, displaying small uncertainties for background photon arrival rates approaching that of the signal. These results not only are relevant in establishing the theoretical limits on precision, but are also useful in the context of experimental design. Finally, to demonstrate how the methodology can be extended to more complex kinetic models and how it can allow one to make use of the full power of statistics for purposes of model evaluation and selection, we consider a four-state kinetic model for protein conformational transitions previously studied by Schenter et al. (J. Phys. Chem. A 1999, 103, 10477). We show how an HMM can be used as an alternative to higher-order correlation function analysis for the detection of “conformational memory” and apparent non-Markovian dynamics arising from such temporally inhomogeneous kinetic schemes.
The top panel (a) shows a Monte Carlo simulated trajectory of a molecule diffusing through a spherically symmetric Gaussian collection volume. The next panel (b) shows the reciprocal of the inter-photon time. The delay between excitation and emission of the photon is the lifetime and is shown in panel (c). Panels (b) and (c) are used with three dimensional diffusion of the molecule through the focus of the instrument as the hidden Markov model to reconstruct the Brownian motion trajectory in panel (d). The trajectory was reconstructed by using Monte Carlo sampling of the HMM parameters to maximize the likelihood of the data in (b) and (c).
We use Shannon’s definition of information to develop a theory to predict the ability of a photon-counting-based single molecule experiment to result in the measurement of a desired property. We treat several phenomena that are commonly measured on single molecules. We treat spectral fluctuations of a solvatochromic dye. We treat assignment of the azimuthal dipole angle. We treat determination of a distance by fluorescence resonant energy transfer using Förster’s theory. We consider the effect of background and other “imperfections” on the measurement through the decrease in information. We have implemented the information theoretical results in cross-platform commercial analysis programs and have made them available for download at http://www.singlemolecule.net.
The interpretation of single-molecule measurements is greatly complicated by the presence of multiple fluorescent labels. However, many molecular systems of interest consist of multiple interacting components. We address this issue using multiply-labeled dextran polymers that we intentionally photobleach to the background on a single molecule basis. Hidden Markov models allow unsupervised analysis of the data to determine the number of fluorescent subunits involved in the fluorescence intermittency of the 6-carboxy-tetramethylrhodamine labels by counting the discrete steps in fluorescence intensity. The Bayes information criterion allows us to distinguish between hidden Markov models that differ by number of states, i.e., number of fluorescent molecules. We determine information-theoretical limits and show via Monte Carlo simulations that the hidden Markov model analysis approaches these theoretical limits. This technique has resolving power of one fluorescing unit up to as many as 30 fluorescent dyes with the appropriate choice of dye and adequate detection capability. We discuss the general utility of this method for determining aggregation-state distributions as could appear in many biologically important systems and its adaptability to general photometric experiments.
The probability-normalized (∑P(O|S)=1) fluorescence spectra of C153 in hexane (blue) and in methanol (green). The mutual information in bits between the state (polar versus nonpolar) and each photon emitted (red).From an information theory point of view the molecule encodes information into photons using the dyes as a transducer. The photons are converted into raw data by the detection apparatus and then decoded into a useful form by some data analysis procedure. From the reduced data we draw inferences about the molecule based on the data and our prior knowledge of the system. Two color experiments represent one of the most common types of single molecule fluorescence measurements. We developed a formulation of Shannons information theory to treat two-color problems and showed how it can be used to design experiments based on the number of photons required to deliver a particular amount of information.
Glucose/galactose binding protein (GGBP) functions in two different larger systems of proteins used by enteric bacteria for molecular recognition and signaling. Here we report on the thermodynamics of conformational equilibrium distributions of GGBP. Three fluorescence components appear at zero glucose concentration and systematically transition to three compo- nents at high glucose concentration. Fluorescence anisotropy correlations, fluorescent lifetimes, thermodynamics, computational structure minimization, and literature work were used to assign the three components as open, closed, and twisted conformations of the protein. The existence of three states at all glucose concentrations indicates that the protein continuously fluctuates about its conformational state space via thermally driven state transitions; glucose biases the populations by reorganizing the free energy profile. These results and their implications are discussed in terms of the two types of specific and nonspecific interactions GGBP has with cytoplasmic membrane proteins.
David S. Talaga, Current Opinion in Colloid & Interface Science 12:6 (2007) 285-296
This article examines the current status of Markov processes in single molecule fluorescence. For molecular dynamics to be described by a Markov process, the Markov process must include all states involved in the dynamics and the FPT distributions out of those states must be describable by a simple exponential law. The observation of non-exponential first-passage time distributions or other evidence of non-Markovian dynamics is common in single molecule studies and offers an opportunity to expand the Markov model to include new dynamics or states that improve understanding of the system.
Link to published version