Chinese Physics B, Volume. 29, Issue 10, (2020)

Find slow dynamic modes via analyzing molecular dynamics simulation trajectories

Chuanbiao Zhang1 and Xin Zhou2、†
Author Affiliations
  • 1College of Physics and Electronic Engineering, Heze University, Heze 27405, China
  • 2School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100190, China
  • show less

    It is a central issue to find the slow dynamic modes of biological macromolecules via analyzing the large-scale data of molecular dynamics simulation (MD). While the MD data are high-dimensional time-successive series involving all-atomic details and sub-picosecond time resolution, a few collective variables which characterizing the motions in longer than nanoseconds are needed to be chosen for an intuitive understanding of the dynamics of the system. The trajectory map (TM) was presented in our previous works to provide an efficient method to find the low-dimensional slow dynamic collective-motion modes from high-dimensional time series. In this paper, we present a more straight understanding about the principle of TM via the slow-mode linear space of the conformational probability distribution functions of MD trajectories and more clearly discuss the relation between the TM and the current other similar methods in finding slow modes.

    Keywords

    1. Introduction

    Molecular dynamics (MD) simulation has been widely used in recent a few decades in very various fields, such as detecting the relationship between chemical structures and functions of biological macromolecules.[1,2] The raw data of MD simulations are high-dimensional time series with the total time length usually in the order of microsecond or even longer and containing the femtosecond resolution and all-atomic coordinates and velocities. The data are too complicated to provide an intuitive understanding of the simulated system; thus we usually need to find much less (collective) variables to characterize the motion of the system in a larger timescale to understand its main features.[3] For simpler systems, there may exist some direct ways to select some collective variables to (approximately) represent the main features via our experiences or intuitions. For example, for small proteins or peptides, the dihedral angles in the backbone can describe well the large (and usually slow) motions of molecules. Thus they can be used as the key collective variables. However, in more complex systems, it is usually needed to develop more systematical data analysis methods to capture the main features from big MD data.

    Many methods were presented to simplify large data to take out the main features. For example, lots of clustering algorithms[46] and reduction dimensionality techniques[711] focused on finding the low-dimensional structure of data point set, and were applied in the analysis of MD simulation data of biomolecules.[1223] The methods are based on the geometric distances between data points (conformations of molecules) can get the static structure of the whole data set without applying any dynamics information. A recent technique, the diffusion map, constructed an artificial diffusion dynamics among the static data points based on the distances then tried to find the slow modes of the diffusion dynamics, which achieved the static structure robustly under the help of the artificial dynamics information.[11] On the other hand, some methods, such as the Markov states model (MSM),[2431] and its improvement, tICA,[32,33] the trajectory map (TM),[3440] applied the dynamics information of MD trajectories to find the slow-dynamic structure of data. The TM gives a simple and efficient way to take into account the time successiveness along the MD trajectory to construct the structure of slow dynamics robustly. It promises a general method in analyzing various complicated time series. The implementation and application of TM were presented in previous works.[34,3740] Here we revisit the mathematical principle behind the TM and the relationship between the TM and other related techniques; thus we can more clearly discuss the advantages and disadvantages of these methods.

    2. Theory and method

    2.1. Slow dynamics modes

    Generally, let us consider a stochastic process, as the molecular dynamics (MD) simulation, e.g., following the (overdamped) Langevin equation

    $$ \begin{eqnarray}\gamma {\rm{d}}{\boldsymbol{q}}=-\displaystyle \frac{\partial U}{\partial {\boldsymbol{q}}}{\rm{d}}t+\sqrt{2{k}_{{\rm{B}}}T\gamma {\rm{d}}t}{\rm{d}}{\boldsymbol{G}},\end{eqnarray}$$ (1)

    where γ is the friction coefficient, q is a simple denotation of the (high dimensional) conformational coordinate, U(q) is the potential energy surface, kB is the Boltzmann constant, T is the temperature, and d G is the normal Gaussian white noise. Equivalently, we can described the process by the Fokker–Planck equation based on the probability density function,

    $$ \begin{eqnarray}\displaystyle \frac{\partial }{\partial t}P(q,t)={{\boldsymbol{L}}}_{FP}P(q,t),\end{eqnarray}$$ (2)

    with the operator
     1
    Simply, we set γ to be a constant, (the unit if resetting time d τ = d t/γ). The equilibrium solution is the Boltzmann distribution, Peq(q) = (1/Z) eβ U(q). Here Z = ∫ d q eβ U(q), named as the partition function, is the center quality in statistical physics, and β = 1/kBT. In this paper, we will set kBT as unit without explicit mention again.

    Defining P(q,t) = [Peq(q)]1/2Ψ(q,t), we have

    $$ \begin{eqnarray}-\displaystyle \frac{\partial }{\partial t}\varPsi (q,t)={\boldsymbol{H}}\varPsi (q,t),\end{eqnarray}$$ (3)

    the imaginary time Schödinger equation, where H = – 2 / ∂ q2 + V(q) is a real symmetric operator, the same as the Hamilton operator in quantum mechanism. The effective potential
     1
    Therefore, we have the spectrum expansion

    $$ \begin{eqnarray}P(q,t)={P}_{{\rm{eq}}}(q)\displaystyle \sum _{n=0}^{\infty }{a}_{n}(0){{\rm{e}}}^{-{\lambda }_{n}t}{\hat{\varPhi }}_{n}(q),\end{eqnarray}$$ (4)

    where Φ^n(q)=Φn(q)/Φ0(q), and Φn(q) is the eigenfunction of the operator H with the eigenvalue λn, i.e., HΦn(q) = λn Φn(q). We have 0 = λ0 < λ1 < … < λn < …, and Φ0(q) = [Peq(q)]1/2. The orthogonal condition Φ^n(q)Φ^m(q)eq=Φn(q)Φm(q)dq=δnm, and the completeness Peq(q)n=0Φ^n(q)Φ^n(q)=δ(qq). Here 〈 ⋅ 〉eq means the expectation under the distribution Peq(q). The expansion coefficient an(0)=Φ^n(q)0=dqΦ^n(q)P(q,0), the expectation under the initial distribution P(q,t = 0).

    Due to the fast decay of motion modes with large eigenvalues as time, we can consider only the equilibrium mode Φ0(q) and the first N eigenfunctions, {Φ1(q), …, ΦN(q)}, named as slow modes, to describe the main features of the dynamics of system at long time scale. For example, if U(q)=12kq2 in one dimension space, we have V(q)=14k2q2k/2. It is easy to know the eigenvalues and eigenfunctions of the operator H^ from that of the quantum harmonic oscillator, λn = n k, and the Φ^n(q) is the n-order Hermite polynomial. In this case, Φ0(q) gives the equilibrium information, and λ1 = k corresponds to the time scale to reach the equilibrium.

    2.2. Linear space of slow modes and metastable states

    For any probability function P(q,t), we can suppose it already evolved a not-short time, thus

    $$ \begin{eqnarray}\displaystyle \frac{P(q,t)}{{P}_{{\rm{eq}}}(q)}\approx 1+\displaystyle \sum _{n=1}^{N}{a}_{n}(t){\hat{\varPhi }}_{n}(q),\end{eqnarray}$$ (5)

    i.e., it approximately belongs to the linear space spanned by the slow modes. For example, considering an MD trajectory q(t), t ∈ [0,τ], without losing generality, the corresponding probability function P(q)=(1/τ)0τdtδ(qq(t)) is a linear combination of the slow modes (if not, reset a later time as t = 0 and discarding the earlier conformations). Therefore, we can generate lots of MD trajectories from different initial conformations, the corresponding probability functions (or more exactly, the finite-size samples), denoted as {Pi(q)}, i = 1, …, m, can be applied to linearly combine to span the slow dynamic modes of the system

    $$ \begin{eqnarray}{\hat{\varPhi }}_{n}(q)=\displaystyle \frac{\sum _{i}{b}_{n,i}{P}_{i}(q)}{{P}_{{\rm{eq}}}(q)},\end{eqnarray}$$ (6)

    where a linear uncorrelated subset of {Pi(q)} is sufficient to expand any slow mode.

    Usually, we can split the whole conformational space into some conformational regions which are basins or super-basins of the potential energy surface and separated by high energy barriers; each is named as a metastable state in slow dynamics. In other words, each metastable state is a conformational region where the system often reaches local equilibrium inside before leaving out. The slow modes correspond to transitions among metastable states, and the inner local equilibrium within each (small) metastable state is corresponds to faster modes. Therefore, all the slow-mode functions {Φ^n(q)} provide a splitting of the whole conformational space into metastable states, which are approximately constants inside each state but vary obviously only at boundaries of metastable states. We have

    $$ \begin{eqnarray}{\hat{\varPhi }}_{n}(q)=\displaystyle \sum _{\alpha }{c}_{n\alpha }{\varTheta }_{\alpha }(q).\end{eqnarray}$$ (7)

    Here Θα(q) is the characteristic function of the α metastable state, whose value is unit inside the state but zero otherwise. It means that conformations inside the same state are equal in the viewpoint of slow dynamics. As involving more (shorter) dynamic modes, more metastable states are split and more refined description about the conformational space are obtained.

    MD trajectories similarly give the splitting of metastable states, since

    $$ \begin{eqnarray}\displaystyle \frac{{P}_{i}(q)}{{P}_{{\rm{eq}}}(q)}=\displaystyle \sum _{\alpha }{c}_{i\alpha }^{^{\prime} }{\varTheta }_{\alpha }(q),\end{eqnarray}$$ (8)

    i.e., the local equilibrium distribution is proportional to the global one inside each metastable state. It means that there is the same linear structure among slow-mode functions, the characteristic functions of metastable states, and the conformational probability functions from MD trajectories. The central idea of the trajectory map (TM)[34,38] is to achieve slow modes or metastable states from MD trajectories.

    2.3. Trajectory map

    We apply a set of known conformational functions, denoted as {Aα(q)}, μ = 1, 2, …, N0, as basis functions to coarsely represent the linear space spanned by the slow-mode functions,

    $$ \begin{eqnarray}\displaystyle \frac{{P}_{i}(q)}{{P}_{{\rm{eq}}}(q)}\approx \displaystyle \sum _{\mu }{p}_{i,\mu }{A}^{\mu }(q).\end{eqnarray}$$ (9)

    Here, the approximation which comes from the incompleteness of the set of basis functions less affects the construction of the slow-mode space. The coefficient pi,μ = 〈 Aμ(q) 〉i is estimated from the finite-size conformational sample corresponded to Pi(q) (a segment of MD trajectory length of τ), where the conjugated basis function Aμ(q) satisfying

    $$ \begin{eqnarray}{\langle {A}_{\mu }(q){A}^{\nu }(q)\rangle }_{{\rm{eq}}}={\delta }_{\mu \nu },\end{eqnarray}$$ (10)

    is estimated in the conformational sample corresponded to Peq(q). Usually, the equilibrium sample is absence, we can apply a reference distribution Pref(q), for example, the mean of all MD trajectories, to replace Peq(q), which less affects the construction of linear space of slow modes, since Pref(q) is also proportional to Peq(q) in each metastable state, thus Pi(q)/Pref(q) still belongs to the slow-mode linear space.

    In this paper, we do not distinguish the conformation sample of an MD trajectory and its corresponding probability function Pi(q), unless avoiding some possible confusions. In addition, for simplification, we linearly combine these selected physical variables Aν(q) to get a set of orthonormalize basis functions {A^μ(q)=A^μ(q)}. Thus, except A^0(q)1, we have A^μ(q)ref=0. We use the Einstein summation convention of repeat subscript and superscript indexes below without explicit mentioning.

    Thus, each MD trajectory or segment Pi(q) is mapped as a vector

    $$ \begin{eqnarray}{{\boldsymbol{p}}}_{i}={p}_{i,\mu }{\hat{A}}^{\mu }(q),\end{eqnarray}$$ (11)

    and all these vectors {pi} span the slow-mode linear space (more exactly, its projection in the applied basis functions {A^μ(q)}). A larger set of basis functionss is helpful for providing more complete information about slow modes, but even when the basis functions are not sufficient so that some of the slow modes are not able to distinguish completely, the linear projection does not bring any additional biased results. In practice, the size of sample corresponded to Pref(q) also limits the number of linear uncorrelated basis functions {A^μ(q)}. More discussion about the basis functions were shown in the previous literatures about TM.[34,3740]

    We apply the principle component analysis (PCA) to the trajectory-mapped vectors {pi} to get the first a few principle components (PCs), denoted as {B^α(q)},α=1,,m, as a set of orthonormalized basis functions of the linear space of slow dynamics modes,

    $$ \begin{eqnarray}{\hat{B}}_{\alpha }(q)={b}_{\alpha \mu }{\hat{A}}^{\mu }(q),\end{eqnarray}$$ (12)

    here bαμ is the α-th eigenvector of the variance–covariance matrix of these mapped points, ipp. We have B^α(q)B^β(q)ref=δαβ. A hint for choosing the number of principal components is provided by the plot of eigenvalues sorted in decreasing order. The first few eigenvalues which are significantly greater than zero are usually correspond to slow processes.

    In the low-dimensional space of {B^α(q)}, many common analyzing or visualizing techniques can be applied to achieve the slow dynamics modes or identify metastable states and transition among states. For example, we can project the original MD trajectory q(t) to get a smoothed slow-dynamic-dominate trajectory B(t) with the components

    $$ \begin{eqnarray}{\hat{B}}_{\alpha }(t)=\displaystyle \frac{1}{\Delta t}\displaystyle {\int }_{t}^{t+\Delta t}{\hat{B}}_{\alpha }(q({t}^{^{\prime} })){\rm{d}}{t}^{^{\prime} }.\end{eqnarray}$$ (13)

    Here we applied a time-window smoothing (with length Δ t ) to further filter fast dynamics modes of MD trajectory. Usually, Δ t is set about two to three orders of magnitude smaller than the length of trajectory segment (τ), and the results of TM is insensitive to the specific value of Δ t. Then we can identify the obvious change along the trajectories B(t) as transition events between metastable states, or calculate the two-point similar matrix in the B space, C(t2,t1) = B(t2) ⋅ B(t1) to get the transition events.[38]

    3. Application

    In this section, we use the Trp-cage protein to illustrate the basic application of the TM. The Trp-cage contains 20 residues, includes three secondary structures in its native structure: an α helix, a 3–10 helix, and a polyproline II segment.[4143] This protein can fold in microseconds, and the stability of native structure originates from the hydrophobic core around the Trp residue.[43,44] Due to its small size and various meta-stable states, Trp-cage becomes an ideal protein for testing both sampling algorithms and force fields of simulations, thus it has been extensively studied by MD and experiments.[4554] However, its folding kinetics and folding pathways are still not fully understood.[andryushchenko2016hydrodynamic] Recently, Lindorff–Larsen et al. have performed a 208-μs equilibrium MD simulation of the Trp-cage at 290 K.[43] They applied the CHARMM22* force field[56] and the modified TIP3P water model compatible with the CHARMM force field.[57,58] The generated MD trajectory involves about 2.08 × 105 snapshots, with a time interval of 200 ps. We downloaded the trajectory file from D. E. Shaw Research.

    We choose the dihedral angles of protein backbone, ϕi and ψi with i = 1, …, 18, as collective variables to describe the large-scale motions. Due to the periodicity of dihedral angles, we transform the angles into their cosine and sine functions, to get 72 basis functions in total,[59] {Aμ(q)}, μ = 1, …, 72 in the TM. The time evolution of the root-mean-square deviation (RMSD) clearly distinguishes the folded state (native structure) and unfolded structure, but more details inside the unfolded structure is not so clear (Fig. 1(a)).

    (a) Time evolution of the RMSD of the Trp-cage to its native structure, the slow variables B1 and B2 obtained in the TM. Red line is time-window-smoothed one (Δ t = 200 ns). (b) Eigenvalue of the variance–covariance matrix of the trajectory-mapped points. The inset is the contribution of each basis function to the slow variables B1 and B2. (c) The free-energy landscape (in units of kBT) in the slow-variable space (B1, B2).

    Figure 1.(a) Time evolution of the RMSD of the Trp-cage to its native structure, the slow variables B1 and B2 obtained in the TM. Red line is time-window-smoothed one (Δ t = 200 ns). (b) Eigenvalue of the variance–covariance matrix of the trajectory-mapped points. The inset is the contribution of each basis function to the slow variables B1 and B2. (c) The free-energy landscape (in units of kBT) in the slow-variable space (B1, B2).

    We apply the TM with the free parameter τ = 2 μs to detect the slow modes in the τ time scale, which hiding in the total 208-μs-long all-atomic MD trajectory. As shown in Fig. 1(b), two eigenvalues of variance-covariance matrix are significantly greater than zero, which indicates that this system involves two slow dynamics processes. These two slow variables (denoted as B1 and B2) are shown in Fig. 1(a). We show the free energy surface in the (B1, B2) space (Fig. 1(c)), Δ G = –kBT ln P, where P is the probability distribution of the molecular system along the slow variables. In this two-dimensional space, the entire region includes three minima that correspond to three metastable states, i.e., the folded state (Sf), unfolded state (Su), and extended state (Se). The B1 primarily distinguishes state Sf from the other two states. In other words, the B1 which combined by 72 angle-based basis functions has similar physical meaning with the RMSD which is common used to identify folded state of protein. The B2 distinguishes state Se from the other states.

    Figure 2(a) shows the time-ordered similarity matrix of the MD trajectory after projecting in the slow dynamics space, C(t2,t1) = B(t2) ⋅ B(t1). The matrix is found to divide into some blocks. Inside each block (each time segment), the element of similarity matrix is almost equal to unity, it indicates that all the conformations in each time segment are almost identical in the B space, i.e., they belong to the same metastable state without occuring transition in the time segment. At the time points which separated different blocks, the MD trajectory occurs transitions from one to another metastable states, thus the similarity of conformations in the B space is obvious. Thus, this matrix can help us to identify transition events.

    Time-ordered similarity matrix of the MD trajectory. The similarity between two samples C(t2, t1) = B(t2) ⋅ B(t1). (b) The time-rearranged similarity matrix, suggesting three metastable states. (c) Kinetic transition network. Numbers near the arrows are the corresponding transition rates. The population of each state in the 208-μs MD trajectory is listed in bracket (which approaches to the equilibrium one, in consistent with the fact the folding and unfolding transitions occur more than ten times during the MD simulation). Residue TRP6 and PRO17 are shown in blue, GLY11 in red.

    Figure 2.Time-ordered similarity matrix of the MD trajectory. The similarity between two samples C(t2, t1) = B(t2) ⋅ B(t1). (b) The time-rearranged similarity matrix, suggesting three metastable states. (c) Kinetic transition network. Numbers near the arrows are the corresponding transition rates. The population of each state in the 208-μs MD trajectory is listed in bracket (which approaches to the equilibrium one, in consistent with the fact the folding and unfolding transitions occur more than ten times during the MD simulation). Residue TRP6 and PRO17 are shown in blue, GLY11 in red.

    From the similarity matrix, the PCCA+ algorithm[23] is implied to cluster the similar segments together. As shown in Fig. 2(b), the rearranged similarity matrix clearly show three blocks which correspond to three metastable states. A transition network was constructed from the similarity matrix (Fig. 2(c)). The folding path was SeSuSf, without the direct transition between Se and the nature structure Sf. The typical protein structure of each state (Fig. 2(c)) and the components of slow variables help us to understand the main distinction between each state. As mentioned in Eq. (12), these variables are actually linear combinations of basis functions. The combination coefficient bαμ is illustrated in Fig. 1(b) which can used to obtain the understanding about the physical meaning of slow variables. The main components of B1 is ψTRP6 (ψ angle of TRP6) and ϕPRO17. Those two residues correspond to the hydrophobic core. The folding/unfolding process of the protein is closely related to the formation of hydrophobic core. The main components of B2 is ψGLY17, which correspond to the middle part of the protein. The twist of ψGLY17 make the structure of state Se looser than Su (Fig. 2(c)). The three-state model constructed by TM provides us with more details about the folding dynamics of the Trp-cage protein.

    4. Discussion

    The propagator, G(q, t|q0,0), also named as Green function, is the condition probability of the system to be q at time t while it was q0 at t0, which describes the dynamics of system. For large t, we have

    $$ \begin{eqnarray}G(q,t|{q}_{0},0)={P}_{{\rm{eq}}}(q)\left[1+\displaystyle \sum _{n=1}^{N}{{\rm{e}}}^{-{\lambda }_{n}t}{\hat{\varPhi }}_{n}(q){\hat{\varPhi }}_{n}({q}_{0})\right],\end{eqnarray}$$ (14)

    and P(q,t) = ∫ G(q,t|q0,0) P(q0,0) d q0.

    It is easy to know that the slow-mode functions are the first a few eigenfunctions of G(q,t|q0,0). In principle, we can represent the propagator as a time correlation matrix by using the basis functions {Aμ(q)},

    $$ \begin{eqnarray}\begin{array}{lll}{G}_{\,\nu }^{\mu }(t,0) & = & \displaystyle \int {\rm{d}}q{\rm{d}}{q}^{^{\prime} }{A}^{\mu }(q)G(q,t|{q}^{^{\prime} },0){A}_{\nu }({q}^{^{\prime} })P({q}^{^{\prime} },0)\\ & \equiv & {\langle {A}^{\mu }(q(t)){A}_{\nu }(q(0))\rangle }_{0},\end{array}\end{eqnarray}$$ (15)

    then diagonalize the matrix to get the slow-mode functions.

    For well constructing the propagator to get the slow dynamics modes, it is natural to use a great number of (thousands or much larger) cell functions θμ(q) as an approximately complete set of basis functions. Here {θμ(q)} is the character function of the cells, whose value is unit while q is in the μ cell, otherwise zero. It means that {θμ(q)} splits whole the conformational space into lots of small cells, then any conformational function can use these cell function to expand by neglecting the fast-spatial varying of function inside each of cell (In practice, we can only split the sample of all conformations and group the sample into many clusters according to their neighboring, then {θμ(q)} which are the character functions of these clusters, describe the belonging of any sampled conformation q). These cells are required to so small that we can suppose the local equilibrium inside each cell is easy to reach in a short time. Thus the cells provide a coarse description of conformations, we do not distinguish conformations inside the same cell and think they are identity in slow-dynamics viewpoint. Thus, {θμ(q)} is approximately complete in describing the slow dynamics modes. We can apply {θμ(q)} to represent the detailed varying of slow-mode functions Φ^(q) in whole the conformational space (in whole the finite-size set of all sampled conformations in practice). In the cases, Aμ(q) = θμ(q), and Aμ(q)=(1/Z^μ)θμ(q), where Z^μ=θμ(q)eq is the equilibrium probability inside the μ cell. The propagator is the transition probability matrix among these cells,

    $$ \begin{eqnarray}\begin{array}{lll}{G}_{\,\nu }^{\mu }(t) & = & \displaystyle \frac{1}{{\hat{Z}}_{\mu }}{\langle {\theta }_{\mu }(q(t)){\theta }_{\nu }(q(0))\rangle }_{{\rm{eq}}}\\ & = & \displaystyle \frac{{n}_{\mu \nu }(t,0)}{{n}_{\nu }(0)},\end{array}\end{eqnarray}$$ (16)

    provides a very detailed representation of the propagator. Here nμν(t,0) is the probability that q locates in the μ cell at t and starts from the ν cell at t = 0, and nν(0) is the probability in the ν cell at t = 0. It is worth to mention, due to the character of the cell functions, we actually can use any initial probability P(q,0) to calculate the matrix without altering results. It is easy to know that Gνμ(t)0, and μGνμ(t)1. Thus, the eigenvalues of the matrix are between 0 and 1 for any t, and the largest a few eigenvalues and eigenvectors give the slow modes of motion.

    It is actually the main idea of MSM which was widely applied for analyzing metastable states of slow motions in biological molecular systems from ensemble dynamics simulations.[2431] In MSM, the cells are defined in the sample set of sampled conformations. All sampled conformations are clustered into thousands of groups based on the pair distance of conformations, then each group of conformations corresponds into a cell function. Along the MD trajectories, the transition probability from one cell to another after a time interval t is estimated as the corresponding matrix element. Due to the completeness of basis functions and not requiring the initial distribution be equilibrium, MSM provides a complete construction of slow-dynamics modes in principle, and its eigenfunctions are one-to-one corresponding to the slow dynamics modes. However, in practice, very large data set of simulations is often required to get good estimate of each matrix element with sufficient statistical accuracy, which requiring sufficient events between each pair of cells.

    Rather than applying a great number of cell functions as the basis functions, a recent improvement of the original MSM, named as tICA, was presented to identify slow dynamics by constructing the time correlation matrix of some physical variables, Gμν(t,0)=A^μ(q(t))A^ν(q(0)). Here {A^μ(q)} is same as the basis functions applied in the TM. As we already mentioned, the time correlation matrix is a finite-dimensional approximate representation of the dynamics propagator. Thus, in principle, the tICA gives slow dynamical modes by diagonalizing the time correlation matrix to achieving its first a few slowest eigen-modes, while sufficient basis functions are applied and the initial distribution P(q,0) already reached the equilibrium one. Anyway, in practice, usually not too many basis functions can be applied, and the initial distribution P(q,0) may be deviated obviously from Peq(q), thus the calculated time correlation matrix may loss the character of the original propagator more or less. For example, since the time correlation matrix is not symmetric, some of its eigenvalues may be not real, thus do not correspond to the rates of dynamical modes then we cannot directly get slow modes from the eigenvalues directly.

    As a comparison, TM calculates and diagonalizes the variance-covariance matrix of trajectory-mapped points,

     1
    It is easy to know, Σμν is symmetric and a kind of average of Gμν(t2,t1),

    $$ \begin{eqnarray}{\varSigma }^{\mu \nu }=\displaystyle \frac{1}{\tau }\displaystyle {\int }_{0}^{\tau }{\rm{d}}t\left(1-\displaystyle \frac{t}{\tau }\right)[{C}_{\mu \nu }(t)+{C}_{\nu \mu }(t)],\end{eqnarray}$$ (17)

    where

    $$ \begin{eqnarray}{C}_{\mu \nu }(t)=\displaystyle \frac{1}{\tau -t}\displaystyle {\int }_{0}^{\tau -t}{\rm{d}}{t}_{1}{G}_{\mu \nu }(t+{t}_{1},{t}_{1}).\end{eqnarray}$$ (18)

    The TM focuses more on the difference of MD trajectories {Pi(q)}, but less on that of conformations in the same trajectory, since the latter is mainly related to the short-time correlation. Therefore, the average of the time correlation matrix in the TM provides a suitable way to more efficiently extract the slow dynamics modes by filtering fast motions.

    5. Summary

    The TM can extract the slow dynamic processes from time-series data. The key of TM is to make use of the time continuity between conformations, rather than only based on the geometric similarity of single conformation to build the slow variables of the system. Compared with the other methods, the TM is to apply the probability function of trajectories to combine the slow-dynamics functions directly linearly. It makes the TM robust and straightforward, less affecting by the incompleteness of basis functions in representing these slow dynamics functions. Besides, the TM gives slow-dynamics related analyzed collective variables, which not only provides a simple understanding of the slow dynamics of systems from MD trajectories but also is applied to extend the time scale of further MD simulations[39] by combining with some enhanced sampling techniques, such as metadynamics,[60] umbrella sampling,[61] and forward flux methods,[62]etc.

    [1] S Piana, K Lindorff-Larsen, D E Shaw. Proc. Natl. Acad. Sci. USA, 109(2012).

    [2] S V Lyulin, A A Gurtovenko, S V Larin, V M Nazarychev, A V Lyulin. Macromolecules, 46, 6357(2013).

    [3] T J Lane, D Shukla, A B Kyle, S P Vijay. Curr. Opin. Struct. Biol., 23, 58(2013).

    [4] A K Jain. Machine Learning and Knowledge Discovery, 3-4(2008).

    [5] E Schubert, J Sander, M Ester, H P Kriegel, X Xu. ACM Trans. Database Syst., 42, 19(2017).

    [6] R Alex, A Laio. Science, 344, 1492(2014).

    [7] H Hotelling. J. Educ. Psychol., 24, 417(1933).

    [8] A Hyvrinen, E Oja. Neural Netw., 13, 411(2000).

    [9] C R Schwantes, V S Pande. J. Chem. Theory. Comput., 9, 2000(2013).

    [10] J B Tenenbaum, V de Silva, J C Langford. Science, 290, 2319(2000).

    [11] B Nadler, S Lafon, R R Coifman, I G Kevrekidis. Appl. Comput. Harmon. Anal., 21, 113(2006).

    [12] J E Shea, C L Brooks. Ann. Rev. Phys. Chem., 52, 499(2001).

    [13] Y G Mu, P H Nguyen, G Stock. Proteins, 58, 45(2005).

    [14] G E Sims, I G Choi, S H Kim. Proc. Natl. Acad. Sci. USA, 102, 618(2005).

    [15] F Rao, M Karplus. Proc. Natl. Acad. Sci. USA, 107, 9152(2010).

    [16] P Das, M Moll, H Stamati, L E Kavraki, C Clementi. Proc. Natl. Acad. Sci. USA, 103, 9885(2006).

    [17] B Nadler, S Lafon, R R Coifman, I G Kevrekidis. Appl. Comput. Harmon. Anal., 21, 113(2006).

    [18] S V Krivov, M Karplus. Proc. Natl. Acad. Sci. USA, 101(2004).

    [19] G G Maisuradze, A Liwo, H A Scheraga. Phys. Rev. Lett., 102(2009).

    [20] A E Torda, W F Gunsteren. J. Comput. Chem., 15, 1331(1994).

    [21] J Y Shao, S W Tanner, N Thompson, T E Cheatham. J. Chem. Theory. Comput., 3, 2312(2007).

    [22] P Deuflhard, W Huisinga, A Fischer, C Schutte. Linear Algebra Appl., 315, 39(2000).

    [23] P Deuflhard, M Weber. Numer Linear Algebra Appl., 398, 161(2005).

    [24] D Gfeller, P De Los Rios, A Caflisch, F Rao. Proc. Natl. Acad. Sci. USA, 104, 1817(2007).

    [25] F Noe, I Horenko, C Schutte, J C Smith. J. Chem. Phys., 126(2007).

    [26] J D Chodera, N Singhal, V S Pande, K A Dill, W C Swope. J. Chem. Phys., 126(2007).

    [27] G R Bowman, V S Pande. Proc. Natl. Acad. Sci. USA, 107(2010).

    [28] G R Bowman, L Meng, X Huang. J. Chem. Phys., 139(2013).

    [29] J K Weber, R L Jack, V S Pande. J. Am. Chem. Soc., 135, 5501(2013).

    [30] V S Pande, K Beauchamp, G R Bowman. Methods, 52, 99(2010).

    [31] N J Deng, W Dai, R M Levy. J. Phys. Chem. B, 117(2013).

    [32] Y Naritomi, S Fuchigami. J. Chem. Phys., 134(2011).

    [33] F Nuske, B G Keller, G Perez-Hernandez, A S J S Mey, F Noe. J. Chem. Theory. Comput., 10, 1739(2014).

    [34] L C Gong, X Zhou. J. Phys. Chem. B, 114(2010).

    [35] L C Gong, X Zhou. Phys. Rev. E, 80(2009).

    [36] C B Zhang, M Li, X Zhou. Chin. Phys. B, 24(2015).

    [37] L C Gong, X Zhou, Z C Ouyang. PloS One, 10(2015).

    [38] C B Zhang, J Yu, X Zhou. J. Phys. Chem. B, 121, 4678(2017).

    [39] C B Zhang, F F Ye, M Li, X Zhou. Sci. China: Phys. Mech., 62(2019).

    [40] C B Zhang, S Xu, X Zhou. Phys. Rev. E, 100(2019).

    [41] J W Neidigh, R M Fesinmeyer, N H Andersen. Nat. Struct. Biol., 9, 425(2002).

    [42] B Bipasha, J C Lin, V D Williams, P Kummler, J W Neidigh, N H Andersen. Protein Eng. Des. Sel., 21, 171(2008).

    [43] K Lindorff-Larsen, S Piana, R O Dror, D E Shaw. Science, 334, 517(2011).

    [44] R Day, D Paschek, A E Garcia. Proteins, 78, 1889(2010).

    [45] V Spiwok, P Oborsky, J Pazurikova, A Krenek, B Kralova. J. Chem. Phys., 142(2015).

    [46] S B Kim, C J Dsilva, I G Kevrekidis, P G Debenedetti. J. Chem. Phys., 142(2015).

    [47] V A Andryushchenko, S F Chekmarev. Eur. Biophys. J., 45, 229(2016).

    [48] T W Zang, L L Yu, C Zhang, J P Ma. J. Chem. Phys., 141(2014).

    [49] L X Zhan, J Z Y Chen, W K Liu. Proteins, 66, 436(2007).

    [50] X H Huang, M Hagen, B Kim, R A Friesner, R H Zhou, B J Berne. J. Phys. Chem. B, 111, 5405(2007).

    [51] J W Pitera, W C Swope. Proc. Natl. Acad. Sci. USA, 100, 7587(2003).

    [52] V Hornak, R Abel, A Okur, B Strockbine, A Roitberg, C Simmerling. Proteins, 65, 712(2006).

    [53] Z Z Lai, N K Preketes, S Mukamel, J Wang. J. Phys. Chem. B, 117, 4661(2013).

    [54] R M Abaskharon, R M Culik, G A Woolley, F Gai. J. Phys. Chem. Lett., 6, 521(2015).

    [55] V A Andryushchenko, S F Chekmarev. Eur. Biophys. J., 45, 229(2016).

    [56] S Piana, K Lindorff-Larsen, D E Shaw. Biophys. J., 100, L47(2011).

    [57] W L Jorgensen, J Chandrasekhar, J D Madura, R W Impey, M L Klein. J. Chem. Phys., 79, 926(1983).

    [58] A D MacKerell, D Bashford, M Bellott et al. J. Phys. Chem. B, 102, 3586(1998).

    [59] A Altis, M Otten, P H Nguyen, R Hegger, G Stock. J. Chem. Phys., 128(2008).

    [60] A Laio, F L Gervasio. Rep. Prog. Phys., 71(2008).

    [61] G M Torrie, J P Valleau. J. Comput. Phys., 23, 187(1977).

    [62] R J Allen, C Valeriani, P R Wolde. J. Phys.: Condens. Matter, 21(2009).

    Tools

    Get Citation

    Copy Citation Text

    Chuanbiao Zhang, Xin Zhou. Find slow dynamic modes via analyzing molecular dynamics simulation trajectories[J]. Chinese Physics B, 2020, 29(10):

    Download Citation

    EndNote(RIS)BibTexPlain Text
    Save article for my favorites
    Paper Information

    Received: Jun. 18, 2020

    Accepted: --

    Published Online: Apr. 21, 2021

    The Author Email: Xin Zhou (xzhou@ucas.ac.cn)

    DOI:10.1088/1674-1056/abad24

    Topics