Molecular dynamics (MD) simulation has been widely used in recent a few decades in very various fields, such as detecting the relationship between chemical structures and functions of biological macromolecules.[
Chinese Physics B, Volume. 29, Issue 10, (2020)
Find slow dynamic modes via analyzing molecular dynamics simulation trajectories
It is a central issue to find the slow dynamic modes of biological macromolecules via analyzing the large-scale data of molecular dynamics simulation (MD). While the MD data are high-dimensional time-successive series involving all-atomic details and sub-picosecond time resolution, a few collective variables which characterizing the motions in longer than nanoseconds are needed to be chosen for an intuitive understanding of the dynamics of the system. The trajectory map (TM) was presented in our previous works to provide an efficient method to find the low-dimensional slow dynamic collective-motion modes from high-dimensional time series. In this paper, we present a more straight understanding about the principle of TM via the slow-mode linear space of the conformational probability distribution functions of MD trajectories and more clearly discuss the relation between the TM and the current other similar methods in finding slow modes.
1. Introduction
Molecular dynamics (MD) simulation has been widely used in recent a few decades in very various fields, such as detecting the relationship between chemical structures and functions of biological macromolecules.[
Many methods were presented to simplify large data to take out the main features. For example, lots of clustering algorithms[
2. Theory and method
2.1. Slow dynamics modes
Generally, let us consider a stochastic process, as the molecular dynamics (MD) simulation, e.g., following the (overdamped) Langevin equation
Defining P(q,t) = [Peq(q)]1/2Ψ(q,t), we have
Due to the fast decay of motion modes with large eigenvalues as time, we can consider only the equilibrium mode Φ0(q) and the first N eigenfunctions, {Φ1(q), …, ΦN(q)}, named as slow modes, to describe the main features of the dynamics of system at long time scale. For example, if
2.2. Linear space of slow modes and metastable states
For any probability function P(q,t), we can suppose it already evolved a not-short time, thus
Usually, we can split the whole conformational space into some conformational regions which are basins or super-basins of the potential energy surface and separated by high energy barriers; each is named as a metastable state in slow dynamics. In other words, each metastable state is a conformational region where the system often reaches local equilibrium inside before leaving out. The slow modes correspond to transitions among metastable states, and the inner local equilibrium within each (small) metastable state is corresponds to faster modes. Therefore, all the slow-mode functions
MD trajectories similarly give the splitting of metastable states, since
2.3. Trajectory map
We apply a set of known conformational functions, denoted as {Aα(q)}, μ = 1, 2, …, N0, as basis functions to coarsely represent the linear space spanned by the slow-mode functions,
In this paper, we do not distinguish the conformation sample of an MD trajectory and its corresponding probability function Pi(q), unless avoiding some possible confusions. In addition, for simplification, we linearly combine these selected physical variables Aν(q) to get a set of orthonormalize basis functions
Thus, each MD trajectory or segment Pi(q) is mapped as a vector
We apply the principle component analysis (PCA) to the trajectory-mapped vectors {
In the low-dimensional space of
3. Application
In this section, we use the Trp-cage protein to illustrate the basic application of the TM. The Trp-cage contains 20 residues, includes three secondary structures in its native structure: an α helix, a 3–10 helix, and a polyproline II segment.[
We choose the dihedral angles of protein backbone, ϕi and ψi with i = 1, …, 18, as collective variables to describe the large-scale motions. Due to the periodicity of dihedral angles, we transform the angles into their cosine and sine functions, to get 72 basis functions in total,[
Figure 1.(a) Time evolution of the RMSD of the Trp-cage to its native structure, the slow variables
We apply the TM with the free parameter τ = 2 μs to detect the slow modes in the τ time scale, which hiding in the total 208-μs-long all-atomic MD trajectory. As shown in Fig. 1(b), two eigenvalues of variance-covariance matrix are significantly greater than zero, which indicates that this system involves two slow dynamics processes. These two slow variables (denoted as B1 and B2) are shown in Fig. 1(a). We show the free energy surface in the (B1, B2) space (Fig. 1(c)), Δ G = –kBT ln P, where P is the probability distribution of the molecular system along the slow variables. In this two-dimensional space, the entire region includes three minima that correspond to three metastable states, i.e., the folded state (Sf), unfolded state (Su), and extended state (Se). The B1 primarily distinguishes state Sf from the other two states. In other words, the B1 which combined by 72 angle-based basis functions has similar physical meaning with the RMSD which is common used to identify folded state of protein. The B2 distinguishes state Se from the other states.
Figure 2(a) shows the time-ordered similarity matrix of the MD trajectory after projecting in the slow dynamics space, C(t2,t1) =
Figure 2.Time-ordered similarity matrix of the MD trajectory. The similarity between two samples
From the similarity matrix, the PCCA+ algorithm[
4. Discussion
The propagator, G(q, t|q0,0), also named as Green function, is the condition probability of the system to be q at time t while it was q0 at t0, which describes the dynamics of system. For large t, we have
It is easy to know that the slow-mode functions are the first a few eigenfunctions of G(q,t|q0,0). In principle, we can represent the propagator as a time correlation matrix by using the basis functions {Aμ(q)},
For well constructing the propagator to get the slow dynamics modes, it is natural to use a great number of (thousands or much larger) cell functions θμ(q) as an approximately complete set of basis functions. Here {θμ(q)} is the character function of the cells, whose value is unit while q is in the μ cell, otherwise zero. It means that {θμ(q)} splits whole the conformational space into lots of small cells, then any conformational function can use these cell function to expand by neglecting the fast-spatial varying of function inside each of cell (In practice, we can only split the sample of all conformations and group the sample into many clusters according to their neighboring, then {θμ(q)} which are the character functions of these clusters, describe the belonging of any sampled conformation q). These cells are required to so small that we can suppose the local equilibrium inside each cell is easy to reach in a short time. Thus the cells provide a coarse description of conformations, we do not distinguish conformations inside the same cell and think they are identity in slow-dynamics viewpoint. Thus, {θμ(q)} is approximately complete in describing the slow dynamics modes. We can apply {θμ(q)} to represent the detailed varying of slow-mode functions
It is actually the main idea of MSM which was widely applied for analyzing metastable states of slow motions in biological molecular systems from ensemble dynamics simulations.[
Rather than applying a great number of cell functions as the basis functions, a recent improvement of the original MSM, named as tICA, was presented to identify slow dynamics by constructing the time correlation matrix of some physical variables,
As a comparison, TM calculates and diagonalizes the variance-covariance matrix of trajectory-mapped points,
5. Summary
The TM can extract the slow dynamic processes from time-series data. The key of TM is to make use of the time continuity between conformations, rather than only based on the geometric similarity of single conformation to build the slow variables of the system. Compared with the other methods, the TM is to apply the probability function of trajectories to combine the slow-dynamics functions directly linearly. It makes the TM robust and straightforward, less affecting by the incompleteness of basis functions in representing these slow dynamics functions. Besides, the TM gives slow-dynamics related analyzed collective variables, which not only provides a simple understanding of the slow dynamics of systems from MD trajectories but also is applied to extend the time scale of further MD simulations[
[1] S Piana, K Lindorff-Larsen, D E Shaw. Proc. Natl. Acad. Sci. USA, 109(2012).
[2] S V Lyulin, A A Gurtovenko, S V Larin, V M Nazarychev, A V Lyulin. Macromolecules, 46, 6357(2013).
[3] T J Lane, D Shukla, A B Kyle, S P Vijay. Curr. Opin. Struct. Biol., 23, 58(2013).
[4] A K Jain. Machine Learning and Knowledge Discovery, 3-4(2008).
[5] E Schubert, J Sander, M Ester, H P Kriegel, X Xu. ACM Trans. Database Syst., 42, 19(2017).
[6] R Alex, A Laio. Science, 344, 1492(2014).
[7] H Hotelling. J. Educ. Psychol., 24, 417(1933).
[8] A Hyvrinen, E Oja. Neural Netw., 13, 411(2000).
[9] C R Schwantes, V S Pande. J. Chem. Theory. Comput., 9, 2000(2013).
[10] J B Tenenbaum, V de Silva, J C Langford. Science, 290, 2319(2000).
[11] B Nadler, S Lafon, R R Coifman, I G Kevrekidis. Appl. Comput. Harmon. Anal., 21, 113(2006).
[12] J E Shea, C L Brooks. Ann. Rev. Phys. Chem., 52, 499(2001).
[13] Y G Mu, P H Nguyen, G Stock. Proteins, 58, 45(2005).
[14] G E Sims, I G Choi, S H Kim. Proc. Natl. Acad. Sci. USA, 102, 618(2005).
[15] F Rao, M Karplus. Proc. Natl. Acad. Sci. USA, 107, 9152(2010).
[16] P Das, M Moll, H Stamati, L E Kavraki, C Clementi. Proc. Natl. Acad. Sci. USA, 103, 9885(2006).
[17] B Nadler, S Lafon, R R Coifman, I G Kevrekidis. Appl. Comput. Harmon. Anal., 21, 113(2006).
[18] S V Krivov, M Karplus. Proc. Natl. Acad. Sci. USA, 101(2004).
[19] G G Maisuradze, A Liwo, H A Scheraga. Phys. Rev. Lett., 102(2009).
[20] A E Torda, W F Gunsteren. J. Comput. Chem., 15, 1331(1994).
[21] J Y Shao, S W Tanner, N Thompson, T E Cheatham. J. Chem. Theory. Comput., 3, 2312(2007).
[22] P Deuflhard, W Huisinga, A Fischer, C Schutte. Linear Algebra Appl., 315, 39(2000).
[23] P Deuflhard, M Weber. Numer Linear Algebra Appl., 398, 161(2005).
[24] D Gfeller, P De Los Rios, A Caflisch, F Rao. Proc. Natl. Acad. Sci. USA, 104, 1817(2007).
[25] F Noe, I Horenko, C Schutte, J C Smith. J. Chem. Phys., 126(2007).
[26] J D Chodera, N Singhal, V S Pande, K A Dill, W C Swope. J. Chem. Phys., 126(2007).
[27] G R Bowman, V S Pande. Proc. Natl. Acad. Sci. USA, 107(2010).
[28] G R Bowman, L Meng, X Huang. J. Chem. Phys., 139(2013).
[29] J K Weber, R L Jack, V S Pande. J. Am. Chem. Soc., 135, 5501(2013).
[30] V S Pande, K Beauchamp, G R Bowman. Methods, 52, 99(2010).
[31] N J Deng, W Dai, R M Levy. J. Phys. Chem. B, 117(2013).
[32] Y Naritomi, S Fuchigami. J. Chem. Phys., 134(2011).
[33] F Nuske, B G Keller, G Perez-Hernandez, A S J S Mey, F Noe. J. Chem. Theory. Comput., 10, 1739(2014).
[34] L C Gong, X Zhou. J. Phys. Chem. B, 114(2010).
[35] L C Gong, X Zhou. Phys. Rev. E, 80(2009).
[36] C B Zhang, M Li, X Zhou. Chin. Phys. B, 24(2015).
[37] L C Gong, X Zhou, Z C Ouyang. PloS One, 10(2015).
[38] C B Zhang, J Yu, X Zhou. J. Phys. Chem. B, 121, 4678(2017).
[39] C B Zhang, F F Ye, M Li, X Zhou. Sci. China: Phys. Mech., 62(2019).
[40] C B Zhang, S Xu, X Zhou. Phys. Rev. E, 100(2019).
[41] J W Neidigh, R M Fesinmeyer, N H Andersen. Nat. Struct. Biol., 9, 425(2002).
[42] B Bipasha, J C Lin, V D Williams, P Kummler, J W Neidigh, N H Andersen. Protein Eng. Des. Sel., 21, 171(2008).
[43] K Lindorff-Larsen, S Piana, R O Dror, D E Shaw. Science, 334, 517(2011).
[44] R Day, D Paschek, A E Garcia. Proteins, 78, 1889(2010).
[45] V Spiwok, P Oborsky, J Pazurikova, A Krenek, B Kralova. J. Chem. Phys., 142(2015).
[46] S B Kim, C J Dsilva, I G Kevrekidis, P G Debenedetti. J. Chem. Phys., 142(2015).
[47] V A Andryushchenko, S F Chekmarev. Eur. Biophys. J., 45, 229(2016).
[48] T W Zang, L L Yu, C Zhang, J P Ma. J. Chem. Phys., 141(2014).
[49] L X Zhan, J Z Y Chen, W K Liu. Proteins, 66, 436(2007).
[50] X H Huang, M Hagen, B Kim, R A Friesner, R H Zhou, B J Berne. J. Phys. Chem. B, 111, 5405(2007).
[51] J W Pitera, W C Swope. Proc. Natl. Acad. Sci. USA, 100, 7587(2003).
[52] V Hornak, R Abel, A Okur, B Strockbine, A Roitberg, C Simmerling. Proteins, 65, 712(2006).
[53] Z Z Lai, N K Preketes, S Mukamel, J Wang. J. Phys. Chem. B, 117, 4661(2013).
[54] R M Abaskharon, R M Culik, G A Woolley, F Gai. J. Phys. Chem. Lett., 6, 521(2015).
[55] V A Andryushchenko, S F Chekmarev. Eur. Biophys. J., 45, 229(2016).
[56] S Piana, K Lindorff-Larsen, D E Shaw. Biophys. J., 100, L47(2011).
[57] W L Jorgensen, J Chandrasekhar, J D Madura, R W Impey, M L Klein. J. Chem. Phys., 79, 926(1983).
[58] A D MacKerell, D Bashford, M Bellott et al. J. Phys. Chem. B, 102, 3586(1998).
[59] A Altis, M Otten, P H Nguyen, R Hegger, G Stock. J. Chem. Phys., 128(2008).
[60] A Laio, F L Gervasio. Rep. Prog. Phys., 71(2008).
[61] G M Torrie, J P Valleau. J. Comput. Phys., 23, 187(1977).
[62] R J Allen, C Valeriani, P R Wolde. J. Phys.: Condens. Matter, 21(2009).
Get Citation
Copy Citation Text
Chuanbiao Zhang, Xin Zhou. Find slow dynamic modes via analyzing molecular dynamics simulation trajectories[J]. Chinese Physics B, 2020, 29(10):
Received: Jun. 18, 2020
Accepted: --
Published Online: Apr. 21, 2021
The Author Email: Xin Zhou (xzhou@ucas.ac.cn)