Solvation

The solvation of molecules frequently has a large influence on the molecule. To better understand this influence, accurate models of explicit solvents are necessary. With Autosolvate software, described in the Software section, accurate equilibration and sampling of a solvent shell can be reached. To reduce the computational requirements when calculating quantum mechanical properties with this solvent shell, the solvent shell size has to be selected carefully. Spherical solvent shells (top) don't cover the molecule as consistently as aspherical solvent shells (bottom). [Hruska et al., Autosolvate: A Toolkit for Automating Quantum Chemistry design and Discovery of Solvated Molecules, 2022]

Solvent shells can be detected with radial distribution functions (RDF), but for comparing solvent shells of molecules with different sizes, an RDF shifted by the average molecule radius can be beneficial (top). For aspherical molecules, which are very common, the minimum distance distribution function (MDDF) shows the solvent shells much clearer than a RDF (bottom). MDDFs measure the minimum distance between the molecule and solvent instead of the distance to the center of the molecule. [Hruska et al., Autosolvate: A Toolkit for Automating Quantum Chemistry design and Discovery of Solvated Molecules, 2022]

The MDDF allows one to measure the dependency of the closeness between the molecule and solvent on the molecule. With machine learning this closeness relationship can be predicted from the molecule structure.

The closeness depends on the solvent and is decreasing with the solvent's dielectric constant.


Reorganization energies

The solvent at room temperature is not static. Chemical changes to the center molecule lead to a reorganization of the solvent, measured as reorganization energy. Previously the computational estimation of reorganization energy was done only for a limited number of systems, but with high-throughput simulation with Autosolvate software this could be done for 100+ molecules. [Hruska et al., Bridging the experiment-calculation divide: Machine learning corrections to redox potential calculations in implicit and explicit solvent models, 2022]

The reorganization energy can be split into the contribution by the center molecule and the contribution by the solvent. The fraction caused by solvent (outer-sphere) shown above differs for individual molecules. [Hruska et al., Autosolvate: A Toolkit for Automating Quantum Chemistry design and Discovery of Solvated Molecules, 2022]


Machine learning corrected redox potentials

Redox potentials are a fundamental chemical property, where the accuracy of DFT calculations compared to experiments is limited. To improve the accuracy, I ran both explicit and implicit solvation approaches, and with machine learning decreased the errors between calculations and experimental data, and analysed the error contributions. [Hruska et al. Bridging the experiment-calculation divide: Machine learning corrections to redox potential calculations in implicit and explicit solvent models, 2022]

For the low number of redox potential data points, machine learning features related to the solute and solvent were used.

Machine learning could correct both systematic biases and reduce the size of outliers.

To optimize the solvent shell size for redox potential calculation, the convergence of redox potential is shown. When combining an explicit solvent shell with implicit solvent (C-PCM) the convergence is reached much faster at around 4 Å. This reduces the computational requirements to calculate redox potentials with an explicit solvent.


Dynamics of small molecules

Even for small molecules the thermal motion is significant, and the frequently used single structure for calculations can not represent the whole ensemble of conformations that the molecule samples. On the left the free energy landscape shows several local minima for one system in explicit solvent. The right plot shows that the investigated property (redox potential) varies across the ensemble.


Sampling of rare events

extasy schema

For larger molecules like proteins, the sampling of all conformations that the molecule reaches is challenging. To sample this ensemble efficiently, adaptive sampling stores the already sampled conformations and iteratively adds data points at the optimal positions. Here, machine learning of the latent space of protein dynamics helps. The open-source software ExTASY used to achieve adaptive sampling for 70+ residue proteins is described in the Software section. [Hruska et al., Extensible and scalable adaptive sampling on supercomputers., 2020].

free energy landscape

To validate that adaptive sampling reaches the correct ensemble of conformations, adaptive sampling (black diagonal lines) is compared with plain molecular dynamics (colored background).

adaptive sampling

Adaptive sampling (red) reaches the folded state (vertical lines) faster than plain molecular dynamics (blue). Note that the x-axis is logarithmic, indicating a significant speed up.


Understanding adaptive sampling strategies

adaptive sampling scaling2

To understand adaptive sampling strategies better, the statistics of these simulations have to be improved, which is challenging due to the large amount of computational resources required to fold one single protein. Instead, a Markov Chain surrogate can be used. Here the scalability of adaptive sampling is investigated, indicating that different adaptive sampling strategies scale better (steeper curve) than molecular dynamics (MD) and the adaptive sampling scales to about 100-1000 concurrent molecular dynamics trajectories. [Hruska et al., Quantitative comparison of adaptive sampling methods for protein dynamics, 2018]

extasy schema

The maximum speed up with adaptive sampling compared to molecular dynamics was previously unclear, here I developed the first upper limit estimate with a greedy algorithm approach. [Hruska et al., Quantitative comparison of adaptive sampling methods for protein dynamics, 2018]

adaptive sampling scaling

The upper limit of the speed up with adaptive sampling reaches up to 20-fold for the small (<80 residues) proteins investigated. The speed up increases with the folding time, indicating that for larger proteins an even larger speed up is reachable.