Research

SandCV: Smooth and nonlinear data-driven collective variable

The time-scale disparity between Molecular Dynamics (MD) simulations and many rare events of interest avoid sufficient sampling of such events and thus hamper the connection between MD simulations and experiments. Hence, enhanced sampling methods are in great demand to overcome this challenge; however, the effectiveness of such methods relies on having a good set of collective variables (CVs), which govern the essential dynamics of the system. The choice of CVs is far from obvious for complex systems. We present here a general method, Smooth And Nonlinear Data-driven Collective Variables (SandCV), based on machine learning techniques to define such CVs. SandCV is a versatile method and can be non-intrusively combined with the available molecular dynamics implementations.

Topological obstructions in data-driven techniques

Nonlinear dimensionality reduction techniques are increasingly used to visualize molecular trajectories and to build data-driven collective variables for enhanced sampling simulations. The success of these methods relies on their ability to identify the essential degrees of freedom characterizing conformational changes. We show that NLDR methods face serious obstacles when the underlying collective variables present periodicities, e.g. arising from proper dihedral angles. Thus leading to misinterpretations and inefficiencies in enhanced sampling.

http://scitation.aip.org/content/aip/journal/jcp/139/21/10.1063/1.4830403

Customer Matching

Nowadays, we are generating digital traces of various aspects of urban life, including records of when we use public transportation or when we use our cell phone. Many of this data is recorded by different companies and is anonymized to protect people’s privacy. Each set of data tells a story from different aspects of human life and can be used to improve the cities in specific dimensions, such as optimizing mobility system, predicting epidemic outbreaks, and planning for city infrastructures, yet these efforts are contained to the limit of individual datasets’ information. Here, we aim to combine multiple anonymized datasets, which allows information from multiple sources complement and enrich each other to establish a comprehensive predictive platform for building solutions and making informed decisions that address multidimensional aspects of urban life.

From Shops to Economy

Predicting socio-economic indices from big data of individual bank card's transaction.

http://scitation.aip.org/content/aip/journal/jcp/142/4/10.1063/1.4906425
https://urban-lens.herokuapp.com

Alanine dipeptide molecular flexibility

We characterized the geometry and topology of conformational changes of alanine dipeptide, a benchmark system for testing new methods to identify collective variables.

Bike-sharing mobility patterns

http://scitation.aip.org/content/aip/journal/jcp/142/4/10.1063/1.4906425

Enhanced sampling with atlas of collective variables

Nonlinear dimensionality reduction techniques, as a fundamental element in SandCV, fails to embed manifolds with nontrivial topologies to their lowest dimensional space. To overcome this issue, we developed a systematic way of tearing manifold apart and embed each part separately. However, in this way we will not have a unified parameterization of the slow manifold and will end up with different sets of collective variables. These atlas of collective variables will be used to perform enhanced sampling simulations and then they will glue together to present a unified map.

Machine learning in smart cities

I have been collaborating with World Sensing company to implement the machine learning techniques in the real-time data analysis of Fastprk, "a smart city system helping drivers to find a parking spot quicker and allowing cities to manage their parking spaces more efficiently".

Fastprk
World Sensing

B-spline free energy reconstruction

The methods of calculating Free Energy Surfaces (FES) from an MD simulations only provide the derivatives of free energy at some specific points in CV space (center of bins). Thus a post-processing step has to be done to calculate the FES over entire domain. I have implemented a free energy reconstruction method based on the B-spline basis functions.

Collective refolding of Titin