Advised by Marino Arroyo
Thesis Committee: Timon Rabczuk, Fehmi Cirak, Antonio Huerta
Special doctoral award from the Universitat Politècnica de Catalunya for the 2012-2013 academic year. Research area: Sciences. [Facultat de Matemàtiques i Estadística]
SEMNI award for the best Ph.D. thesis in 2012 on Numerical Methods in Engineering.
Ph.D. Thesis dissertation (pdf) at the electronic library of the Universitat Politècnica de Catalunya. [Tesis Doctorals en Xarxa]
UPCommons. Global access to UPC knowledge: https://upcommons.upc.edu/handle/2117/94814
In many applications, one would like to perform calculations on smooth manifolds of low-dimension d embedded in a high-dimensional space of dimension D. Often, a continuous description of such manifold is not known, and instead it is sampled by a set of scattered points in high dimensions. This poses a serious challenge. In this thesis, we approximate the point-set manifold as an overlapping set of smooth parametric descriptions, whose geometric structure is revealed by statistical learning methods, and then parametrized by meshfree methods. This approach avoids any global parametrization, and hence is applicable to manifolds of any genus and complex geometry. It combines four ingredients:
partitioning of the point set into sub-regions of trivial topology,
the automatic detection of the geometric structure of the manifold by nonlinear dimensionality reduction techniques,
the local parametrization of the manifold using smooth meshfree (here local maximum-entropy) approximants, and
patching together the local representations by means of a partition of unity.
We show the generality, flexibility, and accuracy of the method in four different problems. First, we exercise it in the context of Kirchhoff-Love thin shells, (d=2, D=3). We test our methodology against classical linear and non linear benchmarks in thin-shell analysis, and highlight its ability to handle point-set surfaces of complex topology and geometry. We then tackle problems of much higher dimensionality. We perform reduced order modeling in the context of finite deformation elastodynamics, considering a nonlinear reduced configuration space, in contrast with classical linear approaches based on Principal Component Analysis (d=2, D=10,000's). We further quantitatively unveil the geometric structure of the motility strategy of a family of micro-organisms called Euglenids from experimental videos (d=1, D~30,000's). Finally, in the context of enhanced sampling in molecular dynamics, we automatically construct collective variables, which characterize molecular conformations (d=1,...,6, D~30 to 1,000's).
Figure 1. Point-set manifold processing for computational mechanics (click on the picture to enlarge).
[download high resolution image]
In this thesis apparently unrelated topics are addressed: dimensionality reduction, meshfree analysis of thin-shells, nonlinear model reduction of mechanical systems, quantitative analysis of a motility mode exhibited by the Euglenids (a family of protists), and the automatic detection of collective variables from biomolecular ensembles. However, these topics rely on the processing of manifolds represented by a set of scattered points in high dimensions.
Over the last decades, a myriad of techniques in the area of statistical learning have emerged to extract the hidden variables from high-dimensional data sets, commonly as post-processing or data-exploration tools. While this has proven extremely useful in applications such as automatic perception [1,2], climate science [3], the study of the conformation dynamics of molecules [4,5], or galaxy spectra classification [6], one would like to further process the slow manifold, for instance by finding low-dimensional representations of new out-of-sample data points or finding tangent vectors to it characterizing changes of the system.
We proposed a general methodology to perform calculations on smooth manifolds of low-dimension d embedded in a high-dimensional space of dimension D, and given by scattered points. We approximate the point-set manifold as an atlas of overlapping smooth parametric descriptions, whose geometric structure is revealed by statistical learning methods, and then parametrized by meshfree methods. This method closely mimics the theory of smooth manifolds. This approach avoids any global parametrization; it is applicable to manifolds of any dimension, any genus and complex geometry.
We proceed in four steps:
[1] Automatic partitioning of the point-set data into sub-regions of trivial topology by recursively application of graph based methods (METIS).
Figure 2. Visualization of a coarse partition of unity and overlapping regions on a point-set surface of the Stanford bunny (click on the picture to enlarge). The automatic partitioning of a point-set surface representing the Stanford bunny, done with METIS, can create patches of complex geometry and topology, e.g. a tubular partition in an ear. We recursively partition such patches until reach regions of trivial topology.
[2] Automatic detection of the geometric structure of the manifold patches by either linear or nonlinear dimensionality reduction methods. These methods embed the high-dimensional training set in low dimensions, maintaining some of the geometric structure of the underlying manifold.
Figure 3. (A) Automatic partitioning of a point-set surface representing the Stanford bunny, done with METIS. (B) Two views of the patch (*). Low-dimensional embedding of the patch (*) by principal component analysis (PCA) (C), and by a nonlinear dimensionality reduction method (NLDR) (D). The colors of the points are provided to guide the visual inspection of the embeddings of low-dimension. PCA collapses large regions of the patch, while the NLDR method successfully “irons” the curved patch into a moderately distorted low-dimensional embedding. (click on the picture to enlarge).
[3] A smooth local parametrization is defined in the low-dimensional embedding of each patch. This can be realized with a variety of methods, from meshfree methods such as moving least squares approximants [7] to mesh-based methods such as subdivision finite elements in the case of surfaces [8]. Here the local max-ent approximants [9,10] are chosen, due to their smoothness, robustness, and applicability in any space dimension.
Figure 4. Sketch of the method to smoothly parametrize point-set manifolds (click on the picture to enlarge). The set of high-dimensional snapshots X (left - dark blue points) are embedded in low dimensions (right - purple points) by manifold learning methods, possibly applied to pruned snapshots for computational convenience. With the low-dimensional embedding at hand, the slow manifold hidden in the training set X is represented with a smooth data-driven parametrization φ(ξ). It is also possible to find the low-dimensional representation ξ(x) ∈ Rd of a new out-of-sample configuration x ∈ RD, where ξ(x) is the pre-image of the closest-point projection of x on M, found solving a nonlinear least-squares problem [12].
[4] The local parametrizations are then glued together, if needed, with a partition of unity (PU) defined in the ambient space, which consequently is also a PU on the embedded manifold.
Figure 5. Illustration of the proposed method for a curve, d = 1, in the plane D = 2 (click on the picture to enlarge). (A) Illustration of a function ψκ(x) of the coarse partition of unity tied to the patches. (B) Visualization of the coarse partition of unity overlap regions. The partition of the geometric markers is color-coded.
We have applied the methodology to the geometrically exact theory of Kirchhoff-Love thin-shells. In thin-shells, the intrinsic dimension is known a priori, d=2, and the shell deforms in three-dimensions, D=3. Unlike previous meshfree methods, limited to very simple surfaces admitting a single parametrical space, the proposed method shown to be very robust and general. It can deal very easily with shells of very complex geometry and topology. Full details can be found in the works [11,12]. Clik here for more details. Currently we are applying this approach for the modeling of fracture in brittle materials, see a preliminary result in Fig. 6, and full details in [17].
Figure 6. Selected snapshots of the deformation process of a brittle thin shell with complex topology (click on the picture to enlarge). The boundary curve of the bottom pipe is clamped and the top boundary curve is incrementally displaced in the upward (0,0,1) direction. The process has been performed without an initial crack. (A,C) Phase field as colormap in the reference configuration for two selected imposed displacements of the top boundary curve, just before the fracture (d = 0.0055) and for the final imposed displacement, d = 0.01. (B,D) Deformed configurations for two selected instants, the deformation field has been magnified by 20.
By taking advantage of the slow manifold parametrization, we developed the nonlinearly reduced dynamics of mechanical systems in a variational framework. We have exemplified the method in finite deformation elastodynamics. In these problems, D~103-106 while the slow-manifold on which the dynamics take place is expected to be rather low-dimensional and nonlinear. In our proof of concept test D=104 and d=2,3. We have explored this topic in [13].
Figure 7. Two-dimensional embedding. (A) Effective reduced potential energy. (B) Total reduced mass, position dependent. (C) Reduced trajectory in the two-dimensional embedding from NLIE (after fitting), over seven large amplitude oscillations of the system. The initial, final, and selected snapshot positions are highlighted.
Figure 8. Snapshots during the time evolution of a high-dimensional trajectory x(t), and the reduced trajectories X(q(t)) obtained from PCA and NLIE (click on the picture to enlarge).
Swimming is generally accomplished by the repetitive execution of a path (d=1) in shape space. By Purcell's scallop theorem [14], at low Reynolds number such path needs to be non-reciprocal. We studied the particular motility strategy of Euglenids, a family of unicellular eukaryotes, consisting of large amplitude highly concerted deformations of the entire body (euglenoid movement or metaboly). Unlike ciliary or flagellar motility, this mode is not well understood. We have examined quantitatively video recordings of four Euglenids executing such motions with nonlinear dimensionality reduction methods. The low-dimensional embedding allows us to parametrize the geometric stroke smoothly in time and space, filtering the noise of the data, while retaining a sharp description of the stroke, a one-dimensional curve in D~104, and allowing us to analyze the hydrodynamics at low Reynolds number of this swimming strategy, its efficiency, discuss its role in the evolutionary history of this group of protists, and get inspiration of new technologies based on soft active surfaces. A paper detailing this work has been published in PNAS [15].
Figure 9. Method for the quantitative analysis of video recordings (click on the picture to enlarge). (A) The video frames are segmented and aligned to obtain images Fa containing information about the shape alone, devoid of translation, rotation, or textures produced by internal organelles. A B-Spline curve, given by its control polygon Pa (black circles), is fitted to the boundary of Fa, and is a generating curve of the axisymmetric representation of the pellicle. (B) The segmented and aligned frames, representative of the shapes adopted by the swimmer, are embedded in low-dimensions by a nonlinear dimensionality reduction technique (here, Isomap [1]). The embedding in 2D shows that the stroke is a closed non-reciprocal path in shape space. The ability of the algorithm to identify similar shapes from different strokes and define a single geometric stroke is shown by plotting the 2D embedding against the frame number, here for a movie capturing about 4 strokes. The metaboly movement (a non-reciprocal path), can be most compactly described by embedding the frames in a periodic 1D segment, from which we parametrize the stroke as a function of τ by interpolation with smooth basis functions wa(τ). At any given τ, the synthetic stroke is a weighted average of the curves fitting video frames whose 1D embedding is in the vicinity of τ. The parameter τ is not proportional to physical time, ignored by the manifold learning algorithm, but rather to arc-length in shape space.
We automatically have identified and smoothly described collective variables that best explain the conformational flexibility of molecules, such as proteins. We exercised the proposed methodology in in silico conformations from molecular simulations, although the method is general enough to be applied also in the study of in vitro conformations from nuclear magnetic resonance spectroscopy. The D-dimensional space describing these proteins is commonly around 102-103, whereas we found that the collective variables have dimensionality ranging from 1 to 6. We have in preparation a work related with this [16].
Figure 10. Sketch of the slow manifold parametrization (left), and an illustration of this procedure for the alanine dipeptide (right) (click on the picture to enlarge). (A) High dimensional ensemble built with conformations that populate as much as possible the configurational space. (B) Selected conformations representative of the molecule variability are simplified and the rigid body motions are removed. (C) The conformations are embedded in low dimension by a NLDR technique. (D) With the low-dimensional embedding at hand, the slow manifold hidden in the conformations can be reveled through a smooth parametrization φ(ξ) ∈ RD. Also, it is possible to define the embedded sample position ξ(r) ∈ Rd of the closest point projection π(A(r)) for a new out-of-sample conformation r ∈ RD, where the value ξ(r) is found by solving a nonlinear minimization problem.
Tenenbaum J., V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, Vol. 290, Nro. 5500, pp. 2319–2323, 2000.
Roweis S. and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, Vol. 290, Nro 5500, pp. 2323–2326, 2000.
Gámez A., C. Zhou, A. Timmermann, and J. Kurths. Nonlinear dimensionality reduction in climate data. Nonlinear Processes in Geophysics, Vol. 11, 393–398, 2004.
Das P., M. Moll, H. Stamati, L. Kavraki, and C. Clementi. Low-dimensional, free- energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. Proceedings of the National Academy of Sciences of USA, Vol. 103, Nro. 26, pp. 9885–9890, 2006.
Brown W., S. Martin, S. Pollock, E. Coutsias, and J.-P. Watson. Algorithmic dimensionality reduction for molecular structure analysis. The Journal of Chemical Physics, Vol. 129, Nro. 6, pp. 064118, 2008.
Vanderplas J. and A. Connolly. Reducing the dimensionality of data: Locally linear embedding of Sloan galaxy spectra. The Astronomical Journal, Vol. 138, Nro. 5, pp. 1365–1379, 2009.
Lancaster P. and K. Salkauskas. Surfaces generated by moving least squares methods. Mathematics of Computation, Vol. 37, Nro. 155, pp. 141–158, 1981.
Cirak F., M. Ortiz, and P. Schröder. Subdivision surfaces: a new paradigm for thin-shell finite-element analysis. International Journal for Numerical Methods in Engineering, Vol. 47, Nro. 12, pp. 2039–2072, 2000.
Arroyo M. and M. Ortiz. Local maximum-entropy approximation schemes: a seamless bridge between finite elements and meshfree methods. International Journal for Numerical Methods in Engineering, Vol. 65, Nro. 13, pp. 2167–2202, 2006.
Rosolen A., D. Millán, and M. Arroyo. On the optimum support size in meshfree methods: a variational adaptivity approach with maximum entropy approximants. International Journal for Numerical Methods in Engineering, Vol. 82, Nro. 7, pp. 868-895, 2010.
Millán D., A. Rosolen, and M. Arroyo. Thin shell analysis from scattered points with maximum-entropy approximants. International Journal for Numerical Methods in Engineering, Vol. 85, Nro. 6, pp. 723-751, 2011.
Millán D., A. Rosolen, and M. Arroyo. Nonlinear manifold learning for meshfree finite deformation thin-shell analysis. International Journal for Numerical Methods in Engineering, Vol. 93, Nro. 7, pp. 685-713, 2013.
Millán D. and M. Arroyo. Nonlinear manifold learning for model reduction in finite elastodynamics. Computer Methods in Applied Mechanics and Engineering, Vol. 261-262, pp. 181-131, 2013.
Purcell, E. Life at low Reynolds numbers. American Journal of Physics, Vol. 45, Nro. 1, pp. 3–11, 1977.
Arroyo M., L. Heltai, D. Millán, and A. DeSimone. Reverse engineering the euglenoid movement. Proceedings of the National Academy of Sciences of USA, Vol. 109, Nro. 44, pp. 17874-17879, 2012.
Hashemian, B., D. Millán, and M. Arroyo. Modeling and enhanced sampling of molecular systems with smooth and nonlinear data-driven collective variables. Journal of Chemical Physics, Vol. 139, Nr. 21, 214101, 2013
Amiri F., Millán D., Shen Y., Rabczuk T. and M. Arroyo. Phase-field modeling of fracture in linear thin shells. Theoretical and Applied Fracture Mechanics, Vol. 69, pp. 102–109, 2014.