Nowadays, one of the most challenging problem in statistics is how to deal with Big Data. All modern sciences, from genomics to astrophysics, and from medical sciences to finance, need to deal with high dimensional data. Dealing with this kind of data we may face serious problems about how to store, analyze and process them. In statistical analysis, one main problem is the so called curse of dimensionality. This phenomenon, roughly speaking, derives from the fact that the volume of the space where data lie increases exponentially with the dimensionality, leading to a sparsifcation of the points. The consequence of this behavior is that the sample size necessary to achieve any kind of statistical significance grows exponentially with the dimension. For these reason, in modern sciences, there is a ravenous need for statistical methods which are capable of performing tasks with provably good accuracy while maintaining a low sample complexity and affordable computational costs.
High dimensional data might lie on a non-linear low-dimensional structure. This means that there is large ambient dimension and a low intrinsic dimension. To learn a low dimensional manifold capable of describing our data without a significant loss of information can be useful for statistical significance in inference and data compression.
We developed a multi-scale manifold learning technique which provides finite sample rates independent of the ambient dimension as well as multiple scales of precision. The method is strongly data adaptive and fast since it has computational complexity O(n log n).
In several applications, like computer vision, image processing and system theory, it may be reasonable to assume that data are sampled from a collection of an unknown number of unknown low-dimensional subspaces, which are embedded, randomly placed and corrupted by noise in very high dimension.
We developed an algorithm capable of correctly estimate all the model parameters: the number of linear subspaces, their dimension and how they are arranged in the space and thus correctly clusterize the points. We proved that the method has finite sample convergency rates independent of the ambient dimension. The method is fast and have computational complexity O(n log n).
There are instances where the domain of a function is high dimensional but it can be reduced to a low-dimensional space. This is the case of Single-index models. In the ridge function, the domain may be high dimensional, but the value of the function varies only on one direction. The problem can be generalized and complicated if it is assumed that the domain of the function is a d-dimensional manifold embedded in a high dimensional space.
We developed a multi-scale method which relies on the fact that function values depend only from the direction given by a one dimensional gradient; thus, it is proposed to learn the gradient with an inverse approach. We can prove that the finite sample convergence rate is optimal as we were performing a 1-dimensional regression, even if our data has a non-linear high dimensional structure. This means a complete defeat of the curse of dimensionality. This method is also fast since it relies only on methods with cost O(n log n).
The Kaczmarz method is an iterative algorithm for solving overdetermined linear systems by consecutive projections onto the hyperplanes defined by the equations. The method has a wide range of applications in signal processing, notably for biomedical imaging in X-ray tomography.
We propose a new implementation of the Kaczmarz method for clustered equations. When the hyperplanes are grouped into directional clusters we draw the projection promoting sparse high-variance clusters, this leads to an improvement in performance.
True Image
Standard Kaczmarz Reconstruction
Proposed Kaczmarz Reconstruction
The application of Geometric Morphometrics has remarkably increased since 3D imaging techniques have become widespread, such as high-resolution computerised tomography, laser scanning and photogrammetry. Acquisition, 3D rendering and simplification of virtual objects produce faceting and topological artifacts, which can be counteracted by applying decimation and smoothing algorithms. Nevertheless, smoothing algorithms can have detrimental effects.
We developed a method to assess the amount of information loss or recovery after the application of 3D surface smoothing. This tool help the researcher to find the optimal smoothing procedure without biasing the analysis.