Manifold Fitting, Embedding of Data, and Dimension Estimation
Manifold Fitting, Embedding of Data, and Dimension Estimation
Much of modern data analysis in high dimensions relies on the premise that data, while embedded in a high-dimensional space, lie on or near a manifold of lower dimension. Sometimes data has this structure for obvious reasons; it may be the set of images taken of a single object from varying angles, for instance. At other times, the supposed manifold structure may be a proxy for various, partially understood correlations between different coordinates. In any case, lower-dimensional structure of the data may allow one to embed it in a space of lower dimension while preserving much of this essential structure, with benefits including faster computation and data visualization. My current research studies this phenomenon in multiple ways.
I have developed a new approach to estimate the dimension of the underlying manifold from a finite sample of points. Dimension estimation of data can be useful for determining the parameters of a dimension reduction method or as a simple measure of complexity of the data. Existing dimension estimators work computing a quantity from the data and comparing it to what one gets from a vector subspace or ball of varying dimension. However, this approximation breaks down at larger scales. In fact, the reason to use a manifold as a model for data is that one expects it to not lie in a linear subspace. In (12), Anna Gilbert and I derive an ID estimator calibrated by a quadratic embedding. It works by computing the eigenvalues from principal components analysis applied to the nearest neighbors of a given data point (as in the usual local PCA) and comparing them to the eigenvalues one expects from a second-order Taylor approximation to the manifold.
I have also developed a novel method for embedding high-dimensional data in lower dimensions (joint work with Anna Gilbert in (14)). The method has demonstrable benefits for clustering data and robustness to outliers. It works by constructing a Gaussian process on the data and using independent realizations of this Gaussian process for coordinates of the embedding. The straight-line distance in the embedding serves as an approximation to the commonly-used diffusion distance. See forthcoming paper for details.
Whitney-Type Problems
Whitney's extension problem (1934): Suppose one is given a function f on a compact subset E of R^n. How can we tell if it extends to a C^m function on all of R^n? This problem, falling in the class of smooth extension problems, was first solved in a 2006 paper of Charles Fefferman. The solution, along with related works, has motivated a series of more recent papers on the topic.
If the set E is finite, then an extension always exists. Instead, we ask how small can the norm of such an extension--or interpolant-- be? How can we compute it? These are problems in the area of smooth interpolation.
My research has focused on
a) The case of set-valued mappings and nonnegative functions in the case of infinite E (see (9) below) and applications to systems of linear inequalities (11).
b) Improvement of an important finiteness constant in some recent results, particularly involving set-valued mappings .(10).
c) A variant of the Whitney Extension Problem in which we ask if a subset of R^n is contained in a smooth, d-dimensional manifold (13).
As a follow-up to (13), I hope to make progress on connections involving topology, and perhaps differential geometry.
For more information on the subject, see this expository article by Charles Fefferman.
Harmonic Analysis and Extremizers
The focus of my training in graduate school was Euclidean harmonic analysis, and I am interested in many of the modern research areas within the subject. My personal work has been primarily on oscillatory integrals (6, 8) and extremizers of multilinear inequalities (3, 4, 5, 7).
The Riemann-Lebesgue lemma states that the integral of a function against e^{inx} goes to 0 as n->infinity, but at different rates for different functions. Under different settings, increasing the oscillation gives an integral which decays at a uniform rate for any function. My work on oscillatory integrals has focused on quantifying that rate.
Classically, one often asks if a linear operator is bounded, that is, if the norm of the output is at most a constant times the norm of the input. However, it is often valuable to ask
a) What is the optimal value of that constant?
b) Are there input functions for which this optimal constant is attained? If so, what are these extremizers? Are they unique up to symmetry?
c) What is the behavior of extremizing sequences which whose ratio of norm of their output to their norm converges to the optimal constant?
My research has focused on these questions for multilinear inequalities, focusing on quantitative answers to question c), even when extremizers may not exist.
Publications
(15) José-Antonio Espín-Sánchez and Kevin O’Neill. Identification of number of bidders in auctions using order statistics and finite mixtures. In preparation.
(14) Anna Gilbert and Kevin O'Neill, Sketching the Heat Kernel: Using Gaussian Processes to Embed Data. preprint, cs.LG https://arxiv.org/abs/2403.07929.
(13) Kevin O’Neill, A Whitney Extension Problem for Manifolds. preprint, math.FA arXiv:2310.13115.
(12) Anna Gilbert and Kevin O’Neill, CA-PCA: Manifold Dimension Estimation, Adapted for Curvature. preprint, stat.ML arXiv:2309.13478. Submitted.
(11) Garving K. Luli and Kevin O'Neill, On C^m Solutions to Systems of Linear Inequalities. Advances in Mathematics 422 (2023), Paper No. 109025, 17 pp.
(10) Jiang, Fushuai, Luli, Garving K., and O'Neill, Kevin. On the shape fields finiteness principle. International Mathematics Research Notices 2021;, rnab242, https://doi.org/10.1093/imrn/rnab242
(9) Jiang, Fushuai, Luli, Garving K., and O'Neill, Kevin. Smooth selection for infinite sets. Advances in Mathematics 407 (2022), Paper No. 108566, 62 pp.
(8) Gilula, Maxim, O'Neill, Kevin, and Xiao, Lechao. Oscillatory Loomis-Whitney and Projections of Sublevel Sets. Journal d'Analyse Math\' ematique 145 (2021), no. 1, 307–333.
(7) O'Neill, Kevin. A quantitative stability theorem for convolution on the Heisenberg group. Revista Matem\'{a}tica Iberoamericana 37 (2021), no. 5, 1861–1884.
(6) Niepla, Aleksandra, O'Neill, Kevin, and Zeng, Zhen. Decay rate of n-linear oscillatory integral operators in R^2. Proc. Amer. Math. Soc. 148 (2020), no. 4, 1689–1695.
(5) O'Neill, Kevin. A sharpened inequality for twisted convolution. Proc. Amer. Math. Soc. 150 (2022), no. 11, 4685-4697.
(4) Christ, Michael and O'Neill, Kevin. Maximizers of Rogers-Brascamp -Lieb-Luttinger functionals in higher dimensions. preprint, math.CA arXiv:1712.00109.
(3) O'Neill, Kevin. A variation on H\"older-Brascamp-Lieb inequalities. Trans. Amer. Math. Soc. 373 (2020), no. 8, 5467–5489.
(2) Carl, Uri; O'Neill, Kevin, and Ryder, Nicholas. Establishing a metric in max-plus geometry. Rose-Hulman Undergraduate Mathematics Journal: Vol. 13 : Iss. 2 , Article 10. 2012.
(1) Carlson, Rosalie; Flood, Stephen; O'Neill, Kevin, and Su, Francis Edward. A Tur'an-type problem for circular arc graphs. 2011 arXiv:1110.4205