Author: Marco Cuturi
Affiliation: Kyoto University, Japan
Title: Distances and Kernels on Discrete Structures
Distances and positive definite kernels lie at the core of many machine learning algorithms. When comparing vectors, these two concepts form well-matched pairs that are almost interchangeable: trivial operations such as changing signs, adding renormalization factors, taking logarithms or exponentials are usually sufficient to recover one from the other (e.g. Euclidean distances & Laplace kernels). However, when comparing discrete structures, this harmonious symmetry falls apart. The culprit lies in the introduction of combinatorial optimization to compute distances (e.g. edit distances for strings / time series / trees; minimum cost matching distances for sets of points; transportation distances for histograms etc.). Simple counterexamples show that such considerations -- finding a minimal cost matching or a maximal alignment to compare two objects -- tend to destroy any hope of recovering a positive definite kernel from such distances. We present a review of several results in the recent literature that have overcome this limitation. We provide a unified framework for these approaches by highlighting the fact that they all rely on generating functions to achieve positive definiteness.
Author: Saburou Saitoh
Affiliation: University of Aveiro, Portugal
Title: Theory of reproducing kernels and its general applications
In Part I, I state what is a reproducing kernel theory and its basic theory: Inversion of linear mapping, identification of linear systems, ill-posed problem, Kolmogorov factorization theorem, nonlinear mappings, identification of nonlinear systems. Furthermore, I will refer to stochastic theory, support vector machine and feature space. In Part II, its basic general applications to bounded linear operator equations on Hilbert spaces combining with the Tikhonov regularization method. Here, we introduce a historically difficult problem on numerical and real inversion formula of the Laplace transform as a typical concrete result. In Part III, we will introduce a very recent new idea and method, and very concrete results. Let S be a linear mapping on some linear space into the linear space comprising all the complex valued functions on a set E. For a finite number of points of the set E, when we observe some data on the points, look for a good input object such that its outputs are given values on the given points of E.
Author: Robert Schaback
Affiliation: University of Goettingen, Germany
Title: Kernels for learning solutions of PDEs
Adaptive methods for solving PDEs by trial spaces formed by translates of kernels can be viewed as "learning the solution of a PDE". This is explained in some detail, and a few examples are given. As in all other branches of machine learning, kernels have to be adapted to the problem. This calls for special kernels that have connections to special PDEs, and a few of them are provided. This will include fundamental solutions and particular solutions, and harmonic kernels in 2D and 3D.
Author: Paul Eggermont
Affiliation: University of Delaware, USA
Title: Reproducing kernel Hilbert spaces and smoothing splines
We discuss reproducing kernels and reproducing kernel Hilbert spaces as they arise in the smoothing spline approach to nonparametric regression problems on a nice domain. The standard assumption in spline smoothing is that the function is smooth, i.e., that it belongs to a Sobolev space of square integrable functions with square integrable (distributional) derivatives of all orders. There are many reproducing kernels, depending on the choice of the inner product. The “natural” inner product in this context depends on the regularization parameter used in the smoothing spline problems. The consequences of this will be discussed in full detail.
Authors: Ferenc Huszar, David Duvenaud
Affiliation: University of Cambridge
Title: Bayesian Quadrature minimizes Maximum Mean Discrepancy
We show that the Maximum Mean Discrepancy (MMD) criterion, used to choose samples in kernel herding, is identical to the expected error in the estimate of the integral of a function f(x) over distribution P(x), under a Gaussian process prior for f . This expected error is the criterion being minimized when choosing samples for Bayesian quadrature (BQ). Because Bayesian quadrature assigns diﬀerent weights to each of the observed function values f(x), we can view Bayesian quadrature as a weighted version of kernel herding. We show that these weights are optimal in a minimax sense over all functions in the Hilbert space deﬁned by our kernel. This implies that Bayesian quadrature dominates kernel herding and other non-optimally weighted herding in rate of convergence. We further show that the MMD, when using BQ weights, is submodular in the samples chosen, which implies that sequential BQ achieves the optimal rate of convergence for any sampling method.
Authors: Urun Dogan, Tobias Glasmachers, Christian Igel
Affiliations: Institut fur Mathematik, Universitaet Potsdam; Institut fur Neuroinformatik, Ruhr-Universitaet Bochum; Department of Computer Science, University of Copenhagen
Title: Turning Binary Large-margin Bounds into Multi-class Bounds
The theory of generalization bounds for multi-class support vector machines follows the route already paved by the analysis of binary large-margin classiﬁers. We link the analysis of binary and multi-class large margin classiﬁers explicitly by presenting a straightforward technique to generalize bounds for binary learning machines to the multi-class case.
Authors: Rob Hall, Alessandro Rinaldo, Larry Wasserman
Title: (α, β)-Diﬀerential Privacy in Reproducing Kernel Hilbert Spaces
Diﬀerential privacy is a framework for releasing summaries of a dataset in a way that makes rigorous the notion of privacy aﬀorded to the data elements (e.g., when they are measurements of individuals). Previous work has focused on the release of ﬁnite dimensional statistics where privacy is achieved via randomization. We extend these methods to allow the release of inﬁnite dimensional quantities in an RKHS, which leads to a new technique for the privacy-preserving release of functions such as kernel machines.
Author: Lorenzo Rosasco
Affiliation: MIT, USA; Istituto Italiano di Tecnologia, Italy
Title: Learning geometry with kernels
We are interested into the problem of learning geometric properties of a distribution from random samples using kernel methods. Towards this end we discuss how relevant geometric and topological properties of a set can be studied analytically using concepts from the theory of reproducing kernel Hilbert spaces. A new kind of reproducing kernel, that we call separating kernel, plays a crucial role in our study and is analyzed in detail. We prove a new analytic characterization of the support of a distribution, that naturally leads to a family of provably consistent regularized learning algorithms. If time permits, we will discuss the stability of these methods with respect to random sampling and numerical experiments showing that the approach is competitive, and often better, than other state of the art techniques.