### RMML2013 Schedule

Pre-prints are available here!
The recordings from RMML2013 are available here!

NIPS 2013 Randomized Methods for Machine Learning Workshop took place on Monday 9th of December at Emerald Bay 5, Harvey's Hotel, Lake Tahoe, USA.
• 07:45-08:00: Opening
• 08:00-08:30: Michael Mahoney on Recent Results in Randomized Numerical Linear Algebra
• 08:30-09:00: Dimitris Achlioptas on Randomness in graphs: less is more
• 09:00-09:30: Coffee Break
• 09:30-10:45: Contributed Talks
• Randomized co-training: from cortical neurons to machine learning and back again, David Balduzzi. Best paper award, congratulations!
• A Statistical Perspective on Algorithmic Leveraging, Michael Mahoney, Bin Yu and Ping Ma.
• Ensemble of Circular Discrepancy for Efficient Two-Sample Tests, Ji Zhao and Deyu Meng.
• Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch, Partha Pratim Talukdar and William Cohen.
• 10:45-15:30: Posters, Lunch & Ski
• 15:30-16:00: Francis Bach on Beyond stochastic gradient descent for large-scale machine learning
• 16:00-16:30: Le Song on Scalable Influence Estimation in Continuous-Time Diffusion Networks
• 16:30-17:00: Coffee Break
• 17:00-18:30: Contributed Talks
• The Dropout Learning Algorithm, Pierre Baldi and Peter Sadowski.
• One Permutation and Random Rotation: An Efficient Replacement for Minwise Hashing, Anshumali Shrivastava and Ping Li.
• Combining Structured and Unstructured Randomness in Large Scale PCA, Nikos Karampatziakis, Paul Mineiro.
• Sketching Sparse Covariance Matrices and Graphs, Gautam Dasarathy, Pariskhit Shah, Badri Narayan Bhaskar, Robert Nowak.
• Monochromatic Bi-Clustering, Sharon Wulﬀ, Ruth Urner, Shai Ben-David.
• 18:30-20:00: Posters & Discussion

# Invited Talks

## Dimitris Achlioptas on Randomness in graphs: less is more

The study of random graphs has been dominated for over fifty years by the Erdos-Renyi (ER) model, in which edges appear independently of one another with equal probability. It is by now well-understood that the graphs generated by the ER model bear very little resemblance to real networks, rendering analytical results for ER graphs mostly uninformative. At the same time, the need for viable alternatives is ever more pressing as networks become central to more aspects of human activity.

After a quick overview of the ER model, I will point out its main shortcomings and go over a number of attempts to overcome them. We will see how all these attempts run afoul the inherent tension between low-dimensionality, needed for realism, and probabilistic independence, needed for mathematical tractability. I will also describe some recent work aimed at resolving this tension by avoiding to model random graphs as explicit probability distributions over networks. Instead, random graphs will arise as maximally random solutions to constrained optimization problems.

## Francis Bach on Beyond stochastic gradient descent for large-scale machine learning

Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. Given n observations/iterations, the optimal convergence rates of these algorithms are O(1/\sqrt{n}) for general convex functions and reaches O(1/n) for strongly-convex functions. In this talk, I will show how the smoothness of loss functions may be used to design novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a convergence rate of O(1/n) without strong convexity assumptions, while in the practical finite-data setting, an appropriate randomized combination of batch and online algorithms leads to unexpected behaviors, such as a linear convergence rate for strongly convex problems, with an iteration cost similar to stochastic gradient descent. (joint work with Nicolas Le Roux, Eric Moulines and Mark Schmidt)

## Michael Mahoney on Recent Results in Randomized Numerical Linear Algebra

Matrix algorithms and numerical linear algebra provide the foundation for many methods in scientific computing, engineering, machine learning, and data analysis, and in recent years randomization has proven to be a powerful if unexpected resource in the development of qualitatively improved matrix algorithms that are designed to address increasingly-common large-scale statistical data analysis problems. Here, we will review several recent developments in this area of Randomized Numerical Linear Algebra that are of particular interest to machine learners: first, an algorithm to compute a low-precision solution to an arbitrary over-constrained least-squares problem in time that is proportional to the number of nonzero elements in the input, plus lower order terms; second, recent improvements in Nystrom-based approximation of symmetric positive semidefinite matrices; and third, a framework to begin to address the statistical properties of random sampling and random projection algorithms. In each case, we will see that these developments depend crucially on understanding and exploiting the complementary algorithmic and statistical properties of the empirical statistical leverage scores of the input matrices. Extensions of the ideas underlying leverage-based randomized algorithms to address problems beyond the traditional purview of Randomized Numerical Linear Algebra may also be briefly described.

Le Song on Scalable Influence Estimation in Continuous-Time Diffusion Networks

If a piece of information is released from a media site, can it spread, in 1 month, to a million web pages? This influence estimation problem is very challenging since both the time-sensitive nature of the problem and the issue of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influenceestimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with $|\Vcal|$ nodes and $|\Ecal|$ edges to an accuracy of $\epsilon$ using $n=O(1/\epsilon^2)$ randomizations and up to logarithmic factors $O(n|\Ecal|+n|\Vcal|)$ computations. When used as a subroutine in a greedy influence maximization algorithm, our proposed algorithm is guaranteed to find a set of $C$ nodes with an influence of at least $(1 - 1/e)\operatorname{OPT} - 2C\epsilon$, where $\operatorname{OPT}$ is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence.

# Contributed Talks

• Randomized co-training: from cortical neurons to machine learning and back again, David Balduzzi.
• A Statistical Perspective on Algorithmic Leveraging, Michael Mahoney, Bin Yu and Ping Ma.
• Ensemble the Margins on Circle for Efficient Kernel Selection, Ji Zhao and Deyu Meng.
• Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch, Partha Pratim Talukdar and William Cohen.
• The Dropout Learning Algorithm, Pierre Baldi and Peter Sadowski.
• One Permutation and Random Rotation: An Efficient Replacement for Minwise Hashing, Anshumali Shrivastava and Ping Li.
• Combining Structured and Unstructured Randomness in Large Scale PCA, Nikos Karampatziakis, Paul Mineiro.
• Sketching Sparse Covariance Matrices and Graphs, Gautam Dasarathy, Pariskhit Shah, Badri Narayan Bhaskar, Robert Nowak.
• Monochromatic Bi-Clustering, Sharon Wulﬀ, Ruth Urner, Shai Ben-David.

# Contributed Posters

• Positive Random Projections, Amit Deshwar.
• Random Projections for Anchor-based Topic Inference, David Mimno.
• Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices, Jose Miguel Hernandez Lobato, Neil Houlsby, Zoubin Ghahramani.
• Using a Stochastic Map View of Sequential Monte Carlo for Memory and Network Efficiency, Seong-Hwan Jun and Alexandre Bouchard-Cote.
• Embarrassingly Parallel MCMC via Density Product Estimation, Willie Neiswanger, Chong Wang and Eric Xing.
• Rapid Distance-Based Outlier Detection via Sampling, Mahito Sugiyama and Karsten M. Borgwardt.
Organizers