While we have 60+ in the audience (researchers, students, NSF CISE folks) from many different areas, we want a lot of time for breakout discussions, so we have a small number of talks. Current list of talks and abstracts:
: Recommender Systems: The Art and Science of Matching Items to Users
Sampling-Based Motion Planning: From Intelligent CAD to Group Behaviors to Protein Folding
Challenges in Machine Learning and Data Mining
Scalable Topic Modeling
: Some Sketchy Results
: Scaling data mining techniques
: Minimizing Communication in Linear Algebra
John Doyle: Universal laws and architectures
Martin Farach-Colton: From FOCS to Sand Hill Road
Steve Feinberg: Statistical Methods and Algorithms for the Analysis of
Social Network Data
Eugene Fiume: Reflections on Algorithm Engineering in Computer Graphics
Phil Gibbons: Specialization as key to Cloud efficiency
Alon Halevy: Principles of Fact and Ontology Mining from the Web
Kamal Jain: Online Matching of Ads with Advertisers
David Karger: Cheap Wins: Applications from Elementary Algorithms
David Kempe: Algorithms and Social Networks: How Close are our Ties?
Enriching Education Through Data Mining
: Algorithm Design using Spectral Graph Theory
: Algorithmic Aspects of Real-Time System Design
: It pays to do the right thing: Incentive mechanisms for Societal Networks
: Insights into Routing in Real-world Networks
: Dealing with Massive Data: Parallelism to the Rescue
Recommender Systems: The Art and Science of Matching Items to Users
Speaker: Deepak Agarwal
Algorithmically matching items to users in a given context is essential
for the success and profitability of large scale recommender systems
like content optimization, computational advertising, search,
shopping, movie recommendation, and many more. The objective is to
maximize some utility (e.g. total revenue, total engagement) of
interest over a long time horizon, hence this is a bandit problem
since there is positive utility in displaying items that may have low
mean but high variance. A key challenge in such bandit problems is the
curse of dimensionality --- data is obtained via interactions among
several heavy-tailed categorical variables and the goal is to estimate
response rates of some rare event like click, buy, and so on. Other
than statistical challenges due to high dimensionality and data
sparsity, bandit problems are difficult to work with for response that
are observed with considerable delay (e.g. return visits, confirmation
of a buy). To mitigate this issue, one approach is to optimize multiple
competing objectives in the short-term to achieve the best long-term
performance. For instance, in serving content to users on a website, one
may want to optimize some combination of clicks and downstream
advertising revenue in the short-term to maximize revenue in the
long-run. This combination of bandits, high dimensional modeling, and
constrained optimization in an environment where one has access to a
persistent stream of data arriving at high frequency, poses unique
algorithmic challenges. In this talk, I will attempt to capture some of
the technical issues by focusing on a concrete application - content
optimization on the Yahoo! front page. I will briefly discuss our own
research, mention some specific open problems, and outline some broad
technical challenges in this area.
About the speaker: Deepak Agarwal is a statistician at Yahoo! who is
interested in developing statistical and machine learning methods to
enhance the performance of large scale recommender systems. Deepak and
his collaborators significantly improved article recommendation on
several Yahoo! websites, most notably on the Yahoo! front page. He also
works closely with teams in computational advertising to deploy
elaborate statistical models on the RightMedia Ad Exchange, yet another
large scale recommender system. He currently serves as associate editor
for the Journal of American Statistical Association.
Sampling-Based Motion Planning: From Intelligent CAD to Group Behaviors to Protein Folding
Department of Computer Science and Engineering
Texas A&M University
Motion planning arises in many application domains such as computer animation (digital actors), mixed reality systems and intelligent CAD (virtual prototyping and training), and even computational biology and chemistry (protein folding and drug design). Surprisingly, a single class of planners, called sampling-based planners, have proven effective on problems from all these domains. In addition to versatility, these methods are simple and efficient, even in high-dimensional configuration spaces.
In this talk, we describe two major classes of sampling-based planners, the graph-based probabilistic roadmap method (PRM) and the tree-based rapidly-exploring random tree (RRT). We give an overview of some PRM variants developed in our group and present work related to virtual prototyping, multi-agent systems, and protein folding. For virtual prototyping, we show that in some cases a hybrid system incorporating both an automatic planner and haptic user input leads to superior results. For multi-agent systems, we describe PRM-based techniques for herding, pursuit/evasion and evacuation planning. Finally, we describe our application of PRMs to simulate molecular motions, such as protein and RNA folding. More information regarding our work, including movies, can be found at http://parasol.tamu.edu/~amato/
Bio: Nancy M. Amato is a professor of computer science and engineering at Texas A&M University where she co-directs the Parasol Lab, is Deputy Director of the Institute for Applied Math and Computional Science, is past chair of the Council of Principal Investigators and chair of the Alliance for Bioinformatics, Computational Biology, and Systems Biology. She received undergraduate degrees in Mathematical Sciences and Economics from Stanford University, and masters and Ph.D. degrees in Computer Science from UC Berkeley and UIUC. She received an NSF CAREER Award, is an IEEE Fellow, is an ACM Distinguished Speaker, and was a Distinguished Lecturer for the IEEE Robotics and Automation Society. She was an Associate Editor of the IEEE Transactions on Robotics and Automation and the IEEE Transactions on Parallel and Distributed Systems. She is a member of the Computing Research Association's Committee on the Status of Women in Computing Research (CRA-W) and of the Coalition to Diversity Computing (CDC); she co-directs the CDC/CRA-W Distributed Research Experiences for Undergraduates (DREU) program and the CDC/CRA-W Distinguished Lecture Series. She is Co-Chair of the NCWIT Academic Alliance. Her main areas of research focus are motion planning and robotics, computational biology and geometry, and parallel and distributed computing
Challenges in Machine Learning and Data Mining
Kristin P. Bennett
Rensselaer Polytechnic Institute
The areas of machine learning and mathematical programming are increasingly
intertwined. Optimization problems
lie at the heart of most machine learning approaches. Machine learning researchers have embraced the advances in
convex mathematical programming contributing to rapid advances in machine
learning models and algorithms. In
turn, the particular nature of machine learning models has stimulated new
advances in convex optimization.
nonsmooth optimization methods have led to massively scalable algorithms for
solution of convex machine learning models such as Support Vector
Machines. However convex machine learning models
are not enough. Using problems in
drug discovery, we will illustrate how nonconvex
elegant capture learning tasks and
how novel nonsmooth nonconvex optimization methods can be analogously used to
create scalable algorithms for solving these nonconvex machine learning models. Advances in nonsmooth and nonconvex optimization theory
and algorithms to meet the challenges of machine learning and data mining could
contribute to another surge of synergistic advances in both learning and
mathematical programming with far reaching impact in many domains.
Scalable Topic Modeling
Probabilistic topic modeling provides a suite of tools for the unsupervised analysis of large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. This analysis can be used for corpus exploration, document search, and a variety of prediction problems.
In this talk, I will review the state-of-the-art in probabilistic topic models. I will describe the ideas behind latent Dirichlet allocation and its extensions, and I'll point to some of the software
available for topic modeling.
I will then describe our online strategy for fitting topic models, which lets us analyze massive document collections and document collections arriving in a stream. Our algorithm can fit models to millions of articles in a matter of hours, and I will present a study of 3.3M articles from Wikipedia. These results show that the online approach finds topic models that are as good as or better than those found with traditional inference algorithms.
(Joint work with Matthew Hoffman, Francis Bach, and Chong Wang)
Some Sketchy Results
At&T Labs - Research
'Sketch' data structures are simple, efficient, compact summaries of
large data sets. Sketches enable approximate computations like
similarity comparison and summarization of distributions. There have
been a number of successes where use of sketches have been applied in
the field, most notably when dealing with very large data sets. In this
talk, I'll present some key sketches and their properties, and describe
some successes and some non-successes in their application. I'll
conclude by outlining emerging applications for sketches, and their
Scaling data mining techniques
In this talk, we will discuss problems that arise from scaling data mining and machine learning techniques for large,
sparse, high dimensional datasets. We will discuss this in the context
of two theoretical constructs -- random projection and locality
sensitive hashing. Random projection has been much studied as a
primitive in various algorithms to control dependence on input
dimension. We will show a machine learning application that motivates,
in addition, the preservation of sparsity of input vectors, and show a
corresponding new construction. We will also discuss improvements to
locality sensitive hashing techniques that enable a theoretically
justifiable, and empirically observed lift in performance.
Minimizing Communication in Linear Algebra
Algorithms have two kinds of costs: arithmetic and communication, by which we mean moving data either between levels of a memory hierarchy (in the sequential case) or between processors over a network (in the parallel case). Communication costs can already exceed arithmetic costs by orders of magnitude, and the gap is growing exponentially over time following technological trends, so our goal is to design linear algebra algorithms that minimize communication. First, we show how to extend known communication lower bounds for O(n^3) dense matrix multiplication to all direct linear algebra, i.e. for solving linear systems, least squares problems, eigenproblems and the SVD, for dense or sparse matrices, and for sequential or parallel machines. We also describe new algorithms that attain these lower bounds; some implementations attain large speed ups over conventional algorithms. Second, we show how to minimize communication in Krylov-subspace methods for solving sparse linear system and eigenproblems, and again demonstrate new algorithms with significant speedups.
Universal laws and architectures
Essential dimensions of complex
systems research are the hard limits on what is achievable (laws), the
organizing principles that succeed or fail in achieving them (architectures and
protocols), the resulting behavior observed in real systems (behavior, data),
and the processes by which systems evolve (variation, selection, design). Hard limits on measurement, prediction,
communication, computation, decision, and control, as well as the underlying
physical energy and material conversion mechanism necessary to implement these
abstract processes are at the heart of modern mathematical theories of systems
in engineering and science.
Unfortunately, these subjects remain largely fragmented and incompatible,
even as the tradeoffs between these
limits are of growing. The principle aim of this talk is to
motivate an integrated theory based on optimization and robustness.
Insights can be drawn from three
converging research themes. First, detailed description of components and a
growing attention to systems biology and neuroscience, the organizational
principles of organisms and evolution are becoming increasingly apparent. While the components differ and the
system processes are far less integrated, advanced technology’s complexity is now
approaching biology’s and there are striking convergences at the level of
organization and architecture. Determining what is essential about this convergence and what
is merely historical accident
requires a deeper understanding of architecture — the most universal,
high-level, persistent elements of organization — and protocols. Protocols
define how diverse modules interact, and architecture defines how sets of
protocols are organized. Finally, new mathematical frameworks for the study of
complex networks suggests that this apparent network-level evolutionary
convergence within/between biology/technology is not accidental, but follows
necessarily from their universal system requirements to be fast, efficient,
adaptive, evolvable, and most critically, robust to perturbations in their environment
and component parts.
We will sketch the underlying mathematical
ideas and illustrate the key results with case studies drawn from statistical mechanics,
turbulence, cell biology, human physiology and medicine, wildfire ecology,
earthquakes, the Internet, market volatility, toys, fashion, and
neuroscience. If there is time we
will also review (i.e. poke fun at) a remarkably popular and persistent source
of error and confusion, namely the many past and ongoing attempts to find “new
sciences” of complex systems and networks. This highlights another apparent tradeoff we must overcome
in complex systems, between the popularity and accessibility of ideas versus
their mathematical rigor and practical relevance.
From FOCS to Sand Hill Road
In this talk, I'll be presenting the steps of a multi-phase transformation:
1) My theory work on cache-oblivious algorithms
2) Collaborative systems/algorithmic work on high-performance file systems and how that changed the theory I do
3) The breakthrough that lead to my company, Tokutek
4) What I've learned about how people use software and how that changed the theory I do
5) What it means to do algorithms work in a wide context.
Statistical Methods and Algorithms for the Analysis of
Social Network Data
Stephen E. Fienberg
Department of Statistics, Heinz College, and Machine
Carnegie Mellon University
There are three broad classes of statistical models used for
the analysis of (social) network data:
exponential random graph models, latent space models, and mixed
membership stochastic blockmodels. They have been used in both static (cross-sectional) and dynamic
(longitudinal) network situations. “Exact” computation for such models often relies on iterative or
simulation methods, such as Markov chain Monte Carlo methods, which do not
scale especially well to large sparse network settings. For some models good approximations are
available. We give an
overview of the models and computational approaches for estimating their
parameters. We also discuss model
degeneracies that pose special problems of both computational and inferential
sorts. Fast and scalable
computations of network motifs have been of special interest in the machine
learning literature but they do not appear to help enough for problems of model
estimation and assessing model goodness-of-fit.
Reflections on Algorithm Engineering in Computer Graphics
University of Toronto
Computer graphics is both a mathematical science and an engineering
discipline. Mathematical modelling of biological, geometric and
physical phenomena are a common first step; these models then require
careful transformation into simulation and control algorithms that are
aware of the constraints imposed by numerical precision, large
heterogeneous data requirements, and idiosyncrasies of different
computing platforms, run-time systems, operating systems and languages.
In this, computer graphics is not much different from other numerically
rich application areas. The differentiation is that error and
complexity measures are at best a starting point for determining a
"good" implementation, for the human visual system is the final arbiter,
both of image quality and of responsiveness. I will survey some of
these issues and propose that we enhance the education of our
undergraduate and particularly graduate students in applied computation.
Specialization as key to Cloud efficiency
As an increasingly large fraction of computing takes place in the Cloud,
the growing scale of data centers is leading to new trends in
computing. A traditional goal of data centers is to have a homogeneous
server architecture in order to (i) reduce administration costs for
maintenance, diagnosis, and repair and (ii) simplify load balancing. However, any one architecture is suboptimal (in node costs, energy
costs, and performance) given the variety of loud
applications/algorithms, including compute-intensive algorithms, random
I/O algorithms, streaming algorithms, memory-bound algorithms, and
algorithms exploiting hardware assists such as GPUs.
Specialization is the key to efficiency, in societies and in clouds.
The massive scale of data centers makes it cost effective for a single
data center to have a mix of server architectures, each specialized to
an important class of applications. Algorithm designers should take
this new trend into consideration, both in algorithms to be run in the
cloud and in algorithms that support cloud infrastructure (such as load
Principles of Fact and Ontology Mining from the Web
The World-Wide Web can be viewed as the biggest repository of factual information available to mankind. Facts on the Web are typically embedded in text and sometimes presented in more structured form such as tables or lists. As a result, since the WWW emerged, it has been mined for knowledge bases that can be used for query answering and higher-level inference. I will describe some of the challenges we face when trying to extract knowledge bases from the Web. In particular, I will focus on two issues that emerged in our work on the WebTables Project, a collection of over 100 million tables extracted from the Web. The first issue is the need for a principled theory for assigning confidence for facts mined from the web that considers the uneven popularity of different facts on the Web (e.g., "capital of France" vs. "capital of Malawi") and limitations of the extractors we employ. The second issue is the need to develop more flexible knowledge representation mechanisms (schema languages) that can deal with the inherent uncertainty and lack of central data modeling on the Web.
Online Matching of Ads with Advertisers
Online matching was introduced by Karp-Vazirani-Vazirani in 1990's. In the last decade the setup was used to study the problem of allocating ad slots to advertisers. Online algorithms were proposed with worst case ratio of 63% and works only for hard budgets.
One criticism of such algorithms and analysis for practical use is their pessimistic viewpoint. We suggest a flexible online algorithm. Our algorithm is flexible in many ways.
1. It allows soft budgets.
2. It allows the optimal performance to be computed on a computer.
3. It allows accommodating the known stochastic information.
4. It allows tuning the algorithm during the run time, such as making it more aggressive or conservative, depending on whether one is going luckier or unluckier than usual.
The analysis of the algorithm shows that a tight-example and an optimal algorithm are primal-dual pairs or in other words satisfy min-max relationship. In doing so we introduced a new kind of duality for convex programs and adapt the primal-dual algorithmic technique to convex programs. This technique usually result in fast combinatorial algorithms.
Cheap Wins: Applications from Elementary Algorithms
While theoretical computer science often develops deep and rich
techniques by abstracting important problems from specific application
domains, there are many cases where the standard algorithmic toolbox
already contains the answers to important problems. In such cases the
key contribution of the theoretician may be simply to recognize and
apply the appropriate solutions. I'll run through some examples from
natural language processing, peer to peer systems, networking, coding,
and computational biology. And I'll explore the place of this kind of
algorithms "service" work on our theoretical field.
Algorithms and Social Networks: How Close are our Ties?
Social Networks have been one of the application areas that have inspired quite a lot of algorithms work lately. This is only natural: on the one hand, they are a very tangible concept that laymen can relate to (friendships, Facebook); on the other hand, they are naturally modeled as graphs (one of our favorite classes of objects), and give rise to novel algorithmic questions on them.
A large fraction of the work has focused on what we can do with information on social networks. Algorithmic tasks include identifying social communities (densely connected groups of individuals), tracking their evolution, locating influential individuals, or devising viral marketing strategies. A complementary approach views the networks themselves as executing algorithms, such as routing messages, spreading information, facilitating trades, or allocating social stature or power.
Several of these directions have shown appeal to multiple communities with different tool sets and approaches, including theoretical computer science, physics, machine learning, and data mining. Often, the strucutral results and algorithmic techniques are intriguing, and long sequences of papers explore ever-refined versions of the models or definitions.
At the same time, our point of departure - treating social networks simply as graphs, or perhaps graph augmented with a simplistic dynamic of our own design - frequently creates a disconnect to the actual
application area (social networks) we would like to have an impact on. I would like to argue that as a community, we should perhaps try to understand more closely what questions matter to sociologists or
researchers who really deal with social networks. More importantly, we should gain more understanding of actual human behavior. This may help push our contribution from intriguing papers to impact in the field.
Enriching Education Through Data Mining
Education is acknowledged to be the primary vehicle for improving the economic well-being of people. Textbooks have a direct bearing on the quality of education imparted to the students as they are the primary conduits for delivering content knowledge. They are also indispensable for fostering teacher learning and constitute a key component of the ongoing professional development of the teachers.
Many textbooks, particularly from emerging countries, lack clear and adequate coverage of important concepts. In this talk, we present our early explorations into developing a data mining based approach for enhancing the quality of textbooks. We discuss techniques for algorithmically augmenting different sections of a book with links to selective content mined from the Web. For finding authoritative articles, we first identify the set of key concept phrases contained in a section. Using these phrases, we find web (Wikipedia) articles that represent the central concepts presented in the section and augment the section with links to them. We also describe a framework for finding images that are most relevant to a section of the textbook, while respecting global relevancy to the entire chapter to which the section belongs. We pose this problem of matching images to sections in a textbook chapter as an optimization problem and present an efficient algorithm for solving it.
We also present a diagnostic tool for identifying those sections of a book that are not well-written and hence should be candidates for enrichment. We propose a probabilistic decision model for this purpose, which is based on syntactic complexity of the writing and the newly introduced notion of the dispersion of key concepts mentioned in the section. The model is learned using a tune set which is automatically generated in a novel way. This procedure maps sampled text book sections to the closest versions of Wikipedia articles having similar content and uses the maturity of those versions to assign need-for-enrichment labels. The maturity of a version is computed by considering the revision history of the corresponding Wikipedia article and convolving the changes in size with a smoothing filter.
We also provide the results of applying the proposed techniques to a corpus of widely-used, high school textbooks published by the National Council of Educational Research and Training (NCERT), India. We consider books from grades IXÐXII, covering four broad subject areas, namely, Sciences, Social Sciences, Commerce, and Mathematics. The preliminary results are encouraging and indicate that developing technological approaches to enhancing the quality of textbooks could be a promising direction for research for our field.
Algorithm Design using Spectral Graph Theory
Gary L Miller
Computer Science Department, Carnegie Mellon
Spectral Graph Theory is the interplay between linear algebra and combinatorial graph theory. One application of this interplay is a nearly linear time solvers Symmetric Diagonally Dominate system (SDD). This seemingly restrictive class of systems has received substantial interest and in the last 15 years both algorithm design theory and practical implementations have made substantial progress. There is also a growing number of problems that can be efficiently solved using SDD solvers including: image segmentation, image denoising, finding solutions to elliptic equations, computing maximum flow in a graph, graph sparsification, and graphics.
Theoretically, we have recently shown that if $A$ is an $nxn$ SDD matrix with $m$ entries the linear system $Ax =b$ can be solve to constant precision in $O(m \log n)$ time ignoring lower order factors. On the implementation side, the latest code runs in linear time experimentally for large sparse systems. In fact the problem sizes are now large enough that on modern multicore workstations the memory bandwidth is a limiting factor. Compression techniques are now used to substantially speedup the solver.
This represents joint work with: Guy Blelloch, Ioannis Koutis, Richard Pang, Kanat Tangwongsan,
and David Tolliver
Algorithmic Aspects of Real-Time System Design
Aloysius K. Mok
Department of Computer Science, University of Texas at Austin
Real-time systems have timeliness properties that must be satisfied.
Because the real-time behavior of programs depends on implementation
details, the traditional software design approach that abstracts away
implementation details and focuses on functional correctness does not
work well for real-time system design. To address this issue, we
describe an annotation scheme that is used to superimpose timeliness
assertions on top of the functional specification of programs. The first
part of this talk will discuss some formal aspects of the annotation
scheme, including the logical basis, the algorithmic analysis and
synthesis problems and the related issues such as implementability and
robustness in real-time scheduler design.
The second part of this talk will address the design problems of
real-time systems in an open system environment. In the open system
environment, a set of resources is shared by multiple applications each
of which has its own resource scheduling policy, and the resource
scheduling policy of each individual application is not accessible by
other applications. Because of the resource scheduler isolation, meeting
the timeliness requirements requires new resource sharing models and
analysis techniques. We shall discuss the concept of real-time resource
virtualization and the related resource allocation problems. If time
permits, we shall touch on real-time database issues and speculate about
real-time intelligent agent design.
It pays to do the right thing: Incentive mechanisms for Societal Networks
Why did kamikaze pilots wear helmets?
Why does glue not stick to the inside of the bottle?
Why is lemonade made with artificial flavor but dishwashing liquid made with real lemons?
How can I avoid traffic jams and be paid for it?
While the first three are some of life's enduring questions, the fourth
is the subject of decongestion research projects at Stanford University
and in Singapore. In this talk, I will briefly describe these projects
and, more generally, discuss incentive mechanisms for Societal Networks;
for example, transportation, energy, healthcare and waste management
networks. I will talk about incentive mechanisms and experiments for
reducing road congestion, pollution and energy use, and for improving
"wellness" and good driving habits. Some salient themes are: using
low-cost sensing technology to make Societal Networks much more
efficient, using price as a signal to co-ordinate individual behavior,
and intelligently "throwing money at problems".
Insights into Routing in Real-world Networks
Sociologist Stanley Milgram's experiments on the "small-world" phenomenon have
led to a variety of fascinating questions at the intersection of sociology and computer
science. In this work, we seek to understand what makes routing algorithms similar
to those employed by Milgram work as effectively as they do. To this end, we develop
a network models and tools, and provide theoretical and empirical answers this question.
While affirming the importance of the role of weak ties (as pioneered by Granovetter in
sociology and modeled crisply by Kleinberg in CS), our work highlights the role of
two phenomena -- the existence of a dual space of affiliations of the members of the
network, and the emergence of central nodes in the network.
Dealing with Massive Data: Parallelism to the Rescue
The present availability of data has lead to data analysis
problems whose size would be unimaginable just 10 years ago. With
datasets routinely measured in gigabytes and terabytes, one challenge
that has emerged is scaling algorithms to handle this deluge of
information. With chip designers sustaining Moore’s law by turning to
parallelism, so have algorithmicists turned to parallelism to speed up
algorithms and improve performance. In this talk, we will focus on the
MapReduce model of parallel computing, highlight some of the recent
algorithmic advances and present current challenges.