ExploreDB 2015
2nd International Workshop on Exploratory Search in Databases and the Web
Co-located with SIGMOD/PODS 2015
Melbourne, Australia
May 31, 2015


         Title: Explore-By-Example: A New Database Service for Interactive Data Exploration

Abstract: Traditional DBMSs are suited for applications in which the structure, meaning and contents of the database, as well as the questions (queries) to be asked, are all well-understood. However, this is no longer true when the volume and diversity of data grow at an unprecedented rate, while the user ability to comprehend data remains (as limited) as before. To address the increasing disparity in the “big data - same humans” problem, our project explores a new approach of system-aided exploration of a big data space and automatic learning of the user interest in order to retrieve all objects that match the user interest—we call this new service “interactive data exploration”, which complements the traditional querying interface of a database system.

In this talk, I introduce a new framework for interactive data exploration, called “Explore-by-Example”, which iteratively seeks user relevance feedback on database samples and uses such feedback to finally predict a query that retrieves all objects of interest to the user. The goal is to make such exploration converge fast to the true user interest model, while minimizing the user labeling effort and providing interactive performance in each iteration. I discuss a range of techniques and optimizations to do so for linear patterns and complex non-linear patterns. Our user study indicates that our approach can significantly reduce the user effort and the total exploration time, compared with the common practice of manual exploration. I finally conclude the talk by pointing out a host of new challenges, ranging from application of active learning theory, to database optimizations, to visualization. 

Bio: Yanlei Diao is an Associate Professor of Computer Science at the University of Massachusetts Amherst. Her research interests are in information architectures and data management systems, with a focus on big data analytics, scientific analytics, data streams, uncertain data management, and RFID and sensor data management. She received her PhD in Computer Science from the University of California, Berkeley in 2005, her M.S. in Computer Science from the Hong Kong University of Science and Technology in 2000, and her B.S. in Computer Science from Fudan University in 1998.

Yanlei Diao was a recipient of the 2013 CRA-W Borg Early Career Award (one female computer scientist selected each year), IBM Scalable Innovation Faculty Award, and NSF Career Award, and she was a finalist of the Microsoft Research New Faculty Award. She spoke at the Distinguished Faculty Lecture Series at the University of Texas at Austin. Her PhD dissertation “Query Processing for Large-Scale XML Message Brokering” won the 2006 ACM-SIGMOD Dissertation Award Honorable Mention. She is currently Editor-in-Chief of the ACM SIGMOD Record, Associate Editor of ACM TODS, Associate Editor of VLDB 2016, and member of the SIGMOD Executive Committee and SIGMOD Software Systems Award Committee. In the past, she has served as Associate Editor of PVLDB, Area Chair of SIGMOD, organizing committee member of SIGMOD, CIDR, DMSN, and the New England Database Summit, as well as on the program committees of many international conferences and workshops. 

         Title: Principled Optimization Frameworks for Query Reformulation of Database Queries

Abstract: Traditional databases have traditionally supported the Boolean retrieval model, where a query returns all tuples that match the selection conditions specified – no more and no less. Such a query model is often inconvenient for naïve users conducting searches that are often exploratory in nature, since the user may not have a complete idea, or a firm opinion of what she may be looking for. This is especially relevant in the context of the Deep Web, which offers a plethora of searchable data sources such as electronic products, transportation choices, apparel, investment options, etc.  Users often encounter two types of problems : (a) they may under-specify the items of interest, and find too many items satisfying the given conditions (the many answers problem), or (b) they may over-specify the items of interest, and find no item in the source satisfying all the provided conditions (the empty answer problem).

In this talk I will discuss our recent efforts in developing techniques for iterative “query reformulation” by which the system guides the user in a systematic way through several small steps, where each step suggests slight query modifications, until the query reaches a form that generates desirable answers. Our proposed approaches for suggesting query reformulations are driven by novel probabilistic frameworks based on optimizing a wide variety of application-dependent objective functions.

Bio: Gautam Das is Professor of Computer Science and Engineering and Head of Database Exploration Laboratory (DBXLab) at the University of Texas at Arlington. He has also held positions at Microsoft Research and the University of Memphis. He graduated with a B.Tech from IIT Kanpur, India, and with a Ph.D from the University of Wisconsin, Madison. His research interests span data mining, information retrieval, big data management, social media analytics, and theoretical algorithms. His research has resulted in over 150+ papers, many of which have appeared in premier conferences and journals such as SIGMOD, VLDB, ICDE, KDD, TODS, and TKDE. His work has received several awards, including the IEEE ICDE 2012 Influential Paper award, Best Student Paper Award of CIKM 2013, VLDB Journal special issues on Best Papers of VLDB 2012 and 2007, Best Paper of ECML/PKDD 2006, and Best Paper (runner up) of ACM SIGKDD 1998. Dr. Das is in the Editorial Board of ACM TODS and IEEE TKDE, and has served the organization and program committees of numerous premier conferences. His research has been supported by US National Science Foundation, US Army Research Office, US Office of Naval Research, US Department of Education, Texas Higher Education Coordinating Board, Microsoft Research, Nokia Research, and Cadence Design Systems.