Program
The 1st Workshop on Data Science with Human in the Loop (DaSH)
8:00 – 8:05am: Workshop introduction
Session Chair: Lucian Popa, IBM Almaden
8:05 – 8:50am: Invited talk 1 (Marti Hearst)
Title: Human-in-the-Loop from the Human Perspective (slides)
Abstract: As stated in this workshop’s call for participation, in order to unleash the full potential of data science, we need to improve our understanding about the best modalities of human and computer cooperation along the data science pipeline. The accepted papers in this workshop advance the future of human-machine interaction in data analysis, and include new algorithms for active learning, new user interfaces for allowing analysts to augment algorithms, algorithms to automate parts of the analysis, and trenchant forecasts of the future of work in the field of data science.
My contribution to this conversation will be twofold. I will first share results of a survey of professional information analysts’, relating their views about the role of machine automation in the process of exploratory data analysis. I will then discuss results in peer learning in online education, and how these ideas might be applicable to advanced human-machine analysis tasks.
Session Chair: Yunyao Li, IBM Almaden
8:50 – 9:20am: Session 1 (Human-in-the-Loop Techniques)
Bhavya Ghai, Q. Vera Liao, Yunfeng Zhang and Klaus Mueller. Active Learning++: Incorporating Annotator’s Rationale using Local Model Explanation
Kun Qian, Lucian Popa and Yunyao Li. An Intuitive User Interface for Human-in-the-loop Entity Name Parsing and Entity Variant Generation
Eric Bunch, Qian You and Glenn Fung. Human-In-The-Loop Topic Discovery with Embedded Text Representations
Teja Kanchinadam, Keith Westpfahl, Qian You and Glenn Fung. Rationale-based Human-in-the-Loop via Supervised Attention
Session Chair: Slobodan Vucetic, Temple University
9:25 – 9:55 am: Session 2 (Model Analysis and Applications)
Maeda Hanafi, Azza Abouzied, Marina Danilevsky and Yunyao Li. WhyFlow: Explaining Errors in Data Flows Interactively
Nikolaos Lagos, Jose Miguel Perez, Michel Langlais and Adrian Mos. Towards a What-If System for Point-Of-Interest Categorisation
Subhajit Das and Florina Dutt. InMacs: Interactive modeling and comparison of sentiments from sequence data
Hamsa Shwetha Venkataram, Ian Colwell, Steven Liu, Philip Southam, Chris Mattmann and Tomas Soderstrom. Names Don't Fly: Smart Filters for Profanity Detection and Classification in User-Generated Content
Session Chair: Eduard Dragut, Temple University
10:00 – 10:30am: 3rd Session (Impact of Data Science and Automation)
Josh Andres, Christine Wolf, Michael Muller, Justin Weisz, Narendra Nath Joshi, Aabhas Sharma, Krissy Brimijoin, Michael Desmond, Zahra Ashktorab, Qian Pan, Evelyn Duesterwald and Casey Dugan. Cultivating Human Expertise Through AI-Assisted Data Science
Justin Weisz and Michael Muller. The Next Decade of Data Science
Diarmuid Cahalane, Patrick Connolly, Andrew Dalton, Bogdan Saceleanu and Medb Corcoran. Data Scientists and Designers Can Create The Jobs of The Future
Dakuo Wang, Josh Andres, Justin Weisz, Erick Oduor, Udayan Khurana, Horst Samulowitz, Arunima Chaudhary, Abel Valente, Dustin Torres and Casey Dugan. IBM AutoAI: Human-in-the-Loop Automated Machine Learning Supports Data Scientists to Build Better Models
Session Chair: Lucian Popa, IBM Almaden
10:30 – 11:15: Invited talk 2 (AnHai Doan)
Title: Human-in-the-Loop Challenges for Entity Matching: A Report from the Trenches
Abstract: Entity matching (EM) is a fundamental problem in data science. Many data science projects must integrate multiple data sources, before analysis can be carried out to extract insights, and such integration often requires EM. In the past five years, we have been building Magellan, a general platform that uses machine learning, big data processing, and effective user interaction to solve EM problems. Magellan has been deployed at 12 companies and domain science groups, recently commercialized by GreenBay Technologies, and pushed into commercial EM platforms at Informatica, the world-leading data integration company. In this talk, I will discuss human-in-the-loop (HIL) challenges we faced in Magellan, and how we designed Magellan from the scratch using HIL principles. Specifically, I will discuss how we identify the end-to-end process that a user must follow to perform EM, then develop semi-automatic tools to support the various steps in the process. I will also discuss why we designed tools to be atomic, highly interoperable, and built into popular ecosystems of data science tools. Finally, I discuss lessons learned which can potentially be applied to other problem settings in data science. It is my hope that more researchers will investigate EM, as it can be a rich “playground” for HIL research.
Session Chair: Eduard Dragut, Temple University
11:15 – 12:00: Panel on Open challenges in human-computer cooperation in data science
Azza Abouzied, NYU Abu Dhabi
AnHai Doan, University of Wisconsin
Marti Hearst, University of California, Berkeley
Xiang Ren, University of Southern California
Session Chair: Yunyao Li, IBM Almaden