We are pleased to welcome three invited speakers covering a range of perspectives on data science and journalism.
Nick Diakopoulos (University of Maryland):
Advanced data mining and machine learning techniques are now being used throughout the news industry to extract editorial value from massive stores of data and documents. Computation provides a sixth-sense for journalists to scan for news at scale: streams of data are monitored and mined in real-time, triggering alerts to reporters and editors. In investigative journalism, machine-learned classifiers sift through thousands of documents to identify the most newsworthy, similarity algorithms help connect the dots and orient journalists towards the most promising leads, and modeling techniques contribute to algorithmic accountability. Curation algorithms rank and surface the most salient and high-quality social media content so that journalists can vet, verify, and refine it editorially before publication. This talk will explore journalistic use-cases and tasks that will benefit from data mining techniques, suggesting opportunities for researchers seeking to apply data mining techniques in journalistic scenarios. In addition, the risks, caveats, ethics and ways in which human values are reflected in data mining algorithms will be discussed with respect to their impact on news coverage for society.
Hannes Munzinger (Süddeutsche Zeitung):
This talk will give insights into how the small German newspaper Süddeutsche Zeitung managed to publish the biggest investigation in its history and maybe the scoop of a decade. The Panama Papers opened up a secretive world of offshore companies used by the rich and powerful.
11.5 Million documents with information about more than 200,000 shell companies had to be organized, made machine readable, searchable, and cross-matched with other datasets. A big task for a group of six people, who had never done something like this before. But Süddeutsche Zeitung together with the International Consortium of Investigative Journalists (ICIJ) set a new standard of global collaboration in the field of investigative Journalism. After the big buzz it is now the time to rethink how a small paper can use open source intelligence tools and data mining techniques to work with the data they obtained and in future investigations.
Clay Eltzroth (Bloomberg):
Five years ago, the idea of discovering breaking, market-moving news on social media seemed far-fetched and out of the ordinary for most newsrooms, including Bloomberg's. Fast forward and we are now able to discover, verify and disseminate breaking news faster than ever before, using data analysis, natural language processing and machine learning. Speed and efficiency aren't the only things that have changed over the years; the behavior of journalists, public figures and consumers around the world have changed too, and we are all faced with the challenges of "fake news" and the rise in the influence of bots. This presentation will look at how Bloomberg brought together data scientists and journalists to develop better tools and workflow processes to be faster in delivering the news to our readers without sacrificing accuracy. We're also developing and expanding best practices. There's a better real time understanding of opinions, events and individual people. The next logical step is quickly turning breaking news into video and using the data to help visualize an event or part of an event to tell the story.