Long-tailed distributions are very common in information retrieval (IR) problems. A good example is the distribution of queries submitted to a search engine. In this talk, I will highlight cases where IR research has focused efforts on performing well on the head of the distribution and ignored ‘difficult’ long-tail cases. In addition, I will give examples of problems and challenges we face at Signal, where it is crucial to put effort into improving and evaluating performance of cases on the long tail.
I am a researcher in Information Retrieval (IR). Since September 2015, I have joined Signal AI, where I currently lead a team of data scientists and engineers with a focus on researching and developing text analytics and retrieval components for a large-scale media monitoring platform. Prior to that, I was a post-doctoral researcher in the Terrier IR research group at the University of Glasgow, after obtaining a PhD degree in 2012 from the University of Essex.
Intranet Focus Ltd and University of Sheffield
IR evaluation (and hopefully optimisation!) depends heavily on curated test collections and curated cohorts of users. In this paper I outline how employees use enterprise search applications, raise the issue about the relevance of the classic IR metrics (especially precision at n) and outline the challenges of creating test collections and user cohorts. To close I will present the concept of search satisfaction as an enterprise, and potentially wider, search metric.
Martin White started using computer IR applications in 1975 (that is not a misprint!) and has authored four out of the five books on enterprise search. He set up Intranet Focus Ltd in 1999 and quickly realised that search was suffering from benign neglect. Over the last decade most of his consulting engagements have focused on enterprise search optimisation. Martin has been a Visiting Professor in the Information School at the University of Sheffield since 2002.
Drawing on 20 years experience working with open source search engines, Charlie will describe the history of the search company he co-founded, Flax, and the lessons he has learned building relevant, powerful and scalable search solutions for clients including Reed Specialist Recruitment, Historic England and Cambridge University Press. He'll talk about the open source software his company used, built and contributed to and how it was used in several client projects. He'll talk about how students and academics moving into industry can learn more about the software widely used for search applications and suggest next steps for those interested.
Charlie Hull keeps a strategic view on developments in the search industry and is in demand as a speaker at conferences across the world. Charlie writes regularly on search issues and runs the London Lucene/Solr Meetup. Charlie co-founded Flax in 2001, the UK’s leading open source search company and now part of OpenSource Connections, where he acts as a Managing Consultant working with clients across the world to empower their search teams.
Charlie co-authored Searching the Enterprise with Professor Udo Kruschwitz of the University of Essex, part of the Foundations and Trends® in Information Retrieval series published by Now Publications. The book was reviewed by Martin White of Intranet Focus who said “There are of course many books on information retrieval… but Searching the Enterprise is in a class of its own”. Charlie is also a member of the Search Network, an independent group of search professionals who release regular free reports on search issues.
Microsoft Research
Interpretability for ML systems has been the subject of research for decades, largely from the point of view of debuggability. More recently, the proliferation of ML into diverse aspects of everyday life, as well as increased awareness of existing or possible biases have prompted for ML systems to explain their decisions or even their decision making processes. Following this, a range of ongoing initiatives have been set up with an agenda towards fairness, accountability and transparency in ML.
Web search engines are examples of complex ML systems that given a high dimensional input space produce rankings of resources as a result of implicit comparisons and decisions among candidate pages. Traditionally, little explanation has been provided or indeed was required of why search results appear on a search engine result page (SERP) or at least the explanation simply equated the notion of relevance, which has been studied at great depth. In this talk, I will reflect on what explainability means in search and how it relates to user's mental models of relevance, SEO best practices and internal policy decisions on page quality.