BIR 2021

11th International Workshop on Bibliometric-enhanced Information Retrieval

The Bibliometric-enhanced Information Retrieval workshop series (BIR) at ECIR goes into its 11th iteration. In this workshop held online in conjunction with the 43rd European Conference on Information Retrieval (ECIR 2021), we will tackle issues related to academic search, at the intersction between Information Retrieval and Bibliometrics. We strive to get the ‘retrievalists’ and ‘citationists’ active in both academia and the industry together, who are developing search engines and recommender systems for scholarly search. An overview of the BIR/BIRNDL workshop series can be found at https://sites.google.com/view/bir-ws/home.

BIR 2021 took place online on April 1, 2021. The BIR 2021 proceedings were published by CEUR. Past BIR(NDL) proceedings are online as open access.

Workshop proceedings: http://ceur-ws.org/Vol-2847/
Playlist of videos (recorded talks): https://www.youtube.com/playlist?list=PLK4WlSr348zmuZHOzC15W2DYR3gdF6n9n
BIR 2021 SIGIR Forum event report
Another BIR please - BIR 2021 article in BCS Informer

Important Dates

Submissions: 19 January 2021 25 January 2021 midnight AoE
Notifications: 22 February 2021
Camera Ready Contributions: 12 March 2021
Workshop: April 1, 2021 (Online)

Keynotes

Lucy Lu Wang (Allen Institute for AI, USA): Text mining insights from the COVID-19 pandemic (slides, video recording)Over the last year, the novel coronavirus SARS-CoV-2 generated significant upheaval in the world and in scientific publishing, yet it provided a unique sandbox in which to test and innovate upon the latest text mining and information retrieval technologies. We saw the emergence of novel information retrieval and NLP tasks with the potential to change the way information from scientific literature is communicated to healthcare providers and public health researchers. In this talk, I will discuss some of the ways the computing community came together to tackle this challenge, with the release of open data resources like CORD-19 and the introduction of various shared tasks for evaluation. I will also present our work on scientific fact checking, a novel NLP task that looks to address issues around scientific misinformation, and its practical uses in managing conflicting information arising from COVID-19 pandemic publishing.
Ludo Waltman (CWTS, the Netherlands): Openness, transparency, and inclusivity in science: What does it mean for information retrieval? (slides, video recording) The research system is moving in a direction of increased openness, transparency, and inclusivity. This offers exciting new possibilities for scholarly literature search. At the same time it also changes the expectations users have of search systems for scholarly literature. New opportunities are provided by the increased openness of the metadata of scientific outputs, and sometimes also of the outputs themselves. Calls for increased transparency and inclusivity raise complex questions about the responsibilities of those who manage search systems for scholarly literature and about the benefits as well as the risks of new AI-based approaches to scholarly literature search. While acknowledging that there are no easy answers, I will share some of my thoughts on the various issues that the BIR community may need to reflect on.
Jimmy Lin (University of Waterloo, Canada): Domain Adaptation to Scientific Texts and the Limits of Scale? (slides, video recording) t SIGIR FOA fundamental assumption behind bibliometric-enhanced information retrieval is that ranking models need to be adapted to handle scientific text, which are very different from the typical corpora (Wikipedia, books, web crawls, etc.) used to pretrain large-scale transformers. One common approach is to take a large "general-domain" model and then apply domain adaptation techniques to "customize" it for a specific (scientific) domain. While we have pursued this research direction, it appears that the far less satisfying approach of "just throwing more data at the problem" with increasingly larger pretrained transformers seems to be more effective. In fact, over the last year, my group has "won" multiple community-wide shared evaluations focused on texts related to the novel coronavirus SARS-CoV-2 using exactly this approach: document ranking (TREC-COVID, TREC Health Misinformation), question answering (EPIC-QA), and fact verification (SciFcat). We have been deeply frustrated by our own inability for "smarter" to beat "bigger". In this talk, I will share our efforts to grapple with these issues and perhaps to understand... why?

Program

All times are Central European Summer Time (CEST, UTC+02:00, daylight saving time). Current time in Lucca. Please also have a look at the BIR agenda on ECIR's Whova page.

Playlist of videos (recorded talks): https://www.youtube.com/playlist?list=PLK4WlSr348zmuZHOzC15W2DYR3gdF6n9n

Note to presenters: Long papers = 15 minutes + 5 minutes for questions / Short papers: 7 minutes + 3 minutes for questions.

Accepted papers

Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš and Shigeo Morishima: Self-Supervised Learning for Visual Summary Identification in Scientific Publications (long paper)
Ken Voskuil and Suzan Verberne: Improving reference mining in patents with BERT (long paper)
Manajit Chakraborty, David Zimmermann and Fabio Crestani: PatentQuest: A User-Oriented Tool for Integrated Patent Search (long paper)
Pablo Accuosto, Mariana Neves and Horacio Saggion: Argumentation mining in scientific literature: From computational linguistics to biomedicine (long paper)
Ahmed Abura'Ed and Horacio Saggion: Automatic Generation of Related Work Reports (long paper)

Jacqueline Sachse: Bibliometric Indicators and Relevance Criteria – An Online Experiment (short paper)
Frederique Bordignon, Liana Ermakova and Marianne Noel: Preprint abstracts in times of crisis: a comparative study with the pre-pandemic period (short paper)
Hiran H Lathabai, Abhirup Nandy and Vivek Kumar Singh: Expertise based institutional recommendation in different thematic areas (short paper)
Daria Alexander and Arjen P. de Vries: "This research is funded by...": Named Entity Recognition of financial information in research papers (short paper)

Workshop Topics

We welcome (but are not limited to) submissions regarding the aspects of academic search below:

Information seeking & searching with scientific information, such as:
- Finding relevant papers/authors for a literature review.
- Measuring the degree of plagiarism in a paper.
- Identifying expert reviewers for a given submission.
- Flagging predatory conferences and journals.
- Information seeking behaviour and human-computer interaction in academic search.

Mining the scientific literature, such as:
- Information extraction, text mining and parsing of scholarly literature.
- Natural language processing (e.g., citation contexts).
- Discourse modelling and argument mining.

Academic search/recommender systems, such as:
- Modelling the multifaceted nature of scientific information.
- Building test collections for reproducible BIR.
- System support for literature search and recommendation.

Dataset development for bibliographic research

We especially invite descriptions of running projects and ongoing work as well as contributions from industry. Papers that investigate multiple themes directly are especially welcome.

Submission Details

All submissions must be written in English following the CEURART 1-column paper style (6 to 12 pages, please see below) and should be submitted as PDF files to EasyChair. All submissions will be reviewed by at least two independent reviewers. Please be aware of the fact that at least one author per paper needs to register for the workshop and attend the workshop to present the work. In case of no-show the paper (even if accepted) will be deleted from the proceedings AND from the program.

Page limits

Full paper: 12 pages excluding references
Short paper: 6 pages excluding references

Workshop proceedings will be deposited online in the CEUR workshop proceedings publication service (ISSN 1613-0073) - this way the proceedings will be permanently available and citable (digital persistent identifiers and long term preservation).

Aim of the Workshop

Searching for scientific information is a long-lived information need. In the early 1960s, Salton (1963) was already striving to enhance information retrieval by including clues inferred from bibliographic citations. The development of citation indexes pioneered by Garfield (1955) proved determinant for such a research endeavour at the crossroads between the nascent fields of Information Retrieval (IR) and Bibliometrics [Bibliometrics refers to the statistical analysis of the academic literature (Pritchard, 1969) and plays a key role in scientometrics: the quantitative analysis of science and innovation (Leydesdorff & Milojevic, 2015)]. The pioneers who established these fields in Information Science---such as Salton and Garfield---were followed by scientists who specialised in one of these (White & McCain, 1998), leading to the two loosely connected fields we know of today.

The purpose of the BIR workshop series founded in 2014 is to tighten up the link between IR and Bibliometrics. We strive to get the ‘retrievalists’ and ‘citationists’ (White & McCain, 1998) active in both academia and the industry together, who are developing search engines and recommender systems such as ArnetMiner, CiteSeerX, Google Scholar, Microsoft Academic Search, and Semantic Scholar, just to name a few.

These bibliometric-enhanced IR systems must deal with the multifaceted nature of scientific information by searching for or recommending academic papers, patents, venues (i.e., conferences or journals), authors, experts (e.g., peer reviewers), references (to be cited to support an argument), and datasets. The underlying models harness relevance signals from keywords provided by authors, topics extracted from the full-texts, coauthorship networks, citation networks, and various classifications schemes of science.

Bibliometric-enhanced IR is a hot topic whose recent developments made the news---see for instance the Initiative for Open Citations (Shotton, 2018) and the Google Dataset Search (Castelvecchi, 2018) launched on September 4, 2018. We believe that BIR@ECIR is a much needed scientific event for the ‘retrievalists’ and ‘citationists’ to meet and join forces pushing the knowledge boundaries of IR applied to literature search and recommendation.

Castelvecchi, D.: Google unveils search engine for open data [News & Comment]. Nature (2018). doi:10.1038/d41586-018-06201-x

Garfield, E.: Citation indexes for science: A new dimension in documentation through association of ideas. Science 122(3159), 108–111 (1955). doi:10.1126/science.122.3159.108

Leydesdorff, L., Milojević, S.: Scientometrics. In: Wright, J.D. (ed.) International Encyclopedia of the Social & Behavioral Sciences, vol. 21, pp. 322–327. Elsevier, 2nd edn. (2015). doi:10.1016/b978-0-08-097086-8.85030-8

Pritchard, A.: Statistical bibliography or bibliometrics? [Documentation notes]. Journal of Documentation 25(4), 348–349 (1969). doi:10.1108/eb026482

Salton, G.: Associative document retrieval techniques using bibliographic information. Journal of the ACM 10(4), 440–457 (1963). doi:10.1145/321186.321188

Shotton, D.: Funders should mandate open citations. Nature 553(7687), 129 (2018). doi:10.1038/d41586-018-00104-7

White, H.D., McCain, K.W.: Visualizing a discipline: An author co-citation analysis of Information Science, 1972–1995. Journal of the American Society for Information Science 49(4), 327–355 (1998). doi:b57vc7

Program Committee

Muhammad Kamran Abbasi, University of Singh, Pakistan
Karam Abdulahhad, GESIS - Leibniz institute for the Social Sciences, Germany
Iana Atanassova, Université de Bourgogne Franche-Comté, France
Joeran Beel, Trinity College Dublin, Ireland
Patrice Bellot, Aix-Marseille Université - CNRS (LSIS), France
Marc Bertin, Université Claude Bernard Lyon 1, France
José Borbinha, Universidade de Lisboa, Portugal
Norbert Fuhr, University of Duisburg-Essen, Germany
Bela Gipp, UC Berkeley, California and University of Konstanz/Wuppertal, Germany
Nazli Goharian, Georgetown University, France
Gilles Hubert, Université de Toulouse, France
Najko Jahn, State and University Library Göttingen, Germany
Kokil Jaidka, University of Pennsylvania, USA
Roman Kern, Know-Center GmbH, Austria
Marijn Koolen, Royal Netherlands Academy of Arts and Sciences, NL
Rob Koopman, OCLC Research, The Netherlands
Cyril Labbé, Univ. Grenoble Alpes, France
Anne Lauscher, University of Mannheim, Germany
Haiko Lietz, GESIS – Leibniz Institute for the Social Sciences, Germany
Haiming Liu, University of Bedfordshire, UK
Peter, Mutschke, GESIS – Leibniz Institute for the Social Sciences, Germany
Rajesh Piryani, South Asian University, India
Jason Portenoy, University of Washington, USA
Philipp Schaer TH Köln (University of Applied Sciences), Germany
Ralf Schenkel, Trier University, Germany
Vivek Kumar Singh, Banaras Hindu University, India
Cassidy Sugimoto, Indiana University Bloomington, USA
Lynda Tamine, IRIT, France
Mike Thelwall, University of Wolverhampton, UK
Ulrich Thiel, Fraunhofer IPA-PAMB, Germany
Nees Jan Van Eck, Leiden University, Centre for Science and Technology Studies, NL
Cyril Verluise, Paris School of Economics, France
Dietmar Wolfram, University of Wisconsin-Milwaukee, USA

Organisers

Ingo Frommholz, University of Wolverhampton, UK

Philipp Mayr, GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany

Guillaume Cabanac, University of Toulouse, France

Suzan Verberne, Leiden University, the Netherlands

Contact

For any enquiries please send an email to bir2021@easychair.org.