Lead Researcher at the Language Technologies Unit

Barcelona Supercomputing Center (BSC)

Universitat Pompeu Fabra (UPF)

My Google Scholar page

My ORCID page

I am an applied data scientist with over 14 years of experience in machine learning with an emphasis on deep learning for natural language processing.

I lead the Data team curating data for the creation of foundational multilingual language models.

My research is focused on multilingual concept extraction, detection of lexical collocations, corpus analysis, dialogue management, aspect-oriented sentiment analysis, and natural language generation.

Positions held

Since 2024

Barcelona Supercomputing Center, Language Technologies Unit, Barcelona

Head of Data for AI

Since 2018

Pompeu Fabra University, The Natural Language Processing Group, Barcelona

Lead Researcher (information extraction and natural language generation)

2016-2018

Pompeu Fabra University, Department of Economics and Business, Barcelona

Postdoctoral Researcher (NLP in economics of mass media)

2012-2016

Technologies for Systems Analysis LLC, Moscow

Applied Scientist (computational linguistics and information retrieval)

2011-2016

Russian Academy of Sciences, Computational Linguistics Lab, Moscow

Researcher, Ph.D. candidate, holding a national scholarship for excellent research

2010

Ulm University, Ulm, Germany

Visiting Researcher

2009-2011

Siberian Federal University (SFU), Krasnoyarsk, Russia

Junior Researcher

Education

2015

Ph.D., Computer Science, Russian Academy of Sciences, Moscow

2011

M.S., Applied Mathematics and Computer Science, cum laude, SFU, Krasnoyarsk

Technical skills

Programming Languages: Python, C++

Tools: Transformers, LangChain, Torch, TensorFlow, OpenNMT, Turi Create, NetworkX, Hive, Redash, HPC

Academic record

Author of 50 publications, h-index 9, citations 250, conference ranks A, B

Participant in 20 R&D projects including dialogue agents, opinion-driven design, disaster management

Reviewer at top-tier conferences and journals in CL and AI (ACL, CoNLL, COLING, AAAI, IJCAI, EMNLP, Computational Intelligence)

Participant in writing research proposals and dissemination of results at conferences and invited seminars

Co-organizer of shared tasks (CheckThat! Lab on Checkworthiness, Subjectivity, Persuasion, Roles, Authorities and Adversarial Robustness at CLEF 2024)

Awards

Runner-up team in feedback comment generation for writing learning, RIKEN, Tokyo (2022)

Winner team at Flow.ai hackathon in language generation within INLG conference, Tilburg (2018)

Runner-up at a €15K public competition in AI poem generation held by Sberbank of Russia (2018)

Selected papers

The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora

Shvets A, Wanner L.

Mathematics, 10(20), 3831, 1-21. (2022)

Mathematics – one of the leading journals in mathematics and computer science

Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Anke LE, Shvets A, Mohammadshahi A, Henderson J, Wanner L.

The 11th Joint Conference on Lexical and Computational Semantics (*SEM 2022), 89-100. (2022)

*SEM – one of the leading conferences in the semantics of natural language and its computational modelling

Targets and Aspects in Social Media Hate Speech

Shvets A, Fortuna P, Soler J, Wanner L.

The 5th Workshop on Online Abuse and Harms (WOAH 2021), the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), 179-190. (2021)

WOAH – leading workshop in computational methods for detecting and modelling online abuse

Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation

Shvets A, Wanner L.

International Conference on Knowledge Engineering and Knowledge Management (EKAW 2020), 120-135. Springer, Cham. (2020)

EKAW – one of the leading conferences in knowledge engineering and knowledge management

Automatic Related Work Section Generation: Experiments in Scientific Document Abstracting

AbuRa’ed A, Saggion H, Shvets A, Bravo À.

Scientometrics, 125(3), 3159-3185. (2020)

Scientometrics – leading international journal for quantitative aspects of scientific research, communication in science, and science policy

Sentence Packaging in Text Generation from Semantic Graphs as a Community Detection Problem

Shvets A, Mille S, Wanner L.

The 11th International Conference on Natural Language Generation (INLG 2018), 350-359. (2018)

INLG – one of the leading conferences in the area of natural language generation

Other publications are listed here.

Funded research projects

1) Discourse Planning in Natural Language Text Generation (Russian Foundation for Basic Research, "Starting Grant", € 20.000, 2018-2020). Role: Principal Investigator, research within a team of 3 members.

Project officer's conclusion:

The results obtained are of significant practical importance they can be applied and will provide an increase in the accuracy and coherence of the generated texts in a wide range of applied tasks: question-answer systems, text annotation, machine translation, etc. The project executors are well aware of the current state of affairs in this area, use the most advanced methods and approaches.

All research was carried out in accordance with the standards accepted in computational linguistics. The report is very detailed, it surpasses the reports of the projects of more "senior" competitions in quality.

2) ReSilence: Retune the Soundscape of future cities through art and science collaboration (HORIZON-RIA, € 2.826.500, 2022-2025). Role: UPF team coordinator, research and development in emotion-oriented entity extraction and multilingual personalized report generation.

3) MindSpaces: art-driven adaptive outdoors and indoors design (H2020, € 4.182.625, 2019-2022). Role: UPF team coordinator, research and development in concept extraction, concept relation detection, and sentiment analysis for modelling the language of art and architecture and populating multimodal knowledge bases.

4) WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals in the EU (H2020, € 4.272.870, 2020-2023). Role: coordination, research, and development in dialogue planning, slot filling, and clarification dialogue strategies learning.

5) V4Design: visual and textual content re-purposing FOR architecture, Design and video virtual reality games (H2020, € 3.937.850, 2018-2021). Role: research and development in multilingual information extraction for ontology population and analysis of opinions in social media reviews.

6) xR4DRAMA: Extended Reality For DisasteR management And Media planning (H2020, € 2.318.500, 2020-2022). Role: research and development in geolocation identification in a transcribed speech during disaster events and extraction of key concepts for report generation.

7) beAWARE: Enhancing decision support and management services in extreme weather climate events (H2020, € 6.725.209, 2017-2019). Role: research and development in geolocation identification in crisis-related tweets.

8) CONNEXIONs: InterCONnected NEXt-Generation Immersive IoT Platform of Crime and Terrorism DetectiON, PredictiON, InvestigatiON, and PreventiON Services (H2020, € 4.999.390, 2018-2022). Role: research and development in detecting concepts for modelling the profiles of the authors in specific multilingual forums.

9) TENSOR: Retrieval and Analysis of Heterogeneous Online Content for Terrorist Activity Recognition (H2020, € 5.579.894, 2016-2019). Role: research and development in detecting concepts in terrorist-generated multilingual textual content for modelling the author profiles.

10) Social Media, Political Participation, and Accountability (H2020, ERC Starting Grant Project, € 1.170.625, 2016-2021). Role: research and development in multimodal social media data analysis, graph clustering, topic modelling, and text classification for comparative analysis of online and offline behaviour of large communities.

Support in PhD supervision

1) Tomara Gotkova (Université de Lorraine): “The Lexicon of the Environment and Green Chemistry in Ordinary Discourse. Using Social Networks as Corpora”

Contribution: Assistance in a semi-automatic compilation of lexicon using a concept extraction model

2) Ahmed AbuRa’ed (UPF): “Automatic Generation of Descriptive Related Work Reports”

Contribution: Transferring knowledge about pointer-generator neural networks for text generation and assistance with the creation of the dataset

3) Paula Fortuna (UPF): “Re-thinking Large Scale Hate Speech Identification: Beyond Common NLP Conventions and Supervised Machine Learning”

Contribution: Assistance with designing deep learning models for information extraction and clustering, help with computational experiments

Teaching

UPF, 2023-2024, Erasmus Mundus joint Master in Artificial Intelligence (EMAI), "Natural Language Interaction", Lecturer

UPF, 2023-2024, master's program in Intelligent Interactive Systems (IIS), "Natural Language Interaction", Lecturer

UPF, 2022-2023, master's program in Intelligent Interactive Systems (IIS), "Natural Language Interaction", Lecturer

UPF, 2022-2023, undergraduate course, "Introduction to Natural Language Processing Techniques for Everyday Applications", Teaching Assistant

UPF, 2021-2022, undergraduate course, "Introduction to Natural Language Processing Techniques for Everyday Applications", Teaching Assistant

UPF, 2020-2021, undergraduate course, "Introduction to Natural Language Processing Techniques for Everyday Applications", Teaching Assistant

UPF, 2019-2020, undergraduate course, "Introduction to Natural Language Processing Techniques for Everyday Applications", Teaching Assistant

UPF, 2018-2019, undergraduate course, "Natural Language Processing", Teaching Assistant

Contacts

Universitat Pompeu Fabra

Tànger, 122-140

08018 Barcelona, Spain

e-mail: Alexander.Shvets (at) upf.edu

Phone: (+34) 935 42 2569