Lead Researcher at the Language Technologies Unit
Barcelona Supercomputing Center (BSC)
Universitat Pompeu Fabra (UPF)
I am an applied data scientist with over 14 years of experience in machine learning with an emphasis on deep learning for natural language processing.
I lead the Data team curating data for the creation of foundational multilingual language models.
My research is focused on multilingual concept extraction, detection of lexical collocations, corpus analysis, dialogue management, aspect-oriented sentiment analysis, and natural language generation.
Positions held
Since 2024
Barcelona Supercomputing Center, Language Technologies Unit, Barcelona
Head of Data for AI
Since 2018
Pompeu Fabra University, The Natural Language Processing Group, Barcelona
Lead Researcher (information extraction and natural language generation)
2016-2018
Pompeu Fabra University, Department of Economics and Business, Barcelona
Postdoctoral Researcher (NLP in economics of mass media)
2012-2016
Technologies for Systems Analysis LLC, Moscow
Applied Scientist (computational linguistics and information retrieval)
2011-2016
Russian Academy of Sciences, Computational Linguistics Lab, Moscow
Researcher, Ph.D. candidate, holding a national scholarship for excellent research
2010
Ulm University, Ulm, Germany
Visiting Researcher
2009-2011
Siberian Federal University (SFU), Krasnoyarsk, Russia
Junior Researcher
Education
2015
Ph.D., Computer Science, Russian Academy of Sciences, Moscow
2011
M.S., Applied Mathematics and Computer Science, cum laude, SFU, Krasnoyarsk
Technical skills
Programming Languages: Python, C++
Tools: Transformers, LangChain, Torch, TensorFlow, OpenNMT, Turi Create, NetworkX, Hive, Redash, HPC
Academic record
Author of 50 publications, h-index 9, citations 250, conference ranks A, B
Participant in 20 R&D projects including dialogue agents, opinion-driven design, disaster management
Reviewer at top-tier conferences and journals in CL and AI (ACL, CoNLL, COLING, AAAI, IJCAI, EMNLP, Computational Intelligence)
Participant in writing research proposals and dissemination of results at conferences and invited seminars
Co-organizer of shared tasks (CheckThat! Lab on Checkworthiness, Subjectivity, Persuasion, Roles, Authorities and Adversarial Robustness at CLEF 2024)
Awards
Runner-up team in feedback comment generation for writing learning, RIKEN, Tokyo (2022)
Winner team at Flow.ai hackathon in language generation within INLG conference, Tilburg (2018)
Runner-up at a €15K public competition in AI poem generation held by Sberbank of Russia (2018)
Selected papers
Shvets A, Wanner L.
Mathematics, 10(20), 3831, 1-21. (2022)
Mathematics – one of the leading journals in mathematics and computer science
Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers
Anke LE, Shvets A, Mohammadshahi A, Henderson J, Wanner L.
The 11th Joint Conference on Lexical and Computational Semantics (*SEM 2022), 89-100. (2022)
*SEM – one of the leading conferences in the semantics of natural language and its computational modelling
Targets and Aspects in Social Media Hate Speech
Shvets A, Fortuna P, Soler J, Wanner L.
The 5th Workshop on Online Abuse and Harms (WOAH 2021), the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), 179-190. (2021)
WOAH – leading workshop in computational methods for detecting and modelling online abuse
Concept Extraction Using Pointer–Generator Networks and Distant Supervision for Data Augmentation
Shvets A, Wanner L.
International Conference on Knowledge Engineering and Knowledge Management (EKAW 2020), 120-135. Springer, Cham. (2020)
EKAW – one of the leading conferences in knowledge engineering and knowledge management
Automatic Related Work Section Generation: Experiments in Scientific Document Abstracting
AbuRa’ed A, Saggion H, Shvets A, Bravo À.
Scientometrics, 125(3), 3159-3185. (2020)
Scientometrics – leading international journal for quantitative aspects of scientific research, communication in science, and science policy
Sentence Packaging in Text Generation from Semantic Graphs as a Community Detection Problem
Shvets A, Mille S, Wanner L.
The 11th International Conference on Natural Language Generation (INLG 2018), 350-359. (2018)
INLG – one of the leading conferences in the area of natural language generation
Other publications are listed here.
Funded research projects
1) Discourse Planning in Natural Language Text Generation (Russian Foundation for Basic Research, "Starting Grant", € 20.000, 2018-2020). Role: Principal Investigator, research within a team of 3 members.
Project officer's conclusion:
The results obtained are of significant practical importance – they can be applied and will provide an increase in the accuracy and coherence of the generated texts in a wide range of applied tasks: question-answer systems, text annotation, machine translation, etc. The project executors are well aware of the current state of affairs in this area, use the most advanced methods and approaches.
All research was carried out in accordance with the standards accepted in computational linguistics. The report is very detailed, it surpasses the reports of the projects of more "senior" competitions in quality.
2) ReSilence: Retune the Soundscape of future cities through art and science collaboration (HORIZON-RIA, € 2.826.500, 2022-2025). Role: UPF team coordinator, research and development in emotion-oriented entity extraction and multilingual personalized report generation.
3) MindSpaces: art-driven adaptive outdoors and indoors design (H2020, € 4.182.625, 2019-2022). Role: UPF team coordinator, research and development in concept extraction, concept relation detection, and sentiment analysis for modelling the language of art and architecture and populating multimodal knowledge bases.
4) WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals in the EU (H2020, € 4.272.870, 2020-2023). Role: coordination, research, and development in dialogue planning, slot filling, and clarification dialogue strategies learning.
5) V4Design: visual and textual content re-purposing FOR architecture, Design and video virtual reality games (H2020, € 3.937.850, 2018-2021). Role: research and development in multilingual information extraction for ontology population and analysis of opinions in social media reviews.
6) xR4DRAMA: Extended Reality For DisasteR management And Media planning (H2020, € 2.318.500, 2020-2022). Role: research and development in geolocation identification in a transcribed speech during disaster events and extraction of key concepts for report generation.
7) beAWARE: Enhancing decision support and management services in extreme weather climate events (H2020, € 6.725.209, 2017-2019). Role: research and development in geolocation identification in crisis-related tweets.
8) CONNEXIONs: InterCONnected NEXt-Generation Immersive IoT Platform of Crime and Terrorism DetectiON, PredictiON, InvestigatiON, and PreventiON Services (H2020, € 4.999.390, 2018-2022). Role: research and development in detecting concepts for modelling the profiles of the authors in specific multilingual forums.
9) TENSOR: Retrieval and Analysis of Heterogeneous Online Content for Terrorist Activity Recognition (H2020, € 5.579.894, 2016-2019). Role: research and development in detecting concepts in terrorist-generated multilingual textual content for modelling the author profiles.
10) Social Media, Political Participation, and Accountability (H2020, ERC Starting Grant Project, € 1.170.625, 2016-2021). Role: research and development in multimodal social media data analysis, graph clustering, topic modelling, and text classification for comparative analysis of online and offline behaviour of large communities.
Support in PhD supervision
1) Tomara Gotkova (Université de Lorraine): “The Lexicon of the Environment and Green Chemistry in Ordinary Discourse. Using Social Networks as Corpora”
Contribution: Assistance in a semi-automatic compilation of lexicon using a concept extraction model
2) Ahmed AbuRa’ed (UPF): “Automatic Generation of Descriptive Related Work Reports”
Contribution: Transferring knowledge about pointer-generator neural networks for text generation and assistance with the creation of the dataset
3) Paula Fortuna (UPF): “Re-thinking Large Scale Hate Speech Identification: Beyond Common NLP Conventions and Supervised Machine Learning”
Contribution: Assistance with designing deep learning models for information extraction and clustering, help with computational experiments
Teaching
UPF, 2023-2024, Erasmus Mundus joint Master in Artificial Intelligence (EMAI), "Natural Language Interaction", Lecturer
UPF, 2023-2024, master's program in Intelligent Interactive Systems (IIS), "Natural Language Interaction", Lecturer
UPF, 2022-2023, master's program in Intelligent Interactive Systems (IIS), "Natural Language Interaction", Lecturer
UPF, 2022-2023, undergraduate course, "Introduction to Natural Language Processing Techniques for Everyday Applications", Teaching Assistant
UPF, 2021-2022, undergraduate course, "Introduction to Natural Language Processing Techniques for Everyday Applications", Teaching Assistant
UPF, 2020-2021, undergraduate course, "Introduction to Natural Language Processing Techniques for Everyday Applications", Teaching Assistant
UPF, 2019-2020, undergraduate course, "Introduction to Natural Language Processing Techniques for Everyday Applications", Teaching Assistant
UPF, 2018-2019, undergraduate course, "Natural Language Processing", Teaching Assistant
Contacts
Universitat Pompeu Fabra
Tànger, 122-140
08018 Barcelona, Spain
e-mail: Alexander.Shvets (at) upf.edu
Phone: (+34) 935 42 2569