Projects

Current projects

TRADISAN - Against Disinformation in Health Care through fake news detection on social media

The project aims to tackle misinformation in the health care field particularly among young people, who, through social media, are more exposed to fake news, which they tend to believe in, as proven by the recent experiment by Greškovičová et al. (2022). Through NLP techniques and the creation of a domain-specific multimodal dataset for the Italian language, TRADISAN aims to develop an automatic granular classification system of news in the health-medical field for Italian, in order to offer users an assessment of the reliability of information on social media.

Efficient access to the constantly growing quantities of data, especially of language data, largely relies on advances in data science. This domain includes natural language processing (NLP), which is currently booming, to the benefit of many end users. However, this optimization-based technological progress poses an important challenge: accounting for and fostering language diversity. UniDive builds upon previous experience of European networks and projects which provided a proof of concept for language modelling and processing, unified across many languages but preserving their diversity. The main benefits of the Action will include, on the theoretical side, a better understanding of language universals, and on the practical side, language resources and tools covering, in a unified framework, a bigger variety of language phenomena in a large number of languages, including low-resourced and endangered ones.

SMACH - Semantic Multilingual Access to Cultural Heritage (Rappresentazione semantica cross-linguistica per applicazione di accesso multilingue ad archivi digitali dei patrimoni museali)

SMACH has been funded under the action PON Ricerca e Innovazione 2014 - 2020 - Action I.2 - D.D. n. 407 of 27 February 2018 "Attraction and International Mobility". - MIUR CODE: AIM- 1_1825887-1. This project, which is part of the National Strategy for Intelligent Specialisation 2014-2020 (SNSI) - Area 6, aims at encouraging the development of an innovative service with high added value for cultural heritage, i.e. multilingual access to cultural heritage, thus expanding the potential audience of visitors.

Crowd for the Environment: Monitoraggio degli sversamenti illegali attraverso l’impiego sinergico di tecnologie avanzate e delle segnalazioni spontanee del cittadino -  PON Ricerca e Innovazione 2014-2020 - Smart, Secure and Inclusive Communities 

Innovative technologies for the treatment of heterogeneous  and incomplete information sources and their integration in the monitoring processes  of the environmental  criticalities of anthropic origins, with particular reference to the problems of the abusive waste spills. Partners: Analist Group s.r.l., CIRA S.c.p.A., Università degli Studi di Cassino e del Lazio Meridionale, Expert Systems S.p.A., Major Bit Consulting s.r.l., AI Tech s.r.l., MapSat s.r.l.

COST Action CA16204 - Distant Reading for European Literary History 

This Action’s challenge is to create a vibrant and diverse network of researchers jointly developing the resources and methods necessary to change the way European literary history is written. Grounded in the Distant Reading paradigm (i.e. using computational methods of analysis for large collections of literary texts), the Action will create a shared theoretical and practical framework to enable innovative, sophisticated, data-driven, computational methods of literary text analysis across at least 10 European languages. Fostering insight into cross-national, large-scale patterns and evolutions across European literary traditions, the Action will facilitate the creation of a broader, more inclusive and better-grounded account of European literary history and cultural identity. 

COST Action CA18209 - Nexus Linguarum:  European network for Web-centred linguistic data science 

The main aim of this Action is to promote synergies across Europe between linguists, computer scientists, terminologists, and other stakeholders in industry and society, in order to investigate and extend the area of linguistic data science. In order to support the study of linguistic data science in the most efficient and productive way, the construction of a mature holistic ecosystem of multilingual and semantically interoperable linguistic data is required at Web scale. Such an ecosystem, unavailable today, is needed to foster the systematic cross-lingual discovery, exploration, exploitation, extension, curation and quality control of linguistic data. The combination of linked data (LD) technologies, natural language processing (NLP) techniques and multilingual language resources (LRs) (bilingual dictionaries, multilingual corpora, terminologies, etc.), is considered as a potential to enable such an ecosystem that will allow for transparent information flow across linguistic data sources in multiple languages, by addressing the semantic interoperability problem. 

COST Action CA16105 - enetCollect : European Network for Combining Language Learning with Crowdsourcing Techniques 

EnetCollect aims at performing the groundwork to set into motion a Research and Innovation trend combining the well-established domain of Language Learning with recent and successful crowdsourcing approaches. EnetCollect aims at unlocking a crowdsourcing potential available for all languages and at triggering an innovation breakthrough for the production of language learning material, such as lesson or exercise content, and language-related datasets such as, among others, NLP language resources.

COST Action CA17124 - Digital forensics: evidence analysis via intelligent systems and practices

The Challenge of the proposed COST Action consists in creating a Network for exploring the potential of the application of Artificial Intelligence and Automated Reasoning in the Digital Forensics field, and creating synergies between these fields. Specifically, the challenge is to address the Evidence Analysis phase, where evidence about possible crimes and crimes perpetrators collected from various electronic devices (by means of specialized software, and according to specific regulations) must be exploited so as to reconstruct possible events, event sequences and scenarios related to a crime. Evidence Analysis results are then made available to law enforcement, investigators, public prosecutors, lawyers and judges: it is therefore crucial that the adopted techniques guarantee reliability and verifiability, and that their result can be explained to the human actors.

The main aim of the project is to bridge the gap between linguistic precision and computational efficiency in NLP applications by investigating the syntactic and semantic representation of MWEs in language resources, the integration of MWE analysis in syntactic parsing and translation technology. Expected deliverables include mainly enhanced monolingual language resources (lexicons, grammars and annotated corpora) in Italian or multilingual linguistic resources with the Italian language. This project is a spin-off of the European IC1207 COST action, PARSEME. 

Multiword Units in Machine Translation and Translation Technology

Multi-word units are word combinations which range from compounds such as ‘credit card’ to idiomatic expressions such as “’it is raining cats and dogs’ and are acknowledged as one of the major challenges in natural language processing (NLP), because of their lexical, syntactic, semantic, pragmatic and/or statistical idiosyncracies.

In spite of the relative progress achieved in translation technology with the adoption of neural approaches and in the processing for particular types of units such as verb-particle constructions, the identification, interpretation and translation of multi-word units in general still represent open challenges, both from a theoretical and a practical point of view. The idiosyncratic morpho-syntactic, semantic and translational properties of multi-word units pose many obstacles even to human translators, mainly because of intrinsic ambiguities, structural and lexical asymmetries between languages, and, finally, cultural differences.

In recent years, growing attention has been paid to integrating multi-word units (MWUs) in machine translation and translation technology tools, as it has been acknowledged that it is not possible to create large scale language solutions without properly handling MWUs of all types. As a matter of fact, researchers are now addressing the problems posed by MWU processing and translation using different formalisms and techniques, such as automatic recognition of MWUs in a monolingual or bilingual setting; alignment and paraphrasing techniques; development and use of (handcrafted) monolingual and bilingual language resources; creation of annotated monolingual and parallel corpora, development of strategies for handling syntactically flexible units in language analysis and translation modules, development of evaluation projects. Partners: Research Group in Computational Linguistics (University of Wolverhampton),  Grupo de investigación "Lexicografía y Traducción"( Universidad de Málaga)


Past projects

COST Action IC1207, PARSEME (PARSing and Multi-word Expressions) - Towards linguistic precision and computational efficiency in natural language processing

This Action aims at increasing and enhancing the support of the European multilingual heritage from Information and Communication Technologies (ICT). The Action focuses on the major bottleneck of these applications: Multi-Word Expressions (MWEs), i.e. sequences of words with unpredictable properties such as "to count somebody in" or "to take a haircut". A breakthrough in their modelling and processing can only result from a coordinated effort of multidisciplinary experts in different languages. Fourteen European languages will be addressed from a cross-theoretical and cross-methodological perspective, necessary for coping with current fragmentation issues. Expected deliverables include enhanced language resources and tools, as well as recommendations of best practices for cutting-edge MWE-aware language models. The Action will lead to a better understanding of the nature of MWEs. It will establish a long-lasting collaboration within a multilingual network of MWE specialists. It will pave the way towards competitive next generation text processing tools which will pay greater attention to language phenomena.