Resources and software

This is the github of the UNIOR NLP Research Group: you will find here all the linguistic resources and the software developed by the group.

EU Datathon - Final event

Maggie wins the third prize at EUDATATHON'22

After two rounds of pre-selections among 156 proposals from 38 countries, the UNIOR NLP Research Group of the University of Naples L'Orientale obtained the third place for Challenge 4: A Europe Digital for the Digital Age at the EUDATATHON'22, the prestigious annual competition that took place in Brussels on October 20, 2022 on the reuse of open data, organized by the Publications Office of the European Union.

Taking full advantage of the potential offered by open data and applying Natural language processing techniques in the field of computational linguistics, the Group proposed "Across Europe with MAGGIE", a virtual assistant that guides users in discovering the beauties of European cultural heritage, offering personalized recommendations based on user preferences, real-time chat and geolocation services.

For more information, an interview with the research group is available on:

Here you find the presentation of "Across Europe with MAGGIE"


UNIOR4NLP is a system which took part in the NLP4FUN task of the Evalita 2018 evaluation campaign (Basile et al., 2018). The goal of this task is to design a solver for “La Ghigliottina”, the final game of the popular Italian TV quiz show “L’Eredità”. The game involves a single player, who is given a set of five words (clues), each one linked with an unknown sixth word that represents the solution to the game. For example, given the set of clues [ fighting, gun, roof, eater, set ] the solution is fire, because: the roof is on fire is a title of a famous song, while fire fighting, fire a gun, fire-eater, and set something on fire are fixed word constructions. UNIOR4NLP relies on the assumption that Multiword Expressions (MWEs) play an important role in solving the game: given a set of clues, the system outputs the solution word which forms the strongest connections with all of the clues. UNIOR4NLP is available on Twitter (@UNIOR4NLP) and on Telegram (

PlagioBot is a multi-player game to promote creative writing: participants have to come up with a plausible continuation of a beginning of a sentence taken from a book, and have to guess the original continuation among those being suggested.

The game has been extended to fit a classroom setting where the teacher enters the beginning of a sentence and the students propose plausible continuations. In addition, to the “continuation” exercise, the teacher can also use the same game paradigm to undergo “fill-the-gap” exercises and “translations” exercises.

EmojitalianoBot is an open tool to build an Italian emoji dictionary. Search for translations from and into emoji.

The bot currently features:

  • inline queries: Type @emojitalianobot and a word in any Telegram conversation, and it will suggest a set of emojis you can send with that word.

  • The Emoji Column: A game of memory and intuition. Type /start in @emojitalianobot conversation and click on Gioca!

@emojitalianobot is a project by: Francesca Chiusaroli, Johanna Monti, Federico Sangati

Please help us to make known this bot by inviting friends and vote it on Telegramitalia, here!

@EmojiWorldBot is a multilingual dictionary that uses Emoji as a pivot for contributors among dozens of diverse languages.

The bot currently features:

  • emoji-to-word and word-to-emoji conversion for more than 70 languages imported from the Unicode tables (see

  • a tagging game for people to contribute to the expansion of these dictionaries or the creation of new ones for any additional language.

  • inline queries: type @EmojiWorldBot and an emoji tag in any Telegram conversation, and it will suggest a set of emojis you can send with that tag.

Please help us expanding your language with new tags by playing the tagging game and invite new friends to use the bot.

Rate and Review @EmojiWorldBot on Telegram Bot Store.

@EmojiWorldBot is a free public service produced by Francesca Chiusaroli, Johanna Monti, Federico Sangati, Martin Benjamin and Sina Mansour at Kamusi Project International and EPFL (Switzerland).

The PILLAR (Parallel ItaLian engLish ARchaeological) Corpus¹ is an Italian-English parallel corpus composed of bi-texts from the archaeological domain in the form of museums and archaeological sites' brochures, leaflets, guides and websites.

Texts are crawled from the web and belong to a time span ranging from 2006 to 2020.

It consists of 200k tokens in Italian and 200k tokens in English.

Click here to access the PILLAR Corpus

¹ It was collected within the scope of Giulia Speranza's PhD Thesis titled "From Unstructured Data to Terminological Resources in the Domain of Archaeology: Translation Quality and Formal Representation" (submitted in 2022) University of Naples "L'Orientale" - UNIOR NLP Research Group

Idiomatica: a dictionary app of Italian idiomatic expressions for foreign learners

The complexity of idiomatic expressions but also their centrality in spontaneous speech has been underlined by several scientific studies. To provide support to users in various communication situations, such as language classes or daily interaction, Idiomatica, a prototype of a dictionary of Italian idiomatic expressions addressed to non-native speakers, has been developed for smartphones.

Idiomatica combines descriptive richness with a simple but innovative layout. Instead of the 'scrollable' formats commonly used on smartphones, the user is in fact offered a clickable interaction '. If you want to try the mock up: or use the QR Code