Data Mining &
Information Retrieval Lab.
Research Topics
a non exhaustive list ...
Efficient and Effective Learning to Rank
Web search engines, and other ranking systems, use very complex models in order to estimate the relevance of a document w.r.t. a given user query. These models are made of thousands regression trees, and their evaluation is computationally expensive. We aim to develop new Machine Learning (ML) algorithms for the construction of high-quality models that are also efficient at exploitation time.
Major Publications:
Federico Marcuzzi, Claudio Lucchese, Salvatore Orlando. LambdaRank Gradients are Incoherent. CIKM 2023.
Federico Marcuzzi, Claudio Lucchese, Salvatore Orlando. Filtering out Outliers in Learning to Rank. ICTIR 2022.
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, Rossano Venturini: QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees. SIGIR 2015: 73-82. (Best Paper) (ACM Notable Article)
(Tutorial) Lucchese, C., Nardini, F. M., Pasumarthi, R. K., Bruch, S., Bendersky, M., Wang, X., Oosterhuis, H., Jagerman, R., and de Rijke, M. Learning to rank in theory and practice: From gradient boosting to neural networks and unbiased learning. In SIGIR ’19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019), ACM, pp. 1419–1420.
Adversarial Machine Learning
Machine Learning (ML) is increasingly used in several applications and different contexts. When ML is leveraged to ensure system security, such as in spam filtering and intrusion detection, everybody acknowledges the need of training ML models resilient to adversarial manipulations. To date, research on adversarial ML has mostly focused on deep neural networks. Despite their effectiveness in so called non-perceptual scenarios, decision tree ensembles have received only limited attention by the security and machine learning communities. We aim at filling this gap by investigating novel algorithms for robust ensemble learning as well as novel solutions for the evaluation of models' robustness.
Major Publications:
Stefano Calzavara, Claudio Lucchese, Federico Marcuzzi, Salvatore Orlando: Feature partitioning for robust tree ensembles and their certification in adversarial scenarios. EURASIP J. Inf. Secur. 2021(1): 12 (2021)
Stefano Calzavara, Lorenzo Cazzaro, Claudio Lucchese: AMEBA: An Adaptive Approach to the Black-Box Evasion of Machine Learning Models. AsiaCCS 2021: 292-306
Stefano Calzavara, Claudio Lucchese, Gabriele Tolomei, Seyum Assefa Abebe, Salvatore Orlando: Treant: training evasion-aware decision trees. Data Min. Knowl. Discov. 34(5): 1390-1420 (2020)
Explainable AI
EXplainable AI (XAI) research aims at answering the ineludible need for AI systems of being trustworthy, fair, and understandable. Among the models that are most effective, we limit our interest to forests of decision trees such as Gradient Boosted Decision Trees (GBDTs). These are very accurate in several application scenarios, but their large size (up to thousands of decision trees) makes them a black box that is impossible to be interpreted by a human. We aim at building models that explainable and fair in classification and ranking.
Major Publications:
Claudio Lucchese, Giorgia Minello, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Alberto Veneri. Can Embeddings Analysis Explain Large Language Model Ranking? CIKM 2023
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Alberto Veneri. ILMART: Interpretable Ranking with Constrained LambdaMART. ACM SIGIR 2022.
Seyum Assefa Abebe, Claudio Lucchese, Salvatore Orlando: EiFFFeL: Enforcing Fairness in Forests by Flipping Leaves. ACM SAC 2021.