Master Degree Thesis
Computer Vision for Comics Understanding
Comics represent an art form and a medium of extraordinary cultural richness, capable of documenting social, stylistic, and narrative trends across different eras. Despite their global reach and cultural value, the automatic analysis of comic content remains an open and challenging problem for the computer vision community. Unlike photographs or text documents, comics are an inherently multimodal medium in which text, drawing, color, and narrative structure combine according to complex visual conventions, such as panels, speech bubbles, sound effects, and action sequences, that require a simultaneous understanding of both graphical and semantic elements. Tasks such as panel detection and segmentation, character recognition, text localization and recognition in speech bubbles, and narrative sequence comprehension pose challenges that models developed for natural scenes struggle to address effectively. This thesis aims to study and develop deep learning and computer vision models for one or more aspects of comic understanding, exploring modern architectures such as convolutional networks, visual transformers, and multimodal models to contribute to the automatic understanding of this unique medium.
Computer Vision for Document Image Restoration
The preservation and promotion of historical documentary heritage represent a challenge of paramount cultural and scientific importance. Manuscripts, notarial archives, antique printed texts, and administrative documents constitute irreplaceable testimonies of collective memory, but they are often affected by severe forms of deterioration, such as yellowing of the paper, fading ink, stains, tears, and bleed-through, which compromise their legibility and hinder their digital use. The manual restoration of these materials is a slow, costly, and difficult-to-scale process, and classical image processing techniques prove inadequate in the face of the heterogeneity and complexity of real-world degradation. This thesis aims to address this problem through the study and development of deep learning models for tasks such as denoising, binarization, super-resolution, and inpainting of documentary images, with a particular focus on architectures based on convolutional networks and transformers.
Deep Learning Models for Assisted Diagnosis in Ophthalmology
Ophthalmic diseases are among the leading causes of visual impairment and blindness worldwide. Early diagnosis is crucial for effective treatment, but it requires a high level of specialized expertise and the analysis of complex clinical images. These characteristics make computer-aided diagnosis a field of research of great clinical and technological significance. Unlike other computer vision application domains, ophthalmic images present unique challenges: small but clinically significant anatomical structures, inter- and intra-patient variability, acquisition artifacts, and a marked imbalance between pathological classes and healthy subjects in available datasets. Deep learning models developed for natural images do not transfer directly to this context and require specific adaptation strategies. This thesis aims to study and develop models based on deep learning and computer vision for the assisted diagnosis of one or more ophthalmological conditions, addressing aspects such as the classification of disease severity, the segmentation of retinal structures and lesions, and the detection of anomalies in OCT images. Modern architectures will be explored, including deep convolutional networks, vision transformers, and foundational models adapted to the medical domain, with a focus on explainability techniques that make predictions interpretable and clinically useful. The ultimate goal is to contribute to the development of reliable decision-support tools that can be integrated into a real-world clinical workflow.