Publications
Selected Papers
(Complete list of papers available at Google Scholar profile)
Multimodal early fusion operators for temporal video scene segmentation tasks
@ Multimedia Tools and Applications journal (2023)
Antonio A. R. Beserra, Rudinei Goularte
The Temporal Video Scene Segmentation (TVSS) task is still an open problem presenting challenges in the Multimedia Analysis area. Current approaches employ multimodality, fusing features from different video data modalities as a way to improve segmentation accuracy. Generally, the progress presented in the literature improves the accuracy at the cost of increasing the fusion processing complexity. In this paper, we propose the application of early multimodal fusion operators in the mid-level feature space. We show that, at the mid-level, operators’ accuracy presents no statistically significant difference compared with complex techniques. The results allow us to conclude the contributions are twofold. First, the operators make the fusion a separable step in the TVSS pipeline, easing technique development. Second, we show that, at the mid-level feature space, TVSS researchers can put efforts otherwise than in the fusion since simple operators provide statistically similar results to those from more complex fusion techniques.
On the Use of Early Fusion Operators on Heterogeneous Graph Neural Networks for One-Class Learning
@ WebMedia'23 (Brazilian Symposium on Multimedia and Web - 2023)
Marcos P. S. Gôlo, Marcelo I. de Moraes, Rudinei Goularte, Ricardo M. Marcacini
Multimodal data fusion generates robust and unified representations considering supplementary and complementary information from different modalities, such as audio, image, and text. However, existing studies do not investigate multimodal fusion operators for heterogeneous graphs, which are powerful representations for modeling real-world data through a powerful structure that considers the different relations between different node types. Those representations are suited for important multimedia-related tasks, such as classification, recommendation, summarization, web sensing, and content-based retrieval. This work presents a graph neural network (GNN) method for heterogeneous graphs that explores different types of early fusion operators to deal with multiple modalities. Moreover, we evaluated the proposal’s performance with different early fusion operators considering one-class learning, a popular learning approach for real-world applications. A statistical analysis of the experimental results shows that early fusion operators improve the F1-Score when considering GNNs from heterogeneous graphs. Thus, we argue that our early-fusion operators’ proposal in heterogeneous graph neural networks leads to improved performance and is also a competitive alternative to the well-often-used concatenation technique or costly hand-based approaches of combining different modalities.
Video Summarization using Text Subjectivity Classification
@ WebMedia'22 (Brazilian Symposium on Multimedia and Web - 2022)
Leonardo G. Moraes, Ricardo M. Marcacini, Rudinei Goularte
Video summarization has attracted researchers’ attention because it provides a compact and informative video version, supporting users and systems to save efforts in searching and understanding content of interest. The presence or the absence of subjectivity can be explored as a relevance clue, helping to bring video summaries closer to the final user’s expectations. However, despite this potential, there is a gap on how to capture subjectivity information from videos. This paper investigates video summarization through subjectivity classification from video transcripts. We propose a multilingual machine learning model trained to deal with subjectivity classification in multiple domains. Such a model can be used to provide subjectivity as a content selection criterion in the video summarization task, filtering out segments that are not relevant to a video domain of interest.
SIRA - An efficient method for retrieving stereo images from anaglyphs
@ Signal Processing: Image Communication journal (2020)
Lucas F. Kunze, Rudinei Goularte, Elaine P. M. de Sousa
Anaglyph reversion aims to recover the best possible approximation of a stereo pair of images from an anaglyph. Possible applications include a range of practical situations like enabling visualization of legacy anaglyphs on the Web, saving storage/transmission bandwidth by encoding stereo pairs as anaglyphs before stereo visualization or enabling users to enjoy stereo visualization using any available device. The recovering process faces a challenging issue: the anaglyphic stereo matching. Different from regular stereo images, corresponding pixels in the left and right views of an anaglyph have dissimilar intensity values, lowering photometric consistency and thus turning the usual stereo matching algorithms not suitable. In this work we propose SIRA, an efficient method for anaglyph reversion, introducing a novel approach to find stereo correspondences based on a pixel descriptor developed to deal with anaglyphic photometric differences. The descriptor's core idea is to model stereo pairs as time series, extracted from both views of an anaglyph. The results show SIRA achieves equivalent state-of-the-art image quality while consuming 26 times less computational resources, on average.
Content selection criteria for news multi-video summarization based on human strategies
@ International Journal on Digital Libraries (2020)
Tamires T. S. Barbieri, Rudinei Goularte
In the recent years, the multimedia data volume produced and available for access has increased continuously and quickly, notably video content. This context has also increased the overload information problem: finding content of interest in the huge amount of available options. So, efficient schemes for content access are needed. Automatic video summarization is a research field that deals with this problem. Furthermore, the current multimedia systems make available several videos related to the same topic but having, each one, a piece of unique complementary information. This fact highlights the need for multi-video summarization to deal with users’ interest in being informed about a subject from a set of videos without being obligated to watch the whole set. However, the literature analysis shows that human strategies are not considered to define criteria used to automatically select video segments that will compose a summary and the focus of techniques has been the identification of common information in different videos. In this work, we investigate human strategies for news multi-video summarization. The results of the study with real users uncover relevant criteria to develop summaries, with potential to increase their semantics and bring them closer to users’ perception.