Selected Publications

Disclaimer: This site contains PDF files of articles that are covered by copyright. You may browse the articles at your convenience in the same spirit as you may read a journal or conference proceedings in a public library. Retrieving, copying, distributing these files, entirely or in parts, may violate copyright protection laws.

2023

Supporting Decision-Making Process on Higher Education Dropout by Analyzing Academic, Socioeconomic, and Equity Factors through Machine Learning and Survival Analysis Methods in the Latin American Context

D. A. Gutierrez-Pachas, G. García-Zanabria, E. Cuadros-Vargas, G. Cámara-Chávez and E. Gomez-Nieto

Education Sciences

Abstract: The prediction of university dropout is a complex problem, given the number and diversity of variables involved. Therefore, different strategies are applied to understand this educational phenomenon, although the most outstanding derive from the joint application of statistical approaches and computational techniques based on machine learning. Student Dropout Prediction (SDP) is a challenging problem that can be addressed following various strategies. On the one hand, machine learning approaches formulate it as a classification task whose objective is to compute the probability of belonging to a class based on a specific feature vector that will help us to predict who will drop out. Alternatively, survival analysis techniques are applied in a time-varying context to predict when abandonment will occur. This work considered analytical mechanisms for supporting the decision-making process on higher education dropout. We evaluated different computational methods from both approaches for predicting who and when the dropout occurs and sought those with the most-consistent results. Moreover, our research employed a longitudinal dataset including demographic, socioeconomic, and academic information from six academic departments of a Latin American university over thirteen years. Finally, this study carried out an in-depth analysis, discusses how such variables influence estimating the level of risk of dropping out, and questions whether it occurs at the same magnitude or not according to the academic department, gender, socioeconomic group, and other variables.

2022

SDA-Vis: A Visualization System for Student Dropout Analysis Based on Counterfactual Exploration

G. García-Zanabria, D. A. Gutierrez-Pachas, G. Cámara-Chávez, J. Poco and E. Gomez-Nieto

Applied Sciences

Abstract: High and persistent dropout rates represent one of the biggest challenges for improving the efficiency of the educational system, particularly in underdeveloped countries. A range of features influence college dropouts, with some belonging to the educational field and others to non-educational fields. Understanding the interplay of these variables to identify a student as a potential dropout could help decision makers interpret the situation and decide what they should do next to reduce student dropout rates based on corrective actions. This paper presents SDA-Vis, a visualization system that supports counterfactual explanations for student dropout dynamics, considering various academic, social, and economic variables. In contrast to conventional systems, our approach provides information about feature-perturbed versions of a student using counterfactual explanations. SDA-Vis comprises a set of linked views that allow users to identify variables alteration to chance predefined students situations. This involves perturbing the variables of a dropout student to achieve synthetic non-dropout students. SDA-Vis has been developed under the guidance and supervision of domain experts, in line with some analytical objectives. We demonstrate the usefulness of SDA-Vis through case studies run in collaboration with domain experts, using a real data set from a Latin American university. The analysis reveals the effectiveness of SDA-Vis in identifying students at risk of dropping out and proposes corrective actions, even for particular cases that have not been shown to be at risk with the traditional tools that experts use.

VGGFace-Ear: An~Extended Dataset for Unconstrained Ear Recognition

S. Ramos-Cooper,  E. Gomez-Nieto and G. Camara-Chavez

Sensors

Abstract: Recognition using ear images has been an~active field of~research in~recent years. Besides~faces and fingerprints, ears have a~unique structure to identify people and can be captured from~a~distance, contactless, and without the~subject's cooperation. Therefore, it represents an~appealing choice for building surveillance, forensic, and security applications. However, many techniques used in~those applications---e.g., convolutional neural networks (CNN)---usually demand large-scale datasets for training. This research work introduces a~new dataset of~ear images taken under uncontrolled conditions that present high inter-class and intra-class variability. We built this dataset using an~existing face dataset called the~VGGFace, which gathers more than 3.3 million images. in~addition, we perform ear recognition using transfer learning with CNN pretrained on~image and face recognition. Finally, we performed two experiments on~two unconstrained datasets and reported our results using Rank-based metrics.

Exploring scientific literature by textual and image content using DRIFT

X. Pocco, T. da Silva, J. Poco, L. G. Nonato and  E. Gomez-Nieto

Computer & Graphics Journal


Extended version of SIBGRAPI 2021 paper

Abstract: Digital libraries represent the most valuable resource for storing, querying, and retrieving scientific literature. Traditionally, the reader/analyst aims to compose a set of articles based on keywords, according to his/her preferences, and manually inspect the resulting list of documents. Except for the articles which share citations or common keywords, the results retrieved will be limited to those which fulfill a syntactic match. Besides, if instead of having an article as a reference, the user has an image, the process of finding and exploring articles with similar content becomes infeasible. This paper proposes a visual analytic methodology for exploring and analyzing scientific document collections that consider both textual and image content. The proposed technique relies on combining multiple Content-Based Image Retrieval (CBIR) components and multidimensional projection to map the documents to a visual space based on their similarity, thus enabling an interactive exploration. Moreover, we extend its analytical capabilities with visual resources to display complementary information on selected documents that uncover hidden patterns and semantic relations. We evidence the effectiveness of our methodology through three case studies and a user evaluation, which attest to its usefulness during the process of scientific collections exploration.

2021

DRIFT: An interactive visualization method for scientific literature exploration based on textual and image content

X. Pocco, J. Poco Medina, M. Viana, R. de Abreu, L. G. Nonato and  E. Gomez-Nieto

Conference on Graphics, Patterns and Images


Honorable Mention in Visualization 

Abstract: Exploring digital libraries of scientific articles is an essential task for any research community. The typical approach is to query the articles' data based on keywords and manually inspect the resulting list of documents to identify which papers are of interest. Besides being time-consuming, such a manual inspection is quite limited, as it can hardly provide an overview of articles with similar topics or subjects. Moreover, accomplishing queries based on content other than keywords is rarely doable, impairing finding documents with similar images. In this paper, we propose a visual analytic methodology for exploring and analyzing scientific document collections that consider the content of scientific documents, including images. The proposed approach relies on a combination of Content-Based Image Retrieval (CBIR) and multidimensional projection to map the documents to a visual space based on their similarity, thus enabling an interactive exploration. Additionally, we enable visual resources to display complementary information on selected documents that uncover hidden patterns and semantic relations. We show the effectiveness of our methodology through two case studies and a user evaluation, which attest to the usefulness of the proposed framework in exploring scientific document collections.

A comparative study of WHO and WHEN prediction approaches for early identification of university students at dropout risk

D. A. Gutierrez-Pachas, G. García-Zanabria, A. J. Cuadros -Vargas, G. Cámara-Chávez, J. Poco and E. Gomez-Nieto

Latin American Computing Conference

Abstract: Reducing the students' dropout is one of the biggest challenges faced by educational institutions, especially in underdeveloped countries. Identification of the student with the highest risk of dropping out is generally used to apply corrective actions (WHO). Therefore, it is also important to determine WHEN a student will drop out, which is fundamental to planning preventive actions. In this work, we perform a study to quantitatively compare several approaches to address the early identification of dropout students in universities. We categorize our study into three main methods families, i.e., analytical methods, traditional classification methods, and probabilistic methods. The first is exploited at preprocessing step for selecting significant variables into the dropout identification task. The second uses machine learning models to classify students into dropout prone or non-dropout prone classes. The third family uses survival models to determine when the student would desert. To evaluate the predictive capacity of the classification models, the Kappa coefficient was incorporated into the usual machine learning metrics and shows that Kappa is handy for evaluating performance in unbalanced data. Similarly, in the survival models, the concordance index was applied to evaluate the predictive capacity. Our approach was applied over a real data set of Peruvian university graduate students to identify when and who will drop out.

Political Discourses, Ideologies and Online Coalitions in the Brazilian Congress on Twitter during 2019

E. Garcia, E. Gomez-Nieto, P. Benettii, G. Higa and M. Alvarez

New Media and Society Journal 

Abstract: The aim of this research is to describe the pattern of interactions of Brazilian legislators on Twitter during 2019 in the construction of political discourses. Based on 20,076 replies during 2019 posted on Twitter by 514 Brazilian legislators, we conducted descriptive analysis of legislators’ Twitter profiles, social network analyses from their interactions, and content analysis of the messages. We found that 1) there are large disparities between legislators in the use of Twitter; 2) the pattern of interactions depicted five clusters defined by political affinities; 3) each cluster had different features regarding their composition and impact; 4) the centrality of the legislators within the network was positively associated with public endorsement on Twitter; and 5) the topics of messages within the clusters reinforce discourses aligned to political ideologies. We argue that the pattern of interactions on Twitter allow to identify online coalitions that reinforce particular discourses within the Brazilian parliament.

ICE: A visual analytic tool for interactive clustering ensembles generation

J. Castro, G. Camara-Chavez and E. Gomez-Nieto

The 36th ACM/SIGAPP Symposium On Applied Computing (SAC'21 - DM Track)

Abstract: Clustering methods are the most used algorithms for unsupervised learning. However, there is no unique optimal approach for all datasets since different clustering algorithms produce different partitions. To overcome this issue of selecting an appropriate technique and their corresponding parameters, cluster ensemble strategies are used for improving accuracy and robustness by a weighted combination of two or more approaches. However, this process is often carried out almost in a blind manner, testing different combinations of methods and assessing if its performance is beneficial for the defined purpose. Thus, the procedure for selecting the best combination tests many clustering ensembles until the desired result is achieved. This paper proposes a novel analytic tool for clustering ensemble generation, based on quantitative metrics and interactive visual resources. Our approach allows the analysts to display different results from state-of-the-art clustering methods and analyze their performance based on specific metrics and visual inspection. Based on their requirements/experience, the analysts can interactively assign weights to the different methods to set their contributions and manage (create, store, compare, and merge) such as for ensembles. Our approach's effectiveness is shown through a set of experiments and case studies, attesting its usefulness in practical applications.

2020

Mirante: A visualization tool for analyzing urban crimes

G. Garcia-Zanabria, E. Gomez-Nieto, J. Silveira, J. Poco, M. Nery, S. Adorno, and L. G. Nonato

Conference on Graphics, Patterns and Images, 33. (SIBGRAPI) - Nov/2020

Best Paper Award in Visualization 

Abstract: Visualization assisted crime analysis tools used by public security agencies are usually designed to explore large urban areas, relying on grid-based heatmaps to reveal spatial crime distribution in whole districts, regions, and neighborhoods. Therefore, those tools can hardly identify micro-scale patterns closely related to crime opportunity, whose understanding is fundamental to the planning of preventive actions. Enabling a combined analysis of spatial patterns and their evolution over time is another challenge faced by most crime analysis tools. In this paper, we present \emph{Mirante}, a crime mapping visualization system that allows spatiotemporal analysis of crime patterns in a street-level scale. In contrast to conventional tools, Mirante builds upon street-level heatmaps and other visualization resources that enable spatial and temporal pattern analysis, uncovering fine-scale crime hotspots, seasonality, and dynamics over time. Mirante has been developed in close collaboration with domain experts, following rigid requirements as scalability and versatile to be implemented in large and medium-sized cities. We demonstrate the usefulness of Mirante throughout case studies run by domain experts using real data sets from cities with different characteristics. With the help of Mirante, the experts were capable of diagnosing how crime evolves in specific regions of the cities while still being able to raise hypotheses about why certain types of crime show up.

2019

Generating audiovisual summaries from literary works using emotion analysis

D. F. Milon-Flores, Jose Ochoa-Luna and E. Gomez-Nieto  

Conference on Graphics, Patterns and Images, 32. (SIBGRAPI) - Oct/2019

Abstract: Literature work reading is an essential activity for human communication and learning. However, several relevant tasks as selection, filter or analyze in a high number of such works become complex. For dealing with this requirement, several strategies are proposed to rapidly inspect substantial amounts of text,  or retrieve information previously read, exploiting graphical, textual or auditory resources. In this paper, we propose a methodology to generate audiovisual summaries by the combination of emotion-based music composition and graph-based animation. We applied natural language processing algorithms for extracting emotions and characters involved in literary work. Then, we use the extracted information to compose a musical piece to accompany the visual narration of the story aiming to convey the extracted emotion. For that, we set important musical features as chord progressions, tempo, scale, and octaves, and we assign a set of suitable instruments. Moreover, we animate a graph to sum up the dialogues between the characters in the literary work. Finally,  to assess the quality of our methodology, we conducted two user studies that reveal that our proposal provides a high level of understanding over the content of the literary work besides bringing a pleasant experience to the user.

Similarity-based visual exploration of very large georeferenced multidimensional datasets

R. Peralta-Aranibar, Cicero A. L. Pahins, Joao L. D. Comba and E. Gomez-Nieto

The 34th ACM/SIGAPP Symposium On Applied Computing (SAC'19 - GIA Track) 

Abstract: Big data visualization is a main task for data analysis. Due to its complexity in terms of volume and variety, very large datasets are unable to be queried for similarities among entries in traditional Database Management Systems. In this paper, we propose an effective approach for indexing millions of elements with the purpose of performing single and multiple visual similarity queries on multidimensional data associated with geographical locations. Our approach makes use of Z-Curve algorithm to map into 1D space considering similarities between data. We support our proposal by comparisons with state-of-the-art methods in the literature. Additionally, we present a set of results using real data of different sources and we analyze the insights obtained from the interactive exploration.

2018

Extracting Visual Encodings from Map Chart Images with Color-encoded Scalar Values

G. A. Mayhua, E. Gomez-Nieto, J. Heer  and Jorge Poco  

Conference on Graphics, Patterns and Images, 31. (SIBGRAPI) - Oct/2018

Abstract: Map charts are used in diverse domains to show geographic data (eg, climate research, oceanography, business analysis, etc). These charts can be found in news articles, scientific papers, and on the Web. However, many map charts are available only as bitmap images, hindering machine interpretation of the visualized data for indexing and reuse. We propose a pipeline to recover both the visual encodings and underlying data from bitmap images of geographic maps with color-encoded scalar values. We evaluate our results using map images from scientific documents, achieving high accuracy along each step of our proposal. In addition, we present two applications: data extraction and map reprojection to enable improved visual representations of map charts.

2016

iStar (i*): An Interactive Star Coordinates Approach for High-Dimensional Data Exploration

G. Garcia Zanabria, L.G. Nonato and E. Gomez-Nieto

Computer & Graphics Journal, Volume 60, November 2016, Pages 107–118

Presented at SIBGRAPI 2016

Abstract: Star Coordinates is an important visualization method able to reveal patterns and groups from multidimensional data  while still showing the impact of data attributes in the formation of such patterns and groups. Despite its usefulness, Star Coordinates bears limitations that impair its use in several scenarios. For instance, when the number of data dimensions is high, the resulting visualization becomes cluttered, hampering the joint analysis of attribute importance and group/pattern formation. In this paper, we propose a novel method that renders Star Coordinates a feasible alternative to analyze high-dimensional data. The proposed method relies on a clustering mechanism to group attributes in order to mitigate visual clutter. Clustering can be performed automatically as well as interactively, allowing the analysis of how particular groups of attributes impact on the radial layout, thus assisting users in the understanding of data. The effectiveness of our approach is shown through a set of experiments and case studies, which attest its usefulness in practical applications.

Dealing with Multiple Requirements in Geometric Arrangements

E. Gomez-Nieto, W.Casaca, D. Motta, I. Hartmann, G. Taubin and L.G. Nonato

IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 3, pp. 1223-1235, March 1 2016.

Invited TVCG paper presented at IEEE Vis 2016

Abstract: Existing algorithms for building layouts from geometric primitives are typically designed to cope with requirements such as orthogonal alignment, overlap removal, optimal area usage, hierarchical organization, among others. However, most techniques are able to tackle just a few of those requirements simultaneously, impairing their use and flexibility. In this work we propose a novel methodology for building layouts from geometric primitives that concurrently addresses a wider range of requirements. Relying on multidimensional projection and mixed integer optimization, our approach arranges geometric objects in the visual space so as to generate well structured layouts that preserve the semantic relation among objects while still making an efficient use of display area. Moreover, scalability is handled through a hierarchical representation scheme combined with navigation tools. A comprehensive set of quantitative comparisons against existing geometry-based layouts and applications on text, image, and video data set visualization prove the effectiveness of our approach. 

2015

Understanding Legal Large Datasets through Visual Analytics

E. Gomez-Nieto, W. Casaca, I. Hartmann and L.G. Nonato

WVis at SIBGRAPI 2015, IEEE Press, 2015.

Abstract: Existing Databases containing more than one million of legal documents, each with dozens of variables, pose special issues as to the detection of patterns of interest for judges and prosecutors. Most state-of-the-art methods rely on automatic schemes to classify/group data according to their similarity in the hope of uncover useful information, neglecting the knowledge and skill of specialists in the information extraction process. In this work we propose an visual analytics tool made up of a set of linked-views to explore 25 years of data from the Brazilian Supreme Court, all with the purpose of extracting information that traditional methods could not reveal.

Layout arrangement for Data Visualization

E. Gomez-Nieto

Doctoral Colloquium at IEEE Vis 2015

Abstract: Existing algorithms for building layouts from geometric primitives to visualize data typically designed to cope with requirements such as orthogonal alignment, overlap removal, optimal area usage, hierarchical organization, among others. However, most techniques are able to tackle just a few of those requirements simultaneously, imparinig their use and flexibility. In this proposal, are summarized some works developed during this two years of PhD studies, showing a contribution in state-of-the-art in terms of semantic preservation layout arrangement and dynamic updating of data representations. These projects were accomplished (and some of them still on investigation) in collaboration with scientists from Brown University (USA), Fundação Getúlio Vargas in Rio de Janeiro (Brazil) and University of São Paulo (Brazil).

2014

Semantically Aware Dynamic Layouts

E. Gomez-Nieto, D. Motta and L.G. Nonato

SIBGRAPI 2014, IEEE Press, 2014.

Abstract: Arranging geometric primitives in a two-dimensional layout is a typical problem in graphics and visualization applications. Most existing approaches are either not flexible enough to allow users modify the layout according to their interest or unable to maintain the semantic relation between geometric instances during user manipulation. The few interactive semantic aware layout construction methods relies on computationally costly energy functional that demand intricate GPU implementations to enable real-time user interventions. In this work we propose a novel semantic aware layout construction technique that relies on a simple mathematical formulation that does not require intricate computational implementations. Moreover, users are allowed to freely interact with the layout. Our approach is supported by interactive multidimensional projection methods, which enforces that similar instances to be close to each other during layout updates. The presented results show the versatility, effectiveness, and simplicity of our approach when building semantic aware user tailored layouts.

Similarity Preserving Snippet-Based Visualization of Web Search Results

E. Gomez-Nieto, F. San Roman, P. Pagliosa, W. Casaca, E. S. Helou, M. C. F. Oliveira and L.G. Nonato

IEEE Transactions on Visualization and Computer Graphics, IEEE Press, Vol. 20, No. 3, pp. 457-470, 2014.

Invited TVCG Paper presented at IEEE Vis 2014

Abstract: Internet users are very familiar with the results of a search query displayed as a ranked list of snippets. Each textual snippet shows a content summary of the referred document and a link to it. This display has many advantages, e.g., it affords easy navigation and is straightforward to interpret. Several search tasks would be easier if users were shown an overview of the returned documents, organized so as to reflect how related they are, content-wise, being that the main goal of the visualization method proposed in this work. Call ProjSnippet, the proposed method combines the neighborhood preservation capability of multidimensional projections with the familiar snippet-based representation. The multidimensional projection ensures that similar snippets are neighbors in the visual space, but not avoiding overlaps. Overlapping is handled by defining an energy functional that considers both the amount snippets overlap each other and the preservation of the neighborhood structure as given in the projected layout. The resulting visualization conveys a global view of the query results while highlighting visual groupings of related results.

2013

Mixed Integer Optimization for Layout Arrangement

E. Gomez-Nieto, W. Casaca, L.G. Nonato and G. Taubin

SIBGRAPI 2013, IEEE Press, 2013.

Best Paper Award in Graphics and Visualization

Abstract: Arranging geometric entities in a two-dimensional layout is a common task for most information visualization applications, where existing algorithms typically rely on heuristics to position shapes such as boxes or discs in a visual space.Geometric entities are used as a visual resource to convey information contained in data such as textual documents or videos and the challenge is to place objects with similar content close to each other while still avoiding overlap. In this work we present a novel mechanism to arrange rectangular boxes in a two-dimensional layout which copes with the two properties above, that is, it keeps similar object close and prevents overlap. In contrast to heuristic techniques, our approach relies on mixed integer quadratic programming, resulting in well structured arrangements which can easily be tuned to take different forms.We show the effectiveness of our methodology through a comprehensive set of comparisons against state-of-art methods. Moreover, we employ the proposed technique in video data visualization, attesting its usefulness in a practical application.

Spectral image segmentation using image decomposition and inner product-based metric

W. Casaca, A. Paiva, E. Gomez-Nieto, P. Joia, and L.G. Nonato

Journal of Mathematical Imaging and Vision, 2013.

Abstract: Image segmentation is an indispensable tool in computer vision applications, such as recognition, detection and tracking. In this work, we introduce a novel user-assisted image segmentation technique which combines image decomposition, inner product-based similarity metric, and spectral graph theory into a concise and unified framework. First, we perform an image decomposition to split the image into texture and cartoon components. Then, an affinity graph is generated and the weights are assigned to its edges according to a gradient-based inner-product function. From the eigenstructure of the affinity graph, the image is partitioned through the spectral cut of the underlying graph. The computational effort of our framework is alleviated by an image coarsening process, which reduces the graph size considerably. Moreover, the image partitioning can be improved by interactively changing the graph weights by sketching. Finally, a coarse-to-fine interpolation is applied in order to assemble the partition back onto the original image. The efficiency of the proposed methodology is attested by comparisons with state-of-art spectral segmentation methods through a qualitative and quantitative analysis of the results.

2012

Colorization by Multidimensional Projection

W. Casaca, E. Gomez-Nieto, C. Ferreira, G. Tavares, P. Pagliosa, F. Paulovich,, L.G. Nonato and A. Paiva

SIBGRAPI 2012, IEEE Press, 2012.

Abstract: Most image colorization techniques assign colors to grayscale images by embedding image pixels into a highdimensional feature space and applying a color pattern to each cluster of high-dimensional data. A main drawback of such an approach is that, depending on texture patterns and image complexity, clusters of similar pixels can hardly be defined automatically, rendering existing methods prone to fail. In this work we present a novel approach to colorize grayscale images that allows for user intervention. Our methodology makes use of multidimensional projection to map high-dimensional data to a visual space. User can manipulate projected data in the visual space so as to further improve clusters and thus the colorization result. Different from other methods, our interactive tool is ease of use while still being flexible enough to enable local color modification. We show the effectiveness of our approach through a set of examples and comparisons against existing colorization methods.

Class-specific metrics for multidimensional data projection applied to CBIR

P. Joia, E. Gomez-Nieto, J.B. Neto, W. Casaca, G. Botelho, A. Paiva and L.G. Nonato

The Visual Computer, 2012

Abstract: Content-based image retrieval is still a challenging issue due to the inherent complexity of images and choice of the most discriminant descriptors. Recent developments in the field have introduced multidimensional projections to burst accuracy in the retrieval process, but many issues such as introduction of pattern recognition tasks and deeper user intervention to assist the process of choosing the most discriminant features still remain unaddressed. In this paper we present a novel framework to CBIR that combines pattern recognition tasks, class-specific metrics and multidimensional projection to devise an effective and interactive image retrieval system. User interaction plays an essential role in the computation of the final multidimensional projection from which image retrieval will be attained. Results have shown that the proposed approach outperforms existing methods, turning out to be a very attractive alternativefor managing image data sets.

2011

Projection-based image retrival from class-specific metrics

P. Joia,  E. Gomez-Nieto,  G. Botelho, J.B. Neto, A. Paiva and L.G. Nonato

SIBGRAPI 2011, IEEE Press, pp. 125-132, 2011.

Best Papers Award in Computer Graphics and Visualization

Abstract: Content-based image classification/retrieval based on image descriptors has become an essential component in most database systems. However, most existing systems do not provide mechanisms that enable interactive multi-objective queries, hampering the user experience. In this paper we present a novel methodology capable of accomplishing multi-objective searches while still being interactive. Our approach relies on a combination of class-specific metrics and multidimensional projection to devise an effective and interactive image retrieval system. Besides allowing visual exploration of image data sets, the provided results and comparisons show that the proposed approach outperforms existing methods, turning out to be a very attractive alternative for managing image data sets.