Research

Background

In many applications, data contains structured (or unstructured) information that is multi-dimensional and multi-level in nature, such as the ones in areas like e-commerce, telecommunications, retail, stocks, and healthcare. Several research efforts have been made on exploring better computational methods for handling high multi-dimensional databases. The best attempt so far has been achieved by combining OLAP and data mining strategies. OLAP provides a data-driven way to explore and test what-if business/scientific scenarios by employing intrinsic multi-dimensional algorithms. Data mining provides a discovery-driven manner to explore databases, revealing interesting patterns by applying mixture computational models and algorithms from areas such as machine learning, information retrieval, artificial intelligence, and databases. In fact, data mining and OLAP complement each other, bringing out powerful techniques to allow practical and efficient multi-dimensional data analysis (MDA).

My major research interest relies on the development of computational methods and strategies for MDA and its applications in both scientific and business domain. During my PhD studies I have devised several Aggregation-based Mining Methods for MDA on a broad range of applications. These methods provided an efficient computational model. They have been applied successfully on real large data applications, particularly in Telecomm and Retail industry. Before my thesis defense I had the opportunity to be introduced to the genomics/bioinformatics area while I was a visiting scholar in the Pablo de Olavide University (Seville, Spain). I was quite excited about exploring machine learning on biological data analysis, and since that time I have been focused on exploring data mining and machine learning algorithms on scientific data streams. Currently, I am focusing on ensemble learning methods and its applications on scientific data analysis, particularly environmental and health analytics. I also have interest in exploring utility-based patterns in these domains, as well as sampling assumptions along with these Big Data.

I am looking forward to pursue a research career on the computational intelligence frontier, exploring data mining and machine learning algorithms on Data Analytics applications. I also plan to conduct research in this direction as a long-term goal.

Next I present research contributions to applied data analytics in i) environmental analytics, ii) health analytics and iii) business analytics.

Environmental Analytics

Environmental analytics can be defined as the challenge of devising robust machine learning and data mining methods for better exploration, understanding and awareness of "potential changes" in the environment which may have strong impact to all living things (including us!). Potential changes might be detected directly from i) statistical analysis of abnormal events from a broad range of geological, hydrological, meteorological stream data; ii) genomic data analysis allows us the possibility to investigate such abnormal events in a fine grain, molecular level. Microbial communities can be explored as interesting “bio-sensors” to detect perturbations in the whole environmental system.

We have been developing data analytics methods for multivariate data analysis of geo-oriented data, and network-based clustering approaches for the identification of essential microbial consortia by means of functional metagenomics analysis over high throughput sequencing data.

heatmap

-SANTOS, V.; Corrêa, L. H. S.; MEIGUINS, B.; OLIVEIRA, G.; ALVES, RONNIE Metagenomics-based signature clustering and interactive visualization analysis In: International Joint Conference on Neural Networks (IJCNN), 2018, Rio de Janeiro. 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 2018.

-NASCIMENTO, SIDNEY VASCONCELOS DO; MAGALHÃES, MARCELO MURAD; CUNHA, ROBERTO LISBOA; COSTA, PAULO HENRIQUE DE OLIVEIRA; ALVES, RONNIE CLEY DE OLIVEIRA; OLIVEIRA, GUILHERME CORRÊA DE; VALADARES, RAFAEL BORGES DA SILVA. Differential accumulation of proteins in oil palms affected by fatal yellowing disease. PLoS One, v.13, p.e0195538 - , 2018.

-LANES, ÉDER C.; POPE, NATHANIEL S.; Alves, Ronnie; CARVALHO FILHO, NELSON M.; GIANNINI, TEREZA C.; GIULIETTI, ANA M.; IMPERATRIZ-FONSECA, VERA L.; MONTEIRO, WALÉRIA; OLIVEIRA, GUILHERME; SILVA, AMANDA R.; SIQUEIRA, JOSÉ O.; SOUZA-FILHO, PEDRO W.; VASCONCELOS, SANTELMO; JAFFÉ, RODOLFO Landscape Genomic Conservation Assessment of a Narrow-Endemic and a Widespread Morning Glory From Amazonian Savannas. Frontiers in Plant Science, v.9, p.532 - , 2018.

-GUIMARÃES, JOSÉ TASSO FELIX ; CARREIRA, LÉA MARIA MEDEIROS ; ALVES, R ; MARTINS E SOUZA FILHO, PEDRO WALFIR ; GIANNINI, TEREZA CRISTINA ; MACAMBIRA, HIGOR JARDIM ; DA SILVA, EDILSON FREITAS ; DIAS, ANNA CHRISTINA RIO ; DA SILVA, CARLA BASTISTA ; ROMEIRO, LUIZA DE ARAÚJO ; RODRIGUES, TARCÍSIO MAGEVSKI . Pollen morphology of the Poaceae: implications of the palynological and paleoecological records of the southeastern Amazon in Brazil. PALYNOLOGY, v. 1, p. 1-13, 2017.

-GUIMARAES, J. T. F. ; REIS, L. ; FIGUEIREDO, M. ; RODRIGUES, T. ; Alves, Ronnie ; SOUZA, P. M. ; SILVA, M. S. ; SAHOO, PRAFULLA KUMAR ; GIANNINI, T. C. ; CARREIRA, L. . Modern pollen rain as a background for palaeoenvironmental studies in the Serra dos Carajás, southeastern Amazonia. HOLOCENE, p. 095968361668326, 2017.

-GIANNINI, T. C. ; GIULIETTI, A. ; HARLEY, R. ; VIANA, P. ; JAFFE, R. ; Alves, R. ; PINTO, C. ; MOTA, N. ; CALDEIRA, C. ; IMPERATRIZ-FONSECA, V. ; FURTINI, A. ; SIQUEIRA, J. . Selecting plant species for practical restoration of degraded lands using a multiple-trait approach. Austral Ecology (Print), 2016.

-GUIMARAES, J. T. F. ; REIS, L. ; FIGUEIREDO, M. ; RODRIGUES, T. ; Alves, Ronnie ; SOUZA, P. M. ; SILVA, M. S. ; SAHOO, PRAFULLA KUMAR ; GIANNINI, T. C. ; CARREIRA, L. . Modern pollen rain as a background for palaeoenvironmental studies of the Serra dos Carajás, southeastern Amazonia. The Holocene, 2016.

-CORREA, L., GOES, F.; ALVES, A.; ALVES, R. Functional network-oriented analysis of environmental metagenomics data. In the 3ème Colloque de Génomique Environnementale Montpellier (GE2015). Montpellier, October 26-28, 2015.

-GOES, F.; ALVES, R.; CORREA, L.; CHAPARRO, C. and THOM, L. A comparison of classification methods for gene prediction in metagenomics. In the International Workshop on New Frontiers in Mining Complex Patterns (NFmcp). The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Nancy, France, September 15-19, 2014.

-GOES, F.; ALVES, R.; CORREA, L.; CHAPARRO, C. and THOM, L. Towards an ensemble learning strategy for metagenomic gene prediction. In the Brazilian Symposium on Bioinformatics (BSB), Belo Horizonte, MG, Brazil, October 28-30, 2014. To appear in the Lecture Notes in Bioinformatics (LNBI), vol. 8826 Springer, pp.17-24.

-CORREA, L.; ALVES, R.; GOES, F.; CHAPARRO, C. and THOM, L. A pipeline for functional and visual analytics of microbial genetic networks. In the 2nd International Workshop on Dynamic Networks and Knowledge Discovery (DyNaK). The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Nancy, France, September 15-19, 2014.

-CORREA, L.; ALVES, R.; GOES, F.; CHAPARRO, C. and THOM, L. FUNN-MG: A metagenomic systems biology computational framework. In the Brazilian Symposium on Bioinformatics (BSB), Belo Horizonte, MG, Brazil, October 28-30, 2014. To appear in the Lecture Notes in Bioinformatics (LNBI), vol. 8826 Springer, pp.25-32.

-GUIMARAES, J.T.F; SOUZA-FILHO, P.W.M ; ALVES, R; DE SOUZA, E.B; DA COSTA, F.R; REIS, L. S.; SAHOO, P.K.; DE OLIVEIRA MANES, C-L.; SILVA JUNIOR, R.O.; OTI, D.; DALL'AGNOL, R. Source and distribution of pollen and spores in surface sediments of a plateau lake in southeastern Amazonia. Quaternary International, v. xx, p. 1-16, 2014.

-GUIMARAES, J.T.F; NOGUEIRA, A.C.R.; BANDEIRA, J.; SOARES, J.; ALVES, R.; KERN, A. Palynology of the middle miocene-pliocene novo remanso formation, central amazonia, Brazil. Ameghiniana, v. 52, p. 107-134, 2014.

-GUIMARAES, J.T.F. ; COHEN, M.C.L. ; FRANCA, M. C. ; PESSENDA, L.C.R ; SOUZA, E.J. ; NOGUEIRA, A.C.R. ; ALVES, R. . Recent effects of tidal and hydro-meteorological changes on coastal plains near the mouth of the Amazon River. Earth Surface Processes and Landforms (Print), v. 1, p. n/a-n/a, 2013.

Health Analytics

With the advances of high throughput technologies and storage facilities, biology has become an enormously data-rich subject. Data is generated in many flavors and follows particularities of the OMICs perspective adopted along experimental studies. For instance, genomics is the field of study dealing with genomes and it is mostly associated with the “static” view (the genes and where they are placed along the genome). The dynamic view is brought from the transcriptomics perspective, so gene expression and its regulation profiles. Finally, interactomics is usually associated to gene products, proteins, and other complex molecular interactions. Additionally, these interactions could also be seen as a huge graph network with multiples layers of links integrating distinct OMICs perspectives. The challenge of integrating and analyzing OMICs data can drive many interesting applications in health care systems, such as personalized medicine. We have been developing machine learning methods for ranking, clustering and prediction of gene markers along OMICs data.

-RUFFLE, FLORENCE ; AUDOUX, JEROME ; BOUREUX, ANTHONY ; BEAUMEUNIER, SACHA ; GAILLARD, JEAN-BAPTISTE ; BOU SAMRA, ELIAS ; MEGARBANE, ANDRE ; CASSINAT, BRUNO ; CHOMIENNE, CHRISTINE ; ALVES, R ; RIQUIER, SEBASTIEN ; GILBERT, NICOLAS ; LEMAITRE, JEAN-MARC ; BACQ-DAIAN, DELPHINE ; BOUGÉ, ANNE LAURE ; PHILIPPE, NICOLAS ; COMMES, THERESE . New chimeric RNAs in acute myeloid leukemia. F1000RESEARCH, v. 6, p. 1302, 2017.

-BEAUMEUNIER, S.; AUDOUX, J.; BOUREUX, A.; COMMES, T.; FRUFFLE, F.; ALVES, R. On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs. BioData Mining, 2016.

-RICHARD, F.D.; ALVES, R.; KAJAVA, A.V. A scoring tool for boundary determination between repetitive and non-repetitive protein sequences. Bioinformatics. 2016.

-BEAUMEUNIER, S., AUDOUX, J., BOUREAUX, A., COMMES, T., PHILIPPE, N., ALVES, R. The Role of Machine Learning in Finding Chimeric RNAs. 6th International Workshop on Biological Knowledge Discovery and Data Mining (BIOKDD 2015), Valencia, Spain, September 1-4, 2015. [site] Best Paper Award at BIOKDD-DEXA'15

-LICHTNOW, D.; ALVES, R.; PASTOR, O.; BURRIEL, V.; OLIVEIRA, J.P.M.: BION2SEL: An Ontology-Based Approach for the Selection of Molecular Biology Databases. In the Brazilian Symposium on Bioinformatics (BSB), Belo Horizonte, MG, Brazil, October 28-30, 2014. To appear in the Lecture Notes in Bioinformatics (LNBI), vol. 8826 Springer. 83-90.

-ALVES, R. Gene Expression Biomarkers, Ranking. Encyclopedia of System Biology, Editors-in-chief: Werner Dubitzky; Olaf Wolkenhauer; Kwang-Hyun Cho; Hiroki Yokota, Springer (print and online), v. 2, p. 791-795, 2013.

-ALVES, R. Biomarkers, Ranking. Encyclopedia of System Biology, Editors-in-chief: Werner Dubitzky; Olaf Wolkenhauer; Kwang-Hyun Cho; Hiroki Yokota, Springer (print and online), v. 2, p. 760-761, 2013.

-MENDOZA, MARIANA R. ; DA FONSECA, GUILHERME C. ; LOSS-MORAIS, GUILHERME ; ALVES, R; MARGIS, ROGERIO ; BAZZAN, ANA L. C. . RFMirTarget: Predicting Human MicroRNA Target Genes with a Random Forest Classifier. Plos One, v. 8, p. e70153, 2013.

-SIMAO, E.M.; SINIGAGLIA, M.; BUGS, C.A., CASTRO, M.A.A.; LIBRELOTTO, G. R.; ALVES, R.; MOMBACH, J. C. M. Induced genome maintenance pathways in pre-cancer tissues describe an anti-cancer barrier in tumor development. Molecular Biosystems (Print), 2012.

-ALVES, R. ; RODRIGUEZ-BAENA, D. S. ; AGUILAR-RUIZ, J. S. Gene Association Analysis: A Survey of Frequent Pattern Mining from Gene Expression Data. BRIEFINGS IN BIOINFORMATICS, p. 210-224, 2010.

-ALVES, R.; MENDES, MARCUS; BONNATO, DIEGO. A Network-Based Meta-analysis Strategy for the Selection of Potential Gene Modules in Type 2 Diabetes. In: 8th Brazilian Symposium on Bioinformatics, 2013, Recife. Advances in Bioinformatics and Computational Biology - Lecture Notes in Computer Science. New York: Springer International Publishing, 2013. v. 8213. p. 160-169.

-MENDONZA, M.; FONSECA, G.; MORAIS, G.; ALVES, R.; BAZZAN, A.; MARGIS, R. RFMirTarget: A Random Forest classifier for Human miRNA target gene prediction In: Brazilian Symposium on Bioinformatics, 2012, Campo Grande.

-LICHTNOW, D.; LEVIN, A.; ALVES, R.; CASTELLO, I. M.; PULIDO, L.; DOPAZO, J.; PASTOR, O.; OLIVEIRA, J.P.M. Using Metadata and Web Metrics to Create a Ranking of Genomic Databases. In: IADIS WWW/Internet 2011, 2011, Rio de Janeiro. IADIS WWW/Internet 2011.

-LICHTNOW, D.; ALVES, R. ; OLIVEIRA, J.P.M.; LEVIN, A.; PASTOR, O.; CASTELLO, I. M.; DOPAZO, J. Using Papers Citations for Selecting The Best Genomic Databases. In: 30th International Conference of the Chilean Computer Science Society, 2011, Curico. The Jornadas Chilenas de Computación (JCC), 2011.

-FERREIRA, P.; LIBRELOTTO, G.; ALVES, R. . Discovering Co-Relations on Research Topics and Authors from the PubMed Database. In: 13th Portuguese Conference on Artificial Intelligence - EPIA'07, 2007, Guimaraes. 2nd Workshop on Text Mining and Applications, 2007.

Business Analytics

Business analytics relies on the exploration of MDA to highlight interesting patterns that drive business performance improvements. Companies generated lots of data and so discovering point events and changes along business impacts on companies competitiveness and survival. Data analytics are in some sense pervasive to several business applications. MDA relies on effective and efficient computation of aggregating functions on high dimensional databases. MDA on higher dimensional databases is not a trivial task, given the limitation of the most known aggregation-based algorithms. So, how to enhance this data-driven search with discovery-driven features smoothing the curse-of-dimensionality problem? Besides, data as well as patterns evolve over time-to-time. Thus, how to highlight those significant changes Aggregation-based Mining Methods (AMM) takes its place helping to handle those previous issues. In summary, AMM attempts to combine ideas of aggregating and mining functions to develop practical, effective and efficient data analytics methods. Furthermore visualization also plays an important asset to highlight utility of these patterns along the business domain. We have been explored visual data analytics in a broad range of industrial applications.

-LIRA, W.P.; GAMA, F.; BARBOSA, H.; ALVES, R.; SOUZA, C.R.B.: VCloud: adding interactiveness to word clouds for knowledge exploration in large unstructured texts. ACM SAC 2016: 193-198

-LIRA, W. P. ; ALVES, R.; COSTA, J.M.R. ; PESSIN, G. GALVAO, L.; CARDOSO, A.C. ; DE SOUZA, C.R.B. A Visual-Analytics System for Railway Safety Management. IEEE Computer Graphics and Applications, v. 34, p. 52-57, 2014.

-ALVES, R. ; RIBEIRO, J.; BELO, O.; HAN, J. Ranking Gradients in Multidimensional Spaces. In: Tho Manh Nguyen. (Org.). Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development: Innovative Methods and Applications. Hershey, PA, EUA: IGI Global, 2009, v. , p. 251-269.

-ALVES, R. ; BELO, O.; RIBEIRO, J. Mining Significant Change Patterns in Multidimensional Spaces. International Journal of Business Intelligence and Data Mining, v. 4, p. 219-241, 2009.

-ALVES, R. ; BELO, O.; RIBEIRO, J. Mining Top-k Multidimensional Gradients. Lecture Notes in Computer Science, v. 4654, p. 375-384, 2007.

-ALVES, R.; FERREIRA, P.; RIBEIRO, J.; BELO, O. Detecting Abnormal Patterns in Call Graphs based on the Aggregation of Relevant Vertex Measures In: 12th Industrial Conference on Data Mining, 2012, Berlin.

-ALVES, R.; BELO, O. Multidimensional Data Mining. In: IV Simposio Doutoral do Departamento de Informatica, 2007, Braga. SDDI'2007, 2007.

-ALVES, R.; BELO, O. Effective OLAP Mining of Evolving Data Marts. In: 11th International Database Engineering and Applications Symposium (IDEAS 2007), 2007, Banff, Canada. International Database Engineering and Applications Symposium, 2007. p. 120-128.

-FERREIRA, P.; ALVES, R.; BELO, O.; RIBEIRO, J. Detecting Telecommunications Fraud based on Signature Clustering Analysis. In: 13th Portuguese Conference in Artificial Intelligence - EPIA'07, 2007, Guimaraes. Workshop on Business Intelligence, 2007.

-ALVES, R.; BELO, O. Analytical Data Mining for Stream Data Analysis. In: III Simposio Doutoral do Departamento de Informatica, 2006, Braga. SDDI'2006, 2006.

-ALVES, R.; FERREIRA, P.; BELO, O.; RIBEIRO, J.; LOPES, J.; CORTESAO, L . Discovering Telecom Fraud Situations through Mining Anomalous Behavior Patterns. In: ACM SIGKDD 2006, 2006, Philadelphia. 1st Workshop on Data Mining for Business Applications, 2006.

-ALVES, R. ; BELO, O. On the Computation of Maximal-Correlated Cuboids Cells. Lecture Notes in Computer Science, v. 4081, p. 165-174, 2006.

FERREIRA, P.; ALVES, R. ; BELO, O.; CORTESAO, L. Establishing Fraud Detection Patterns Based on Signatures. Lecture Notes in Computer Science, v. 4065, p. 526-538, 2006.

-ALVES, R. ; BELO, O. Programming Relational Databases for itemset Mining over Large Transactional Tables. Lecture Notes in Computer Science, v. 3808, p. 314-324, 2005.

-FERREIRA, P.; ALVES, R.; AZEVEDO, P.; BELO, O. A Hybrid Method to Discover Inter-Transactional Rules. In: Jornadas de Ingenieria del Software y Bases de Datos (JISBD), 2005, Granada. JISBD 2005, 2005. p. 131-138.

-ALVES, R.; BELO, O. Mining Clickstream-based Data Cubes. In: International Conference On Enterprise Information Systems, 2004, Porto. ICEIS'2004, 2004. p. 583-586.

-ALVES, R.; CAVALCANTI, F.T.; FERREIRA, P.; BELO, O. Clickstreams, the basis to establish user navigation patterns on web sites. In: International Conference On Data Mining 2004, 2004, Malaga. Data Mining'2004, 2004. p. 87-132.

-ALVES, R.; LOURENCO, A.; BELO, O. When the Hunter Becomes the Prey: Tracking Down Web Crawlers in Clickstreams. In: Data Gadgets 2005, Bringing Up Emerging Solutions for Data Warehousing Systems, 2004, Malaga. Data Gadgets 2004, 2004.

-ALVES, R..; BELO; O. An OLAM Approach to Analize e-commerce Clickstreams. In: I Simposio Doutoral do Departamento de Informatica, 2003, Braga. SDDI'2003, 2003.