Workshop 3

Place: Recife, Brazil

Date: November 26 - December 01, 2018

Attendees:

  • George DC Cavalcanti

  • Tsang Ing Ren

  • Thiago José Marques Moura

  • Rene Nobrega de Sousa Gadelha

  • Luiz Eduardo S. Oliveira

  • Laurent Heutte

  • Hongliu Cao

Activities

Monday, November 26, 2018

Tuesday, November 27, 2018

  • Mariana Araujo de Souza (PhD Student)

Title: Online local pool generation for dynamic classifier selection

Abstract: Dynamic Classifier Selection (DCS) techniques have difficulty in selecting the most competent classifier in a pool, even when its presence is assured. Since the DCS techniques rely only on local data to estimate a classifiers competence, the manner in which the pool is generated could affect the choice of the best classifier for a given instance. That is, the global perspective in which pools are generated may not help the DCS techniques in selecting a competent classifier for instances that are likely to be misclassified. Thus, it is proposed in this work an online pool generation method that produces a locally accurate pool for test samples in difficult regions of the feature space. The difficulty of a given area is determined by the estimated classification difficulty of the instances in it. That way, by using classifiers that were generated in a local scope, it could be easier for the DCS techniques to select the best one for those instances they would most probably misclassify. For the query samples surrounded by easy instances, a simple nearest neighbors rule is used in the proposed method. In order to identify in which cases the local pool is used in the proposed scheme, an analysis on the correlation between instance hardness and DCS techniques is performed in this work, and it is proposed the use of an instance hardness measure that conveys the degree of local class overlap near a given sample. Experimental results show that the DCS techniques were more able to select the most competent classifier for difficult instances when using the proposed local pool than when using a globally generated pool. Moreover, the proposed technique yielded significantly greater recognition rates in comparison to a Bagging-generated pool and two other global generation schemes for all DCS techniques evaluated. The performance of the proposed technique was also significantly superior to three state-of-the-art classification models and was statistically equivalent to five of them.

  • Felipe Nunes Walmsley (MSc Student)

Title: An Ensemble Generation Method Based on Instance Hardness

Abstract: In Machine Learning, ensemble methods have been receiving a great deal of attention. Techniques such as Bagging and Boosting have been successfully applied to a variety of problems. Nevertheless, such techniques are still susceptible to the effects of noise and outliers in the training data. We propose a new method for the generation of pools of classifiers based on Bagging, in which the probability of an instance being selected during the resampling process is inversely proportional to its instance hardness, which can be understood as the likelihood of an instance being misclassified, regardless of the choice of classifier. The goal of the proposed method is to remove noisy data without sacrificing the hard instances which are likely to be found on class boundaries. We evaluate the performance of the method in nineteen public data sets, and compare it to the performance of the Bagging and Random Subspace algorithms. Our experiments show that in high noise scenarios the accuracy of our method is significantly better than that of Bagging.

Wednesday, November 28, 2018

  • Hongliu Cao

Title: Random Forest Dissimilarity Based Multi-View Learning for Radiomics Application

Abstract: Radiomics is a medical imaging pattern recognition task that aims at extracting a large amount of features from standard-of-care images, to help diagnose and treat cancers. Many recent studies have shown that Radiomics can o er a lot of useful information that physicians cannot extract from these images and can be efficiently associated with other information like gene or protein data. However, most of the classification studies in Radiomics report the use of feature selection methods without identifying the underlying machine learning challenges. In this paper, we first show that the Radiomics problem should be viewed as a high dimensional, low sample size, multi-view learning problem. Then, we propose a dissimilarity-based method for merging the information from the different views, based on Random Forest classifiers. The proposed approach is compared to different state-of-the-art Radiomics and multi-view solutions, on different public multi-view datasets as well as on Radiomics datasets. In particular, our experiments show that the proposed approach works better than the state-of-the-art methods from the Radiomics, as well as from the multi-view learning literature.

  • Prof. Laurent Heutte

Title: Pattern spotting in historical document images

Abstract: Information retrieval in historical document images has long consisted in spotting words. Apart from words, a manuscript can contain various graphical elements that could also be interesting to retrieve, and recently, interest has grown towards graphical object retrieval or pattern spotting. Pattern spotting consists in searching in a collection of document images for occurrences of a graphical object (medieval dropped initial capital letters, decorative objects, coats of arms, etc.), that may present some differences in terms of color, shape, or context. Contrary to object detection and classification, where models of the object of interest may be trained, pattern spotting does not rely on any prior information regarding the query, nor predefined class of graphical objects. An offline sliding window approach may be suitable, provided that the challenge raised by high computational and storage costs is handled. In this talk, we will present an unsupervised, segmentation-free approach that takes advantage of recent developments in computer vision to overcome these issues. Results obtained on medieval manuscripts from the DocExplore project show that our approach achieves better retrieval results, with a better efficiency in terms of time/memory, compared to standard approaches.

  • Prof. Luiz Oliveira

Title: Handwriting Recognition Revisited

Abstract: Handwritten recognition has been subject of research over the past few decades. However, some important applications such as signature verification and string digit recognition have reached their upper limits in terms of performance in the last decade. With the advent of deep learning, most specifically cheap hardware, we saw an opportunity to address some of the handwriting recognition bottlenecks, e.g., representation and segmentation. In this presentation, we will discuss how we improved the performance of signature verification and string digit recognition systems.

Thursday, November 29, 2018

  • Hongliu Cao

Title: Improve the performance of transfer learning without fine-tuning using dissimilarity-based multi-view learning for breast cancer histology images

Abstract: Breast cancer is one of the most common types of cancer and leading cancer-related death causes for women. In the context of ICIAR 2018 Grand Challenge on Breast Cancer Histology Images, we compare one handcrafted feature extractor and five transfer learning feature extractors based on deep learning. We find out that the deep learning networks pretrained on ImageNet have better performance than the popular handcrafted features used for breast cancer histology images. The best feature extractor achieves an average accuracy of 79.30%. To improve the classification performance, a random forest dissimilarity based integration method is used to combine different feature groups together. When the five deep learning feature groups are combined, the average accuracy is improved to 82.90% (best accuracy 85.00%). When handcrafted features are combined with the five deep learning feature groups, the average accuracy is improved to 87.10% (best accuracy 93.00%).

  • Pedro D. Marrero Fernandez

Title: Representation learning and attention models

  • Hector Pinheiro

Title:

Friday, November 30, 2018

  • Organisation of our special session at the IJCNN'2019 on Ensemble Learning and Applications approved by IJCNN committee.

  • Definition of the schedule and program for the next meeting that will take place in Rouen-France (July/2019??).

Saturday, December 01, 2018

  • Trip back home