Mini Challenges 2018

Challenges (M2 students) Abstract Task Videos (L2 student solutions) 
 VISION
https://codalab.lri.fr/competitions/108
Autonomous vehicles will become a common means of transportation very soon. However, obstacles remain to be overcome, in particular obstacle avoidance. This requires powerful computer vision algorithms. In this challenge you will contribute to solve the problem of recognizing animals and vehicles. To illustrate this problematic, we propose to study the image source CIFAR-10 which groups entities that can interact with the vehicle environment like animals(cat, horse, dog, ...) and vehicles (bike, car, truck, ...). We preprocessed the images to you get to solve a multi-class classification problem from pre-computed features.

Your score is the balanced accuracy or BAC. It is the average of the error rates for the various classes. Make predictions the are vectors [0 0 ... 1 ... 0 0] with a 1 at the ith position if you want to predict you sample belongs to class i.
CAMERA
REGARD
IMAGE
 Over-prescription of opioid medicines presents a new public health problem because many people have become addicted. This challenge asks you to help predicting which doctors tend to over-prescribe such medicines.
 The data set contains a binary classification task. The target represents, for each medical prescription whether an opioid has been prescribed or not. The features represent, amongst others, the specialty of the doctor who made the prescription and the name of the non-opioid drugs present in this prescription.

Your score is the Gini or "normalized AUC": 2 AUC - 1. AUC stands for Area under ROC curve. Make numerical predictions for test samples that are larger for the positive class and smaller for the negative class (discriminant values). Random guesses give a score close to 0 while perfect predictions give a score of 1.
SANTE
MEDECINE
SECOURS
  FRIEND
https://codalab.lri.fr/competitions/112
 Predicting at which price a house will sell helps people selling their property at a fair price. This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015.  This is a regression problem. The dataset contains 19 house features plus the price and the id columns, along with 21613 observations.

Your score is the R-square = 1 - MSE / var(Y).It si 0 for the baseline method that predicts the average target value.  It is 1 for perfect guesses. It ca be negative if your predictions are worse than the average target value!
SOLIDARITE
AMITIE
FRATERNITE
EGALITE
  ECOLO
https://codalab.lri.fr/competitions/100
 Pollution, or the introduction of different forms of waste materials in our environment, has negative effects to the ecosystem we rely on. With modernization and development in our lives, pollution has reached its peak, giving rise to global warming and human illness. This is a regression problem. The goal of this challenge is to predict the NOx levels in the air in Northern Taiwan, which is an indicator of pollution. The dataset is was initially provided by the Environmental Protection Administration, Executive Yuan, R.O.C.

Your score is the R-square = 1 - MSE / var(Y).It si 0 for the baseline method that predicts the average target value.  It is 1 for perfect guesses. It ca be negative if your predictions are worse than the average target value!
VERDURE
NATURE
  CREDIT
 This challenge deals with a fundamental task in the financial industry: credit scoring. In simple English, it means deciding whether to grant a credit to someone or not, depending on her/his historical financial record. This is a binary classification problem. The data set contains 150000 instances separated on 2 classes, where each class refers to the seriousness of a client in two years.

Your score is the Gini or "normalized AUC": 2 AUC - 1. AUC stands for Area under ROC curve. Make numerical predictions for test samples that are larger for the positive class and smaller for the negative class (discriminant values). Random guesses give a score close to 0 while perfect predictions give a score of 1.
CROISSANCE
HONETETE
AUDACE

Acknowledgements: These challenges were generated with ChaLab and are hosted by CodaLab. We received a grant of the FCS Paris-Saclay and sponsorship of Microsoft Azure for Research.
Auto-sklearn performances

Challenge

Score (validation set)

VISION

0.8153

BIOMED

0.7098

FRIEND

0.8507

ECOLO

0.8546

CREDIT

0.4499

Sample competition Abstract Task
 Iris
iris
 This is the well known Iris dataset from Fisher's classic paper (Fisher, 1936). The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

The problem is a multi-class classification problem. Each sample (an Iris) is characterized by its sepal and petal width and length (4 features). You must predict the Iris categories: setosa, virginica, or versicolor. 


Mini Challenges 2017

 Challenges (M2 students)  Abstract  Task  Solution (Video L2 students)
 Blue

Blue

 Activity of molecules against HIV

The problem is to relate molecular structure to activity to screen new compounds before actually testing them with High Throughput Screening (HTS) in vitro experiments. HTS is a method for massive scientific experimentation  used in drug discovery, linking the fields of biology and chemistry. This method  remains very costly process despite many recent technological advances in the field of biotechnology. This is why applying machine  learning methods would be of great benefit for the pharmaceutical industry to reduce the number of compounds that need to be tested. 
 The Objective of is to predict which compounds are active against the AIDS HIV infection. The dataset has two classes : active or inactive (Binary Classification). The variables represent properties of the molecule inferred from its structure.
Note: this project is running on the LRI server. In case of problem, a previous version on the main Codalab instance is available.
 Marine

 Cobalt

 Cyan
Cyan
 Lothlorien
This challenge aims at addressing the issue of resources access (website, drug purchase, violent movie, etc.) based on the age of a person. Indeed a lot of violent content is accessible on the internet and  45 % of children under 12 are not monitored by parental control. For this sake, we rely on the person's real-time image to estimate his age category. Facial aging effects are mainly correlated to bone movement and growth, skin wrinkles and reduction of muscle strength. Human observation lacking of accuracy, we want to find an automatic algorithm to make this distinction.
 A computer vision challenge is proposed for undergraduate students in which the challenger must predict the class of a person (major or minor) based on a picture of his/her face.
Note: the main Codalab instance of this challenge has been tested.
Note: this project is running on the LRI server. In case of problem, a previous version on the main Codalab instance is available.
 Cerulean

Turquoise
 Green
Green
 Ecocity
Help SimCity's mayor fight pollution and traffic jams by optimizing the city's bike rental system!
SimCity mayor has invested a lot of money to fight against pollution and reduce traffic jams. Her first action was the purchase of a bike rental system. To improve the system, she wishes to predict the number of bikes rented at each station at any moment of the day using weather data. 
 The challenge that is to use weather data (temperature, humidity, cloud cover) to predict the number of bikes rented at given station for a given day. To make the challenge more interesting, predictions are asked either in the morning or in the afternoon.
Note: this project is running on the LRI server. In case of problem, a previous version on the main Codalab instance is available.
 Grass-Pistachio
 Yellow
Yellow
 Movie recommendation

Currently, there are more and more music to listen, movies to watch and things to buy on the Internet.  Therefore, developing systems that help users find items they may like is crucial. Recommending items is different from "classical" machine learning, where you only have to predict a class given several features.  Recommendation implies using predictions to recommend suitable items (in this case movies) to the adequate people. In addition to that, this preferences can be sometimes evolve in time.  
 In this challenge, you will work on the famous Movielens dataset. The goal of this challenge is to predict for a user and a given film the score that is the most likely to be awarded by the user.
Note: There is also a LRI version. Warning: both versions were using different score. They should now both use a_metric = 1 - MAE/MAD.
 Gold

 Lemon

 Vanilla
 Orange
Orange
 Pick The Sneak Peek
In 2000, 60,234 titles between movies and TV shows were released, according to the IMDB source. In 2010, 165,830 titles and in 2016, 190,275 titles were filmed. We can only notice that the movie release industry is in perpetual increase and the databases aggregating the data are in need of more information to expand.
 This is a text processing challenge.
The idea is to facilitate the genre labeling of movies from their summaries and thus to help with categorization of the movies database.
Note: this project is running on the LRI server. In case of problem, a previous version on the main Codalab instance is available.
 Salmon

Tangerine


 Red

Red
 The Godfather returns!
After last year’s purge accomplished by Batman the Godfather has return and he's looking for new skills, the best criminals in SF, for crime organizations to prosper again and go back to gold age. To make sure about the recruits' abilities, records of their previous crimes in the San Francisco Bay Area are being investigated, background checks are being conducted on the candidates curriculum and a software is being developed to highlight criminals' potential.
 The goal is to design software to predict, for each criminal record, the category of crime. If the candidate's crime falls into the category that the Godfather needs, he will be recruited!
Note: No LRI implementation so far.
 Magenta

 Cherry

 Coral




Acknowledgements: These challenges were generated with ChaLab and are hosted by CodaLab. We received a grant of the FCS Paris-Saclay and sponsorship of Microsoft Azure for Research.
Auto-sklearn performances

Challenge

Score

Blue

0.4020

Cyan

0.5863

Green

0.5118

Yellow

0.6747

Orange

0.3509

Red

0.4850


Sample competition  Abstract  Task
 Iris
iris
 This is the well known Iris dataset from Fisher's classic paper (Fisher, 1936). The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
You may download the Codalab bundle of this challenge, which serves as competition template (uploadable to Codalab). This example can also be created with the ChaLab wizard.
The problem is a multi-class classification problem. Each sample (an Iris) is characterized by its sepal and petal width and length (4 features). You must predict the Iris categories: setosa, virginica, or versicolor. 


Mini Challenges 2016
 Challenges (M2 students)  Abstract  Task  Solution (L2 students)

https://competitions-test.codalab.org/competitions/1104?secret_key=d2ee4b0c-c41a-4071-a9ed-fc93fc0c054e
Less time in hospital
Diabetes will be the seventh most common cause of death in 2030 according to the World Health Organization. In 2014, global prevalence of diabetes was estimated to be more than 9% among adults aged 18+ years. If most hospitals have the necessary medical equipment to treat this disease, some do not have these means. The task is a binary classification problem. Using the train set, it consists in predicting the length of stay for a patient given its diagnosis and its medications. This label consists in two categories : a stay inferior to 7 days or a stay greater or equal to 7 days.  Video Microbes 1
 Video Microbes 2
Restaurantsrestaurant We propose a challenge in restaurant recommendation to predict the rating for a particular user of any restaurant. We have very detailed information of the restaurants like geographical information, number of stars, reviews, etc and for each person a list of some restaurants he visited and his personal rate. The participants will work in two principal tasks:
Task 1: Select the most prevalent features in the three datasets:
Task 2: improving the prediction results using others methods and improving the training dataset with the data of Yelp.
 Video Fin Gourmets
 Eye robot

eye robot
Robots take more place in society everyday and soon they may be walking in the streets among us. There are a lot of problems that need to be solved before that and one of them is adaptation. An AI needs to adapt its vision of the world: when it sees an entity for the first time it should be able to tell if it is a domestic animal, a predator, a vehicle or maybe another robot? That is where transfer learning shows up: extracting general features from specific examples of a group allows to efficiently classify unknown entities. The idea of the challenge is to learn how to separate distinct classes of images. Precisely, we consider different superclasses, like "aquatic animals", each containing several classes, like "dolphin", and the goal is to tell this superclasses apart.  Video EyeRobot 1
 Video EyeRobot 2
 
batman Crimes in Gotham city
Batman fighting in the forefront to deliver the Gotham City from the evil crimes. And now he and his team want to create a system in order to increase their working efficiency. They have recent years’ crime data of Gotham City which is collected from GCPD and Batman’s database. The data including the location, the time and some other information of each crime. Some crimes have been solved, the others not. The main goal of this project is to help Batman develop this system. In other words, do the classification of crimes. You can treat it as a binary classification problem, to predict whether a crime can be solved or not. You can also first do the logistic regression to compute how likely a crime will be solved. Then Batman can define the priority for the crimes with this system.  Video Batman 1
 Video Batman 2
 Video Batman 3
 Textasie
ryan
In this project you will tackle the problem of Opinion Mining in movie reviews with a basic set of techniques used in text classification. Many sentiment-analysis methods for the classification of reviews use training and test-data based on star ratings provided by reviewers. However, when reading reviews it appears that the reviewer's ratings do not always give an accurate measure of the sentiment of the review. The objective of the challenge is to determine the polarity of an opinion from raw text. Since it's a challenge for starter you will only focus on classifying opinion to positive or negative. You can go further in detailing sentiments like happiness, sadness, satisfaction but this will not be our goal in this contest.   Video Textasie