Home

This web site gathers ideas about creating challenges in Machine Learning following the principles of "coopetition".

What is a coopetition?

According to Wikipedia, "coopetition" is a neologism coined to describe cooperative competition. Coopetition is a portmanteau of cooperation and competition. The notion of coopetition has been used in game theory, econometrics, and political science to describe systems in which agents have a partial congruence of interest and cooperate with each other to reach a higher value than by merely competing.

We want to use the notion of coopetition in Machine Learning by organizing challenges in which participants collaborate to some extent, with the objective of improving the challenge outcome (the best solutions to the proposed task). Challenges in Machine Learning are typically built around a solely competitive winner-take-all models in which mutually exclusive teams try to independently solve a problem to win a prize. In some challenges in which team mergers occurred or were encouraged (e.g. the Netflix challenge) performance leaps were obtained. There is value in coopetitions to encourage a broader base of participants with complementary skills to contribute.

Two means of implementing a notion of coopetition in Machine Learning challenges have been used in the past:
- Running recurring challenges: By organizing a competition on the same theme, typically every year, the organizers alternate competition and cooperation. Cooperation is achieved by disseminating the results of each competition by means of workshops and proceedings. The organizers may impose to the winners to make their code available as "open source" to qualify for their prizes. Examples of recurring challenges include TREC organized by NIST (since 1992, originally focused on text information retrieval, later expanding to other areas such as video processing), CASP (since 1994, focused on protein structure prediction), Robocup (since 1997, a competition of robotics), DREAM (since 2007, on the theme of DNA microarray analysis), VOC (2005-2012) and Imagenet challenges (since 2010), both focusing on image recognition and sometimes run jointly, and the ChaLearn gesture challenges (since 2011 on multi-modal gesture recognition). 
- Encouraging coalitions/team merger: The organizers provide on-line feed-back on progress made and a discussion forum to let competitors freely exchange ideas and authorize the merger of teams. In the Netflix prize, the challenge protocol imposed and absolute performance threshold to be exceeded to win the grand prize. This could only be achieved with a team merger after a long period of performance stagnation.

However, to our knowledge, there has not been yet any formal attempt by competition organizers to encourage collaboration between competitors by structuring the rules of the challenge in a particular way and/or facilitating information exchange. Additionally, "classical ways" of encouraging collaboration (with recurring challenges and team mergers) do not provide a means of retributing participants for partial contributions: each teams still has to solve end-to-end the task(s) of the challenge and only the participants ranking at the top at the end of the challenge win a prize. 

We are interested in encouraging two types of partial contributions:
- Sub-task contribution: A team may contribute a module or a key idea, which, alone, is insufficient to get good overall performance. In a complex challenge, the complementarity of domain-specific skills is welcome.
- Snowball effect: Teams that break the ice by entering early in the competition or that make a sudden leap in performance make an important contribution:  they attract interest and push all other teams to match the new best result (which often happens within hours).
To reward participants for partial contributions we need to work both on changing the typical rules of machine learning challenges and adding more flexibility to challenge platforms

Insights from game theory

We found that it is useful to use concepts developed in game theory to put a formal framework around the design of coopetitions. Game theory provides a framework to reason about problems on problems in economics, political science, psychology, and biology that goes well beyond the study of recreational games. It can be thought of as an extension of decision theory to systems of intelligent rational decision-makers.

The PAPI game setting
According to Eric B. Rasmusen in his book "Games and Information":
"The essential elements of a game are players, actions, payoffs, and information– PAPI, for short. These are collectively known as the rules of the game, and the modeller’s objective is to describe a situation in terms of the rules of a game so as to explain what will happen in that situation. Trying to maximize their payoffs, the players will devise plans known as strategies that pick actions depending on the information that has arrived at each moment. The combination of strategies chosen by each player is known as the equilibrium. Given an equilibrium, the modeller can see what actions come out of the conjunction of all the players’ plans, and this tells him the outcome of the game."

From our point of view, challenges are games and challenge participants are players. The task of the organizers is to devise appropriate rules and facilitate game playing. To benefit from the insights of game theory, we first need to fit the challenge settings in the game theoretic framework. We show in boldface the "basic" PAPI challenge setting and in parenthesis more elaborate considerations:
- [P]layers = challenge participants [the organizers may be players to the extent that they make entries in the game to stimulate it (benchmark or baseline entries) but they are not entitled to prizes; other actions of the organizers such as changing the data or the rules in the middle of the challenge can be construed as making them players, but these are cases that should generally be avoided, if possible]
- [A]ctions = challenge entries [other actions may be possible, including sharing knowledge, data, and software]
- [P]ayoffs = prizes [other possible payoffs include: learning about new problems and solutions, gaining visibility (academic credit, jobs), getting work dissemination opportunities (workshops, publications), having fun, meeting new people]
- [I]nformation = results rated on a leaderboard [other information may include additional data acquired during the challenge, knowledge gained from the literature or other sources, software available]

Example of basic PAPI challenge setting
The simplest types of Machine Learning challenges adhere exactly to the basic PAPI mapping. Here is a typical example forming the basis of some popular Kaggle challenges:
- Consider a classification problem e.g. credit scoring. The data consists of a table in which rows are samples (e.g. individuals or companies needing access to credit) and columns are attributes/features (e.g. socio-economic information such as age, gender, zipcode, revenue, etc.). One column needs to be predicted (e.g. whether the person will experience financial distress within the next 2 years).
- Divide the data table into a training set for which the values of the column to be predicted are provided (ground truth) and a test set for which the values of that column are withheld (the goal of the challenge is to predict them accurately).
- Define a scoring function (e.g. the prediction success rate or the Area under the ROC curve -- AUC).
- Take a random fraction of the test data to provide immediate performance feed-back to the participants on a "public leaderboad" during a development phase [note: rather than using a fraction of the test set to give feed-back, most challenge organizers prefer using a separate validation set to avoid biasing the final results].
- Base the prize attribution on performances computed with the entire test set, concealed on a "private leaderboad" revealed only at the end of the final test phase [note: the final test phase gives time to the participants to submit results on test data, but it may coincide with the end of the development phase if the test data are available from the start of the challenge].
- Give prizes of decreasing values to the 3 top ranking participants (e.g. $3,000 for first, $1,500 for second and $500 for third).
This follows the basic PAPI setting: the players are the challenge participants (the organizers do not intervene during the game; they may only provide a baseline method and associated baseline performances to bootstrap submissions); the only actions of the players are the challenge entries; the only (explicit) playoffs are the prizes (not considering other possible types of rewards, such as Kaggle points, see below); the only information available to the participants during the challenge is the participants' performance on the public leaderboard.

Rules as controllable factors to influence the challenge outcome
From a game-theoretic point of view, the purpose of modeling is to explain how a given set of circumstances or factors leads to a particular outcome. We consider that the rules or the game are factors that the organizers have control over, but there may also be other circumstances that the organizers do not have control over, some of which may be unpredictable or even unknown. In the case of a challenge, we list some possible factors and outcomes, the most "basic" ones being outlined in bold:
- Controllable factors = challenge rules [including challenge start, challenge duration, number of submissions per day, number of final submissions, amount of prizes, conditions to qualify for prizes such as releasing source code, location and date of the possible workshop where the results will be discussed, proceedings venue]; besides defining the rules, the participants can influence the outcome by providing help to the participants (collectively, not individually) by giving tutorial material, answer to questions on a forum, etc.; other controllable choices include the type of platform chosen on which to run the challenge.
- Uncontrollable but predictable factors = known concurrent events quantitatively or qualitatively affecting participation [holidays, conferences, and other events that will happen during the course of the challenge; to some extent, the qualification of the participants can be predicted from the choice of platform, workshop venue, etc.].
- Unpredictable but known factors = availability of resources for participants entering the challenge [e.g. decrease or increase in level of Government funding or other economic factors; change in licensing policy or support for critically needed software].
- Unknown factors = personal factors affecting individual participants [sudden sickness, change of job, etc.].
- Outcome = the final solutions of the winners [or the software, algorithms, or principles, leading to such solutions]. 

Basic challenge rules
The primary role of the organizers is to optimize the outcome by defining appropriately the challenge rules (and eventually choosing other controllable factors). Basic challenge rules do not give a lot of leeway to challenge organizers. The choices typically include only:
challenge start
- challenge duration
- number of submissions per day
- number of final submissions per team
- amount of prizes
- licensing conditions of the data and solution
- choice of workshop and proceeding venue
The participants are organized in mutually exclusive teams determined at the start of the challenge and private information exchange between teams is forbidden.
With challenge start and end, one can indirectly control some of the uncontrollable participation factors due to scheduled calendar events. The number and type of participants can also be indirectly influenced by licensing conditions and choice of workshop and proceeding venue.

Challenge design as a meta-game
Challenge organization can be thought of as a meta-game (the development of the rules of another game). 

One role of the organizers is to prevent cheating with an appropriate challenge design. Some participants think of challenges as adversarial games, participants against organizers, in which the goal is to defeat the organizers by finding a flaw in the challenge design that either ensures that they win or invalidates the outcome. Cheating prevention mechanisms include post-challenge cross-verifications: (1) the participants submit their software before the test data are released; (2) the participants and the organizers run the software on the test data and check that they obtain the same result.

It is interesting to study what possible gaming strategies are available to the challenge participants. For instance, it has been observed that some of the top ranking participants do NOT submit results to the public leaderboard until the very end of the challenge to withhold information from the rest of the participants on their current standings.

Our goal is to consider challenge design as an overall optimization process and provide challenge organizers with powerful design tools to increase the efficiency of challenges to produce successful outcomes. In particular, we want to address the inherent limitations imposed by rule preventing or discouraging collaborations between participants.

Game-theoretic vocabulary
To gain more insight from game-theory, it is useful to characterize various challenge settings using the vocabulary used in the field. Challenges with basic rules (described above) are classified as, non-cooperative games, population gamesfinite time horizon gamesconstant-sum games, and symmetric gamesThey are also imperfect information games and sequential games. We are now explaining why, what changes could be made, and what benefits could be derived, particularly in terms of fostering collaboration between participants.

Non-cooperative games: Challenges with basic rules (described above) in which pre-registered mutually exclusive teams are participating and in which private exchanges between teams are forbidden are usually non-cooperative games. We discuss below several deviations to the basic rules turning challenges into cooperative games, including allowing teams to freely re-arrange or permitted private exchanges between teams.

Population games: Challenges are population games. The number of players is usually illimited: the more the better (from the point of view of the organizers). Very popular challenges may put some stress on the computer server and require implementing some congestion control mechanism for entry submissions. Usually, the number of entries per day is limited. Because of the competitive nature of challenges, the participants are not encouraged to recruit others if this means that their expected reward will decrease (e.g. the probability of winning a prize). This is alleviated when a point system is implemented. For instance, Kaggle attributes points to participants using the formula:
Competition Points=100000# Team Members(Team Rank)0.75 log10(# Teams)2 years - time elapsed since deadline2 years

Hence the participants of popular competitions earn more points, which gives them an incentive to recruit more participants.

Finite horizon games: Challenges having a pre-determined start and end, independent on the actions of the players (participants) are finite horizon games. Organizational logistics motivate finite horizon challenges and organizers usually prefer having recurring challenges than leaving open the challenge termination date. This also allows improving the datasets, definition of tasks and/or the rules. However, the "Netflix prize" is an interesting exception: the rules did not specify an end date. Rather, winning the grand prize was conditioned on achieving a root mean squared error or "RMSE" lower than a certain value. This rule favored collaboration towards the end of the challenge when teams resorted to form coalitions to win.

Constant sum games: Challenges usually have a pre-determined prize pool, hence participants compete for the same finite constant rewards (there are winners and loosers). In contrast, real economies are often interpreted as non-constant-sum games in which players can find win-win strategies (yielding to economical growth). Non-constant-sum games favor collaborations. For instance, the ChaLearn Active Learning challenge, had a special prize scheme to encourage the participants to enter the challenge on more than one dataset: a team who won on N final test datasets would earn USD 100*2^(N-1) (if you won on 1, 2, 3, 4, 5, 6 datasets, you would earn USD 100, 200, 400, 800, 1600, 3200, respectively). Hence it was advantageous to team up with people having better results on specific datasets (but interestingly, the participants did not really take advantage of this, showing that prizes may not be as much an incentive as getting full credit for winning).

Symmetric games: Challenges are usually symmetric games: all players are in the same situation and can adopt the same strategy to win. However, there are examples of challenges in which there are competing tracks, each track adopting a different strategy. For instance, in the ChaLearn Agnostic Learning vs. Prior Knowledge challenge, the participants competed on 5 datasets in 2 tracks: in the agnostic track, they used data preprocessed in a simple low-level feature representation with no knowledge of the meaning of the features; in the prior knowledge track they had access the raw data with all available information on the data representation. Challenges organized as non-symmetric game may encourage collaboration within track by instilling a feeling of solidarity to beat the other track.

Imperfect information games: Challenges usually stimulate participation by providing feed-back to the participants during the development phase on a "public leaderboard" displaying the ranking on either a random subset of the final test set or a separate validation set. The actual performance on the test set used for the final ranking remains hidden (to prevent the participants from adjusting their methods to the test data and get biased results).

Sequential games: Challenges providing feed-back on a "public leaderboard" are sequential games: during the development phase, the participants may submit results asynchronously when they want, often in response the improvements made by other participants displayed on the "public leaderboard". Note however that the final phase (evaluation on test data) is simultaneous. Some challenges do not provide any feed-back to the participants during the development phase (such are TREC challenges) and can be classified as simultaneous games.

The previous analysis indicates that, except for withholding information (by not making submissions during the development period), there is not much room for strategy in challenges that make use of basic rules. We propose to change this by introducing more flexible (yet simple) challenge rules.

Coopetitions 

As we have seen in the previous section, it is easy to introduce some form of cooperation between participants, either by allowing them to freely reconfigure their teams during the competition or to privately exchange information. However,
- what incentive will the participants get to cooperate?
- how can we know that such changes in rules will improve the challenge outcome?

We want to introduce new designs that have not been tried yet, but would provably yield better outcomes. To that end we will change the basic PAPI settings:
- [P]layers = the challenge participants (no change), but the asymmetries will be introduced (not all participants having necessarily the same role).
- [A]ctions = challenge entries (no change), but the entries will not be limited to prediction results, they will include shared resources (data, knowledge, software).
- [P]ayoff = credit points earned, spent, and redeemable in various ways (instead of simple prizes for winning).
- [I]nformation = results rated on a leaderboard + (new) resource ratings.
To make the new designs most comparable to previous designs, the old designs will always be special cases of the new designs.

A new PAPI vocabulary
Existing challenge platforms are too restrictive to implement coopetitions: the participants' actions are generally limited to submitting prediction results.
We want to use the capabilities the new platform under development Codalab to provide competition organizers with a broader PAPI vocabulary. The main benefits or Codalab are derived from the possibility of submitting generic "bundles" as challenge entries. A bundle is an archive that may contain data (prediction results being only a particular case) and code (executable or source code). Microsoft is providing access to its cloud computing facility Azure so that code bundles can be executed when submitted to the platform. Bundles can also be thought of as re-usable modules, which may be shares among participants. The present alpha version of Codalab is open source software written in Python, which can be run either under Windows or Linux. It supports executable bundles written in Python, but the plan is to extend support for execution of any type of executable code (interpreted or compiled), isolating each user on a different virtual machine. A bundle execution time is limited by a maximum value and there is a limit on the number for submissions per day.

We first describe a number of building block that will constitute the new PAPI vocabulary of challenge design, beginning with the new type of actions.

Codalab [A]ctions
The bundle system offers to the participants a flexible way to share data and code and we hope that the participants will self-organize and propose new ways of exploiting the bundle concept that we have not anticipated. However, for concreteness and to help the participants getting started, we propose a few generic types of bundles:
- Data bundles: include either input data (training, validation, or test data) or prediction results.
- Model bundles: include executable code for predictive models, with, in particular, a training program and a test program, and data structures with hyperparameters and parameters (for trained models).
- Wrapper bundles: include groups of other bundles, but behave like model bundles (in particular, they include training and test programs).

The organizers will supply:
- input data bundles (for training and testing models), 
- some example model bundles and 
- some generic wrapper bundles. 
The participants will supply new models and wrapper bundles. Codalab can be thought of as an operating system allowing the execution of bundles.
The wrapper bundles (wrappers for short) supplied by the organizers will be sufficient to implement hyper-models consisting of a Directed Acyclic Graph (DAG) of models. The wrappers will include:
- Chains: a chain is a sequence of bundles to be executed one after the other, the results of one feeding inputs to the next one.
- Ensembles: an ensemble is a group of models that may be executed in parallel, using all the same input, and combining their results to produce a common output. The method of combination may include a voting scheme or a selection of the most promising result (hence ensemble wrappers have model selection wrappers as a special case).

Other user-defined wrapper bundles may be neither chains nor ensembles. For instance, a Hidden Markov Model may be implemented as a wrapper taking any predictive model (such as a neural network) as sub-routine to score the local "emission probabilities" (or the probability that a given frame "belongs" to a given state).

[A]ctions in the Codalab submission system will consist of submitting wrapper bundles solving the end-to-end task of the challenge. Such submissions are "hyper-models" consisting of combinations of chains and ensembles of models (that are themselves bundles). The challenge outcome can itself be thought of as a giant hyper-model including all the participants' submissions grouped in an ensemble selecting the winner. Entries in the challenge contribute new nodes in the challenge outcome DAG (see Figure 1). Although we do not want to enter into the details of a possible caching mechanism, we envision that already calculated intermediate data bundles may be re-used by participants. Likewise, re-used bundles may be referenced rather than copied.

To enable bundle sharing, the participants will have the choice of keeping new contributed bundles "private" or make them "public" and request payment (in the form of licenses or royalties using credit points, see [P]ayoff). Within a Codalab submission consisting of a wrapper bundle solving the end-to-end-task of the challenge, any NEW contributed bundle may be tagged with a price, if the participant want to share it for a fee.

Challenge outcome

Figure 1: Example of challenge outcome DAG. Each submission made by a participant is a bundle. Bundles may be wrappers containing other bundles. In this example, data bundles are represented as blue squares and model bundles as ovals. Model bundles are color coded: green for those supplied by the organizers, blue and orange for those supplied by the participants; orange models are private while blue models are public (may be used by other participants for a fee paid to the author in credit points). Two submissions were made in sequence: Left and Right (the left one first). The can be represented as pseudo-code: 
Left = Ensemble_vote( { Chain({Normalize, FB, PCA, ZARBI}), Chain({Normalize, PROD, RF}) } ) and 
Right = Chain({PREPRO, Chain({FS, ZARBI}) })
The challenge outcome is then obtained with the model Ensemble_select( {Left, Right} ). Left contains twice Normalize as part of two chains both leading to "Preprocessed data 1". For simplicity, our scripting representation does not take this into account, but obviously "Preprocessed data 1" could be cached.
A typical usage would be to execute the training algorithms of Left and Right on training data, then execute the test algorithms of Left and Right on validation data and post the performance on validation data on the "public leaderboard". At the end of the challenge, if Left and Right are final entries, the organizers would turn the test algorithms of Left and Right on test data and select the winner(s).
The participants may assign prices to new bundles that they contribute and make public. For instance, the bundle for model ZARBI contributed by by the first participant in the Left entry was made public. The second participants uses it in the Right entry and therefore must pay a fee. The second participant made the bundle Chain({FS, ZARBI}) public. If someone else uses it, it must pay a fee to participant 2, who will in turn re-pay participant 1 for the use of ZARBI.

Codalab [P]layers
As indicated before, the players are the challenge participants. New with Codalab, the players do not need to be limited anymore to being "Humans" the players can be "Robots", i.e. software bundles making entries in the challenge automatically! One of our ambitions is to organize an Automatic Machine Learning challenge "Humans" vs. "Robots" in which Robots can submit challenge entries without any human supervision on tasks they have never seen before.

Codalab [P]ayoffs
Determining what would the most efficient reward scheme to obtain the best possible challenge outcome for a give prize pool will be one of the object of research of this proposal. However, for concreteness, we make some preliminary suggestions.
We suggest to reward participants both for participation (contribution to the final challenge outcome by actively working and making intermediary contributions during the development period) and for merit (best final performance on test data). The idea of credit for participation is not new: For instance, in the 2011-2012 ChaLearn gesture challenge, we offered a free Kinect camera to the first 10 entrants that outperformed the baseline method on the leaderboard. However, using credit points gives a lot more flexibility.

Credit points: a currency used during the development period
During the development period, the participants will receive credit points for:
- improving performance on the validation data (credit for improvement)
- usage of bundles that they authored and made public (credit for sharing).
The credit points may be used to purchase the right to use other participants' bundles. At the end of the challenge, credit points may be redeemable for cash or other prizes. 
One can think of several ways of computing credit points:
Credit for improvement: The amount of credit for improvement could be proportional to the increase in performance compared to the last best entry (performance computed on validation data). However, it may be preferable to make it proportional to the increase in performance of an ensemble of models based on the models submitted so far. The ensemble would average the predictions of the models submitted having yielded a performance improvement (following the award winning method proposed by Richard Caruana et al). This would reward participants who submit results early (when it is easier to make progress fast compared to the baseline method), reward the participants for submitting diverse results complementing each other, and reward participants for on-going efforts (keeping the participants interested in working on the problem).
We may want to authorize "private submissions", i.e. submissions whose score remains known only to the participant who made the submission and whole wrapper bundle architecture remains secret. Such submissions would not get granted any credit point.
Credit for sharing: We propose to let the authors of bundles set the price for bundle usage (in credit points). Several schemes can be envisioned: 
- royalties: pay x credit points for each run.
- non-exclusive license: pay x credit points for unlimited usage.
- exclusive license: the first participant to pay x can get this bundle for unlimited usage, at the exclusion of others.
- bidding: the author asks the other participants to make an offer.
There can also be a grading of information access in the bundles:
- back box: the bundle is only available for execution on the server and the only information provided is the input and output interface.
- gray box: same as black box, but a few hyper-parameters are available for tuning.
- clear box: the source code is provided.

Merit prizes and credit assignment during the final phase
During the final phase, the organizers evaluate the submitted models on the final test data. Typically the top three ranking participants are awarded prizes. This poses the problem of rewarding all the participants who contributed bundle modules that are part of the final winning models in a fair way.
We propose to let the authors of bundles and their users negotiate the credit assignment a priori (that is at the time of first use of the bundle during the development period). For instance, if a bundle by author A is used in a final submission by participant B, if B win a prize, A receives x% of the prize.

Codalab [I]nformation
In classical challenges, the only information provided during the development period are scores on validation data published on the leaderboard.
With the system of credit points, more information can be provided to help the participants make strategic decisions (such as spending credit points to buy the right to use someone else's bundle). For instance, a profile may be available for each public bundle, including:
- the ID of the author;
- the date created;
- the distribution of performance of submitted models containing that bundle and some simple statistics of the distribution (median, best score);
- the price (in credit points) for: one run, an unlimited non-exclusive license, an unlimited exclusive license;
- the number or times the bundle was run so far;
- comments (description by the author, indication of whether this is a revised version of an older bundle);
- the architecture of the wrapper in which the bundle was submitted (e.g. the bundle ZARBI was submitted as part of the wrapper Left = Ensemble_vote( { Chain({Normalize, FB, PCA, ZARBI}), Chain({Normalize, PROD, RF}) } ) ) and the execution time of the bundle as part of that submission.
- the architecture of the wrapper of the best submission using that bundle and the execution time of the bundle as part of that submission.
A profile may also be available for each team, including:
- the date the team first entered the challenge;
- the date of the last entry;
- the number of entries they have made;
the distribution of performance of submitted (overall) models and some simple statistics of the distribution (median, best score);
- the architecture of their best overall model;
- the number of credit points they earned;
- the list of public bundles they provided (with pointers to the profiles of those bundles).
- the list of public bundles they have used.
The teams could be sorted on the leaderboard by credit points of by best scores.

Evaluating the advantages of coopetitions
What guarantee do we have that the new challenge designs will be effective to improve the competition outcomes? Qualitatively, there are four reasons why there will be improvements:
1) Search in hyper-model space: The participants will have the opportunity of trying modules proposed by others, hence they may come up with better overall solutions. In addition, the new setting encourages then to modularize their software, which will allow the organizers to conduct an even more systematic model selection after the challenge is over, searching is the space of all possible model combinations.
2) Method complementarity: To obtain credit for improvement, the participants are encouraged to differentiate their methods because the amount of credit is proportional to the improvement brought to the ensemble of models submitted so far.
3) Increased participation: More participants will be encouraged to enter because (1) the level of entry is lower since it suffices that you contribute to the improvement of a single module; (2) prizes can be won for participating and contributing to the final solution, not just for winning.
4) Shared resources beyond the termination of the challenge: Every time a new challenge is run, the platform will be populated with new bundles, which will enrich the Machine Learning toolkit available on the platform, if the participants agree to make them available to new users (this may be a requirement for winning prizes).

Remark: the fact that the participants have to "pay" for using bundles they did not author themselves (except those provided by the organizers) puts pressure on using models that are not too big.

Quantitatively:
- can we propose to conduct simulations to evaluate the benefits of various settings?
- are there theoretical results we could use to evaluate the benefits of various settings?


Theory of cooperative games:
http://www.math.ucla.edu/~tom/Game_Theory/coal.pdf

Miscellaneous ideas:
- Data labeling and crowsourcing (Sergio Escalera):
- Multimedia problems:
    * Ambient intelligence
    * Spacio-temporal epidemiology
- Use of bundles for causality competitions

Comparison with other existing systems
- Model selection game (2006): Use of CLOP to submit hyper-model code executable in Matlab. Off-line verifications by the organizers.
- CEC competitions.
- Robocup.
- Causality workbench: submission of results or queries.

How to handle plagiarism problems if source code is distributed (for executable code, one may forbid to download it, it will remain on the platform)?
How to handle bundle revision control?

John Langford's pointers:
1) The online learning against an adversary results are quite relevant for online (in the sense of an observe/predict/correct cycle) learning.
Avrim Blum's learning theory lectures, lecture 2: http://www.cs.cmu.edu/~avrim/ML07/  
2) The learning reductions theory provides guarantees about the quality of a complex solution in terms of the quality of individual subsolutions.
http://hunch.net/~reductions_tutorial/
3) There are some game-theoretic results on elicitability which are pretty relevant to answering questions of what is (theoretically) learnable.
Mark Reid's publications: http://mark.reid.name/work/pubs/ and Nicolas Lambert's publications: http://www.stanford.edu/~nlambert/research.html

The topic of coopetition is definitely interesting.  In many challenges where systems are creating scores, it's well-known that you can average results to get better performance.  Using this approach, you could create a system which automatically averages in results as they come in, assigning credit to each new result according to how much it improves the average.  This approach has the advantage that participants want to submit good results early, that participants want to submit diverse good results, and that credit for success is spread over many people rather than winner-take-all.  It would require that you start from a decent baseline as otherwise there is a gold-rush effect at the beginning.

Extra ideas of Sergio Escalera:
1) About New game NSF project: I consider this project very interesting, as I said it is a new level of dealing with challenges and with high impact for research and industry. I think that a good idea would be to have a repository of registered teams with summaries of expertise so that the participants can contact other teams. This list will be very valuable. For instance, for European Projects there are some platforms where people publish multi-disciplinary project ideas and look for partner with complementary expertise to define strong consortia (eg. http://www.idealist.org/). No doubts this can be a very potential NSF project and useful for ChaLearn. I have one idea for a challenge scenario I would be happy to work on. See next.
2) About a challenge scenario for New game NSF project: As I briefly explained in my previous email, ambient intelligent scenarios are very rich for research and with several potential real applications. For instance, in Europe we have the AAL program (Ambient Assistant Living) http://www.aal-europe.eu/ which funds millions of euros for ambient intelligent technologies. These kind of scenarios are useful in health, tele-medicine, and in particular there exists a current trend to help the mental-physical impaired people and to assist to the elder (world population is getting elder and not enough human resources to help the elder are available). In this sense automatic monitoring environments to detect anomalies, dangerous situations, falls, people with Alzheimer that requires reminders or to alarm the family if the Alzheimer patient goes out of the home… are potential applications. Also in social signal processing to detect group conversations, leadership, dominance, etc… in job interviews, to measure the quality of non-verbal communication, etc. I think we can define a very nice multimodal scenario that can contain people, objects and label social signals, objects, individual and collaborative gestures and behaviors, user-object interaction, where context information will be very usedul, and not only from visual but also audio and inertial and other biometric sensors, even using mobile signals (currently Barcelona is the SmartCities world capital and the Mobile World Capital). I think this can be a very useful scenario and with several sub-modules for the NSF project. People from social signal, information fusion, computer vision, machine learning can cooperate to solve different modules and provide a global solution to be tested in a very high impact challenge. Of course it is a very big project and many things require being discussed. We now have a Kinect 2 device offered by Microsoft (Kinect 2 will be public in the middle of next year) so we may be the first one to offer challenges with this new device. Just ideas… what do you think about that?
3) About the labeling task: This is not important for now, but it deserves to be discussed: labeling is one of the most arduous tasks in the organization of challenges. Antonio Torralba, a colleague from MIT (http://scholar.google.com/citations?user=8cxDHS4AAAAJ&hl=es)  designed the LabelMe project (see that it already has more than 1000 citations) (http://labelme.csail.mit.edu/Release3.0/). I do not know if you are familiar with that project. The goal was to open visual RGB data to the community so that anyone can login into the project and provide labels for objects in different regions of the images. In this sense big visual data is labeled by the whole CV community and then recognition analysis can be defined over this ground truth. I wonder if it could be possible to define a project with particular interfaces to open different kinds of data and define different ground truth categories that can be labeled by external people and that could be used for ChaLearn challenges. This is very complex project/idea, I know, since it requires dealing with different kind of data, ground truth labels, maybe different kind of interfaces, it needs to control the quality of annotations, the amount of data labeled, how to validate it if there exists real objects not labeled yet, etc… But Antonio project had very impact and labeling is a really hard task in challenge organizations… just an idea :)

- how to convince them to enter: they will learn, they will collaborate with multi-disciplinary teams, they will solve research and applied relevant problems with high social impact, they will be proactive and can define their working team. 
- how to reward them: travel grants, economic prizes, certificates, allow to have publications and participate in the future as challenge organizers.
- how to make sure they learn something: coordination within the multi-disciplinary team will assure to acquire new knowledge about different disciplines. We can think about each member of the team to briefly fill some forms about the "design of their solution" from a Computer Engineering point of view, or even description of their methods. Codes from the different members of the team can be available to the rest of the team. 
- how to make sure they like it and do it again: we will introduce feedback and satisfaction questionaires, we will use the mailing list to update participants about the news in the platform. 

Extra ideas of Kristin Bennett:
The educational NSF grant to hit is the February 4th one for NSF  IUSE which is a new program that no one quite knows what it is.   In my experience getting funding in the first round of  new programs is easier because then you get to define what they want because they don't really know.  But I am totally booked with the educational grant that I already got until then and thus not too much help.STEM education is a national priority due to looming shortage of needed STEM workers
http://goals.performance.gov/node/38577

Lack of research experiences is one of the causes.
A contest is a research experience.  Figuring out how to make a contest accessible to non-researcher  is the tricky part which is why I thought your proposal was brilliant. 
It;s kind of like a golf scramble (teams of golfer where you only use the best shot of the team at each shot)  You can be a horrible golfer and still play and have fun.   Sometimes they even use your shot.