Resources

Datasets

I prepared these datasets using real data for evaluating adaptive classifiers. Feel free to use them for research purposes. Note, that ESS requires to give a certain acknowledgement to the data source.

Luxembourg dataset is constructing using European Social Survey data. Each instance is an individual. The attributes are formed from answers to the survey questionnaire. The labels indicate high or low internet usage. The dataset has time stamps, the questionnaires are collected over 5 years period. It is expected that internet usage is changing over time (concept drift).
DATA description
Citation: please cite BOTH references [1] as the data source and [2] for creating this dataset. Also, please acknowledge Norwegian Social Science Data Services (NSD) as required by ESS policy and inform them when the paper is published.
[1] R. Jowell and the Central Coordinating Team. European social survey 2002/2003; 2004/2005; 2006/2007. Technical Reports, London: Centre for Comparative Social Surveys, City University, 2003, 2005, 2007.
[2] I. Žliobaitė. Combining time and space similarity for small size learning under concept drift. In Proc. of ISMIS 2009 - 18th International Symposium on Methodologies for Intelligent Systems, volume 5722 of LNCS, pages 412–421, 2009.

Chess.com dataset is constructed using the data obtained from chess.com portal. The data consists of game records of one player over a period from 2007 December to 2010 March. A player has a rating, which changes depending on his/her results achieved (the higher is the rating, the stronger is the player). A payer is developing skills over time, besides engages into different types of tournaments and competitions. The rating and the type of game determine how the system selects an opponent. This is where the concept drift is expected. The task is to predict if the player will win or lose based on the setting. There is natural problem of delayed labeling, the winner is known only after the game is .finished. In turn based chess one game might last even for several months.
DATA description
Citation: please cite reference [3] for creating this dataset.
[3] I. Žliobaitė. Change with Delayed Labeling: when is it detectable?. Proc. of 2010 IEEE int. conf. on Data Mining Workshops, the 5th Int. workshop on Chance Discovery (IWCD10) at ICDM'10. 2010.

Code

Controled permutations for testing adaptive classifiers, DS'11 paper
Conditional non-discrimination, ICDM'11 paper