What is Item Response Theory?
- Item Response Theory (IRT) is a paradigm in psychometrics for estimating human latent ability in tests, widely adopted for educational purposes. A response of a student to a given item in a test depends on both the student ability and item characteristics, like its difficulty. Difficult items are those ones correctly solved by good students, while good students are those ones able to solve difficult items. This "chicken-and-egg" relationship is exploited in IRT to derive mathematical models for human abilities and item difficulties that are consistent to collected responses in tests. IRT can provide for each item an Item Characteristic Curve (ICC), which returns the expected response of an item based on abilities and the item's parameters. ICCs are usually parameterised logistics curves which are fit to the observed responses.
Why IRT in AI?
- IRT is widely adopted to assess human abilities and skills, based on carefully designed exams. Artificial Intelligence (AI) techniques have also "skills" which are evaluated in task-specific experiments using datasets, benchmark problems, test scenarios, and so on. An AI system is usually evaluated by simply averaging the systems' performance on a battery of chosen problems. We could hyphothesize that some AI problems are more difficult and discriminative than others and then, AI systems that perform well in hard problems could receive more attention. Additionally, a performance measure value obtained by the AI system on a problem can be seen as a random variable of the systems' latent skill and problem difficulty. In this sense, AI is a feasible domain, in which IRT can be explored. In IRT for AI evaluation, respondents are AI systems or techniques, while items are problems in the AI domain. ICCs can be estimated from experiments to relate AI latent ability and AI problem difficulty.
Where has IRT been applied in AI?
- Machine Learning
- F Martínez-Plumed, RBC Prudêncio, A Martínez-Usó, J Hernández-Orallo. Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence, 2019.
- Y Chen, RBC Prudêncio, T Diethe, P Flach. IRT: A New Item Response Model and its Applications. AISTATS 2019.
- F Martínez-Plumed, RBC Prudêncio, A Martínez-Usó, J Hernández-Orallo. Making sense of item response theory in machine learning. ECAI 2016.
- NLP
- JP Lalor, H Wu, H Yu. Building an evaluation scale using item response theory. Empirical Methods in NLP 2016.
- J Lalor, H Wu, T Munkhdalai, H Yu. Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study. Empirical Methods in NLP. 2018.
- Games
- F Martinez-Plumed, J Hernandez-Orallo. Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality. IEEE Transactions on Games. 2018