References

- B. Settles. Active Learning, 2012 (Comprehensive introduction to active learning fundamental).
- T. Lattimore and C. Szepesvári. Bandit Algorithms, 2020 (Textbook on multi-armed bandits and theory of exploration-exploitation).
- R. Sutton and A. Barto. Reinforcement Learning: An Introduction (2nd Ed.), 2018 (RL textbook covering MDPs, exploration, and reward learning).
- D. Li et al., A Survey on Deep Active Learning: Recent Advances and New Frontiers, TNNLS 2024 (Overview of deep AL methods, including query strategies and use of representations).

Deep active learning under realistic label budgets

Uncertainty Herding: One Active Learning Method for All Label Budgets (ICLR 2025)
Navigating the Pitfalls of Active Learning Evaluation (NeurIPS 2025)
Training-Free Neural Active Learning with Initialization-Robustness Guarantees (ICML 2023)
Optional readings
- SAAL: Sharpness-Aware Active Learning (ICML 2023)
- Enhancing Cost Efficiency in Active Learning with Candidate Set Query (TMLR 2025),
- Adaptive Batch Sizes for Active Learning (AISTATS 2024)

Model-aware synthetic data generation

Improving the Scaling Laws of Synthetic Data with Deliberate Practice (ICML 2025, oral)
Sample-Efficient Multi-Round Generative Data Augmentation for Long-Tail Instance Segmentation (NeurIPS 2025)
Dataset Distillation by Automatic Training Trajectories (ECCV 2024)

Query synthesis / active data acquisition

Iterative Teaching by Data Hallucination (AISTATS 2023)
Generative Active Learning for Image Synthesis Personalization (MM 2024)
Optional reading
- Semantic-aligned Query Synthesis for Active Learning (2025)
- Diffusion Active Learning: Towards Data-Driven Experimental Design in Computed Tomography (2025)

Sequence models for decision making
- Decision Transformer: Reinforcement Learning via Sequence Modeling (NeurIPS 2021)
- Offline Reinforcement Learning as One Big Sequence Modeling Problem (NeurIPS 2021)
- Mastering Diverse Domains through World Models — Hafner et al., 2023; later expanded in Nature 2025
- Direct Regret Optimization in Bayesian Optimization (ExAI 2025)

Representations learning and Deep Kernels in Bandits
- Contextual Gaussian Process Bandits with Neural Networks (NeurIPS 2023)
- PFNs4BO: In-Context Learning for Bayesian Optimization (ICML 2023)

Neural contextual bandits and efficient exploration
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees (NeurIPS 2022)
- Neural Contextual Bandits with Deep Representation and Shallow Exploration (ICLR 2022)
- Provably and Practically Efficient Neural Contextual Bandits (ICML 2023)
- Beyond Task Diversity: Provable Representation Transfer for Sequential Multi-Task Linear Bandits (NeurIPS 2024)

Active causual representation learning & RL
- Amortized Active Causal Induction with Deep Reinforcement Learning -- Annadani et al., NeurIPS 2024
- Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning -- Sontakke et al., ICML 2021
- BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning -- Lin et al., NeurIPS 2024
- Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations -- Yang et al., ICLR 2025

Meta-exploration & learning to explore
- Learning to Rank for Active Learning via Multi-Task Bilevel Optimization (UAI 2024)
- Algorithm Selection for Deep Active Learning with Imbalanced Datasets (NeurIPS 2023)
- Meta-Learning with Neural Bandit Scheduler (NeurIPS 2023). Very nice bridge from active selection to curriculum/task scheduling under uncertainty.

Goal-Conditioned On-Policy Reinforcement Learning (NeurIPS 2024). Good representative paper for goal sampling and online self-curriculum in RL.
A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals (ICLR 2025).
Diffusion Curriculum Reinforcement Learning (NeurIPS 2024).
Optional readings
- Prioritized Level Replay -- Jiang et al., ICML 2021
- Automatic Goal Generation for Reinforcement Learning Agents Carlos Florensa (Goal GAN) -- Florensa et al., ICML 2018
- Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design -- Dennis et al., NeurIPS 2020
- Unsupervised Environment Design for Task-Level Pairs — Furelos-Blanco et al., AAAI 2026

Statistical Curriculum Learning: An Elimination Algorithm Achieving an Oracle Risk (COLT 2024)
Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning (COLT 2024)

Active learning for LLM alignment
- Sample Efficient Preference Alignment in LLMs via Active Exploration (COLM 2025)
- Active Reward Modeling: Adaptive Preference Labeling for Large Language Model Alignment (ICML 2025)
- Less is More: Improving LLM Alignment via Preference Data Selection (NeurIPS 2025, spotlight)
- Optional readings

Efficient Process Reward Model Training via Active Learning (COLM 2025)
Larger or Smaller Reward Margins to Select Preferences for LLM Alignment? (ICML 2025)

Active in-context learning
- Active Learning Principles for In-Context Learning with Large Language Models (EMNLP Findings, 2023)
- CoverICL: Selective Annotation for In-Context Learning via Active Graph Coverage (EMNLP 2024)
- In-Context Learning with Iterative Demonstration Selection (EMNLP Findings 2024)

Page updated

Report abuse