B. Settles. Active Learning, 2012 (Comprehensive introduction to active learning fundamental).
T. Lattimore and C. Szepesvári. Bandit Algorithms, 2020 (Textbook on multi-armed bandits and theory of exploration-exploitation).
R. Sutton and A. Barto. Reinforcement Learning: An Introduction (2nd Ed.), 2018 (RL textbook covering MDPs, exploration, and reward learning).
D. Li et al., A Survey on Deep Active Learning: Recent Advances and New Frontiers, TNNLS 2024 (Overview of deep AL methods, including query strategies and use of representations).
Deep active learning under realistic label budgets
Uncertainty Herding: One Active Learning Method for All Label Budgets (ICLR 2025)
Navigating the Pitfalls of Active Learning Evaluation (NeurIPS 2025)
Training-Free Neural Active Learning with Initialization-Robustness Guarantees (ICML 2023)
Optional readings
SAAL: Sharpness-Aware Active Learning (ICML 2023)
Enhancing Cost Efficiency in Active Learning with Candidate Set Query (TMLR 2025),
Adaptive Batch Sizes for Active Learning (AISTATS 2024)
Model-aware synthetic data generation
Improving the Scaling Laws of Synthetic Data with Deliberate Practice (ICML 2025, oral)
Sample-Efficient Multi-Round Generative Data Augmentation for Long-Tail Instance Segmentation (NeurIPS 2025)
Dataset Distillation by Automatic Training Trajectories (ECCV 2024)
Query synthesis / active data acquisition
Iterative Teaching by Data Hallucination (AISTATS 2023)
Generative Active Learning for Image Synthesis Personalization (MM 2024)
Optional reading
Semantic-aligned Query Synthesis for Active Learning (2025)
Diffusion Active Learning: Towards Data-Driven Experimental Design in Computed Tomography (2025)
Sequence models for decision making
Decision Transformer: Reinforcement Learning via Sequence Modeling (NeurIPS 2021)
Offline Reinforcement Learning as One Big Sequence Modeling Problem (NeurIPS 2021)
Mastering Diverse Domains through World Models — Hafner et al., 2023; later expanded in Nature 2025
Direct Regret Optimization in Bayesian Optimization (ExAI 2025)
Representations learning and Deep Kernels in Bandits
Contextual Gaussian Process Bandits with Neural Networks (NeurIPS 2023)
PFNs4BO: In-Context Learning for Bayesian Optimization (ICML 2023)
Neural contextual bandits and efficient exploration
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees (NeurIPS 2022)
Neural Contextual Bandits with Deep Representation and Shallow Exploration (ICLR 2022)
Provably and Practically Efficient Neural Contextual Bandits (ICML 2023)
Beyond Task Diversity: Provable Representation Transfer for Sequential Multi-Task Linear Bandits (NeurIPS 2024)
Active causual representation learning & RL
Amortized Active Causal Induction with Deep Reinforcement Learning -- Annadani et al., NeurIPS 2024
Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning -- Sontakke et al., ICML 2021
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning -- Lin et al., NeurIPS 2024
Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations -- Yang et al., ICLR 2025
Meta-exploration & learning to explore
Learning to Rank for Active Learning via Multi-Task Bilevel Optimization (UAI 2024)
Algorithm Selection for Deep Active Learning with Imbalanced Datasets (NeurIPS 2023)
Meta-Learning with Neural Bandit Scheduler (NeurIPS 2023). Very nice bridge from active selection to curriculum/task scheduling under uncertainty.
Curriculum learning and automatic goal generation
Goal-Conditioned On-Policy Reinforcement Learning (NeurIPS 2024). Good representative paper for goal sampling and online self-curriculum in RL.
A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals (ICLR 2025).
Diffusion Curriculum Reinforcement Learning (NeurIPS 2024).
Optional readings
Prioritized Level Replay -- Jiang et al., ICML 2021
Automatic Goal Generation for Reinforcement Learning Agents Carlos Florensa (Goal GAN) -- Florensa et al., ICML 2018
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design -- Dennis et al., NeurIPS 2020
Unsupervised Environment Design for Task-Level Pairs — Furelos-Blanco et al., AAAI 2026
Learning theory
Statistical Curriculum Learning: An Elimination Algorithm Achieving an Oracle Risk (COLT 2024)
Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning (COLT 2024)
Active learning for LLM alignment
Sample Efficient Preference Alignment in LLMs via Active Exploration (COLM 2025)
Active Reward Modeling: Adaptive Preference Labeling for Large Language Model Alignment (ICML 2025)
Less is More: Improving LLM Alignment via Preference Data Selection (NeurIPS 2025, spotlight)
Optional readings
Efficient Process Reward Model Training via Active Learning (COLM 2025)
Larger or Smaller Reward Margins to Select Preferences for LLM Alignment? (ICML 2025)
Active in-context learning
Active Learning Principles for In-Context Learning with Large Language Models (EMNLP Findings, 2023)
CoverICL: Selective Annotation for In-Context Learning via Active Graph Coverage (EMNLP 2024)
In-Context Learning with Iterative Demonstration Selection (EMNLP Findings 2024)