Published at ICML 2025.
Code is available.
Figure W1: AMF with neural policies in MT10 from state inputs. We evaluate both mismatched (center) and non-mismatched settings (right). In the former case, we only provide demonstrations for 5/10 tasks during pre-training; in the latter, we distribute demonstrations across all tasks. We report means and 90% simple bootstrap confidence intervals for average success rate during fine-tuning over 10 seeds. 2 demonstrations are collected at each fine-tuning iteration. Results are consistent with the existing evaluation on different Metaworld tasks (see Figure 4, second row, first two columns): AMF improves data-efficiency significantly in mismatched settings, while all methods are viable under no mismatch.
Abstract: Pre-trained generalist policies are rapidly gaining relevance in robot learning due to their promise of fast adaptation to novel, in-domain tasks. This adaptation often relies on collecting new demonstrations for a specific task of interest and applying imitation learning algorithms, such as behavioral cloning. However, as soon as several tasks need to be learned, we must decide which tasks should be demonstrated and how often? We study this multi-task problem and explore an interactive framework in which the agent adaptively selects the tasks to be demonstrated. We propose AMF, an algorithm to maximize multi-task policy performance under a limited demonstration budget by collecting demonstrations yielding the largest information gain on the expert policy. We derive performance guarantees for \method under regularity assumptions and demonstrate its empirical effectiveness to efficiently fine-tune neural policies in complex and high-dimensional environments.