Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment
To build learning algorithms that transfer efficiently, we need independently modifiable components.
To get independently modifiable components, we need credit assignment mechanisms whose causal structure make independent modification possible.
Modularity for Dynamic Systems
Modularity is algorithmic independence of mechanisms.
A dynamic system encompasses a sequence of modifications to the mechanisms.
Modularity in a dynamic system is the conditional algorithmic independence of mechanisms, conditioned on its previous state.
Independent Credit Assignment
Learning algorithms are dynamic systems.
Modularity requires independent feedback (e.g. gradients).
By formally treating learning algorithms as algorithmic causal graphs, we can directly test, without any training, whether the causal structure of the credit assignment mechanism makes it possible to modify the learnable mechanisms independently by inspecting whether the gradients it produces are d-separated by the previous state of the learner's weights before the credit assignment update.
Modularity of Reinforcement Learning Algorithms
Theoretical question: Which reinforcement learning algorithms produce independent gradients?
Empirical question: Does modularity improve transfer efficiency?
Modular algorithm - CVS:
Chang, Michael, et al. "Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions." ICML (2020).
Non-modular algorithm - PPO:
Schulman, John, et al. "Proximal policy optimization algorithms." (2017).