Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment

Michael Chang*, Sidhant Kaushik*, Sergey Levine, Tom Griffiths

To build learning algorithms that transfer efficiently, we need independently modifiable components.

To get independently modifiable components, we need credit assignment mechanisms whose causal structure make independent modification possible.

Modularity for Dynamic Systems

Modularity is algorithmic independence of mechanisms.

A dynamic system encompasses a sequence of modifications to the mechanisms.

Modularity in a dynamic system is the conditional algorithmic independence of mechanisms, conditioned on its previous state.

Independent Credit Assignment

Learning algorithms are dynamic systems.

Modularity requires independent feedback (e.g. gradients).

By formally treating learning algorithms as algorithmic causal graphs, we can directly test, without any training, whether the causal structure of the credit assignment mechanism makes it possible to modify the learnable mechanisms independently by inspecting whether the gradients it produces are d-separated by the previous state of the learner's weights before the credit assignment update.

Modularity of Reinforcement Learning Algorithms

Theoretical question: Which reinforcement learning algorithms produce independent gradients?

Empirical question: Does modularity improve transfer efficiency?

Modular algorithm - CVS:

Chang, Michael, et al. "Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions." ICML (2020).

Non-modular algorithm - PPO:

Schulman, John, et al. "Proximal policy optimization algorithms." (2017).

Page updated

Google Sites

Report abuse