Adversarial Reinforcement Learning for Unsupervised Domain Adaptation (ARL)

In this paper, we propose to select the best feature pairs across the source and target domains using reinforcement learning, and align both the marginal and conditional distributions of two domains.


  • We propose a novel framework called adversarial reinforcement learning for unsupervised domain adaptation (ARL). Reinforcement learning is employed as a feature selector to identify the closest feature pair between source and target domain.

  • We also develop a new reward across both source and target domain. The proposed deep correlation reward on the target domain can guide the agent to learn the best policy and select the closest feature pair for both domains.

  • The proposed adversarial learning and domain distribution alignment together mitigate the discrepancy between source and target domain.

(a) Source t-SNE view

(c) Target t-SNE view

(b) Combination

T-SNE view of two images from source domain: Art (a) and target domain: Real world (c) in Office-Home dataset. Different colors in (a) and (c) represent different feature extractors, while red colors in (b) are points from source domain, and blue colors denote points in target domain. The distance between source and target image can be minimized by selecting a proper feature extractor. Source features from ShuffleNet and target features from NasnetMobile have the shortest distance.

Model

The overall progress of the proposed model. We first employ reinforcement learning as a feature selector, and the adversarial distribution alignment will provide the reward for the feature selector and then the features from the best pair of pre-trained models will be chosen as the input for the distribution alignment model ($S_{source}$ and $S_{target}$ are all possible states for the source and target domain; $S_{T_{(source, target)}}$ is the terminal state with two selected sources of features. $\mathcal{L_R}$ is the reinforcement learning loss; $\mathcal{L_S}$ is the source classification loss; $\mathcal{L_A}$ is the adversarial domain loss and $\mathcal{L_{DA}}$ is the domain alignment loss).

Results

Conclusion

  • Reinforcement learning is powerful and important to identify the most similar paired features for the source and target domain.

  • Adversarial learning and domain alignment also further improve the classification accuracy.