Causality reading jul 20

Post date: Jul 27, 2016 3:19:25 PM

First paper

Domain Adaptation under Target and Conditional Shift

Only unlabeled data for the test domain

39: first slide

40: Traditionally people studied covariate shift

P(Y|X) fixed or f(X)+noise

P(X) changes

If we have an X distribution with a narrow support that shifts, if f(X) of complex/not linear, a linear approximation on each narrow support may be good for each support, but not when the support changes

People use a simple method, e.g. a kernel method with large kernel width

41: In many practical cases the data generating model is

P(Y) the P(X|Y)

42: Subject is cause and man is effect. If we know the shadow, we can reconstruct the man and the light source (the cause and the mechanism that generated the effet)

but if your know the man, you cannot reconstruct the shadow and the light source

So: knowing the effect is a lot more informative than knowing the cause

--

43:

So if the data generative model is X then Y|X, you cannot recover any information about the mechanism change if the f function

44: under midl assumptions, you can reconstruct the transformation from the marginal distribution of X

man = Y

mechanism = light

shadow = X

We see the shadow (a change in the shadow) the joint distribution of X in target domain

We allow P(Y) and P(Y|X) to change (We also allow the mechanism to change)

we have all the information in the source domain

we have X in the target domain

The goal is to recover P(Y) and P(Y|X) in the target domain

Analogy:

in target domain, we see a new shadow (new X), from that we infer the man and light source

In this work: use location and scale transformation as assumption for transformation

--

45:

apply a transformation so that you can mimic the X distribution on the target domain

The causal rules matter for domain adaptation

If you assume “covariate shift” it does not work as well (20% error) than “conditional shift” (10% error)

=========

Second paper

Multi-Source Domain Adaptation: A Causal View

Distribution weighted decision rule can be used to model the optimal prediction

All possible causal models and give the best solution

Case Y the P(X|Y)

assume a mechanism of shift more complex than location and scale

Mixture model assumption

Vs and Ws are latent factors

====

Third paper

Domain Adaptation with Conditional Transferable Components

transfer component

the features have the same distribution across domain

there is a phi transform in X space

the conditional distribution is the same

Assumption is that if the marginal distributions are the same the conditional distributions are the same too.

Previous papers met this assumption without justification

With causal assumption this is justified

Summary from Kun:

1. The first two papers assume all features are useful and transferable.

This might not be the case, so we have to find a transformation (or

subset) of the features for transfer.

2. Previous work finds the transformation of the features which has the

same marginal distribution for transfer; here we argue that it is more

natural to do transfer learning with the transformation of the features

*whose conditional distribution given the target (i.e., causal mechanism)*

remains invariant or changes in simple ways. That is, we do transfer

learning with *conditional* transferable components.

- We gave learning guarantees.

- As explained today, from the causal view we can justify the assumption

made previously in the classical transfer component analysis; furthermore,

we show that *generally speaking, they fail to find the conditional

transferable components*.