Causality reading jul 20
Post date: Jul 27, 2016 3:19:25 PM
First paper
Domain Adaptation under Target and Conditional Shift
Only unlabeled data for the test domain
39: first slide
40: Traditionally people studied covariate shift
P(Y|X) fixed or f(X)+noise
P(X) changes
If we have an X distribution with a narrow support that shifts, if f(X) of complex/not linear, a linear approximation on each narrow support may be good for each support, but not when the support changes
People use a simple method, e.g. a kernel method with large kernel width
—
41: In many practical cases the data generating model is
P(Y) the P(X|Y)
—
42: Subject is cause and man is effect. If we know the shadow, we can reconstruct the man and the light source (the cause and the mechanism that generated the effet)
but if your know the man, you cannot reconstruct the shadow and the light source
So: knowing the effect is a lot more informative than knowing the cause
--
43:
So if the data generative model is X then Y|X, you cannot recover any information about the mechanism change if the f function
—
44: under midl assumptions, you can reconstruct the transformation from the marginal distribution of X
man = Y
mechanism = light
shadow = X
We see the shadow (a change in the shadow) the joint distribution of X in target domain
We allow P(Y) and P(Y|X) to change (We also allow the mechanism to change)
we have all the information in the source domain
we have X in the target domain
The goal is to recover P(Y) and P(Y|X) in the target domain
Analogy:
in target domain, we see a new shadow (new X), from that we infer the man and light source
In this work: use location and scale transformation as assumption for transformation
--
45:
apply a transformation so that you can mimic the X distribution on the target domain
The causal rules matter for domain adaptation
If you assume “covariate shift” it does not work as well (20% error) than “conditional shift” (10% error)
=========
Second paper
Multi-Source Domain Adaptation: A Causal View
Distribution weighted decision rule can be used to model the optimal prediction
All possible causal models and give the best solution
Case Y the P(X|Y)
assume a mechanism of shift more complex than location and scale
Mixture model assumption
Vs and Ws are latent factors
====
Third paper
Domain Adaptation with Conditional Transferable Components
transfer component
the features have the same distribution across domain
there is a phi transform in X space
the conditional distribution is the same
Assumption is that if the marginal distributions are the same the conditional distributions are the same too.
Previous papers met this assumption without justification
With causal assumption this is justified
Summary from Kun:
1. The first two papers assume all features are useful and transferable.
This might not be the case, so we have to find a transformation (or
subset) of the features for transfer.
2. Previous work finds the transformation of the features which has the
same marginal distribution for transfer; here we argue that it is more
natural to do transfer learning with the transformation of the features
*whose conditional distribution given the target (i.e., causal mechanism)*
remains invariant or changes in simple ways. That is, we do transfer
learning with *conditional* transferable components.
- We gave learning guarantees.
- As explained today, from the causal view we can justify the assumption
made previously in the classical transfer component analysis; furthermore,
we show that *generally speaking, they fail to find the conditional
transferable components*.