Zero-shot Policy Learning with Spatial Temporal RewardDecomposition on Contingency-aware Observation