Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation