My research investigates how to learn representations that simultaneously capture semantic content and remain invariant to distortions, enabling robust visual and multimodal intelligence and supporting applications such as watermarking and model attribution.Â