Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning