Users are given the results form two different LLMs and vote on which they think is better. The results inform a leader board that is able to be inspected.