How do various classification performance metrics compare?