Living on the edge: productivity gain thresholds in machine translation evaluation metrics