I Can't Believe It's Not Better:

Where Large Language Models Need to Improve