On the Reliability and Explainability of Language Models for Program Generation