Image-to-speech synthesis results (figure 6 in the main paper)