Supplementary Material: Together Recognizing, Localizing and Summarizing Actions in Egocentric Videos