Digitized comics suffer from a limited automatic content understanding which restricts online content search and reading applications. Hence, the comic book image analysis has been studied by the community of Document Analysis for more than 10 years. However, there still exist many challenges in this domain. While the comic elements such as panels, balloons, narrative text boxes, texts, comic style are well detected or segmented, the character detection, text recognition, and relation (between elements) analysis are still challenged. Moreover, complex tasks such as story understanding or scene analysis have not been well studied yet.


In this competition, we would like to tackle a problem of comic scene analysis: the emotion recognition of comic scenes. The competition task aims at extracting the emotions of comic characters in comic scenes (panels). The emotions of comic characters are described by the Visual information, the Text in speech Balloons or Captions and the Onomatopoeia (Comic drawings of words that phonetically imitates, resembles, or suggests the sound that it describes).

The task hence is a multi-modal analysis task which can take advantages from both fields: computer vision and natural language processing which are one of the main interests of the ICDAR community.

Challenging to distinguish multi-emotions among multi-characters: (Angry, Surprise), (Angry, Fear). It is noted that it is not easy to detect the tails of the balloons (dialog texts). The SOTA algorithm on this problem is far from perfect.
Challenging to detect multi-label emotions by incorporating multimodal features (visual, text, drawing). It is difficult to recognize the girl’s emotion based on the drawing, but the whole context does show her fear.