Multimodal context-aware estimation of social signals