Open-Source

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs

"Can Omni-MLLMs reasoning like humans from sight and sound"?

AVI-Bench will be continuously maintained as a long-term, open community resource to advance cutting-edge research and technological progress in human-like audio-visual intelligence.

Evolution: Toward Human-like Audio-Visual Intelligence

Task Adaptive

Models demonstrate effective overall performance across a wide range of audio-visual tasks.

Modal Adaptive

Models demonstrate strong performance on both audio and visual modalities.

Stage Adaptive

Models illustrate strong performance on both perception and understand for better audio-visual reasoning.

Domain Adaptive

Models show human-like domain generalization.

Page updated

Google Sites

Report abuse