On Retrieval of Long Audios with Complex Text Queries
Interspeech 2025
Interspeech 2025
News
Our paper is accepted to Interspeech 2025 🎉
Codes and benchmarks are available ✅
https://www.isca-archive.org/interspeech_2025/yang25n_interspeech.html
💻 A thunderstorm rumbles in the background. A passing truck amplifies faint traffic noise, while a band plays outdoors with people talking. A heavy footstep starts slowly, and suddenly someone shakes a chain-locked gate.
Audio Length: 121 secondsÂ
Query Length: 232 characters
💻 A machine whines rhythmically as a car drives by a wet pavement and a boat horn bellows, someone walks in a beeping grocery store while another walks on a leafy path, with heavy rain pouring continuously.Â
Audio Length: 123 secondsÂ
Query Length: 205 characters
💻 A person shreds paper and lights a match as a band transitions from a slow to fast tempo, someone is working on wood. Far away, ocean waves ebb and flow, and an ape bellows amid chirping birds.Â
Audio Length: 108 secondsÂ
Query Length: 193 characters