Aren Jansen
Research Scientist @ Google
I am currently a Research Scientist at Google, working in the Sound Understanding Group on machine learning for speech, language, music and audio processing. Before joining Google, I was a Research Scientist at the Johns Hopkins University Human Language Technology Center of Excellence, an Assistant Research Professor in the John Hopkins Department of Electrical and Computer Engineering, and a faculty member of the Center for Language and Speech Processing. My research has explored a wide range of speech, language, and audio processing topics that involve unsupervised/semi-supervised representation learning, information retrieval, content-based recommendation, latent structure discovery, time series modeling and analysis, and scalable algorithms for big data applications.
Here is my Google scholar page, and my Google Research publications page.
Ph.D. in Computer Science, Univ. of Chicago, 2008
M.S. in Computer Science, Univ. of Chicago, 2005
M.S. in Physics, Univ. of Chicago, 2003
B.A. in Physics, Cornell University, 2001
Journal and Conference Papers
MusicLM: Generating Music from Text. Andrea Agostinelli, Timo I Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank. arXiv:2301.11325, 2023.
V2Meow: Meowing to the Visual Beat via Music Generation. Kun Su, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, Aren Jansen, Yu Wang, Mauro Verzetti, Timo Denk. arXiv:2305.06594, 2023.
Dataset Balancing Can Hurt Model Performance. R. Channing Moore, Daniel PW Ellis, Eduardo Fonseca, Shawn Hershey, Aren Jansen, Manoj Plakal. Proceedings of ICASSP, 2023.
MAQA: A Multimodal QA Benchmark for Negation. Judith Yue Li, Aren Jansen, Qingqing Huang, Joonseok Lee, Ravi Ganti, Dima Kuzmin. arXiv:2301.03238, 2023.
MuLan: A Joint Embedding of Music Audio and Natural Language. Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li, Daniel PW Ellis. Proceedings of ISMIR, 2022.
Universal Paralinguistic Speech Representations Using Self-Supervised Conformers. Joel Shor, Aren Jansen, Wei Han, Daniel Park, Yu Zhang. Proceedings of ICASSP, 2022.
Text-driven Separation of Arbitrary Sounds. K Kilgour, B Gfeller, Q Huang, A Jansen, S Wisdom, M Tagliasacchi. Proceedings of Interspeech, 2022.
A Machine Learning-Based Objective Measure for ALS Disease Severity. Fernando G Vieira, Subhashini Venugopalan, Alan S Premasiri, Maeve McNally, Aren Jansen, Kevin McCloskey, Michael P Brenner, Steven Perrin. npj Digital Medicine, Vol. 5 Iss. 1, 2022.
Shared Computational Principles for Language Processing in Humans and Deep Language Models. Ariel Goldstein, Zaid Zada, Eliav Buchnik, Mariano Schain, Amy Price, Bobbi Aubrey, Samuel A Nastase, Amir Feder, Dotan Emanuel, Alon Cohen, Aren Jansen, Harshvardhan Gazula, Gina Choe, Aditi Rao, Catherine Kim, Colton Casto, Lora Fanda, Werner Doyle, Daniel Friedman, Patricia Dugan, Lucia Melloni, Roi Reichart, Sasha Devore, Adeen Flinker, Liat Hasenfratz, Omer Levy, Avinatan Hassidim, Michael Brenner, Yossi Matias, Kenneth A Norman, Orrin Devinsky, Uri Hasson. Nature Neuroscience, Vol. 25, Iss. 3, 2022.
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition. Yu Zhang, Daniel S Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu. arXiv:2109.13226, 2021
Attention Bottlenecks for Multimodal Fusion. Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun. Proceedings of NeurIPS, 2021
The Benefit of Temporally-Strong Labels in Audio Event Classification. Shawn Hershey, Daniel P. W. Ellis, Eduardo Fonseca, Aren Jansen, Caroline Liu, R. Channing Moore, Manoj Plakal. Proceedings of ICASSP, 2021
Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation. Scott Wisdom, Aren Jansen, Ron J. Weiss, Hakan Erdogan, John R, Hershey. Proceedings of WASPAA, 2021.
Self-Supervised Learning from Automatically Separated Sound Scenes. Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra. Proceedings of WASPAA, 2021. Student Paper Award Winner
A Convolutional Neural Network for Automated Detection of Humpback Whale Song in a Diverse, Long-term Passive Acoustic Dataset. Ann N. Allen, Matt Harvey, Lauren Harrell, Aren Jansen, Karlina P. Merkens, Carrie C. Wall, Julie Cattiau, Erin M. Oleson. Frontiers in Marine Science 8:165, 2021.
Thinking Ahead: Prediction in Context as a Keystone of Language in Humans and Machines. Ariel Goldstein, Zaid Zada, Eliav Buchnik, Mariano Schain, Amy Price, Bobbi Aubrey, Samuel A. Nastase, Amir Feder, Dotan Emanuel, Alon Cohen, Aren Jansen, Harshvardhan Gazula, Gina Choe, Aditi Rao, Catherine Kim, Colton Casto, Fanda Lora, Adeen Flinker, Sasha Devore, Werner Doyle, Patricia Dugan, Daniel Friedman, Avinatan Hassidim, Michael Brenner, Yossi Matias, Ken A. Norman, Orrin Devinsky, Uri Hasson. bioRxiv, 2020.12. 02.403477, 2021.
Into the Wild with Audioscope: Unsupervised Audio-Visual Separation of On-Screen Sounds. Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey. Proceedings of ICLR, 2020.
Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework with Loss Masking. Eduardo Fonseca, Shawn Hershey, Manoj Plakal, Daniel P. W. Ellis, Aren Jansen, R. Channing Moore. IEEE Signal Processing Letters 27:1235-1239, 2020.
Large-Scale Weakly-Supervised Content Embeddings for Music Recommendation and Tagging. Qingging Huang, Aren Jansen, Li Zhang, Daniel P. W. Ellis, Rif A. Saurous, John Anderson. Proceedings of ICASSP, 2020.
Improving Universal Sound Separation Using Sound Classification. Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis. Proceedings of ICASSP, 2020.
Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision. Aren Jansen, Daniel P. W. Ellis, Shawn Hershey, R. Channing Moore, Manoj Plakal, Ashok C. Popat, Rif A. Saurous. Proceedings of ICASSP 2020.
Towards Learning a Universal Non-Semantic Representation of Speech. Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Felix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, Yinnon Haviv. Proceedings of Interspeech, 2020.
Semantically Meaningful Attributes from Co-Listen Embeddings for Playlist Exploration and Expansion. Ayush Patwari, Nicholas Kong, Jun Wang, Ullas Gargi, Michele Covell, Aren Jansen. Proceedings of ISMIR, 2020.
Unsupervised Learning of Semantic Audio Representations. Aren Jansen, Manoj Plakal, Ratheet Pandya, Daniel P. W. Ellis, Shawn Hershey, Jiayang Liu, R. Channing Moore, Proceedings of ICASSP, 2018.
Large-Scale Audio Event Discovery in One Million YouTube Videos. Aren Jansen, Jort Gemmeke, Daniel Ellis, Xiaofeng Liu, Wade Lawrence, Dylan Freedman. Proceedings of ICASSP, 2017.
Audio Set: An Ontology and Human-Labeled Dataset for Audio Events. Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, Marvin Ritter. Proceedings of ICASSP, 2017.
CNN Architectures for Large-Scale Audio Classification. Shawn Hershey, Sourish Chaudhuri, Daniel Ellis, Jort Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, Kevin Wilson. Proceedings of ICASSP, 2017.
A Segmental Framework for Fully-Unsupervised Large-Vocabulary Speech Recognition. Herman Kamper, Aren Jansen, Sharon Goldwater. Computer Speech & Language 46, 154-174, 2017.
Scalable Out-of-Sample Extension of Graph Embeddings Using Deep Neural Networks. Aren Jansen, Greg Sell, and Vince Lyzinski. Pattern Recognition Letters, 2017.
Evaluating Low-Level Speech Features against Human Perceptual Data. Caitlyn Richter, Naomi H. Feldman, Harini Salgado, Aren Jansen. Transactions of the Association for Computational Linguistics, 2017.
Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings. Herman Kamper, Aren Jansen, Sharon Goldwater. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 (4), 669-679 3, 2016.
Context-Dependent Point Process Models for Keyword Search and Detection-Based ASR. Chunxi Liu, Aren Jansen, Sanjeev Khudanpur. Proceedings of ICASSP, 2016.
The Zero Resource Speech Challenge 2015: Proposed Approaches and Results. Marteen Versteegh, Xavier Anguera, Aren Jansen, and Emmanuel Dupoux. Procedia Computer Science 81, 67-72, 2016.
A Segmental Framework for Fully-Unsupervised Large-Vocabulary Speech Recognition. Herman Kamper, Aren Jansen, and Sharon Goldwater. Computer Speech and Language, 2016. Best Computer Speech & Language Research Paper 2016-2020
A Framework for Evaluating Speech Representations. Caitlin Richter, Naomi H. Feldman, Harini Salgado, and Aren Jansen. Proceedings of CogSci, 2016.
Fully Unsupervised Small-Vocabulary Speech Recognition Using a Segmental Bayesian Model. Herman Kamper, Aren Jansen, and Sharon Goldwater. Proceedings of Interspeech, 2015. [pdf]
An Evaluation of Graph Clustering Methods for Unsupervised Term Discovery. Vince Lyzinski, Gregory Sell, and Aren Jansen. Proceedings of Interspeech, 2015. [pdf]
A Comparison of Neural Network Methods for Unsupervised Representation Learning on the Zero Resource Speech Challenge. Daniel Renshaw, Herman Kamper, Aren Jansen, and Sharon Goldwater. Proceedings of Interspeech, 2015. [pdf]
The Zero Resource Speech Challenge 2015. Maarten Versteegh, Roland Thiolliere, Thomas Schatz, Xuan Nga Cao, Xavier Anguera, Aren Jansen, and Emmanuel Dupoux. Proceedings of Interspeech, 2015. [pdf]
Segmental Acoustic Indexing for Zero Resource Keyword Search. Keith Levin, Aren Jansen, and Ben Van Durme. Proceedings of ICASSP, 2015. [pdf]
Unsupervised Neural Network Based Feature Extraction Using Weak Top-Down Constraints. Herman Kamper, Micha Elsner, Aren Jansen, and Sharon Goldwater. Proceedings of ICASSP, 2015. [pdf]
Content-Based Recommender Systems for Spoken Documents. Jonathan Wintrode, Greg Sell, Aren Jansen, Michelle Fox, Daniel Garcia-Romero, and Alan McCree. Proceedings of ICASSP, 2015. [pdf]
Using Zero-Resource Spoken Term Discovery for Ranked Retrieval. Jerome White, Douglas Oard, Aren Jansen, Jiaul Paik, and Rashmi Sankepally. Proceedings of NAACL HLT, 2015. [pdf]
A Test Collection for Spoken Gujarati Queries. Douglas Oard, Rashmi Sankepally, Jerome White, Aren Jansen, and Craig Harman. Proceedings of SIGIR, 2015. [pdf]
Unsupervised Lexical Clustering of Speech Segments Using Fixed Dimensional Acoustic Embeddings. Herman Kamper, Aren Jansen, Simon King, and Sharon Goldwater. Proceedings of SLT, 2014. [pdf]
A Keyword Search System Using Open Source Software. Jan Trmal, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur, Pegah Ghahremani, Xiaohui Zhang, Chunxi Liu, Aren Jansen, Dietrich Klakow, David Yarowsky, and Florian Metze. Proceedings of SLT, 2014. [pdf]
Low Resource Open Vocabulary Keyword Search Using Point Process Models. Chunxi Liu, Aren Jansen, Guoguo Chen, Keith Kintzley, Jan Trmal and Sanjeev Khudanpur. Proceedings of Interspeech, 2014. [pdf]
Unsupervised Idiolect Discovery for Speaker Recognition. Aren Jansen, Daniel Garcia-Romero, Pascal Clark, and Jaime Hernandez-Cordero. Proceedings of ICASSP, 2014. [pdf]
Featherweight Phonetic Keyword Search for Conversational Speech. Keith Kintzley, Aren Jansen, and Hynek Hermansky. Proceedings of ICASSP, 2014. [pdf]
Bridging the Gap between Speech Technology and Natural Language Processing: An Evaluation Toolbox for Term Discovery Systems. Bogdan Ludusan, Maarten Versteegh, Aren Jansen, Guillaume Gravier, Xuan-Nga Cao, Mark Johnson and Emmanuel Dupoux. Proceedings of LREC, 2014. [pdf]
Fixed-Dimensional Acoustic Embeddings of Variable-Length Segments in Low-Resource Settings. Keith Levin, Katharine Henry, Aren Jansen and Karen Livescu. Proceedings of ASRU, 2013. Student Paper Award Winner [pdf]
Text-to-Speech Inspired Duration Modeling for Improved Whole-Word Acoustic Models. Keith Kintzley, Aren Jansen and Hynek Hermansky. Proceedings of Interspeech, 2013. [pdf]
Semi-Supervised Manifold Learning Approaches for Spoken Term Verification. Atta Norouzian, Richard Rose, and Aren Jansen. Proceedings of Interspeech, 2013. [pdf]
Evaluating Speech Features with the Minimal-Pair ABX Task: Analysis of the Classifical MFC/PLP Pipeline. Thomas Schatz, Vijayaditya Peddinti, Francis Bach, Aren Jansen, Hynek Hermansky, and Emmanuel Dupoux. Proceedings of Interspeech, 2013. [pdf]
Weak Top-Down Constraints for Unsupervised Acoustic Model Training. Aren Jansen, Samuel Thomas, and Hynek Hermansky. Proceedings of ICASSP, 2013. [pdf]
A Summary of the 2012 CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition. Aren Jansen, Emmanuel Dupoux, Sharon Goldwater, Mark Johnson, Sanjeev Khudanpur, Kenneth Church, Naomi Feldman, Hynek Hermansky, Florian Metze, Richard Rose, Michael Seltzer, Pascal Clark, Ian McGraw, Balakrishnan Varadarajan, Erin Bennett, Benjamin Borschinger, Justin Chiu, Ewan Dunbar, Abdellah Fourtassi, David Harwath, Chia-ying Lee, Keith Levin, Atta Norouzian, Vijayaditya Peddinti, Rachael Richardson, Thomas Schatz, and Samuel Thomas. Proceedings of ICASSP, 2013. [pdf]
Frequency Offset Correction in Speech without Detecting Pitch. Pascal Clark, Harish Mallidi, Aren Jansen, and Hynek Hermansky. Proceedings of ICASSP, 2013. [pdf]
Zero Resource Graph-Based Confidence Estimation for Open Vocabulary Spoken Term Detection. Atta Norouzian, Richard Rose, Sina Hamidi Ghalehjegh, and Aren Jansen. Proceedings of ICASSP, 2013. [pdf]
Intrinsic Spectral Analysis. Aren Jansen and Partha Niyogi. IEEE Transactions on Signal Processing, 2013. [pdf]
The JHU-HLTCOE Spoken Web Search System for MediaEval 2012. Aren Jansen, Benjamin Van Durme, and Pascal Clark. Proceedings of the MediaEval 2012 Workshop, 2012. [pdf]
Indexing Raw Acoustic Features for Scalable Zero Resource Search. Aren Jansen and Benjamin Van Durme. Proceedings of Interspeech, 2012. [pdf]
Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition. Aren Jansen, Samuel Thomas, and Hynek Hermansky. Proceedings of Interspeech, 2012. [pdf]
MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors. Keith Kintzley, Aren Jansen, and Hynek Hermansky. Proceedings of Interspeech, 2012. Student Paper Award Winner [pdf]
Inverting the Point Process Model for Fast Phonetic Keyword Search. Keith Kintzley, Aren Jansen, Ken Church, and Hynek Hermansky. Proceedings of Interspeech, 2012. [pdf]
Exploiting Discriminative Point Process Models for Spoken Term Detection. Atta Norouzian, Aren Jansen, Richard Rose, and Samuel Thomas. Proceedings of Interspeech, 2012. [pdf]
Data-Driven Posterior Features for Low Resource Speech Recognition Applications. Samuel Thomas, Sriram Ganapathy, Aren Jansen, and Hynek Hermansky. Proceedings of Interspeech, 2012. [pdf]
Efficient Spoken Term Discovery Using Randomized Algorithms. Aren Jansen and Benjamin Van Durme. Proceedings of ASRU, 2011. [pdf]
Estimating Document Frequencies in a Speech Corpus. Damianos Karakos, Mark Dredze, Kenneth Church, Aren Jansen, and Sanjeev Khudanpur. Proceedings of ASRU, 2011. [pdf]
Towards Unsupervised Training of Speaker Independent Acoustic Models. Aren Jansen and Ken Church. Proceedings of Interspeech, 2011. [pdf]
Rapid Evaluation of Speech Representations for Spoken Term Discovery. Michael A. Carlin, Samuel Thomas, Aren Jansen, and Hynek Hermansky. Proceedings of Interspeech, 2011. [pdf]
Event Selection from Phone Posteriorgrams Using Matched Filters. Keith Kintzley, Aren Jansen, and Hynek Hermansky. Proceedings of Interspeech, 2011. [pdf]
Whole Word Discriminative Point Process Models. Aren Jansen. Proceedings of ICASSP, 2011. [pdf]
Speech Recognition with Segmental Conditional Random Fields: A Summary of the JHU CLSP 2010 Summer Workshop. Geoffrey Zweig, Patrick Nguyen,Dirk Van Compernolle, Kris Demuynck, Les Atlas, Pascal Clark, Greg Sell, Meihong Wang, Fei Sha, Hynek Hermansky, Damianos Karakos, Aren Jansen, Samuel Thomas, Sivaram G.S.V.S., Sam Bowman, and Justine Kao. Proceedings of ICASSP, 2011. [pdf]
Point Process Models of Spectro-Temporal Modulation Events for Speech Recognition. Aren Jansen, Nima Mesgarani, and Partha Niyogi. Proceedings of the Asilomar Conference on Signals, Systems, and Computers, 2010. [pdf]
NLP on Spoken Documents without ASR. Mark Dredze, Aren Jansen, Glen Coppersmith and Ken Church. Proceedings of EMNLP, 2010. [pdf]
Towards Spoken Term Discovery at Scale with Zero Resources. Aren Jansen, Ken Church, and Hynek Hermansky. Proceedings of Interspeech, 2010. [pdf]
Detection-Based Speech Recognition with Sparse Point Process Models. Aren Jansen and Partha Niyogi. Proceedings of ICASSP, 2010. [pdf]
Robust Keyword Spotting with Rapidly Adapting Point Process Models. Aren Jansen and Partha Niyogi. Proceedings of Interspeech, 2009. [pdf]
Point Process Models for Spotting Keywords in Continuous Speech. Aren Jansen and Partha Niyogi. IEEE Transactions on Acoustics, Speech, and Signal Processing, 2009. [pdf]
Point Process Models for Event-Based Speech Recognition. Aren Jansen and Partha Niyogi. Speech Communication, 2009. [pdf]
Modeling the Temporal Dynamics of Distinctive Feature Landmark Detectors for Speech Recognition. Aren Jansen and Partha Niyogi. Journal of the Acoustical Society of America, Volume 124, Issue 3, September 2008 [pdf]
A Hierarchical Point Process Model for Speech Recognition. Aren Jansen and Partha Niyogi. Proceedings of ICASSP, 2008. [pdf]
Semi-Supervised Learning of Speech Sounds. Aren Jansen and Partha Niyogi. Proceedings of Interspeech, 2007. [pdf]
Intrinsic Fourier Analysis on the Manifold of Speech Sounds. Aren Jansen and Partha Niyogi. Proceedings of ICASSP, 2006. Student Paper Award Winner [pdf]
Technical Reports and Theses
An Experimental Evaluation of Keyword-Filler Hidden Markov Models. Aren Jansen and Partha Niyogi. Technical Report TR-2009-02. Department of Computer Science, University of Chicago, April 2009. [link]
Point Process Models for Spotting Keywords in Continuous Speech. Aren Jansen and Partha Niyogi. Technical Report TR-2008-09. Department of Computer Science, University of Chicago, September 2008. [link]
Point Process Models for Event-Based Speech Recognition. Aren Jansen and Partha Niyogi. Technical Report TR-2008-04. Department of Computer Science, University of Chicago, February 2008. [link]
A Probabilistic Speech Recognition Framework Based on the Temporal Dynamics of Distinctive Feature Landmark Detectors. Aren Jansen and Partha Niyogi. Technical Report TR-2007-07. Department of Computer Science, University of Chicago, June 2007. [link]
A Geometric Perspective on Speech Sounds. Aren Jansen and Partha Niyogi. Technical Report TR-2005-08. Department of Computer Science, University of Chicago, June 2005. [link]
Geometric and Landmark-Based Approaches to Speech Representation and Recognition. Aren Jansen. Ph.D. Thesis, Department of Computer Science, University of Chicago, August 2008. [pdf]
The Manifold Nature of Vowel Sounds. Aren Jansen. Master's Paper, Department of Computer Science, University of Chicago, April 2005. [pdf]