Aren Jansen

Senior Research Scientist @ Google Research

I am currently a Senior Research Scientist at Google Research, working in the Machine Hearing Group on machine learning for speech and audio processing. Before joining Google, I was a Senior Research Scientist at the Johns Hopkins University Human Language Technology Center of Excellence, an Assistant Research Professor in the John Hopkins Department of Electrical and Computer Engineering, and a member of the Center for Language and Speech Processing. My research has explored a wide range of speech and audio processing topics that involve unsupservised/semi-supervised representation learning, speech retrieval, content-based recommendation, latent structure discovery, time series modeling and analysis, and scalable algorithms for big data applications. These days I am focused on inventing new multimedia processing technologies made possible by vast amounts of data.

Here is my full CV, my Google scholar page, and my Google Research publications page.

Education

Ph.D. in Computer Science, Univ. of Chicago, 2008

M.S. in Computer Science, Univ. of Chicago, 2005

M.S. in Physics, Univ. of Chicago, 2003

B.A. in Physics, Cornell University, 2001

Publications

Journal and Conference Papers

2017

Large-Scale Audio Event Discovery in One Million YouTube Videos. Aren Jansen, Jort Gemmeke, Daniel Ellis, Xiaofeng Liu, Wade Lawrence, Dylan Freedman. Proceedings of ICASSP, 2017.

Audio Set: An Ontology and Human-Labeled Dataset for Audio Events. Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, Marvin Ritter. Proceedings of ICASSP, 2017.

CNN Architectures for Large-Scale Audio Classification. Shawn Hershey, Sourish Chaudhuri, Daniel Ellis, Jort Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, Kevin Wilson. Proceedings of ICASSP, 2017.

A Segmental Framework for Fully-Unsupervised Large-Vocabulary Speech Recognition. Herman Kamper, Aren Jansen, Sharon Goldwater. Computer Speech & Language 46, 154-174, 2017.

Scalable Out-of-Sample Extension of Graph Embeddings Using Deep Neural Networks. Aren Jansen, Greg Sell, and Vince Lyzinski. Pattern Recognition Letters, 2017.

Evaluating Low-Level Speech Features against Human Perceptual Data. Caitlyn Richter, Naomi H. Feldman, Harini Salgado, Aren Jansen. Transactions of the Association for Computational Linguistics, 2017.

2016

Unsupervised word segmentation and lexicon discovery using acoustic word embeddings. Herman Kamper, Aren Jansen, Sharon Goldwater. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 (4), 669-679 3, 2016.

Context-dependent point process models for keyword search and detection-based ASR. Chunxi Liu, Aren Jansen, Sanjeev Khudanpur. Proceedings of ICASSP, 2016.

The Zero Resource Speech Challenge 2015: Proposed Approaches and Results. Marteen Versteegh, Xavier Anguera, Aren Jansen, and Emmanuel Dupoux. Procedia Computer Science 81, 67-72, 2016.

A segmental framework for fully-unsupervised large-vocabulary speech recognition. Herman Kamper, Aren Jansen, and Sharon Goldwater. arXiv preprint arXiv:1606.06950, 2016.

A Framework for Evaluating Speech Representations. Caitlin Richter, Naomi H. Feldman, Harini Salgado, and Aren Jansen. Proceedings of CogSci, 2016

2015

Fully Unsupervised Small-Vocabulary Speech Recognition Using a Segmental Bayesian Model. Herman Kamper, Aren Jansen, and Sharon Goldwater. Proceedings of Interspeech, 2015. [pdf]

An Evaluation of Graph Clustering Methods for Unsupervised Term Discovery. Vince Lyzinski, Gregory Sell, and Aren Jansen. Proceedings of Interspeech, 2015. [pdf]

A Comparison of Neural Network Methods for Unsupervised Representation Learning on the Zero Resource Speech Challenge. Daniel Renshaw, Herman Kamper, Aren Jansen, and Sharon Goldwater. Proceedings of Interspeech, 2015. [pdf]

The Zero Resource Speech Challenge 2015. Maarten Versteegh, Roland Thiolliere, Thomas Schatz, Xuan Nga Cao, Xavier Anguera, Aren Jansen, and Emmanuel Dupoux. Proceedings of Interspeech, 2015. [pdf]

Segmental Acoustic Indexing for Zero Resource Keyword Search. Keith Levin, Aren Jansen, and Ben Van Durme. Proceedings of ICASSP, 2015. [pdf]

Unsupervised Neural Network Based Feature Extraction Using Weak Top-Down Constraints. Herman Kamper, Micha Elsner, Aren Jansen, and Sharon Goldwater. Proceedings of ICASSP, 2015. [pdf]

Content-Based Recommender Systems for Spoken Documents. Jonathan Wintrode, Greg Sell, Aren Jansen, Michelle Fox, Daniel Garcia-Romero, and Alan McCree. Proceedings of ICASSP, 2015. [pdf]

Using Zero-Resource Spoken Term Discovery for Ranked Retrieval. Jerome White, Douglas Oard, Aren Jansen, Jiaul Paik, and Rashmi Sankepally. Proceedings of NAACL HLT, 2015. [pdf]

A Test Collection for Spoken Gujarati Queries. Douglas Oard, Rashmi Sankepally, Jerome White, Aren Jansen, and Craig Harman. Proceedings of SIGIR, 2015. [pdf]

2014

Unsupervised Lexical Clustering of Speech Segments Using Fixed Dimensional Acoustic Embeddings. Herman Kamper, Aren Jansen, Simon King, and Sharon Goldwater. Proceedings of SLT, 2014. [pdf]

A Keyword Search System Using Open Source Software. Jan Trmal, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur, Pegah Ghahremani, Xiaohui Zhang, Chunxi Liu, Aren Jansen, Dietrich Klakow, David Yarowsky, and Florian Metze. Proceedings of SLT, 2014. [pdf]

Low Resource Open Vocabulary Keyword Search Using Point Process Models. Chunxi Liu, Aren Jansen, Guoguo Chen, Keith Kintzley, Jan Trmal and Sanjeev Khudanpur. Proceedings of Interspeech, 2014. [pdf]

Unsupervised Idiolect Discovery for Speaker Recognition. Aren Jansen, Daniel Garcia-Romero, Pascal Clark, and Jaime Hernandez-Cordero. Proceedings of ICASSP, 2014. [pdf]

Featherweight Phonetic Keyword Search for Conversational Speech. Keith Kintzley, Aren Jansen, and Hynek Hermansky. Proceedings of ICASSP, 2014. [pdf]

Bridging the Gap between Speech Technology and Natural Language Processing: An Evaluation Toolbox for Term Discovery Systems. Bogdan Ludusan, Maarten Versteegh, Aren Jansen, Guillaume Gravier, Xuan-Nga Cao, Mark Johnson and Emmanuel Dupoux. Proceedings of LREC, 2014. [pdf]

2013

Fixed-Dimensional Acoustic Embeddings of Variable-Length Segments in Low-Resource Settings. Keith Levin, Katharine Henry, Aren Jansen and Karen Livescu. Proceedings of ASRU, 2013. Student Paper Award Winner [pdf]

Text-to-Speech Inspired Duration Modeling for Improved Whole-Word Acoustic Models. Keith Kintzley, Aren Jansen and Hynek Hermansky. Proceedings of Interspeech, 2013. [pdf]

Semi-Supervised Manifold Learning Approaches for Spoken Term Verification. Atta Norouzian, Richard Rose, and Aren Jansen. Proceedings of Interspeech, 2013. [pdf]

Evaluating Speech Features with the Minimal-Pair ABX Task: Analysis of the Classifical MFC/PLP Pipeline. Thomas Schatz, Vijayaditya Peddinti, Francis Bach, Aren Jansen, Hynek Hermansky, and Emmanuel Dupoux. Proceedings of Interspeech, 2013. [pdf]

Weak Top-Down Constraints for Unsupervised Acoustic Model Training. Aren Jansen, Samuel Thomas, and Hynek Hermansky. Proceedings of ICASSP, 2013. [pdf]

A Summary of the 2012 CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition. Aren Jansen, Emmanuel Dupoux, Sharon Goldwater, Mark Johnson, Sanjeev Khudanpur, Kenneth Church, Naomi Feldman, Hynek Hermansky, Florian Metze, Richard Rose, Michael Seltzer, Pascal Clark, Ian McGraw, Balakrishnan Varadarajan, Erin Bennett, Benjamin Borschinger, Justin Chiu, Ewan Dunbar, Abdellah Fourtassi, David Harwath, Chia-ying Lee, Keith Levin, Atta Norouzian, Vijayaditya Peddinti, Rachael Richardson, Thomas Schatz, and Samuel Thomas. Proceedings of ICASSP, 2013. [pdf]

Frequency Offset Correction in Speech without Detecting Pitch. Pascal Clark, Harish Mallidi, Aren Jansen, and Hynek Hermansky. Proceedings of ICASSP, 2013. [pdf]

Zero Resource Graph-Based Confidence Estimation for Open Vocabulary Spoken Term Detection. Atta Norouzian, Richard Rose, Sina Hamidi Ghalehjegh, and Aren Jansen. Proceedings of ICASSP, 2013. [pdf]

Intrinsic Spectral Analysis. Aren Jansen and Partha Niyogi. IEEE Transactions on Signal Processing, 2013. [pdf]

2012

The JHU-HLTCOE Spoken Web Search System for MediaEval 2012. Aren Jansen, Benjamin Van Durme, and Pascal Clark. Proceedings of the MediaEval 2012 Workshop, 2012. [pdf]

Indexing Raw Acoustic Features for Scalable Zero Resource Search. Aren Jansen and Benjamin Van Durme. Proceedings of Interspeech, 2012. [pdf]

Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition. Aren Jansen, Samuel Thomas, and Hynek Hermansky. Proceedings of Interspeech, 2012. [pdf]

MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors. Keith Kintzley, Aren Jansen, and Hynek Hermansky. Proceedings of Interspeech, 2012. Student Paper Award Winner [pdf]

Inverting the Point Process Model for Fast Phonetic Keyword Search. Keith Kintzley, Aren Jansen, Ken Church, and Hynek Hermansky. Proceedings of Interspeech, 2012. [pdf]

Exploiting Discriminative Point Process Models for Spoken Term Detection. Atta Norouzian, Aren Jansen, Richard Rose, and Samuel Thomas. Proceedings of Interspeech, 2012. [pdf]

Data-Driven Posterior Features for Low Resource Speech Recognition Applications. Samuel Thomas, Sriram Ganapathy, Aren Jansen, and Hynek Hermansky. Proceedings of Interspeech, 2012. [pdf]

2011

Efficient Spoken Term Discovery Using Randomized Algorithms. Aren Jansen and Benjamin Van Durme. Proceedings of ASRU, 2011. [pdf]

Estimating Document Frequencies in a Speech Corpus. Damianos Karakos, Mark Dredze, Kenneth Church, Aren Jansen, and Sanjeev Khudanpur. Proceedings of ASRU, 2011. [pdf]

Towards Unsupervised Training of Speaker Independent Acoustic Models. Aren Jansen and Ken Church. Proceedings of Interspeech, 2011. [pdf]

Rapid Evaluation of Speech Representations for Spoken Term Discovery. Michael A. Carlin, Samuel Thomas, Aren Jansen, and Hynek Hermansky. Proceedings of Interspeech, 2011. [pdf]

Event Selection from Phone Posteriorgrams Using Matched Filters. Keith Kintzley, Aren Jansen, and Hynek Hermansky. Proceedings of Interspeech, 2011. [pdf]

Whole Word Discriminative Point Process Models. Aren Jansen. Proceedings of ICASSP, 2011. [pdf]

Speech Recognition with Segmental Conditional Random Fields: A Summary of the JHU CLSP 2010 Summer Workshop. Geoffrey Zweig, Patrick Nguyen,Dirk Van Compernolle, Kris Demuynck, Les Atlas, Pascal Clark, Greg Sell, Meihong Wang, Fei Sha, Hynek Hermansky, Damianos Karakos, Aren Jansen, Samuel Thomas, Sivaram G.S.V.S., Sam Bowman, and Justine Kao. Proceedings of ICASSP, 2011. [pdf]

2010

Point Process Models of Spectro-Temporal Modulation Events for Speech Recognition. Aren Jansen, Nima Mesgarani, and Partha Niyogi. Proceedings of the Asilomar Conference on Signals, Systems, and Computers, 2010. [pdf]

NLP on Spoken Documents without ASR. Mark Dredze, Aren Jansen, Glen Coppersmith and Ken Church. Proceedings of EMNLP, 2010. [pdf]

Towards Spoken Term Discovery at Scale with Zero Resources. Aren Jansen, Ken Church, and Hynek Hermansky. Proceedings of Interspeech, 2010. [pdf]

Detection-Based Speech Recognition with Sparse Point Process Models. Aren Jansen and Partha Niyogi. Proceedings of ICASSP, 2010. [pdf]

2006-2009

Robust Keyword Spotting with Rapidly Adapting Point Process Models. Aren Jansen and Partha Niyogi. Proceedings of Interspeech, 2009. [pdf]

Point Process Models for Spotting Keywords in Continuous Speech. Aren Jansen and Partha Niyogi. IEEE Transactions on Acoustics, Speech, and Signal Processing, 2009. [pdf]

Point Process Models for Event-Based Speech Recognition. Aren Jansen and Partha Niyogi. Speech Communication, 2009. [pdf]

Modeling the Temporal Dynamics of Distinctive Feature Landmark Detectors for Speech Recognition. Aren Jansen and Partha Niyogi. Journal of the Acoustical Society of America, Volume 124, Issue 3, September 2008 [pdf]

A Hierarchical Point Process Model for Speech Recognition. Aren Jansen and Partha Niyogi. Proceedings of ICASSP, 2008. [pdf]

Semi-Supervised Learning of Speech Sounds. Aren Jansen and Partha Niyogi. Proceedings of Interspeech, 2007. [pdf]

Intrinsic Fourier Analysis on the Manifold of Speech Sounds. Aren Jansen and Partha Niyogi. Proceedings of ICASSP, 2006. Student Paper Award Winner [pdf]

Technical Reports and Theses

An Experimental Evaluation of Keyword-Filler Hidden Markov Models. Aren Jansen and Partha Niyogi. Technical Report TR-2009-02. Department of Computer Science, University of Chicago, April 2009. [link]

Point Process Models for Spotting Keywords in Continuous Speech. Aren Jansen and Partha Niyogi. Technical Report TR-2008-09. Department of Computer Science, University of Chicago, September 2008. [link]

Point Process Models for Event-Based Speech Recognition. Aren Jansen and Partha Niyogi. Technical Report TR-2008-04. Department of Computer Science, University of Chicago, February 2008. [link]

A Probabilistic Speech Recognition Framework Based on the Temporal Dynamics of Distinctive Feature Landmark Detectors. Aren Jansen and Partha Niyogi. Technical Report TR-2007-07. Department of Computer Science, University of Chicago, June 2007. [link]

A Geometric Perspective on Speech Sounds. Aren Jansen and Partha Niyogi. Technical Report TR-2005-08. Department of Computer Science, University of Chicago, June 2005. [pdf]

Geometric and Landmark-Based Approaches to Speech Representation and Recognition. Aren Jansen. Ph.D. Thesis, Department of Computer Science, University of Chicago, August 2008. [pdf]

The Manifold Nature of Vowel Sounds. Aren Jansen. Master's Paper, Department of Computer Science, University of Chicago, April 2005. [pdf]

Software