John R. Hershey, Ph. D.

 


 

Bio

I did my Ph D. at UCSD in the Department of Cognitive Science, where I was a founding member of the Machine Perception Laboratory at the University of California San Diego. My thesis explores the use of generative graphical models for speech enhancement, face-tracking and combinations of the two. During my time at UCSD, I interned in the Machine Learning and Applied Statistics Group at Microsoft Research in Seattle, and at Mitsubishi Electric Research Lab in Boston. In 2004, I spent a year as a visiting researcher in the Speech Group at Microsoft Research. From 2005 to 2010 I was at IBM T. J. Watson Research Center in New York, where I was a research staff member in the Speech Algorithms and Engines group.  I am currently at Mitsubishi Electric Research Labs (MERL) in Boston.  

Publications 

MERL publications (since 2010)

Classic papers from IBM and before:

John Hershey, Steven Rennie, Peder A. Olsen, and Andy Aaron, "Audio Alchemy: getting computers to understand overlapping speech," Scientific American, April, 2011.

Peder A. Olsen, John Hershey, Steven Rennie and Vaibhava Goel, "A speech recognition solution to an ancient cryptography problem," IBM Technical Report RC25109, New York, USA, 2011.

Xin Chen, Xiaodong Cui, Jian Xue, Peder Olsen, John Hersey, Bowen Zhou and Yunxin Zhao, "Clustering of bootstrapped acoustic model with full covariance," ICASSP 2011, accepted, Prague, Czech Republic, 22-27 May 2011.

Steven Rennie, John R. Hershey, and Peder A. Olsen,
"Single Channel Multi-talker Speech Recognition: Graphical Modeling Approaches," 
IEEE Signal Processing Magazine, Special Issue on Graphical Models, Vol. 27:6, November 2010.
 
Pierre L. Dognin, John R. Hershey, Vaibhava Goel, and Peder A. Olsen, "Restructuring Exponential Family Mixture Models," Interspeech 2010, p. 62-65, Makuhari, Japan, 26-30 September 2010.

John R. Hershey, Peder A. Olsen and Steven J. Rennie, "Signal interaction and the devil function," Interspeech 2010, p. 334-337, Makuhari, Japan, 26-30 September 2010.


John R. Hershey, Steven Rennie, Peder A. Olsen and Trausti Kristjansson,
"Super-human multi-talker speech recognition: A graphical modeling approach,"
Computer, Speech and Language, 2009, Special issue: Speech Separation and Recognition.

Martin Cooke, John R. Hershey, Steven Rennie,
"The Speech Separation Challenge,"
Computer, Speech and Language, 2009, Special issue: Speech Separation and Recognition.

Pierre L. Dognin, John R. Hershey, Vaibhava Goel and Peder Olsen,
"Refactoring acoustic models using variational density approximation,"
ICASSP 2009, Taipei, Taiwan.   

Pierre L. Dognin, Vaibhava Goel, Peder A. Olsen and John R. Hershey,
"A fast, accurate approximation to log likelihood of Gaussian mixture models,"
ICASSP 2009,  Taipei, Taiwan.   

Steven J. Rennie, John R. Hershey and Peder A. Olsen,
"Single-channel speech separation and recognition using loopy belief propagation,"
ICASSP 2009, Taipei, Taiwan.

Tim K. Marks, John R. Hershey, and Javier R. Movellan,
"Tracking 3D Motion, Deformations, and Texture using a Conditionally Gaussian Generative Model."
IEEE Transactions On Pattern Analysis And Machine Intelligence, 2008.

John Hershey, Peder Olsen, Steven Rennie, David Nahamoo, Michael Picheny
Coping with Data Addiction in the Quest for Robust Speech Recognition
Keynote speech at HSCMA, 2008.

Steven J. Rennie, John Hershey and Peder Olsen,
Efficient Model-based Speech Separation and Denoising using Non-negative Subspace Analysis
ICASSP 2008, p. 1833-1836, March 30 - April 4, Las Vegas, Nevada.

Binit Mohanty, John R. Hershey, Peder A. Olsen, Suleyman S. Kozat and Vaibhava Goel
Optimizing Speech Recognition Grammars using a Measure of Similarity Between Hidden Markov Models
ICASSP 2008, p. 4593-4596, March 30 - April 4, Las Vegas, Nevada.

John Hershey and Peder Olsen,
Variational Bhattacharyya Divergence for Hidden Markov Models,
ICASSP 2008, p. 4557-4560, March 30 - April 4, Las Vegas, Nevada.

Jia-Yu Chen, John Hershey, Peder Olsen and Emmanuel Yashchin,
Accelerated Monte Carlo for Kullback-Leibler Divergence between Gaussian Mixture Models,
ICASSP 2008, p. 4553-4556, March 30 - April 4, Las Vegas, Nevada.

John R. Hershey, Peder A. Olsen and Steven J. Rennie,
Variational Kullback-Leibler Divergence for Hidden Markov Models,
ASRU 2007 , p. 323-328,December 9-13, Kyoto, Japan.

John Hershey and Peder Olsen,
Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models,
ICASSP 2007, IV, p. 317-320, April 15-20, 2007, Honolulu, Hawaii.

Jia-Yu Chen, Peder Olsen and John Hershey,
Word Confusability - Measuring Hidden Markov Model Similarity,
Interspeech 2007, p. 2089-2092, 27-31 August, 2007, Antwerp, Belgium.

Peder Olsen and John Hershey,
Bhattacharyya Error and Divergence using Variational Importance Sampling,
Interspeech 2007, p. 46-49, 27-31 August, 2007, Antwerp, Belgium.

John Hershey, Peder Olsen and Ramesh Gopinath,
Variational sampling approaches to word confusability,
Information Theory and Applications , February 2007, San Diego, USA.

John Hershey, Trausti Kristjansson, Steven Rennie and Peder Olsen,
Single Channel Speech Separation Using Layered Hidden Markov Models,
NIPS 2006 .

Steven Rennie, Peder Olsen, John Hershey and Trausti Kristjansson,
The Iroquois Model: Using Temporal Dynamics to Separate Speakers,
Interspeech 2006 ICSLP, ISCA Tutorial and Research Workshop on Statistical And Perceptual Audition, p. 24-30, September 16 2006, Pittsburgh, Pennsylvania.

Trausti Kristjansson, John Hershey, Peder Olsen, Steven Rennie and Ramesh Gopinath,
Super-human multi-talker speech recognition: The IBM 2006 speech separation challenge system,
Interspeech 2006 ICSLP, p. 97-100, 17-21 September, 2006, Pittsburgh, Pennsylvania.

Tim K. Marks, John Hershey, J. Cooper Roddey, Javier R. Movellan
Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
in Advances in Neural Information Processing Systems 17, 2005

John Hershey, Trausti Kristjansson, Zhengyou Zhang
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition,
ISCA Workshop on Statistical and Perceptual Audio Processing 2004

Javier Movellan, John Hershey, Tim Marks, and J. Cooper Roddey,
3D Tracking of Morphable Objects Using Conditionally Gaussian Nonlinear Filters,
CVPR Workshop on Generative Models for Vision 2004

Trausti Kristjansson, Hagai Attias, John Hershey,
Stereo Based 3D Tracking and Scene Learning, employing Particle Filtering within EM,
European Conference on Computer Vision (ECCV) 2004

Trausti Kristjansson, John Hershey, Hagai Attias,
Single Microphone Source Separation using High Resolution Signal Reconstruction,
IEEE International Conference on Acoustics, Speech and Signal Processing, 2004

Javier Movellan, Josh Susskind, John Hershey,
Large-Scale Convolutional HMMs for Real-Time Video Tracking,
Computer Vision and Pattern Recognition (CVPR) 2004

John Hershey, Hagai Attias, Nebojsa Jojic, Trausti Kristjansson,
Audio-Visual Graphical Models for Speech Processing,
IEEE International Conference on Acoustics, Speech and Signal Processing, 2004

Trausti Kristjansson, John Hershey,
High Resolution Signal Reconstruction,
Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding, 2003

John Hershey and Mike Casey,
Audio-Visual Sound Separation Via Hidden Markov Models,
in Advances in Neural Information Processing Systems 14, 2002

John Hershey and Javier R. Movellan,
Audio Vision: Using Audio-Visual Synchrony to Locate Sounds,
in Advances in Neural Information Processing Systems 12, 2000