Publications
Book
Shinji Watanabe, Marc Delcroix, Florian Metze, and John R. Hershey, "New Era for Robust Speech Recognition -Exploiting Deep Learning-," Springer (2017)
Shinji Watanabe and Jen-Tzung Chien, "Bayesian Speech and Language Processing," Cambridge University Press (2015)
Book chapter
Takahiro Shinozaki, Shinji Watanabe, Kevin Duh, ”Automated Development of DNN Based Spoken Language Systems Using Evolutionary Algorithms," in ”Deep Neural Evolution: Deep Learning with Evolutionary Computation," eds., Hitoshi Iba, Nasimul Noman, Springer, pp. 97--129, (2020)
Shinji Watanabe, Tuomas Virtanen, Dorothea Kolossa, "Application of Source Separation to Robust Speech Analysis and Recognition," in "Audio Source Separation and Speech Enhancement," eds. Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot, John Wiley & Sons Ltd (2018)
Marc Delcroix, Shinji Watanabe, and Tomohiro Nakatani, " Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing, " in “Robust Speech Recognition of Uncertain or Missing Data, “ eds. Dorothea Kolossa and Reinhold Haeb-Umbach, Springer, pp. 225--256, (2011)
PhD thesis
Speech recognition based on a Bayesian approach, Waseda University (2006)
Keynote talk
Shinji Watanabe, "Reproducing Large Speech Foundation Models," Eighth International Workshop on Symbolic-Neural Learning (SNL), Tokyo, Japan (2024)
Shinji Watanabe, "Unifying Speech Processing Applications with Speech Foundation Models," IEEE Automatic Speech Recognition and Understanding (ASRU), Taipei, Taiwan (2023)
Shinji Watanabe, "Explainable neural network for spoken language processing," iSpeech 2023, Jesraem, Israel (2023)
Shinji Watanabe, "Simplifying Automatic Speech Recognition with Non-Autoregressive Neural End-to-End Modeling," 2020 Accented English Speech Recognition Challenge Workshop (2020 AESRC Workshop) (2020)
Shinji Watanabe, "Tackling Multispeaker Conversation Processing based on Speaker Diarization and Multispeaker Speech Recognition," The VoxSRC Workshop (2020)
Shinji Watanabe, "End-to-End Speech Processing: From Pipeline to Integrated Architecture," Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2018, Honolulu, Hawaii, USA (2018)
Shinji Watanabe, "Neural End-to End Architectures for Speech Recognition in Adverse Environments," ITG Conference on Speech Communication, Oldenburg, Germany (2018)
Shinji Watanabe, "Neural End-to-End Architectures for Speech Recognition in Adverse Environments," 30th annual Conference on Computational Linguistics and Speech Processing (ROCLING), Hsinchu, Taiwan (2018)
Tutorial/Overview/Invited talk
Shi-Xiong Zhang , Yong Xu, Shinji Watanabe, and Dong Yu, "Recent Advances in Speech Processing, Multi-talker ASR and Diarization for Cocktail Party Problem," Interspeech'23 (2023)
Shinji Watanabe, "Explainable End-to-End Neural Networks for Far-Field Conversation Recognition," SANE 2022 - Speech and Audio in the Northeast (2022)
Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, and Katrin Kirchhoff, "Self-supervised Representation Learning for Speech Processing," Interspeech'22 (2022)
Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, and Katrin Kirchhoff, "Self-supervised Representation Learning for Speech Processing," NAACL'22 (2022)
Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, and Katrin Kirchhoff, "Self-supervised Representation Learning for Speech Processing," ICASSP'22 (2022)
Shinji Watanabe, "Machine listening with Explainable AI," IV 2021 Workshop on Explainable AI on Autonomous Driving (2021)
Shinji Watanabe, "Toward Unification of Various Speech Processing Applications based on End-to-End Neural Networks," Otogaku Symposium (2021)
Shinji Watanabe, Pengcheng Guo, and Sathvik Udupa, "Introduction of ESPnet, End-to-End Speech Processing Toolkit," MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages (MUCS) (2021)
Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda, and Shinji Watanabe, "T-9 - Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization," ICASSP'21 (2021)
Takaaki Hori Tomoki Hayashi, Shigeki Karita, and Shinji Watanabe, ”Advanced Methods for Neural End-to-End Speech Processing - Unification, Integration, and Implementation," Interspeech'19 (2019)
Shinji Watanabe, "Hybrid CTC/Attention Architecture for End-to-End Speech Recognition in Challenging Environments," INTERNATIONAL WORKSHOP ON SPEECH PROCESSING IN CHALLENGING ENVIRONMENTS (SPICE) (2018)
Shinji Watanabe, "Multi-microphone Speech Recognition --Recent Advances in Distant Speech Recognition," Speech and Language Symposium (2016)
Shinji Watanabe, Xiong Xiao, and Marc Delcrioix "Multi-microphone Speech Recognition," APSIPA ASC'16 (2016)
Shinji Watanabe, "Pushing the envelope at both ends: beamforming acoustic models and joint CTC/Attention schemes for end-to-end ASR," SANE 2016 - Speech and Audio in the Northeast (2016)
Marc Delcroix and Shinji Watanabe, "Recent Advances in Distant Speech Recognition," Interspeech'16 (2016)
Shinji Watanabe and Jen-Tzung Chien, " Bayesian Learning for Speech and Language Processing," ICASSP’12 2012, Tutorial T-10, (2012)
Shinji Watanabe, "Bayesian approaches in speech recognition," APSIPA ASC 2011, Plenary Overview Sessions, (2011)
Shinji Watanabe, “Recent Topics in Acoustic Modeling for Speech recognition: A Machine Learning Perspective based on Generative and Discriminative Approaches,'' IEICE Technical Report, SP2011-31, pp. 7–10 (2011.6)
Shinji Watanabe, “Recent topics in speech research,” Tutorial Session 1-2, the 2nd Young Researchers Forum on ALAGIN Speech Processing Session (2011.3)
Shinji Watanabe, “Bayesian training of acoustic models,” IPSJ SIG Technical Report, Vol.2009-SLP-77, No. 9, (2009. 7)
Shinji Watanabe, “Speech recognition based on a Bayesian approach and its application,” Japan Techno Center, Lecture on “Speech recognition technology –application and future trend” (2007.2.)
Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, and Naonori Ueda,“Speech recognition using variational Bayes,” 2005 Workshop on Information-Based Induction Sciences (IBIS2005), pp. 269―274, (2005.11)
Shinji Watanabe, “Selection of shared-state hidden Markov model structure using Bayesian criterion,'' IEICE Technical Report, SP2004-149, pp. 25–30 (2005)
Shinji Watanabe, “Speech recognition based on the Bayesian approach”, Tutorial lecture on IEICE Technical Report, SP2004-74, pp. 13--20, (2004)
Shinji Watanabe, “VBEC: robust speech recognition based on Bayesian approach," 5th young researcher workshop, ASJ Kansai section, I-2, (2003)
Review and overview paper
Rohit Prabhavalkar, Takaaki Hori, Tara N Sainath, Ralf Schlüter, and Shinji Watanabe, "End-to-End Speech Recognition: A Survey,” IEEE Transactions on Audio, Speech and Language Processing (2023)
Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, and Shinji Watanabe, "Self-Supervised Speech Representation Learning: A Review," IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1179-1210, Oct. 2022
Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe, "Findings of the IWSLT 2022 Evaluation Campaign," Proc. IWSLT’22, pp. 98–157 (2022)
Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, and Yonatan Bisk, "HEAR 2021: Holistic Evaluation of Audio Representations," In NeurIPS 2021 Competitions and Demonstrations Track (pp. 125--145). PMLR (2021)
Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam Subramanian, and Wangyou Zhang, "The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans," Proc. 2021 IEEE Data Science and Learning Workshop (DSLW 2021), pp. 1--6 (2021)
Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, and Neville Ryant, "CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings," Proc. CHiME 2020, pp. 1--7 (2020)
Reinhold Haeb-Umbach, Jahn Heymann, Lukas Drude, Shinji Watanabe, Marc Delcroix, and Tomohiro Nakatani, "Far-Field Automatic Speech Recognition," Proceedings of the IEEE, vol. 109.2: pp. 124--148 (2020)
Reinhold Haeb-Umbach, Shinji Watanabe, Tomohiro Nakatani, Michiel Bacchiani, Björn Hoffmeister, Michael L. Seltzer, Heiga Zen, and Mehrez Souden, "Speech Processing for Digital HomeAssistants," IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 111--124 (2019)
Takahiro Shinozaki and Shinji Watanabe, "Automatic speech recognition and black-box optimization," The Journal of the Acoustical Society of Japan, vol. 72, number 10, pp. 644--652, (2016). (in Japanese)
Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, John R. Hershey, and Bjoern Schuller, "Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR," Proc. Latent Variable Analysis and Signal Separation, pp. 91--99 (2015)
Shinji Watanabe and Atsushi Nakamura, "Bayesian Approaches to Acoustic Modeling: A Review," APSIPA Transactions on Signal and Information Processing, Volume 1, e5 (11pages), (2012)
Mark Gales, Shinji Watanabe, Eric Fossler-Lussier, “Structured Discriminative Models for Speech Recognition,” IEEE Signal Processing Magazine, vol. 29, no.6, pp. 70--82 (2012).
Shinji Watanabe and Atsushi Nakamura, "Tutorial: Discriminative Training in Speech Recognition," (in Japanese) The Journal of the Institute of Electronics, Information and Communication Engineers (IEICE), vol. 94(10), pp. 920--922, (2011)
Shinji Watanabe, "Acoustic models in speech recognition," The Journal of the Acoustical Society of Japan, vol. 66, number 1, pp. 18--22, (2010). (in Japanese)
Atsushi Nakamura, Shinji Watanabe, Takaaki Hori, Erik, McDermott, and Shigeru Katagiri, "Advanced Computational Models and Learning Theories for Spoken Language Processing,"IEEE Computational Intelligence Magazine, vol. 1, issue 2, pp. 5--9, (2006)
Shinji Watanabe, "Speech recognition based on a Bayesian approach," The Journal of the Acoustical Society of Japan, vol. 62, number 8, pp. 599--604, (2006). (in Japanese)
Journal (refereed)
Shih-Lun Wu, Chris Donahue, Shinji Watanabe, and Nicholas J. Bryan, "Music ControlNet: Multiple Time-varying Controls for Music Generation,” IEEE Transactions on Audio, Speech and Language Processing (2024)
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis,” IEEE Transactions on Audio, Speech and Language Processing (2024)
Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe, "TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation,” IEEE Transactions on Audio, Speech and Language Processing (2023)
Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, and Yu Tsao, "Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information,” IEEE Transactions on Audio, Speech and Language Processing (2023)
Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, and Abdelrahman Mohamed, "LegoNN: Building Modular Encoder-Decoder Models,” IEEE Transactions on Audio, Speech and Language Processing (2023)
Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, and Yohei Kawaguchi, ”Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors,” IEEE Transactions on Audio, Speech and Language Processing (2023)
Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur, "A Dilemma of Ground Truth in Noisy Speech Separation and an Approach to Lessen the Impact of Imperfect Training Data," Computer Speech & Language, vol. 77, pages 101410 (2023)
Zhong-Qiu Wang, Gordon Wichern, Shinji Watanabe, and Jonathan Le Roux, "STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency," IEEE Transactions on Audio, Speech and Language Processing, vol. 31, pp. 397--410 (2022).
Wangyou Zhang, Xuankai Chang, Christoph Boeddeker, Tomohiro Nakatani, Shinji Watanabe, and Yanmin Qian, "End-to-End Dereverberation, Beamforming, and Speech Recognition in A Cocktail Party," IEEE Transactions on Audio, Speech and Language Processing, vol. 30, pp. 3173-3188 (2022)
Zhong-Qiu Wang and Shinji Watanabe, "Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction," IEEE Signal Processing Letters, vol. 29, pp. 1422-1426 (2022)
Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, and Paola Garcia, "Encoder-Decoder Based Attractors for End-to-End Neural Diarization," IEEE Transactions on Audio, Speech and Language Processing, vol. 30, pp. 1493-1507 (2022)
Jing Shi, Xuankai Chang, Shinji Watanabe, and Bo Xu, "Train from scratch: single-stage joint training of speech separation and recognition," Computer Speech & Language, vol. 76, pages 101387 (2022)
Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu, "Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition," Computer Speech & Language, vol. 75, pages 101360 (2022)
Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu, "An Investigation of Neural Uncertainty Estimation for Target Speaker Extraction Equipped RNN Transducer," Computer Speech & Language, vol. 73, pages 101327 (2022)
Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, and Shinji Watanabe, Shrikanth Narayanan, "A Review of Speaker Diarization: Recent Advances with Deep Learning," Computer Speech & Language, vol. 72, pages 101317 (2022)
Zili Huang, Marc Delcroix, Leibny Paola Garcia, Shinji Watanabe, Desh Raj, and Sanjeev Khudanpur, "Joint Speaker Diarization and Speech Recognition Based on Region Proposal Networks," Computer Speech & Language, vol. 72, pages 101316 (2022)
Amir Hussein, Shinji Watanabe, and Ahmed Ali, "Arabic Speech Recognition by End-to-End, Modular Systems and Human," Computer Speech & Language, vol. 71, pages 101272 (2022)
Nanxin Chen, Shinji Watanabe, Jesus Villalba, Piotr Zelasko, and Najim Dehak, "Non-Autoregressive Transformer for Speech Recognition," IEEE Signal Processing Letters, vol. 28, pp 121 -- 125 (2020)
Wangyou Zhang, Xuankai Chang, Yanmin Qian, and Shinji Watanabe, "Improving End-to-End Single-Channel Multi-Talker Speech Recognition," IEEE Transactions on Audio, Speech and Language Processing, vol. 28, pp. 1385--1394 (2020)
Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, and Hynek Hermansky, "Multi-Stream End-to-End Speech Recognition," IEEE Transactions on Audio, Speech and Language Processing, vol. 28, pp. 646--655 (2020)
Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, and John R. Hershey, ”Phasebook and Friends: Leveraging discrete representations for source separation," IEEE Journal of Selected Topics in Signal Processing, pp. 370--382 (2019)
Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe, and Kevin Duh, "Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition," IEEE Transactions on Audio, Speech and Language Processing, vol. 27, no. 1, pp. 77--88 (2019)
Chungwei Lin, Tim K. Marks, Milutin Pajovic, Shinji Watanabe, and Chih-kuan Tung, "Model parameter learning using Kullback–Leibler divergence," Physica A, vol. 491, pp. 549-559 (2018)
Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey, and Xiong Xiao, "A Unified Architecture for Multichannel End-to-End Speech Recognition with Neural Beamforming," IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1274--1288 (2017)
Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey, and Tomoki Hayashi, "Hybrid CTC/attention architecture for end-to-end speech recognition," IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1240--1253 (2017)
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda, "Duration-Controlled LSTM for Polyphonic Sound Event Detection," IEEE Transactions on Audio, Speech and Language Processing, vol. 25, no. 11, pp. 2059--2070 (2017)
Takaaki Hori, Zhuo Chen, Hakan Erdogan, John Hershey, Jonathan Le Roux, Vikramjit Mitra, and Shinji Watanabe, “Multi-Microphone Speech Recognition Integrating Beamforming, Robust Feature Extraction, and Advanced DNN/RNN Backend, " Computer Speech & Language, pp. 401--418 (2017)
Jon Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe, "The third ’CHIME’ speech separation and recognition challenge: Analysis and outcomes," Computer Speech & Language, pp. 605--626 (2017)
Emmanuel Vincent, Shinji Watanabe, Aditya Nugraha, Jon Barker, Ricard Marxer, "An analysis of environment, microphone and data simulation mismatches in robust speech recognition," Computer Speech & Language, pp. 535--557 (2017) Best Review Paper Award
Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, and Tetsunori Kobayashi, "A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large scale data," APSIPA Transactions on Signal and Information Processing, vol. 4, e16 (2015).
Yuuki Tachioka, Tomohiro Narita, and Shinji Watanabe, "Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments," EURASIP Journal on Advances in Signal Processing, vol. 2015:52 (2015).
Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, Nobuaki Minematsu, and Keikichi Hirose, "Feature Enhancement with Joint Use of Consecutive Corrupted and Noise Feature Vectors with Discriminative Region Weighting," IEEE Transactions on Audio, Speech & Language Processing, vol. 21, no. 10, pp. 2172--2181 (2013).
Shinji Watanabe, Atsushi Nakamura, and Biing-Hwang (Fred) Juang, "Structural Bayesian linear regression for hidden Markov models," Journal of Signal Processing Systems, pp. 1--18 (2013).
Seong-Jun Hahm, Shinji Watanabe, Atsunori Ogawa, Masakiyo Fujimoto, Takaaki Hori, and Atsushi Nakamura, "Prior-Shared Feature and Model Space Speaker Adaptation by Consistently Employing MAP Estimation," Speech Communication, vol. 55, Issue 3, pp. 415--431 (2013).
Tomoharu Iwata and Shinji Watanabe, "Influence Relation Estimation based on Lexical Entrainment in Conversation," Speech Communication, vol. 55, issue 2, pp 329–-339 (2013)
Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Atsunori Ogawa, Takaaki Hori, Shinji Watanabe, Masakiyo Fujimoto, Takuya Yoshioka, Takanobu Oba, Yotaro Kubo, Mehrez Souden, Seong-Jun Hahm, and Atsushi Nakamura, "Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds," Computer Speech & Language, vol. 27, Issue 3, pp. 851--873 (2013).
Marc Delcroix, Shinji Watanabe, Tomohiro Nakatani, and Atsushi Nakamura, "Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer," Computer Speech & Language, vol. 27, Issue 1, pp. 350--368 (2013)
Yotaro Kubo, Shinji Watanabe, Takaaki Hori, and Atsushi Nakamura, "Structural Classification Methods based on Weighted Finite-State Transducers for Automatic Speech Recognition," IEEE Transactions on Audio, Speech & Language Processing, vol. 20, issue 8, pp. 2240--2251 (2012).
Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, and Nobuaki Minematsu, "Statistical Voice Conversion Based on Noisy Channel Model, " IEEE Transactions on Audio, Speech & Language Processing, vol. 20, issue 6, pp. 1784--1794 (2012). ASJ Itakura Award
Takuya Maekawa and Shinji Watanabe, “Training Data Selection with User's Physical Characteristics Data for Acceleration-based Activity Modeling,” Personal and Ubiquitous Computing, vol. 17, issue 3, pp. 451--463 (2013).
Masakiyo Fujimoto, Shinji Watanabe, and Tomohiro Nakatani, "Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection," Speech Communication, vol. 54(2), pp. 229--244 (2012)
Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, and Junji Yamato, "Low-latency Real-time Meeting Recognition and Understanding Using Distant Microphones and Omni-directional Camera," IEEE Transactions on Audio, Speech & Language Processing, vol. 20, issue 2, pp. 499--513 (2012).
Hideyuki Watanabe, Shigeru Katagiri, Kouta Yamada, Erik McDermott, Atsushi Nakamura, Shinji Watanabe, and Miho Ohsaki, "Minimum Classification Error Training Using Geometric-Margin-Based Misclassification Measure," (in Japanese) IEICE Transactions on Information and Systems, Vol.J94-D No.10 pp.1664--1675 (2011)
Hideyuki Watanabe, Shin'ichi TANIGUCHI, Shigeru Katagiri, Kouta Yamada, Atsushi Nakamura, Erik McDermott, Shinji Watanabe, and Miho Ohsaki, "Incremental Minimum Classification Error Training for Pattern Recognition," (in Japanese) IEICE Transactions on Information and Systems vol. J94-D, no. 4, pp. 702--711, (2011)
Shinji Watanabe, Tomoharu Iwata, Takaaki Hori, Atsushi Sako, and Yasuo Ariki, "Topic Tracking Language Model for Speech Recognition," Computer Speech & Language, vol. 25, issue 2, pp. 440--461, (2011)
Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Erik McDermott, and Tetsunori Kobayashi, "A Sequential Pattern Classifier Based on Hidden Markov Kernel Machine and Its Application to Phoneme Recognition," IEEE Journal of Selected Topics in Signal Processing Volume 4, Issue 6, pages 974--984 (2010). IEEE Signal Processing Society Japan Chapter Student Paper Award
David Cournapeau, Shinji Watanabe, Atsushi Nakamura, and Tatsuya Kawahara, "Online Unsupervised Classification with Model Comparison in the Variational Bayes Framework for Voice Activity Detection," IEEE Journal of Selected Topics in Signal Processing, volume 4, issue 6, pp. 1071--1083 (2010)
Tomoharu Iwata, Shinji Watanabe, Takeshi Yamada and Naonori Ueda, "Topic Tracking Model for Purchase Behavior Analysis," (in Japanese) IEICE Transactions on Information and Systems, vol. J93-D, No. 6, pp. 978--987 (2010)
Kenta Nishiki, Yousuke Izumi, Shinji Watanabe, Takuya Nishimoto, Nobutaka Ono, And Shigeki Sagayama, "Stereo-input speech recognition using sparseness-based blind source separation," (in Japanese) IEICE Transactions on Information and Systems vol. J93-D, no. 3, pp. 303--311, (2010)
Shinji Watanabe and Atsushi Nakamura, "Predictor-Corrector Adaptation by using Time Evolution System with Macroscopic Time Scale," IEEE Transactions on Audio, Speech & Language Processing, vol. 18, issue 2, pp. 395--406 (2010)
Marc Delcroix, Tomohiro Nakatani, and Shinji Watanabe, "Static and dynamic variance compensation for recognition of reverberant speech with dereverberation pre-processing, " IEEE Transactions on Audio, Speech & Language Processing, vol. 17, issue 2, pp. 324--334, (2009)
Shinji Watanabe and Atsushi Nakamura, "Speech recognition based on Student's t-distribution derived from total Bayesian framework," IEICE Transactions on Information and Systems, vol.E89-D, no. 3, pp. 970--980, (2006)
Shinji Watanabe, Atsushi Sako and Atsushi Nakamura, "Automatic Determination of Acoustic Model Topology using Variational Bayesian Estimation and Clustering for Large Vocabulary Continuous Speech Recognition," IEEE Transactions on Speech and Audio Processing,, vol. 14, issue 3, pp. 855--872, (2006). (received the TELECOM System Technology Award from the Telecommunications Advancement Foundation in 2006)
Shinji Watanabe and Atsushi Nakamura, "Acoustic Model Adaptation based on Coarse/Fine Training of Transfer Vectors," (in Japanese), Information Technology Letters (2004)
Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, and Naonori Ueda, "Variational Bayesian Estimation and Clustering for Speech Recognition," IEEE Transactions on Speech and Audio Processing, vol. 12, pp. 365--381, (2004)
Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, and Naonori Ueda, "Selection of Shared-State Hidden Markov Model Structure Using Bayesian Criterion,"IEICE D-II, vol. J86-D-II, no. 6, pp. 776--786, (2003), (received the Best Paper Award from IEICE. The English translation version is in IEICE Transactions on Information and Systems, vol.E88-D, no. 1, pp. 1--9, (2005))
Hisakazu Minakata and Shinji Watanabe, "Solar Neutrinos and Leptonic CP Violation," Phys. Lett. B, 468, p. 256, (1999).
International Conference and Workshop (refereed)
Darshan Prabhu, Yifan Peng, Preethi Jyothi, and Shinji Watanabe, "Multi-Convformer: Extending Conformer with Multiple Convolution Kernels," Proc. Interspeech'24 (accepted)
Hyung Yong Kim, Byeong-Yeol Kim, Yunkyu Lim, Jihwan Park, Shukjae Choi, Yooncheol Ju, Jinseok Park, Youshin Lim, Seung Woo Yu, Hanbin Lee, and Shinji Watanabe, "Self-training ASR Guided by Unsupervised ASR Teacher," Proc. Interspeech'24 (accepted)
Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Sun, and Shinji Watanabe, "MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model," Proc. Interspeech'24 (accepted)
Jiatong Shi, Shi-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, and Shinji Watanabe, "ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets," Proc. Interspeech'24 (accepted)
Tejes Srivastava, Jiatong Shi, William Chen, and Shinji Watanabe, "EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios," Proc. Interspeech'24 (accepted)
Kwangyoun Kim, Suwon Shon, Yi-Te Hsu, Prashant Sridhar, Karen Livescu, and Shinji Watanabe, "Convolution-Augmented Parameter-Efficient Fine-Tuning for Speech Recognition," Proc. Interspeech'24 (accepted)
Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, and Shinji Watanabe, "On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models," Proc. Interspeech'24 (accepted)
Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, and Bhiksha Raj, "Towards Unified Evaluation of Continual Learning in Spoken Language Understanding," Proc. Interspeech'24 (accepted)
Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, and Qin Jin, "The Interspeech 2024 Challenge on Speech Processing Using Discrete Units," Proc. Interspeech'24 (accepted)
Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, and Hiroshi Saruwatari, "SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics," Proc. Interspeech'24 (accepted)
Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Siddhant Arora, Junichi Yamagishi, and Joon Son Chung, "To what extent can ASV systems naturally defend against spoofing attacks?" Proc. Interspeech'24 (accepted)
Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Alex Gichamba, Barry-John Theobald, Ahmed Hussen Abdelaziz, and Shinji Watanabe, "ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models," Proc. Interspeech'24 (accepted)
Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, and Karen Livescu, "DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding," Proc. Interspeech'24 (accepted)
Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, and Yanmin Qian, "Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement," Proc. Interspeech'24 (accepted)
Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe, "Contextualized End-to-End Automatic Speech Recognition with Intermediate Biasing Loss," Proc. Interspeech'24 (accepted)
Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, and Yanmin Qian, "URGENT Challenge: Universality, Robustness, and Generalizability for speech EnhancemeNT," Proc. Interspeech'24 (accepted)
Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, and Barry-John Theobald, "Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?" Proc. Interspeech'24 (accepted)
Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, and Shinji Watanabe, "OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer," Proc. Interspeech'24 (accepted)
Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, and Shinji Watanabe, "Self-Supervised Speech Representations are More Phonetic than Semantic," Proc. Interspeech'24 (accepted)
Yoshiaki Bando, Tomohiko Nakamura, and Shinji Watanabe, "Neural Blind Source Separation and Diarization for Distant Speech Recognition," Proc. Interspeech'24 (accepted)
Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe, "Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model," Proc. Interspeech'24 (accepted)
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe, "Decoder-only Architecture for Streaming End-to-end Speech Recognition," Proc. Interspeech'24 (accepted)
Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, and Shinji Watanabe, "Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting," Proc. Interspeech'24 (accepted)
Julius Richter, Yi-Chiao Wu, Steven Krenn, Alexander Richard, Simon Welker, Bunlong Lay, Shinji Watanabe, and Timo Gerkmann, "EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation," Proc. Interspeech'24 (accepted)
Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, and Shinji Watanabe, "Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing," Proc. Interspeech'24 (accepted)
Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel Romney Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R Mortensen, and Lori Levin, "Wav2Gloss: Generating Interlinear Glossed Text from Speech," Proc. ACL'24 (accepted)
Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe, "OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification," Proc. ACL'24 (accepted)
Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, and Shinji Watanabe, "On the Evaluation of Speech Foundation Models for Spoken Language Understanding," Proc. Findings of ACL'24 (accepted)
Zhong-Qiu Wang, Anurag Kumar, and Shinji Watanabe, "Cross-Talk Reduction," Proc. IJCAI'24 (accepted)
Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, and Shinji Watanabe, "UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions, " Proc. NAACL'24 (accepted)
Doyeop Kwak, Jaemin Jung, Kihyun Nam, Youngjoon Jang, Jee-weon Jung, Shinji Watanabe, and Joon Son Chung, "VOXMM: RICH TRANSCRIPTION OF CONVERSATIONS IN THE WILD," Proc. ICASSP'24 (accepted)
Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, and Shinji Watanabe, "AUGSUMM: TOWARDS GENERALIZABLE SPEECH SUMMARIZATION USING SYNTHETIC LABELS FROM LARGE LANGUAGE MODELS," Proc. ICASSP'24 (accepted)
Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, and Shinji Watanabe, "VOXTLM: UNIFIED DECODER-ONLY MODELS FOR CONSOLIDATING SPEECH RECOGNITION, SYNTHESIS AND SPEECH, TEXT CONTINUATION TASKS," Proc. ICASSP'24 (accepted)
Amir Hussein, Dorsa Zeinali, Ondřej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, and Sanjeev Khudanpur, "SPEECH COLLAGE: CODE-SWITCHED AUDIO GENERATION BY COLLAGING MONOLINGUAL CORPORA," Proc. ICASSP'24 (accepted)
Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, and Sanjeev Khudanpur, "ENHANCING END-TO-END CONVERSATIONAL SPEECH TRANSLATION THROUGH TARGET LANGUAGE CONTEXT UTILIZATION," Proc. ICASSP'24 (accepted)
Salvador Medina, Sarah Taylor, Carsten Stoll, Gareth Edwards, Alex Hauptmann, Shinji Watanabe, and Iain Matthews, "PHISANET: PHONETICALLY INFORMED SPEECH ANIMATION NETWORK," Proc. ICASSP'24 (accepted)
Samuele Cornell, Jee-weon Jung, Shinji Watanabe, and Stefano Squartini, "ONE MODEL TO RULE THEM ALL ? TOWARDS END-TO-END JOINT SPEAKER DIARIZATION AND SPEECH RECOGNITION," Proc. ICASSP'24 (accepted)
Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Shinji Watanabe, Daniel Povey, and Sanjeev Khudanpur, "LESS PEAKY AND MORE ACCURATE CTC FORCED ALIGNMENT BY PRUNED CTC LOSS AND LABEL PRIORS," Proc. ICASSP'24 (accepted)
Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, and Shinji Watanabe, "HUBERTOPIC: ENHANCING SEMANTIC REPRESENTATION OF HUBERT THROUGH SELF-SUPERVISION UTILIZING TOPIC MODEL," Proc. ICASSP'24 (accepted)
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-weon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, and Hsiu-Hsuan Wang, "EXPLORING SPEECH RECOGNITION, TRANSLATION, AND UNDERSTANDING WITH DISCRETE SPEECH UNITS: A COMPARATIVE STUDY," Proc. ICASSP'24 (accepted)
Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chun-Yi Kuan, Chi-Yuan Hsiao, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, and Hung-yi Lee, "DYNAMIC-SUPERB: TOWARDS A DYNAMIC, COLLABORATIVE, AND COMPREHENSIVE INSTRUCTION-TUNING BENCHMARK FOR SPEECH," Proc. ICASSP'24 (accepted)
Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, and Shinji Watanabe, "CROSS-MODAL MULTI-TASKING FOR SPEECH-TO-TEXT TRANSLATION VIA HARD PARAMETER SHARING," Proc. ICASSP'24 (accepted)
Siddhant Arora, George Saon, Shinji Watanabe, and Brian Kingsbury, "SEMI-AUTOREGRESSIVE STREAMING ASR WITH LABEL CONTEXT," Proc. ICASSP'24 (accepted)
Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, and Karen Livescu, "GENERATIVE CONTEXT-AWARE FINE-TUNING OF SELF-SUPERVISED SPEECH MODELS," Proc. ICASSP'24 (accepted)
Yui Sudo, Shakeel Muhammad, Yosuke Fukumoto, Yifan Peng, and Shinji Watanabe, "CONTEXTUALIZED AUTOMATIC SPEECH RECOGNITION WITH ATTENTION-BASED BIAS PHRASE BOOSTED BEAM SEARCH," Proc. ICASSP'24 (accepted)
William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe, "Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing," Proc. ICASSP'24 (accepted)
Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, and Shinji Watanabe, "PHONEME-AWARE ENCODING FOR PREFIX-TREE-BASED CONTEXTUAL ASR," Proc. ICASSP'24 (accepted)
Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, and Shinji Watanabe, "BOOSTING UNKNOWN-NUMBER SPEAKER SEPARATION WITH TRANSFORMER DECODER-BASED ATTRACTOR," Proc. ICASSP'24 (accepted)
Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, and Yong Man Ro, "VISUAL SPEECH RECOGNITION FOR LOW-RESOURCE LANGUAGES WITH AUTOMATIC LABELS FROM WHISPER MODEL," Proc. ICASSP'24 (accepted)
Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, and Yong Man Ro, "TOWARDS PRACTICAL AND EFFICIENT IMAGE-TO-SPEECH CAPTIONING WITH VISION-LANGUAGE PRE-TRAINING AND MULTI-MODAL TOKENS," Proc. ICASSP'24 (accepted)
Kwanghee Choi, Jee-weon Jung, and Shinji Watanabe, "UNDERSTANDING PROBE BEHAVIORS THROUGH VARIATIONAL BOUNDS OF MUTUAL INFORMATION," Proc. ICASSP'24 (accepted)
Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François Germain, Jonathan Le Roux, and Shinji Watanabe, "IMPROVING AUDIO CAPTIONING MODELS WITH FINE-GRAINED AUDIO FEATURES, TEXT EMBEDDING SUPERVISION, AND LLM MIX-UP AUGMENTATION," Proc. ICASSP'24 (accepted)
Yusuke Shinohara and Shinji Watanabe, "Domain Adaptation by Data Distribution Matching via Submodularity for Speech Recognition," Proc. ASRU'23 (accepted)
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, and Shinji Watanabe, "Summarize while Translating: Universal Model with Parallel Decoding for Summarization and Translation," Proc. ASRU'23 (accepted)
Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, and Shinji Watanabe, "YODAS: Youtube-Oriented Dataset for Audio and Speech," Proc. ASRU'23 (accepted)
Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, and Tetsuji Ogawa, "A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction," Proc. ASRU'23 (accepted)
Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, and Yumeng Tao, "TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch," Proc. ASRU'23 (accepted)
Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, and Yanmin Qian, "Toward Universal Speech Enhancement For Diverse Input Conditions," Proc. ASRU'23 (accepted)
Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei Ping Huang, En Pei Hu, ho lam Chung, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, and Shinji Watanabe, "Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond," Proc. ASRU'23 (accepted)
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, and Shinji Watanabe, "Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning," Proc. ASRU'23 (accepted)
Masao Someki, Nicholas Eng, Yosuke Higuchi, and Shinji Watanabe, "Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference," Proc. ASRU'23 (accepted)
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, and Shinji Watanabe, "Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data," Proc. ASRU'23 (accepted)
Roshan Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Atsunori Ogawa, Siddhant Arora, Marc Delcroix, Rita Singh, Shinji Watanabe, and Bhiksha Raj, "ESPNet-SUMM: Introducing a novel large dataset, toolkit, and a cross-corpora evaluation of speech summarization systems," Proc. ASRU'23 (accepted)
Yuya Fujita, Shinji Watanabe, Xuankai Chang, and Takashi Maekaku, "LV-CTC: Non-autoregressive ASR with CTC and latent variable models," Proc. ASRU'23 (accepted)
Zhong-Qiu Wang and Shinji Watanabe, "UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures," Proc. NeurIPS'23 (accepted)
Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, and Shinji Watanabe, "Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation," Proc. WASPAA'23 (accepted)
Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, and Ondřej Bojar, "Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency," Proc. Interspeech'23 (accepted)
Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan Black, Louis Goldstein, Shinji Watanabe, and Gopala Krishna Anumanchipalli, "Deep Speech Synthesis from MRI-Based Articulatory Representations," Proc. Interspeech'23 (accepted)
Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, and Brian MacWhinney, "A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning," Proc. Interspeech'23 (accepted)
Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, and Shinji Watanabe, "Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning," Proc. Interspeech'23 (accepted)
Puyuan Peng, Brian Yan, Shinji Watanabe, and David Harwath, "Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization," Proc. Interspeech'23 (accepted)
Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, and Shinji Watanabe, "Integrating Pretrained ASR and LM to perform Sequence Generation for Spoken Language Understanding," Proc. Interspeech'23 (accepted)
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, and Shinji Watanabe, "Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder--decoder Speech Recognition," Proc. Interspeech'23 (accepted)
Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, and Shinji Watanabe, "Bayes Risk Transducer: Transducer with Controllable Alignment Prediction," Proc. Interspeech'23 (accepted)
Jiatong Shi, Yun Tang, HIrofumi Inaguma, Hongyu Gong, Juan Pino, and Shinji Watanabe, "Exploration on HuBERT with Multiple Resolution," Proc. Interspeech'23 (accepted)
Yui Sudo, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe, "Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training," Proc. Interspeech'23 (accepted)
Jiatong Shi, Dan Berrebbi, William Chen, En Pei Hu, Wei-Ping Huang, ho lam Chung, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, and Shinji Watanabe, "ML-SUPERB: Multilingual Speech Universal PERformance Benchmark," Proc. Interspeech'23 (accepted)
Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, and Shinji Watanabe, "Tensor Decomposition for Minimization of E2E SLU Model Toward On-Device Processing," Proc. Interspeech'23 (accepted)
Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, and Shinji Watanabe, "4D: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders," Proc. Interspeech'23 (accepted)
Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe, "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models," Proc. Interspeech'23 (accepted)
Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, and Shinji Watanabe, "A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks," Proc. Interspeech'23 (accepted)
William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, and Shinji Watanabe, "Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute," Proc. Interspeech'23 (accepted)
Roshan Sharma, Siddhant Arora, Kenneth Zheng, Shinji Watanabe, Rita Singh, and Bhiksha Raj, "BASS: Block-wise Adaptation for Speech Summarization," Proc. Interspeech'23 (accepted)
Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, and Shinji Watanabe, "ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit," Proc. ACL'23 (demo paper) (accepted)
Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, and Juan Pino, "UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units," Proc. ACL'23 (accepted)
Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan S Sharma, Wei-Lun Wu, Hung-yi Lee, Karen Livescu and Shinji Watanabe, "SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks," Proc. ACL'23 (accepted
Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, and Boris Ginsburg, "Efficient Sequence Transduction by Jointly Predicting Tokens and Durations," Proc. ICML'23 (accepted)
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, and Hiroshi Saruwatari "Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining," Proc. IJCAI'23 (accepted)
Yifan Peng, Jaesong Lee, and Shinji Watanabe, "I3D: Transformer architectures with input-dependent dynamic depth for speech recognition," Proc. ICASSP'23 (accepted)
Junwei Huang, Karthik Ganesan, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, and Shinji Watanabe, "FINDADAPTNET: Find and Insert Adapters by Learned Layer Importance," Proc. ICASSP'23 (accepted)
Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, and Cong Liu, "The Multimodal Information Based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition," Proc. ICASSP'23 (accepted)
Takashi Maekaku, Yuya Fujita, Xuankai Chang, and Shinji Watanabe, "Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model," Proc. ICASSP'23 (accepted)
Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, and Shinji Watanabe, "Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding," Proc. ICASSP'23 (accepted)
Jiachen Lian, Alan W Black, Yijing Lu, Louis Goldstein, Shinji Watanabe, and Gopala K. Anumanchipalli, "Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization," Proc. ICASSP'23 (accepted)
Dan Berrebbi, Brian Yan, and Shinji Watanabe, "Avoid Overthinking in Self-Supervised Models for Speech Recognition," Proc. ICASSP'23 (accepted)
Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe, "Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling," Proc. ICASSP'23 (accepted)
Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, and Sanjeev Khudanpur, "EURO: ESPnet Unsupervised ASR Open-Source Toolkit," Proc. ICASSP'23 (accepted)
Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, and Shinji Watanabe, "TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation," Proc. ICASSP'23 (accepted)
Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, and Shinji Watanabe, "Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History," Proc. ICASSP'23 (accepted)
Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W. Black, and Gopala K. Anumanchipalli, "Speaker-Independent Acoustic-to-Articulatory Speech Inversion," Proc. ICASSP'23 (accepted)
Soumi Maiti, Yifan Peng, Takaaki Saeki, and Shinji Watanabe, "SpeechLMScore: Evaluating Speech Generation Using Speech Language Model," Proc. ICASSP'23 (accepted)
William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, and Shinji Watanabe, "Improving Massively Multilingual ASR With Auxiliary CTC Objectives," Proc. ICASSP'23 (accepted)
Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, and Shinji Watanabe, "Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation," Proc. ICASSP'23 (accepted)
Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, and Shinji Watanabe, "Towards Zero-Shot Code-Switched Speech Recognition," Proc. ICASSP'23 (accepted)
Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, and Shinji Watanabe, "Streaming Joint Speech Recognition and Disfluency Detection," Proc. ICASSP'23 (accepted)
Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, and Shinji Watanabe, "Enhancing Speech-To-Speech Translation with Multiple TTS Targets," Proc. ICASSP'23 (accepted)
Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, and Shinji Watanabe, "Context-Aware Fine-Tuning of Self-Supervised Speech Models," Proc. ICASSP'23 (accepted)
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura, and Shinji Watanabe, "Speech summarization of long spoken document: Improving memory efficiency of speech/text encoders," Proc. ICASSP'23 (accepted)
Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj, "TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement," Proc. ICASSP'23 (accepted)
Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, and Hung-yi Lee, "Bridging Speech and Text Pre-trained Models with Unsupervised ASR," Proc. ICASSP'23 (accepted)
Li-Wei Chen, Shinji Watanabe, and Alexander Rudnicky, ”A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units," Proc. ICASSP'23 (accepted)
Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe, "InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss," Proc. ICASSP'23 (accepted)
Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe, "BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder," Proc. ICASSP'23 (accepted)
Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu J. Han, Ryan McDonald, Kilian Q. Weinberger, and Yoav Artzi, "Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages," Proc. ICASSP'23 (accepted)
Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, and Joon Son Chung, "In search of strong embedding extractors for speaker diarisation," Proc. ICASSP'23 (accepted)
Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj, "PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement," Proc. ICASSP'23 (accepted)
Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, and Boris Ginsburg, "Multi-blank Transducers for Speech Recognition," Proc. ICASSP'23 (accepted)
Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black and Shinji Watanabe, "CTC Alignments Improve Autoregressive Translation," Proc. EACL'23, pp. 1623--1329 (2023)
Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, and Shinji Watanabe, "BAYES RISK CTC: CONTROLLABLE CTC ALIGNMENT IN SEQUENCE-TO-SEQUENCE TASKS," Proc. ICLR'23 (accepted)
Li-Wei Chen, Alexander Rudnicky, and Shinji Watanabe, "A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech," Proc. AAAI'23 (accepted)
Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe, "BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model," Proc. Findings of EMNLP'22, pp. 5486--5503 (2022)
Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, and Shinji Watanabe, "Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models," Proc. Findings of EMNLP'22, pp. 5448--5458 (2022)
Yushi Ueda, Soumi Maiti, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, and Yong Xu, "EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers," Proc. SLT'22, pp. 480--487 (2022)
Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdel-rahman Mohamed, Shang-Wen Li, and Hung-yi Lee, "SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning," Proc. SLT'22, pp. 1096--1103 (2022)
Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, and Shinji Watanabe, "E-Branchformer: Branchformer with Enhanced merging for speech recognition," Proc. SLT'22, pp. 84--91 (2022)
Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, and Shinji Watanabe, "A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding," Proc. SLT'22, pp. 406--413 (2022)
Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-yi Lee, and Hao Tang, "On Compressing Sequences for Self-Supervised Speech Models," Proc. SLT'22, pp. 1128--1135 (2022)
Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe, and Nobutaka Ono, "End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation," Proc. SLT'22, pp. 260--265 (2022)
Shota Horiguchi, Yuki Takashima, Shinji Watanabe, and Paola Garcia, "Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization," Proc. SLT'22, pp. 620--625 (2022)
Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, and Yanmin Qian, "End-to-End Multi-speaker ASR with Independent Vector Analysis," Proc. SLT'22, pp. 496--501 (2022)
Shukjae Choi, Younglo Lee, Jihwan Park, Hyung Yong Kim, Byeong-Yeol Kim, Zhong-Qiu Wang, and Shinji Watanabe, "An Empirical Study of Training Mixture Generation Strategies on Speech Separation: Dynamic Mixing and Augmentation," Proc. APSIPA ASC'22, pp. 1070--1075 (2022)
Masao Someki, Yosuke Higuchi, Tomoki Hayashi, and Shinji Watanabe, "ESPnet-ONNX: Bridging a Gap Between Research and Production," Proc. APSIPA ASC'22, pp. 420--427 (2022)
Jiatong Shi, George Saon, David Haws, Shinji Watanabe and Brian Kingsbury, "VQ-T: RNN Transducers using Vector-Quantized Prediction Network States," Proc. Interspeech'22, pp. 1656--1660 (2022)
Jaesong Lee, Lukas Lee and Shinji Watanabe, "Memory-Efficient Training of RNN-Transducer with Sampled Softmax," Proc. Interspeech'22, pp. 4441--4445 (2022)
Keqi Deng, Shinji Watanabe, Jiatong Shi and Siddhant Arora, "Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation," Proc. Interspeech'22, pp. 1746--1750 (2022)
Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe and Qin Jin, "SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy," Proc. Interspeech'22, pp. 4272--4276 (2022)
Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe and Qin Jin, "Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis," Proc. Interspeech'22, pp. 4277--4281 (2022)
Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Baocai Yin and Jia Pan, "Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis," Proc. Interspeech'22, pp. 1766--1770 (2022)
Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Shifu Xiong and Jian-Qing Gao, "Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis," Proc. Interspeech'22, pp. 1111--1115 (2022)
Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black and Shinji Watanabe, "ASR2K: Speech Recognition for Around 2000 Languages without Audio," Proc. Interspeech'22, pp. 4885--4889 (2022)
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian and Shinji Watanabe, "ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding," Proc. Interspeech'22, pp. 5458--5462 (2022)
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W Black and Shinji Watanabe, "Two-Pass Low Latency End-to-End Spoken Language Understanding," Proc. Interspeech'22, pp. 3478--3482 (2022)
Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W Black and Gopala Krishna Anumanchipalli, "Deep Speech Synthesis from Articulatory Representations," Proc. Interspeech'22, pp. 779--783 (2022)
Yusuke Shinohara and Shinji Watanabe, "Minimum latency training of sequence transducers for streaming end-to-end speech recognition," Proc. Interspeech'22, pp. 2098--2102 (2022)
Yui Sudo, Shakeel Muhammad, Kazuhiro Nakadai, Jiatong Shi and Shinji Watanabe, "Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection," Proc. Interspeech'22, pp. 4641--4645 (2022)
Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe and Yusuke Kida, "Better Intermediates Improve CTC Inference," Proc. Interspeech'22, pp. 4965--4969 (2022)
Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Leibny Paola Garcia Perera and Yohei Kawaguchi, "Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models," Proc. Interspeech'22, pp. 2218--2222 (2022)
Takashi Maekaku, Yuya Fujita, Yifan Peng and Shinji Watanabe, "Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR," Proc. Interspeech'22, pp. 1071--1075 (2022)
Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Prasad Narisetty and Shinji Watanabe, "Residual Language Model for End-to-end Speech Recognition," Proc. Interspeech'22, pp. 3899--3903 (2022)
Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen and Shinji Watanabe, "When Is TTS Augmentation Through a Pivot Language Useful?," Proc. Interspeech'22, pp. 3538--3542 (2022)
Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti and Shinji Watanabe, "TriniTTS: Pitch-controllable End-to-end TTS without External Aligner," Proc. Interspeech'22, pp. 16--20 (2022)
Muqiao Yang, Ian Lane and Shinji Watanabe, "Online Continual Learning of End-to-End Speech Recognition Models," Proc. Interspeech'22, pp. 2668--2672 (2022)
Xuankai Chang, Takashi Maekaku, Yuya Fujita and Shinji Watanabe, "End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation," Proc. Interspeech'22, pp. 3819--3823 (2022)
Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan Amith and Shinji Watanabe, "Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation," Proc. Interspeech'22, pp. 3533--3537 (2022)
Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe and Bhiksha Raj, "Improving Speech Enhancement through Fine-Grained Speech Characteristics," Proc. Interspeech'22, pp. 2953--2957 (2022)
Yifan Peng, Siddharth Dalmia, Ian Lane, and Shinji Watanabe, "Branchformer: Parallel MLP-Attention Architectures for Speech Recognition and Understanding," Proc. ICML'22, pp. 17627--17643 (2022)
Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig and Shinji Watanabe, “CMU’s IWSLT 2022 Dialect Speech Translation System,” Proc. IWSLT’22, pp. 298--307 (2022)
Xinjian Li, Florian Metze, David R. Mortensen, Alan W Black, and Shinji Watanabe, ”Phone Inventories and Recognition for Every Language," Proc. LREC'22, pp. 1061--1067 (2022)
Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee, "SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities," Proc. ACL'22, pp. 8479--8492 (2022)
Xinjian Li, Florian Metze, David R Mortensen, Shinji Watanabe, and Alan Black, "Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble," Proc. Findings of ACL'22, pp. 2106--2115 (2022)
Jen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, and Shinji Watanabe, "TOWARDS LOW-DISTORTION MULTI-CHANNEL SPEECH ENHANCEMENT: THE ESPNET-SE SUBMISSION TO THE L3DAS22 CHALLENGE," Proc. ICASSP'22, pp. 9201--9205 (2022) Ranked 1st place at the L3DAS22 challenge task 1
Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Di-Yuan Liu, Bao-Cai Yin, Jia Pan, Jian-Qing Gao, and Cong Liu "THE FIRST MULTIMODAL INFORMATION BASED SPEECH PROCESSING (MISP) CHALLENGE: DATA, TASKS, BASELINES AND RESULTS," Proc. ICASSP'22, pp. 9266--9270 (2022)
Motoi Omachi, Yuya Fujita, Shinji Watanabe, and Tianzi Wang, "NON-AUTOREGRESSIVE END-TO-END AUTOMATIC SPEECH RECOGNITION INCORPORATING DOWNSTREAM NATURAL LANGUAGE PROCESSING," Proc. ICASSP'22, pp. 6772--6776 (2022)
Takashi Maekaku, Xuankai Chang, Yuya Fujita, and Shinji Watanabe, "AN EXPLORATION OF HUBERT WITH LARGE NUMBER OF CLUSTER UNITS AND MODEL ASSESSMENT USING BAYESIAN INFORMATION CRITERION," Proc. ICASSP'22, pp. 7107--7111 (2022)
Zili Huang, Shinji Watanabe, Shu-wen Yang, Paola Garcia, and Sanjeev Khudanpur, "INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION," Proc. ICASSP'22, pp. 6837--6841 (2022)
Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, and Yu Tsao, "CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT," Proc. ICASSP'22, pp. 7402--7406 (2022)
Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, and Pengyuan Zhang, "IMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELS," Proc. ICASSP'22, pp. 8522--8526 (2022)
Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Han, and Shinji Watanabe, "SRU++: PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION," Proc. ICASSP'22, pp. 7872--7876 (2022)
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe, "Integrating multiple ASR systems into NLP backend with attention fusion," Proc. ICASSP'22, pp. 6237--6241 (2022)
Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, and Shinji Watanabe, "ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET," Proc. ICASSP'22, pp. 7167--7171 (2022)
Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, and Dong Yu, "JOINT MODELING OF CODE-SWITCHED AND MONOLINGUAL ASR VIA CONDITIONAL FACTORIZATION," Proc. ICASSP'22, pp. 6412--6416 (2022)
Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, and Jonathan Le Roux, "EXTENDED GRAPH TEMPORAL CLASSIFICATION FOR MULTI-SPEAKER END-TO-END ASR," Proc. ICASSP'22, pp. 7322--7326 (2022)
Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, "Sequence Transduction with Graph-based Supervision," Proc. ICASSP'22, pp. 7212--7216 (2022)
Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, and Shinji Watanabe, "RUN-AND-BACK STITCH SEARCH: NOVEL BLOCK SYNCHRONOUS DECODING FOR STREAMING ENCODER-DECODER ASR," Proc. ICASSP'22, pp. 8287--8291 (2022)
Wen-Chin Huang, Shu-wen Yang, Tomoki Hayashi, Hung-yi Lee, Shinji Watanabe, and Tomoki Toda, "S3PRL-VC: OPEN-SOURCE VOICE CONVERSION FRAMEWORK WITH SELF-SUPERVISED SPEECH REPRESENTATIONS," Proc. ICASSP'22, pp. 6552--6556 (2022)
Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, and Shinji Watanabe, "JOINT SPEECH RECOGNITION AND AUDIO CAPTIONING," Proc. ICASSP'22, pp. 7892--7896 (2022)
Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, and Yohei Kawaguchi, "MULTI-CHANNEL END-TO-END NEURAL DIARIZATION WITH DISTRIBUTED MICROPHONES," Proc. ICASSP'22, pp. 7332--7336 (2022)
Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Yang, Jason Lian, Jeff Hwang, Ji Chen, Peter Goldsborough, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, and Vincent Quenneville-Bélair, "TORCHAUDIO: BUILDING BLOCKS FOR AUDIO AND SPEECH PROCESSING," Proc. ICASSP'22, pp. 6982--6986 (2022)
Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, and Tomoki Toda,"ON PROSODY MODELING FOR ASR+TTS BASED VOICE CONVERSION," Proc. ASRU'21, pp. 642--649 (2021)
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, and Shinji Watanabe, "Attention-based Multi-hypothesis Fusion for Speech Summarization," Proc. ASRU'21, pp. 487--494 (2021)
Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, and Shinji Watanabe,"Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates," Proc. ASRU'21, pp. 922--929 (2021)
Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki Takashima, and Yohei Kawaguchi, "TOWARDS NEURAL DIARIZATION FOR UNLIMITED NUMBERS OF SPEAKERS USING GLOBAL AND LOCAL ATTRACTORS," Proc. ASRU'21, pp. 98--105 (2021)
Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, and Shinji Watanabe, "A STUDY OF TRANSDUCER BASED END-TO-END ASR WITH ESPNET: ARCHITECTURE, AUXILIARY LOSS AND DECODING STRATEGIES," Proc. ASRU'21, pp. 16--23 (2021)
Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, and Shinji Watanabe, "A COMPARATIVE STUDY ON NON-AUTOREGRESSIVE MODELINGS FOR SPEECH-TO-TEXT GENERATION," Proc. ASRU'21, pp. 47--54 (2021)
Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, and Alan Black, "CROSS-LINGUAL TRANSFER FOR SPEECH PROCESSING USING ACOUSTIC LANGUAGE SIMILARITY," Proc. ASRU'21, pp. 1050--1057 (2021)
Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, and Shidong Shang, "CONFERENCINGSPEECH CHALLENGE: TOWARDS FAR-FIELD MULTI-CHANNEL SPEECH ENHANCEMENT FOR VIDEO CONFERENCING," Proc. ASRU'21, pp. 679--686 (2021)
Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-yi Lee, and Shinji Watanabe, "AN EXPLORATION OF SELF-SUPERVISED PRETRAINED REPRESENTATIONS FOR END-TO-END SPEECH RECOGNITION," Proc. ASRU'21, pp. 228--235 (2021)
Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda, "Self-Guided Curriculum Learning for Neural Machine Translation," Proc. IWSLT'21, pp. 206--214 (2021).
Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, and Shinji Watanabe, "ESPnet-ST IWSLT 2021 Offline Speech Translation System," Proc. IWSLT'21, pp. 100--109 (2021).
Yen-Ju Lu, Yu Tsao, and Shinji Watanabe, "A Study on Speech Enhancement based on Diffusion Probabilistic Model," Proc. APSIPA ASC'21, pp. 659--666 (2021)
Peter Wu, Paul Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, and Louis-Philippe Morency, "Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks," Proc. APSIPA ASC'21, pp. 841-848 (2021)
Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, and Yanmin Qian, "Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions," Proc. WASPAA'21, pp. 146--150 (2021)
Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan, "GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio," Proc. Interspeech'21, pp. 3670--3674 (2021)
Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki and Tomoki Hayashi, "Acoustic Event Detection with Classifier Chains," Proc. Interspeech'21, pp. 601--605 (2021)
Pengcheng Guo, Xuankai Chang, Shinji Watanabe and Lei Xie, "Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain," Proc. Interspeech'21, pp. 3720--3724 (2021)
Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Han and Shinji Watanabe, "Multi-mode Transformer Transducer with Stochastic Future Context," Proc. Interspeech'21, pp. 1827--1831 (2021)
Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze and Shinji Watanabe, "Differentiable Allophone Graphs for Language Universal Speech Recognition," Proc. Interspeech'21, pp. 2471--2475 (2021)
Matthew Maciejewski, Shinji Watanabe and Sanjeev Khudanpur, "Speaker Verification-Based Evaluation of Single-Channel Speech Separation," Proc. Interspeech'21, pp. 3520--3524 (2021)
Patrick O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael Shulman, Boris Ginsburg, Shinji Watanabe and Georg Kucsko, "SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition," Proc. Interspeech'21, pp. 1434--1438 (2021)
Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed and Hung-yi Lee, "SUPERB: Speech processing Universal PERformance Benchmark," Proc. Interspeech'21, pp. 1194--1198 (2021)
Suwon Shon, Pablo Brusco, Jing Pan, Kyu Han and Shinji Watanabe, "Leveraging Pre-trained Language Model for Speech Sentiment Analysis," Proc. Interspeech'21, pp. 3420--3424 (2021)
Tianzi Wang, Yuya Fujita, Xuankai Chang and Shinji Watanabe, "Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models," Proc. Interspeech'21, pp. 3755--3759 (2021)
Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe and Alan W Black, "Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding," Proc. Interspeech'21, pp. 1264--1268 (2021)
Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe and Alexander Rudnicky, "Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021," Proc. Interspeech'21, pp. 1564--1568 (2021)
Jaesong Lee, Jingu Kang and Shinji Watanabe, "Layer Pruning on Demand with Intermediate CTC," Proc. Interspeech'21, pp. 3745--3749 (2021)
Yuya Fujita, Tianzi Wang, Shinji Watanabe and Motoi Omachi, "Toward Streaming ASR with Non-autoregressive Insertion-based Model," Proc. Interspeech'21, pp. 3740--3744 (2021)
Katerina Zmolikova, Marc Delcroix, Desh Raj, Shinji Watanabe and Jan “Honza” Černocký, "Auxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristics," Proc. Interspeech'21, pp. 1464--1468 (2021)
Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi and Shinji Watanabe, "Data Augmentation Methods for End-to-end Speech Recognition on Distant-talk Scenarios," Proc. Interspeech'21, pp. 301--305 (2021)
Mao-Kui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen and Shinji Watanabe, "Target-Speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker," Proc. Interspeech'21, pp. 3555--3559 (2021)
Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Leibny Paola Garcia Perera and Kenji Namagatsu, "Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers," Proc. Interspeech'21, pp. 3116--3120 (2021)
Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Leibny Paola Garcia Perera and Kenji Nagamatsu, "Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization," Proc. Interspeech'21, pp. 3096--3100 (2021)
Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John Hershey, Nima Mesgarani and Zhuo Chen, ”Continuous speech separation using speaker inventory for long recording," Proc. Interspeech'21, pp. 3036--3040 (2021)
Jiatong Shi, Jonathan D. Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, and Shinji Watanabe, "Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation," Proc. AmericasNLP'21, pp. 53--63 (2021)
Hirofumi Inaguma, Tatsuya Kawahara, and Shinji Watanabe, "Bidirectional Source and Target Knowledge Distillation for End-to-end Speech Translation," Proc. NAACL'21, pp. 1872--1881 (2021)
Motoi Omachi, Yuya Fujita, Shinji Watanabe, and Matthew Wiesner, "End-to-end ASR to jointly predict transcriptions and linguistic annotations," Proc. NAACL'21, pp. 1861--1871 (2021)
Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, and Shinji Watanabe, "Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks," Proc. NAACL'21, pp. 1882-1896 (2021)
Jaesong Lee and Shinji Watanabe, "INTERMEDIATE LOSS REGULARIZATION FOR CTC-BASED SPEECH RECOGNITION," Proc. ICASSP'21, pp. 6224--6228 (2021)
Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, and Yanmin Qian, "Dual-Path Modeling for Long Recording Speech Separation in Meetings," Proc. ICASSP'21, pp. 5739--5743 (2021)
Murali Karthick Baskar, Lukas Burget, Shinji Watanabe, Ramon Astudillo, and Jan “Honza” Černocký, "EAT: ENHANCED ASR-TTS FOR SELF-SUPERVISED SPEECH RECOGNITION," Proc. ICASSP'21, pp. 6753--6757 (2021)
Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, and John Hershey, "END-TO-END DIARIZATION FOR VARIABLE NUMBER OF SPEAKERS WITH LOCAL-GLOBAL NETWORKS AND DISCRIMINATIVE SPEAKER EMBEDDINGS," Proc. ICASSP'21, pp. 7183--7187 (2021)
Matthew Maciejewski, Jing Shi, Shinji Watanabe, and Sanjeev Khudanpur, "TRAINING NOISY SINGLE-CHANNEL SPEECH SEPARATION WITH NOISY ORACLE SOURCES: A LARGE GAP AND A SMALL STEP," Proc. ICASSP'21, pp. 5774--5778 (2021)
Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, and Shinji Watanabe, "Non-autoregressive End-to-end Speech Translation with Dual-decoder," Proc. ICASSP'21, pp. 7503--7507 (2021)
Yosuke Kashiwagi, Emiru Tsunoo, and Shinji Watanabe, "Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition," Proc. ICASSP'21, pp. 6214--6218 (2021)
Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, and Dong Yu, "DIRECTIONAL ASR: A NEW PARADIGM FOR E2E MULTI-SPEAKER SPEECH RECOGNITION WITH SOURCE LOCALIZATION," Proc. ICASSP'21, pp. 8433--8437 (2021)
Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, and Yuekai Zhang, "RECENT DEVELOPMENTS ON ESPNET TOOLKIT BOOSTED BY CONFORMER," Proc. ICASSP'21, pp. 5874--5878 (2021)
Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, and Tetsunori Kobayashi, "IMPROVED MASK-CTC FOR NON-AUTOREGRESSIVE END-TO-END ASR," Proc. ICASSP'21, pp. 8363--8367 (2021)
Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, and Yanmin Qian, "END-TO-END DEREVERBERATION, BEAMFORMING, AND SPEECH RECOGNITION WITH IMPROVED NUMERICAL STABILITY AND ADVANCED FRONTEND," Proc. ICASSP'21, pp. 6898--6902 (2021)
Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, and Dong Yu, "IMPROVING RNN TRANSDUCER WITH TARGET SPEAKER EXTRACTION AND NEURAL UNCERTAINTY ESTIMATION," Proc. ICASSP'21, pp. 6908--6912 (2021)
Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, and Kenji Nagamatsu, "END-TO-END SPEAKER DIARIZATION AS POST-PROCESSING," Proc. ICASSP'21, pp. 7188--7192 (2021)
Jiatong Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, and Shinji Watanabe, "Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec," Proc. EACL'21, pp. 1134--1145 (2021)
Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, and John Hershey, "SEQUENTIAL MULTI-FRAME NEURAL BEAMFORMING FOR SPEECH SEPARATION AND ENHANCEMENT," Proc. SLT'21, pp. 905--911 (2021)
Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, and John R. Hershey, "INTEGRATION OF SPEECH SEPARATION, DIARIZATION, AND RECOGNITION FOR MULTI-SPEAKER MEETINGS: SYSTEM DESCRIPTION, COMPARISON, AND ANALYSIS," Proc. SLT'21, pp. 897--904 (2021)
Desh Raj, Leibny Paola Garcia Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, and Sanjeev Khudanpur, "DOVER-LAP: A METHOD FOR COMBINING OVERLAP-AWARE DIARIZATION OUTPUTS," Proc. SLT'21, pp. 881--888 (2021)
Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker, Zhuo Chen, and Shinji Watanabe "ESPNET-SE: END-TO-END SPEECH ENHANCEMENT AND SEPARATION TOOLKIT DESIGNED FOR ASR INTEGRATION," Proc. SLT'21, pp. 785--792 (2021)
Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, Keisuke Kinoshita, Christoph Boeddeker, Yanmin Qian, Shinji Watanabe, and Zhuo Chen "DUAL-PATH RNN FOR LONG RECORDING SPEECH SEPARATION," Proc. SLT'21, pp. 865--872 (2021)
Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola Garcia, and Kenji Nagamatsu "END-TO-END SPEAKER DIARIZATION CONDITIONED ON SPEECH ACTIVITY AND OVERLAP DETECTION," Proc. SLT'21, pp. 849--856 (2021)
Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Paola Garcia, and Kenji Nagamatsu "ONLINE END-TO-END NEURAL DIARIZATION WITH SPEAKER-TRACING BUFFER," Proc. SLT'21, pp. 841--848 (2021)
Emiru Tsunoo, Yosuke Kashiwagi, and Shinji Watanabe "STREAMING TRANSFORMER ASR WITH BLOCKWISE SYNCHRONOUS BEAM SEARCH," Proc. SLT'21, pp. 22--29 (2021)
Jaesung Huh, Hee Soo Heo , Jingu Kang , Shinji Watanabe , Joon Son Chung, "Augmentation adversarial training for self-supervised speaker recognition," Proc. Self-Supervised Learning for Speech and Audio Processing Workshop @ NeurIPS 2020 (NeurIPS SAS 2020) (2020)
Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe and Tomoki Toda, "The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS," Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, pp. 160--164 (2020)
Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, and Lei Xie, "Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals," Proc. NeurIPS'20, pp. 3735--3747 (2020)
Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, and Kazuya Takeda, "Conformer-based sound event detection with semi-supervised learning and data augmentation," Proc. DCASE'20 Workshop (2020) Ranked 1st place at the DCASE 2020 challenge task 4
Xuankai Chang, Aswin Shanmugam Subramanian, Pengcheng Guo, Shinji Watanabe, Yuya Fujita and Motoi Omachi, "End-to-End ASR with Adaptive Span Self-Attention," Proc. Interspeech'20, pp. 3595--3599 (2020)
Jaejin Cho, Piotr Zelasko, Jesus Villalba, Shinji Watanabe and Najim Dehak, "Learning Speaker Embedding from Text-to-Speech," Proc. Interspeech'20, pp. 3256--3260 (2020)
Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe and Bo Xu, "Speaker-conditional Chain Model for Speech Separation and Extraction," Proc. Interspeech'20, pp. 2707--2711 (2020)
Yuya Fujita, Shinji Watanabe, Motoi Omachi and Xuankai Chang, "Insertion Based Modelling for End-to-End Automatic Speech Recognition," Proc. Interspeech'20, pp. 3660--3664 (2020)
Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue and Kenji Nagamatsu, "End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors," Proc. Interspeech'20, pp. 269--273 (2020)
Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe and Yanmin Qian, "End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming," Proc. Interspeech'20, pp. 324--328 (2020)
Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa and Tetsunori Kobayashi, "Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict," Proc. Interspeech'20, pp. 3655--3659 (2020)
Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Enrique Yalta Soplin, Tomoki Hayashi, and Shinji Watanabe, "ESPnet-ST: All-in-One Speech Translation Toolkit," Proc. ACL'20 (demo paper) , pp. 302--311 (2020)
Ashish Arora, Desh Raj, Aswin Shanmugam Subramanian, Ke Li, Bar Ben-Yair, Matthew Maciejewski, Piotr Zelasko, Paola Garcia, Shinji Watanabe, and Sanjeev Khudanpur, "The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge," Proc. CHiME 2020, pp. 48--54 (2020).
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, and Shinji Watanabe, "END-TO-END MULTI-SPEAKER SPEECH RECOGNITION WITH TRANSFORMER," Proc. ICASSP'20, pp. 6134-6138 (2020)
Katsuki Inoue, Sunao Hara, Masanobu Abe, Tomoki Hayashi, Ryuichi Yamamoto, and Shinji Watanabe, "SEMI-SUPERVISED SPEAKER ADAPTATION FOR END-TO-END SPEECH SYNTHESIS WITH PRETRAINED MODELS," Proc. ICASSP'20, pp. 7634-7638 (2020)
Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, and Shinji Watanabe, "END-TO-END AUTOMATIC SPEECH RECOGNITION INTEGRATED WITH CTC-BASED VOICE ACTIVITY DETECTION," Proc. ICASSP'20, pp. 6999-7003 (2020)
Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, and Xu Tan, "ESPNET-TTS: UNIFIED, REPRODUCIBLE, AND INTEGRATABLE OPEN SOURCE END-TO-END TEXT-TO-SPEECH TOOLKIT," Proc. ICASSP'20, pp. 7654--7658 (2020)
Yuya Fujita, Aswin Shanmugam Subramanian, Motoi Omachi, and Shinji Watanabe, "ATTENTION-BASED ASR WITH LIGHTWEIGHT AND DYNAMIC CONVOLUTIONS," Proc. ICASSP'20, pp. 7034--7038 (2020)
Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, and Hynek Hermansky, "A PRACTICAL TWO-STAGE TRAINING STRATEGY FOR MULTI-STREAM END-TO-END SPEECH RECOGNITION," Proc. ICASSP'20, pp. 7014--7018 (2020)
Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, and Sanjeev Khudanpur, "SPEAKER DIARIZATION WITH REGION PROPOSAL NETWORK," Proc. ICASSP'20, pp. 6514--6518 (2020)
Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, and Kazuya Takeda, "WEAKLY-SUPERVISED SOUND EVENT DETECTION WITH SELF-ATTENTION," Proc. ICASSP'20, pp. 66--70 (2020)
Aswin Shanmugam Subramanian, Chao Weng, Meng Yu, Shi-Xiong Zhang, Yong Xu, Shinji Watanabe, and Dong Yu, "FAR-FIELD LOCATION GUIDED TARGET SPEECH EXTRACTION USING END-TO-END SPEECH RECOGNITION OBJECTIVES," Proc. ICASSP'20, pp. 7299--7303 (2020)
Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, "SIMULTANEOUS SPEECH RECOGNITION AND SPEAKER DIARIZATION FOR MONAURAL DIALOGUE RECORDINGS WITH TARGET-SPEAKER ACOUSTIC MODELS," Proc. ASRU'19, pp. 31--38 (2019)
Hirofumi Inaguma, Kevin Due, Tatsuya Kawahara, Shinji Watanabe, "MULTILINGUAL END-TO-END SPEECH TRANSLATION," Proc. ASRU'19, pp. 570--577 (2019)
Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang, "A COMPARATIVE STUDY ON TRANSFORMER VS RNN IN SPEECH APPLICATIONS," Proc. ASRU'19, pp. 449--456 (2019)
Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe, "TRANSFORMER ASR WITH CONTEXTUAL BLOCK PROCESSING," Proc. ASRU'19, pp. 427--433 (2019)
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe, "MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition," Proc. ASRU'19, pp. 237--244 (2019), best paper award
Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur, "ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT," Proc. ASRU'19, pp. 136--143 (2019)
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, "END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION," Proc. ASRU'19, pp. 296--303 (2019), best paper candidate
Matthew K Maciejewski, Gregory Sell, Yusuke Fujita, Leibny Paola Garcia Perera, Shinji Watanabe, and Sanjeev Khudanpur, "Analysis of Robustness of Deep Single-Channel Speech Separation Using Corpora Constructed from Multiple Domains," Proc. WASPAA'19, pp. 165--169 (2019)
Toru Taniguchi, Aswin Shanmugam Subramanian, Xiaofei Wang, Dung Tran, Yuya Fujita, and Shinji Watanabe, "Generalized Weighted-Prediction-Error Dereverberation with Varying Source Priors for Reverberant Speech Recognition," Proc. WASPAA'19, pp. 293--297 (2019)
Aswin Shanmugam Subramanian, Xiaofei Wang, Murali Karthick Baskar, Shinji Watanabe, Toru Taniguchi, Dung Tran, and Yuya Fujita, "Speech Enhancement Using End-to-End Speech Recognition Objectives," Proc. WASPAA'19, pp. 1--6 (2019)
Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux and John Hershey, "End-to-End Multilingual Multi-Speaker Speech Recognition," Proc. Interspeech'19, pp. 3755--3759 (2019)
Matthew Wiesner, Adithya Renduchintala, Shinji Watanabe, Chunxi Liu, Najim Dehak and Sanjeev Khudanpur, "Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings," Proc. Interspeech'19, pp. 4375--4379 (2019)
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Shubham Toshniwal and Karen Livescu, "Pre-trained Text Embeddings for Enhanced Text-to-Speech Synthesis," Proc. Interspeech'19, pp. 4430--4434 (2019)
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu and Shinji Watanabe, "End-to-End Neural Speaker Diarization with Permutation-Free Objectives," Proc. Interspeech'19, pp. 4300--4304 (2019)
Martin Karafiat, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner and Jan Černocký, "Analysis of Multilingual Sequence-to-Sequence speech recognition systems," Proc. Interspeech'19, pp. 2220--2224 (2019)
Marc Delcroix, Shinji Watanabe, Tsubasa Ochiai, Keisuke Kinoshita, Shigeki Karita, Atsunori Ogawa and Tomohiro Nakatani, "End-to-end SpeakerBeam for single channel target speech recognition," Proc. Interspeech'19, pp. 451--455 (2019)
Murali Karthick Baskar, Shinji Watanabe, Ramón Astudillo, Takaaki Hori, Lukas Burget and Jan Černocký, "Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text," Proc. Interspeech'19, pp. 3790--3794 (2019)
Laureano Moro Velazquez, Jaejin Cho, Shinji Watanabe, Mark Hasegawa-Johnson, Odette Scharenborg, Kim Heejin and Najim Dehak, "Study of the performance of automatic speech recognition systems in speakers with Parkinson's Disease," Proc. Interspeech'19, pp. 3875--3879 (2019)
Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Niko Moritz and Jonathan Le Roux, "Vectorized Beam Search for CTC-Attention-based Speech Recognition," Proc. Interspeech'19, pp. 3825--3829 (2019)
Daniel Garcia-Romero, David Snyder, Shinji Watanabe, Gregory Sell, Alan McCree, Dan Povey and Sanjeev Khudanpur, "Speaker recognition benchmark using the CHiME-5 corpus," Proc. Interspeech'19, pp. 1506--1510 (2019)
Shigeki Karita, Nelson Yalta, Shinji Watanabe, Marc Delcroix, Atsunori Ogawa and Tomohiro Nakatani, "Improving Transformer Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration," Proc. Interspeech'19, pp. 1408--1412 (2019)
Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu and Shinji Watanabe, "Interference Speaker Loss for Target-Speaker Speech Recognition," Proc. Interspeech'19, pp. 236--240 (2019)
Nelson Enrique Yalta Soplin, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, and Tetsuya Ogata, "CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments," Proc. EUSIPCO'19 (2019)
Ashish Arora, Chun-Chieh Chang, Babak Rekabdar, Daniel Povey, David Etter, Desh Raj, Hossein Hadian, Jan Trmal, Paola Garcia, Shinji Watanabe, Vimal Manohar, Yiwen Shao and Sanjeev Khudanpur, "Using ASR methods for OCR," Proc. ICDAR'19, pp. 663--668 (2019)
Nelson Enrique Yalta Soplin, Shinji Watanabe, Kazuhiro Nakadai and Tetsuya Ogata, "Weakly-Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation," Proc. IJCNN'19 (2019)
Oliver Adams, Matthew Wiesner, Shinji Watanabe and David Yarowsky, "Massively Multilingual Adversarial Speech Recognition," Proc. NAACL-HLT'19, pp. 96--108 (2019)
Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, Jan “Honza” Černocký, "PROMISING ACCURATE PREFIX BOOSTING FOR SEQUENCE-TO-SEQUENCE ASR," Proc. ICASSP'19, pp. 5646--5650 (2019)
Hirofumi Inaguma, Jaejin Cho, Murali Karthick Baskar, Tatsuya Kawahara, Shinji Watanabe, "Transfer learning of language-independent end-to-end ASR with language model fusion," Proc. ICASSP'19, pp. 6096--6100 (2019)
Hainan Xu, Shuoyang Ding, Shinji Watanabe, "Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling," Proc. ICASSP'19, pp. 7110--7114 (2019)
Jaejin Cho, Shinji Watanabe, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesus Villalba, Najim Dehak, "LANGUAGE MODEL INTEGRATION BASED ON MEMORY CONTROL FOR SEQUENCE TO SEQUENCE SPEECH RECOGNITION," Proc. ICASSP'19, pp. 6191--6195 (2019)
Xiaofei Wang, Ruizhi Li, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky, "STREAM ATTENTION-BASED MULTI-ARRAY END-TO-END SPEECH RECOGNITION," Proc. ICASSP'19, pp. 7105--7109 (2019)
Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur, "ACOUSTIC MODELING FOR OVERLAPPING SPEECH RECOGNITION: JHU CHIME-5 CHALLENGE SYSTEM," Proc. ICASSP'19, pp. 6665-6669 (2019)
Takaaki Hori, Ramon Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, Jonathan Le Roux, "CYCLE-CONSISTENCY TRAINING FOR END-TO-END SPEECH RECOGNITION," Proc. ICASSP'19, pp. 6271--6275 (2019)
Sandeep Kothinti, Keisuke Imoto, Debmalya Chakrabarty, Gregory Sell, Shinji Watanabe, Mounya Elhilali, "JOINT ACOUSTIC AND CLASS INFERENCE FOR WEAKLY SUPERVISED SOUND EVENT DETECTION," Proc. ICASSP'19, pp. 36--40 (2019)
Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy Sarroff, John R. Hershey, "THE PHASEBOOK: BUILDING COMPLEX MASKS VIA DISCRETE REPRESENTATIONS FOR SOURCE SEPARATION," Proc. ICASSP'19, pp. 66--70 (2019)
Xuankai Chang, Yanmin Qian, Kai Yu, Shinji Watanabe, "End-to-End Monaural Multi-speaker ASR System without Pretraining," Proc. ICASSP'19, pp. 6256--6260 (2019)
Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani, "SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOENCODERS," Proc. ICASSP'19, pp. 6166--6170 (2019)
Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, Shinji Watanabe, "ACOUSTIC MODELING FOR DISTANT MULTI-TALKER SPEECH RECOGNITION WITH SINGLE- AND MULTI-CHANNEL BRANCHES," Proc. ICASSP'19, pp. 6630--6634 (2019)
Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, and Sanjeev Khudanpu, "Low-resource contextual topic identification on speech," Proc. IEEE SLT'18, pp. 656--663 (2018)
Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramon Astudillo, and Kazuya Takeda, "Back-translation-style data augmentation for end-to-end ASR" Proc. IEEE SLT'18, pp. 426--433 (2018)
Ruizhi Li, Jaejin Cho, Murali Karthick Baskar, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiat, Shinji Watanabe, and Takaaki Hori "'Multilingual sequence-to-sequence speech recognition: Architecture, transfer learning, and language modeling," Proc. IEEE SLT'18, pp. 521--527 (2018)
Takaaki Hori, Jaejin Cho, and Shinji Watanabe, "End-to-end speech recognition with word-based RNN language models," Proc. IEEE SLT'18, pp. 389--396 (2018)
Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu and Shinji Watanabe, "Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline," Proc. Interspeech'18, pp. 1571--1575 (2018)
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai, "ESPnet: End-to-End Speech Processing Toolkit," Proc. Interspeech'18, pp. 2207--2211 (2018)
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda and Kazuya Takeda, "Multi-Head Decoder for End-to-End Speech Recognition," Proc. Interspeech'18, pp. 801--805 (2018)
Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Atsunori Ogawa and Marc Delcroix, "Semi-Supervised End-to-End Speech Recognition," Proc. Interspeech'18, pp. 2--6 (2018)
Jon Barker, Shinji Watanabe, Emmanuel Vincent and Jan Trmal, "The fifth `CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines," Proc. Interspeech'18, pp. 1561--1565 (2018)
Aswin Shanmugam Subramanian, Szu-Jui Chen and Shinji Watanabe, "Student-Teacher Learning for BLSTM Mask-based Speech Enhancement," Proc. Interspeech'18, pp. 3249--3253 (2018)
Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner and Shinji Watanabe, "Multi-Modal Data Augmentation for End-to-end ASR," Proc. Interspeech'18 , pp. 2394-2398 (2018) best student paper award
Peter Frederiksen, Jesus Villalba, Shinji Watanabe, Zheng-Hua Tan and Najim Dehak, "Effectiveness of Single-Channel BLSTM Enhancement for Language Identification," Proc. Interspeech'18, pp. 1823--1827 (2018)
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesus Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur, "Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge," Proc. Interspeech'18, pp. 2808--2812 (2018)
Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita and Tomohiro Nakatani, "Auxiliary feature based adaptation of end-to-end ASR systems," Proc. Interspeech'18, pp. 2444--2448 (2018)
Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux and John R Hershey, "A Purely End-to-End System for Multi-speaker Speech Recognition," Proc. ACL'18 (Volume 1: long paper), pp. 2620--2630 (2018)
Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri, Takaaki Hori, and John Hershey, "Speaker adaptation for multichannel end-to-end speech recognition," Proc. ICASSP'18, pp. 6707--6711 (2018)
Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, and John Hershey, "An end-to-end language-tracking speech recognizer for mixed-language speech," Proc. ICASSP'18, pp. 4919--4923 (2018)
Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe, and John R. Hershey, "End-to-end multi-speaker speech recognition," Proc. ICASSP'18, pp. 4819--4823 (2018)
Hayato Shibata, Taku Kato, Takahiro Shinozaki, and Shinji Watanabe, "Composite embedding systems for ZEROSPEECH2017 track1," Proc. ASRU'17, pp. 747--751 (2017), student best paper candidate
Takaaki Hori, Shinji Watanabe, and John Hershey, "Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition," Proc. ASRU'17, pp. 287--291 (2017), best paper candidate
Shinji Watanabe, Takaaki Hori, and John Hershey, "Language independent end-to-end architecture for joint language identification and speech recognition," Proc. ASRU'17, pp. 265--269 (2017), best paper candidate
Tsubasa Ochiai, Shinji Watanabe, and Shigeru Katagiri, "Does Speech Enhancement Work With End-To-End ASR Objectives?: Experimental Analysis Of Multichannel End-To-End ASR," Proc. MLSP'17 (2017)
Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken'ichi Furuya, Shinji Watanabe and Jonathan Le Roux, "Coupled initialization of multi-channel non-negative matrix factorization based on spatial and spectral information," Proc. Interspeech'17, pp. 2461--2465 (2017)
Takaaki Hori, Shinji Watanabe, Yu Zhang and William Chan, "Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM," Proc. Interspeech'17, pp. 949--953 (2017)
Takahiro Shinozaki, Shinji Watanabe, Daichi Mochihashi and Graham Neubig, "Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text," Proc. Interspeech'17, pp. 2546--2550 (2017)
Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, and John R. Hershey, "Multichannel End-to-End Speech Recognition" Proc. ICML'17, pp. 2632--2641 (2017)
Takaaki Hori, Shinji Watanabe, and John R. Hershey, "Joint CTC/attention decoding for end-to-end speech recognition" Proc. ACL'17 long paper, pp. 518--529 (2017)
Suyoun Kim, Takaaki Hori, and Shinji Watanabe, "Joint CTC-attention based end-to-end speech recognition using multi-task learning," Proc. ICASSP'17, pp. 4835--4839 (2017)
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda, "BLSTM-HMM Hybrid System Combined with Sound Activity Detection Network for Polyphonic Sound Event Detection" Proc. ICASSP'17, pp. 766--780 (2017)
Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, and John Hershey, "Student-Teacher Network Learning with Enhanced Features," Proc. ICASSP'17, pp. 5275--5279 (2017)
Zhong Meng, Shinji Watanabe, John R. Hershey, and Hakan Erdogan, "Deep Long Short-Term Memory Adaptive Beamforming Networks for Multichannel Robust Speech Recognition," Proc. ICASSP'17, pp. 271--275 (2017)
Xiong Xiao, Shinji Watanabe, Eng Siong Chng, and Haizhou LI, "Beamforming Networks Using Spatial Covariance Features for Far-field Speech Recognition", Proc. APSIPA ASC (2016)
Yuuki Tachioka, Shinji Watanabe, and Takaaki Hori, "The MELCO/MERL System Combination Approach for the Fourth CHiME Challenge," Proc. CHiME 2016 Workshop pp, 1--3 (2016)
Xiong Xiao, Chenglin Xu, Zhaofeng Zhang, Shengkui Zhao, Sining Sun, and Shinji Watanabe, "A Study of Learning Based Beamforming Methods for Speech Recognition," Proc. CHiME 2016 Workshop, pp. 26--31 (2016)
Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe, and Takaaki Hori, "Evolution Strategy Based Neural Network Optimization and LSTM Language Model for Robust Speech Recognition," Proc. CHiME 2016 Workshop, pp. 32--35 (2016)
Hakan Erdogan, Tomoki Hayashi, John R. Hershey, Takaaki Hori, Chiori Hori, Wei-Ning Hsu, Suyoun Kim, Jonathan Le Roux, Zhong Meng, and Shinji Watanabe "Multi-Channel Speech Recognition: LSTMs All the Way Through," Proc. CHiME 2016 Workshop, pp. 45--48 (2016)
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda1, Takaaki Hori, Jonathan Le Roux, and Kazuya Takeda, "Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection," Proc. DCASE2016 Workshop (2016)
Takaaki Hori, Hai Wang, Chiori Hori, Shinji Watanabe, Bret A. Harsham, Jonathan Le Roux, John R. Hershey, Yusuke Koji, Yi Jing, Zhaocheng Zhu, and Takeyuki Aikawa, “Dialog state tracking with attention-based sequence-to-sequence learning,“ Proc. SLT’16, pp.552--558 (2016). Ranked 2nd at DSTC5 challenge.
Tomohiro Tanaka, Takafumi Moriya, Takahiro Shinozaki, Shinji Watanabe, Takaaki Hori, and Kevin Duh, “Automated structure discovery and parameter tuning of neural network language model based on evolution strategy,“ Proc. SLT’16, pp.665--671 (2016)
Toshiaki Koike-Akino, Ruhi Mahajan, Tim K. Marks, Oncel C. Tuzel, Ye Wang, Shinji Watanabe, Philip V. Orlik, “High-Accuracy User Identification Using EEG Biometrics,“ Proc. EMBC (IEEE Engineering in Medicine and Biology Society)’16, (2016)
Yusuf Işık, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe and John Hershey, “Single-channel multi-speaker separation using deep clustering,“ Proc. Interspeech’16, pp. 545--549 (2016)
Chiori Hori, Takaaki Hori, Shinji Watanabe and John Hershey, “Context-sensitive and role-dependent spoken language understanding using bidirectional and attention LSTMs,“ Proc. Interspeech’16, pp. 3236--3240 (2016)
Katerina Zmolikova, Martin Karafiat, Karel Vesely, Marc Delcroix, Shinji Watanabe, Lukas Burget and Jan Cernocky, “Data selection by sequence summarizing neural network in mismatch condition training,“ Proc. Interspeech’16, pp. 2354--2358 (2016)
Hakan Erdogan, John Hershey, Shinji Watanabe, Michael Mandel and Jonathan Le Roux, “Improved MVDR beamforming using single-channel mask prediction networks,“ Proc. Interspeech’16, pp. 1981--1985 (2016)
Chiori Hori, Shinji Watanabe, Takaaki Hori, Bret A. Harsham, John R. Hershey, Yusuke Koji, Youichi Fujii, and Yuki Furumoto, “Driver confusion status detection using recurrent neural networks,“ Proc. ICME’16 (2016)
Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Liang Lu, John Hershey, Michael L. Seltzer, Guoguo Chen, Yu Zhang, Michael Mandel, and Dong Yu, “Deep beamforming networks for multi-channel speech recognition,“ Proc. ICASSP’16, pp. 5745--5749 (2016)
John R. Hershey, Zhuo Chen, Jonathan Le Roux, and Shinji Watanabe, “Deep clustering: discriminative embeddings for segmentation and separation,“ Proc. ICASSP’16, pp. 31--35 (2016)
Scott Wisdom, John R. Hershey, Jonathan Le Roux, Shinji Watanabe, “Deep unfolding for multichannel source separation,“ Proc. ICASSP’16, pp. 121--125 (2016)
Karel Vesely, Shinji Watanabe, Katerina Zmolikova, Martin Karafiat, Lukas Burget, Jan Cernocky, “Sequence summarizing neural network for speaker adaptation,“ Proc. ICASSP’16, pp. 5315--5319 (2016)
Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey, “Minimum word error training of long short-term memory recurrent neural network language models for speech recognition,“ Proc. ICASSP’16, pp. 5990--5994 (2016)
Takaaki Hori, Zhuo Chen, Hakan Erdogan, John Hershey, Jonathan Le Roux, Vikramjit Mitra, and Shinji Watanabe, “The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition,“ Proc. ASRU’15, pp. 475--481 (2015). Ranked 2nd at 3rd CHiME Challenge among 25 submissions
Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe, and Kevin Duh “Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy,“ Proc. ASRU’15, pp. 610--616 (2015)
Jon Barker, Ricard Marxer, Emmanuel Vincent, and Shinji Watanabe, “The third `CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines,“ Proc. ASRU’15, pp. 504--511 (2015)
Roger Hsiao, Jeff Ma, William Hartmann, Martin Karafiat, Frantisek Grezl, Lukas Burget, Igor Szoke, Jan Honza Cernocky, Shinji Watanabe, Zhuo Chen, Sri Harish Mallidi, Hynek Hermansky, Stavros Tsakalidis, and Richard Schwartz, “Robust speech recognition in unknown reverberant and noisy conditions,“ Proc. ASRU’15, pp. 533--538 (2015)
Hiroki Kanagawa, Yuuki Tachioka, Shinji Watanabe, and Jun Ishii, “Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN,“ Proc. APSIPA ASC’15, pp. 86--92 (2015)
Chiori Hori, Takaaki Hori, Shinji Watanabe, and John R. Hershey, "Context Sensitive Spoken Language Understanding using Role Dependent LSTM layers", Proc. NIPS Workshop for Machine Learning for SLU & Interaction (2015)
Bret A. Harsham, Shinji Watanabe, Alan Esenther, John R. Hershey, Jonathan Le Roux, Yi Luan, Daniel N. Nikovski, and Vamsi K. Potluru, “Driver prediction to improve interaction with in-vehicle HMI,“ Proc. DSP for In-Vehicle Workshop’15, (2015).
Yi Luan, Shinji Watanabe, and Bret Harsham, “Efficient learning for spoken language understanding tasks with word embedding based pre-training,“ Proc. Interspeech’15, pp. 1398--1402 (2015).
Zhuo Chen, Shinji Watanabe, Hakan Erdogan, and John R. Hershey, “Integration of Speech Enhancement and Recognition using Long-Short Term Memory Recurrent Neural Network,“ Proc. Interspeech’15, pp. 3274--3278 (2015).
Ahmed Hussen Abdelaziz, Shinji Watanabe, John R. Hershey, Emmanuel Vincent, and Dorothea Kolossa, “Uncertainty Propagation through Deep Neural Networks,“ Proc.Interspeech’15, pp. 3561--3565 (2015).
Yuuki Tachioka, Shinji Watanabe, “Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features,“ Proc. Interspeech’15, pp. 3541--3545 (2015).
Yuuki Tachioka, Shinji Watanabe, “A Discriminative Method for Recurrent Neural Network Language Models,“ Proc. ICASSP’15, pp. 5386--5390 (2015).
Takahiro Shinozaki, Shinji Watanabe, “Structure Discovery of Deep Neural Network Based on Evolutionary Algorithms,“ Proc. ICASSP’15, pp. 4979--4983 (2015).
Erdogan Hakan, John R. Hershey, Shinji Watanabe, Jonathan Le Roux, “Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks,” Proc. ICASSP’15, pp. 708--712 (2015).
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, and John Hershey, "Sequence Discriminative Training for Low-Rank Deep Neural Networks," Proc. GlobalSIP14-Machine Learning Applications in Speech Processing, pp. 735--739 (2014).
Felix Weninger, Jonathan Le Roux, John Hershey, and Shinji Watanabe, "Discriminative NMF and its Application to Single-Channel Source Separation," Proc. Interspeech'14, pp. 865--869 (2014)
Shinji Watanabe, John Hershey, Tim Marks, Youichi Fujii, and Yusuke Koji, "Cost-Level Integration of Statistical and Rule-Based Dialog Managers," Proc. Interspeech'14, pp. 323--327 (2014)
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, and John Hershey, "Sequential Maximum Mutual Information Linear Discriminant Analysis for Speech Recognition," Proc.Interspeech'14, pp. 2415--2419 (2014)
Yuuki Tachioka, Tomohiro Narita, Shinji Watanabe, Jonathan Le Roux, "Ensemble Integration of Calibrated Speaker Localization and Statistical Speech Detection in Domestic Environments," Proc. HSCMA (2014)
Yuuki Tachioka, Tomohiro Narita, Felix J Weninger, and Shinji Watanabe, "Dual System Combination Approach for Various Reverberant Environments with Dereverberation Techniques," Proc. REVERB Workshop (2014) Ranked 1st place at REVERB Challenge Official Data Track (2nd overall)
Felix J Weninger, Shinji Watanabe, Jonathan Le Roux, John Hershey, Yuuki Tachioka, Jürgen T. Geiger, Björn W Schuller, and Gerhard Rigoll, "The MERL/MELCO/TUM system for the REVERB Challenge using Deep Recurrent Neural Network Feature Enhancement," Proc. REVERB Workshop (2014)
Hao Tang, Shinji Watanabe, Tim K. Marks, and John Hershey, "Log-Linear Dialog Manager," Proc. ICASSP'14, pp. 4120 -- 4123 (2014)
Shinji Watanabe, Jonathan Le Roux, "Black Box Optimization for Automatic Speech Recognition," Proc. ICASSP'14, pp. 3280 -- 3283 (2014)
Chao Weng, Dong Yu, Shinji Watanabe, and Biing-Hwang (Fred) Juang, "Recurrent Deep Neural Networks for Robust Speech Recognition," Proc. ICASSP'14, pp. 5569 -- 5572 (2014)
Felix J Weninger, Shinji Watanabe, Yuuki Tachioka, and Björn W Schuller, "Deep Recurrent De-Noising Auto-Encoder and Blind De-Reverberation for Reverberated Speech Recognition," Proc. ICASSP'14, pp. 4656 -- 4659 (2014)
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, and John Hershey, “A Generalized Discriminative Training Framework for System Combination,” Proc. ASRU’13, pp. 43--48 (2013)
Emmanuel Vincent, Jon Barker, Shinji Watanabe, Jonathan Le Roux, Francesco Nesta, and Marco Matassoni, “The Second `CHiME' Speech Separation and Recognition Challenge: An Overview of Challenge Systems and Outcomes,” Proc. ASRU’13, pp. 162 -- 167 (2013)
Jonathon Le Roux, Shinji Watanabe, and John R. Hershey, “Ensemble Learning for Speech Enhancement,” Proc. WASPAA’13 (2013)
Koichiro Yoshino, Shinji Watanabe, Jonathon Le Roux, and John R. Hershey, “Statistical dialogue management using intention dependency graph,” Proc. IJCNLP’13, pp. 962--966 (2013)
Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, and Tetsunori Kobayashi, “Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data,” Proc. MLSP’13 (2013)
Yuuki Tachioka and Shinji Watanabe, “Discriminative training of acoustic models for system combination,” Proc. Interspeech’13, pp. 2355--2359 (2013)
Shinji Watanabe and John R. Hershey, “Stereo based feature enhancement based on dictionary learning,” Proc. ICASSP’13, pp. 7073--7077 (2013)
Yuuki Tachioka, Shinji Watanabe, and John Hershey, “Effectiveness of discriminative training and feature transformation for reverberated and noisy speech,” Proc. ICASSP’13, pp. 6935--6939 (2013)
Emmanuel Vincent, Jon Barker, Shinji Watanabe, Jonathan Le Roux, Francesco Nesta, and Marco Matassoni, “The second ‘CHiME’ speech separation and recognition challenge: datasets, tasks and baselines,” Proc. ICASSP’13, pp. 126--130 (2013)
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, and John R. Hershey, “Discriminative methods for noise robust speech recognition: A CHiME Challenge Benchmark,” Proc. CHiME’13, pp. 19--24, (2013) Ranked 1st place at 2nd CHiME Challenge Track 2
Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi, “Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model,” Proc. Interspeech’12, pp. 2166--2169 (2012)
Seong-Jun Hahm, Shinji Watanabe, Masakiyo Fujimoto, Atsunori Ogawa, Takaaki Hori, and Atsushi Nakamura, “Normalization and Adaptation by Consistently Employing MAP Estimation,” in: Proc. of International Workshop on Statistical Machine Learning for Speech Processing (IWSML) (2012).
Shinji Watanabe, Yotaro Kubo, Takanobu Oba, Takaaki Hori, Atsushi Nakamura, “Bag of arcs: new representation of speech segment features based on finite state machines,” Proc. ICASSP’12, pp. 4201--4204 (2012).
Roland Roller, Shinji Watanabe, Tomoharu Iwata, “Effect of dialog acts on word use in polylogue,” Proc. ICASSP’12, pp. 4969--4972 (2012).
Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, Nobuaki Minematsu, Keikichi Hirose, “MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments,” Proc. ICASSP’12, pp. 4109--4112 (2012). ICASSP 2012 Student Paper Award
Ekapol Chuangsuwanich, Shinji Watanabe, Hori Takaaki, Tomoharu Iwata, James Glass, “Handling uncertain observations in unsupervised topic-mixture language model adaptation,”Proc. ICASSP’12, pp. 5033--5036 (2012).
Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi, “Fully bayesian inference of multi-mixture gaussian model and its evaluation using speaker clustering,” Proc. ICASSP’12, pp. 5253--5256 (2012).
Marc Delcroix, Atsunori Ogawa, Shinji Watanabe, Tomohiro Nakatani, Atsushi Nakamura, “Discriminative feature transforms using differenced maximum mutual information,” Proc. ICASSP’12, pp. 4753--4756 (2012).
Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Simon Wiesler, Ralf Schlueter, Hermann Ney, “Basis vector orthogonalization for an improved kernel gradient matching pursuit method,” Proc. ICASSP’12, pp. 4753--4756 (2012).
Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, “Decoding network optimization using minimum transition error training,” Proc. ICASSP’12, pp. 4197--4200 (2012).
Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani, “Noise suppression with unsupervised joint speaker adaptation and noise mixture model estimation,” Proc. ICASSP’12, pp. 4713 - 4716 (2012).
Shinji Watanabe, Atsushi Nakamura, and Biing-Hwang Juang, "Bayesian Linear Regression for Hidden Markov Model Based on Optimizing Variational Bounds," Proc. MLSP'11, pp. 1--6 (2011).
Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Atsunori Ogawa, Takaaki Hori, Shinji Watanabe, Masakiyo Fujimoto, Takuya Yoshioka, Takanobu Oba, Yotaro Kubo, Mehrez Souden, Seong-Jun Hahm, and Atsushi Nakamura, "Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation, " Proc. CHiME'11, pp. 12—17 (2011) Ranked 1st place at 1st CHiME Challenge
Shinji Watanabe, Atsushi Nakamura, and Biing-Hwang Juang, "Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution," Proc. Interspeech'11, pp. 1081—1084 (2011)
Masakiyo Fujimoto, Shinji Watanabe, and Tomohiro Nakatani, "A Robust Estimation Method of Noise Mixture Model for Noise Suppression," Proc. Interspeech'11, pp. 697—700 (2011)
Tomoharu Iwata and Shinji Watanabe, "Learning Influences from Word Use in Polylogue," Proc. Interspeech'11, pp. 3089-3092 (2011)
Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, and Tetsunori Kobayashi, "Speaker Clustering Based on Utterance-oriented Dirichlet Process Mixture Model," Proc. Interspeech'11, pp. 2905-2908 (2011)
Tomoharu Iwata, Shinji Watanabe and Hiroshi Sawada, “Fashion Coordinates Recommender System using Photographs from Fashion Magazines,” Proc. IJCAI’11, pp. 2262--2267 (2011)
Marc Delcroix, Shinji Watanabe, Tomohiro Nakatani, and Atsushi Nakamura, “Discriminative approach to dynamic variance adaptation for noisy speech recognition,” Proc. HSCMA’11, pp. 7--12 (2011)
Shoko Araki, Takaaki Hori, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato, “Low-latency meeting recognition and understanding using distant microphones,” Proc. HSCMA’11, pp. 151--152 (2011)
Takuya Maekawa, Shinji Watanabe, ”Unsupervised Activity Recognition with User's Physical Characteristics Data,” Proc. ISWC’11, pp. 89--96 (2011). Best In-Category Nominee
Shinji Watanabe, Daichi Mochihashi, Takaaki Hori, and Atsushi Nakamura, ”Gibbs Sampling Based Multi-Scale Mixture Model for Speaker Clustering,” Proc. ICASSP’11, pp. 4524--4527 (2011).
Masakiyo Fujimoto, Shinji Watanabe, and Tomohiro Nakatani, “Non-Stationary Noise Estimation Method Based on Bias-Residual Component Decomposition for Robust Speech Recognition,” Proc. ICASSP’11, pp. 4816 - 4819 (2011).
Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, and Nobuaki Minematsu, “High Accurate Model-Integration-Based Voice Conversion Using Dynamic Features and Model Structure Optimization,” Proc. ICASSP’11, pp. 4576 - 4579 (2011).
Yotaro Kubo, Simon Wiesler, Ralf Schlueter, Hermann Ney, Shinji Watanabe, Atsushi Nakamura, and Tetsunori Kobayashi, “Subspace Pursuit Method for Kernel-Log-Linear Models,” Proc. ICASSP’11, pp. 4500 - 4503 (2011).
Shinji Watanabe, Tomoharu Iwata, Takaaki Hori, Atsushi Sako, and Yasuo Ariki, "Application of Topic Tracking Model to Language Model Adaptation and Meeting Analysis," Proc. IEEE Workshop on Spoken Language Technology (SLT'10), pp. 366--371 (2010)
Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato, "Real-time Meeting Recognition and Understanding Using Distant Microphones and Omni-directional Camera," Proc. IEEE Workshop on Spoken Language Technology (SLT'10), pp. 412—417 (2010)
Shinji Watanabe, Takaaki Hori, and Atsushi Nakamura, "Large Vocabulary Continuous Speech Recognition Using WFST-based Linear Classifier for Structured Data," Proc. Interspeech'10, pp. 346--349, (2010)
Masakiyo Fujimoto, Shinji Watanabe, and Tomohiro Nakatani, "Voice Activity Detection Using Frame-Wise Model Re-Estimation Method Based on Gaussian Pruning with Weight Normalization," Proc. Interspeech'10, pp. 3102--3105, (2010)
Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, and Tetsunori Kobayashi, "A Regularized Discriminative Training Method of Acoustic Models Derived by Minimum Relative Entropy Discrimination," Proc. Interspeech'10, pp. 2954--2957, (2010)
Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, and Nobuaki Minematsu, "Probabilistic Integration of Joint Density Model and Speaker Model for Voice Conversion," Proc. Interspeech'10, pp. 1728--1731, (2010)
Takaaki Hori, Shinji Watanabe, and Atsushi Nakamura, "Improvements of Search Error Risk Minimization in Viterbi Beam Search for Speech Recognition," Proc. Interspeech'10, pp. 1962--1965, (2010)
Shinji Watanabe, Takaaki Hori, Erik McDermott, and Atsushi Nakamura, "A discriminative model for continuous speech recognition based on weighted finite state transducers," Proc. ICASSP'10, pp. 4922--4925, (2010)
David Cournapeau, Shinji Watanabe, and Atsushi Nakamura, Tatsuya Kawahara, "Using online model comparison in the variational Bayes framework for online unsupervised voice activity detection," Proc. ICASSP'10, pp. 4462--4465, (2010)
Takaaki Hori, Shinji Watanabe, and Atsushi Nakamura, "Search error risk minimization in Viterbi beam search for speech recognition," Proc. ICASSP'10, pp. 4934--4937, (2010)
Kazuo Aoyama, Shinji Watanabe, Hiroshi Sawada, Yasuhiro Minami, Naonori Ueda, and Kazumi Saito, "Fast similarity search on a large speech data set with neighborhood graph indexing," Proc. ICASSP'10, pp. 5358--5361, (2010)
Erik McDermott, Shinji Watanabe, and Atsushi Nakamura, "Discriminative training based on an integrated view of MPE and MMI in margin and error space," Proc. ICASSP'10, pp. 4894--4897, (2010)
Hideyuki Watanabe, Shigeru Katagiri, Kouta Yamada, Erik McDermott, Atsushi Nakamura, Shinji Watanabe, and Miho Ohsaki, "Minimum error classification with geometric margin control," Proc. ICASSP'10, pp. 4922―4925, (2010)
Erik McDermott, Shinji Watanabe, and Atsushi Nakamura, "Margin-Space Integration of MPE Loss via Differencing of MMI Functionals for Generalized Error-Weighted Discriminative Training," Proc. Interspeech'09, pp. 224--227, (2009)
Yosuke Izumi, Kenta Nishiki, Shinji Watanabe, Takuya Nishimoto, Nobutaka Ono, and Shigeki Sagayama, "Stereo-input Speech Recognition using Sparseness-based Time-frequency Masking in a Reverberant Environment," Proc. Interspeech'09, pp. 1955--1958, (2009)
Tomoharu. Iwata, Shinji Watanabe, Takeshi Yamada and Naonori Ueda, "Topic tracking model for analyzing consumer purchase behavior," Proc. IJCAI'09, pp. 1427--1432, (2009)
Atushi Nakamura, Erik McDermott, Shinji Watanabe, and Shigeru Katagiri, "A unified view for discriminative objective functions based on negative exponential of difference measure between strings, " Proc. ICASSP'09, pp. 1633-1636, (2009)
Shinji Watanabe and Atsushi Nakamura, "Speech recognition with incremental tracking and detection of changing environments based on a macroscopic time evolution system, " Proc. ICASSP'09, pp. 4373-4376, (2009)
Marc Delcroix, Tomohiro Nakatani, and Shinji Watanabe, "Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer," Proc. ICASSP'08, pp. 4073--4076, (2008)
Shinji Watanabe and Atsushi Nakamura, "A unified interpretation of adaptation approaches based on a macroscopic time evolution system and indirect/direct adaptation approaches," Proc. ICASSP'08, pp. 4285--4288, (2008)
Shinji Watanabe and Atsushi Nakamura, "Incremental adaptation based on a macroscopic time evolution system," Proc. ICASSP'07, vol. 4, pp. 769--772, (2007)
Shinji Watanabe and Atsushi Nakamura, "Acoustic model adaptation based on coarse/fine training of transfer vectors using directional statistics," Proc. ICASSP'06, vol. 1, pp. 1005--1008, (2006)
Shinji Watanabe and Atsushi Nakamura, "Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition," Proc. Interspeech'05, pp. 1105--1109, (2005)
Shinji Watanabe and Atsushi Nakamura, "Robustness of acoustic model topology determined by VBEC (Variational Bayesian Estimation and Clustering for speech recognition) for different speech data sets," Proc. Workshop on statistical modeling approach for speech recognition - Beyond HMM, pp. 55--60, (2004)
Shinji Watanabe and Atsushi Nakamura, "Acoustic model adaptation based on coarse-fine training of transfer vectors and its application to speaker adaptation task," Proc. ICSLP'04, vol. 4, pp. 2933--2936, (2004)
Parham Zolfaghari, Shinji Watanabe, Atsushi Nakamura and Shigeru Katagiri, "Bayesian Modelling of the Speech Spectrum Using Mixture of Gaussians," Proc. ICASSP'04, vol. 1, pp. 553--556, (2004)
Shinji Watanabe, Atsushi Sako and Atsushi Nakamura, "Automatic Determination of Acoustic Model Topology using Variational Bayesian Estimation and Clustering," Proc. ICASSP'04, vol. 1, pp. 813--816, (2004)
Parham Zolfaghari, Hiroko Kato, Shinji Watanabe and Shigeru Katagiri, "Speech Spectral Modelling using Mixture of Gaussians, " Proc. SWIM , (2004)
Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, and Naonori Ueda, "Bayesian Acoustic Modeling for Spontaneous Speech Recognition," Proc. SSPR'03, pp. 47--50, (2003)
Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, and Naonori Ueda, "Application of Variational Bayesian Estimation and Clustering to Acoustic Model Adaptation," Proc. ICASSP'03, vol. 1, pp. 568--571, (2003)
Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, and Naonori Ueda, "Application of Variational Bayesian Approach to Speech Recognition," NIPS15 MIT Press, (2002)
Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, and Naonori Ueda, "Constructing Shared-State Hidden Markov Models Based on a Bayesian Approach," Proc. ICSLP'02, vol. 4, pp. 2669--2672, (2002).