Yamato Ohtani, ``Techniques for Improving Voice Conversion Based on Eigenvoices,'' Ph.D. thesis,Nara Institute of Science and Technology, March 2010.
Yamato Ohtani, ``High Quality One-to-Many Voice Conversion with Mixed Excitation and Eigenvoices,'' Master's thesis,Nara Institute of Science and Technology, March 2007 (in Japanese).
Noriyuki Matsunaga, Yamato Ohtani, Tatsuya Hirahara, ``Normalized Method of Linguistic Feature Suitable for Fundamental Frequency in Japanese Text to Speech Using Deep Learning,'' IEICE Trans. Information and Systems, Vol.J102-D No.10 pp.721-729 (in Japanese).
Yamato Ohtani, Masatsune Tamura, Masahiro Morita and Masami Akamine, ``Statistical bandwidth extension based on Gaussian mixture model with sub-band basis spectrum model,`` IEICE Trans. Information and Systems, vol.E99-D, no.10, October 2016.
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Improvements of the one-to-many eigenvoice conversion system,'' IEICE Trans. Information and Systems, vol.E93-D, no.9, pp.2491--2499, September 2010.
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Adaptive training for voice conversion based on eigenvoices,'' IEICE Trans. Information and Systems, vol.E93-D, no.6, pp.1589--1598, June 2010.
Shin-ich Kawamoto, Yoshihiro Adachi, Yamato Ohtani, Tatsuo Yotsukura, Shigeo Morishima and Satoshi Nakamura,``Voice output system considering personal voice for instant casting movie,'' IPSJ Journal, vol. 51, no. 2, pp. 1234--1248, February 2010 (in Japanese).
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Maximum Likelihood Voice Conversion Based on Gaussian Mixture Model with STRAIGHT Mixed Excitation,'' IEICE Transactions in Japanese,Vol. J91-D, No. 4, pp. 1082--1091, April 2008.
Takuma Okamoto, Yamato Ohtani, Tomoki Toda and Hisashi Kawai, "ConvNeXt-TTS and ConvNeXt-VC: ConvNeXt-based fast end-to-end sequence-to-sequence text-to-speech and voice conversion," Proc. ICASSP 2024, Apr. 2024. (accepted, to appear)
Yamato Ohtani, Takuma Okamoto, Tomoki Toda and Hisashi Kawai, "FIRNet: Fundamental frequency controllable fast neural vocoder with trainable finite impulse response filter," Proc. ICASSP 2024, Apr. 2024. (accepted, to appear)
Takuma Okamoto, Haruki Yamashita, Yamato Ohtani, Tomoki Toda and Hisashi Kawai, "WaveNeXt: ConvNeXt-based fast neural vocoder without iSTFT layer," Proc. ASRU 2023, Dec. 2023.
Daisuke Yoshioka, Yusuke Yaduda, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda, "Spoken-text-style transfer with conditional variational autoencoder and content word storage," Proc. INTERSPEECH, pp. 4576-4580, Incheon, Korea, Sept. 2022.
Yi-Chao Wu, P. L. Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, T Toda, ``A cyclical post-filtering approach to mismatch refinement of neural vocoder for text-to-speech systems,'' Proc. Interspeech 2020 (accepted).
Yamato Ohtani, Koichiro Mori and Masahiro Morita, ``Voice quality control using perceptual expressions for statistical parametric speech synthesis based on cluster adaptive training,'' Proc. Interspeech2016, San Francisco, September 2016 (accepted).
Yamato Ohtani, Yu Nasu, Masahiro Morita and Masami Akamine, ``Emotional Transplant in Statistical Speech Synthesis Based on Emotion Additive Model,'' Proc. Interspeech2015, Dresden, September 2015.
Yamato Ohtani, Masatsune Tamura, Masahiro Morita and Masami Akamine, ``GMM-based bandwidth extension using sub-band basis spectrum model,'' Proc. Interspeech2014, pp. 2489--2493, Singapore, September 2014.
Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima and Masami Akamine, ``HMM-based speech synthesis using sub-band basis spectrum model,'' Proc. Interspeech2012 (accepted), Portland, September 2012.
Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima and Masami Akamine, ``Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP,'' Proc. Interspeech2012 (accepted), Portland, September 2012.
Javier Latorre, Mark J. F. Gales, Sabine Buchholz, Kate Knill, Masatsune Tamura, Yamato Ohtani and Masami Akamine, ``Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?,'' Proc. of ICASSP201, pp. 4724--4727, May 2011.
Kumi Ohta, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari and Kiyohiro Shikano, ``Adaptive voice-quality control based on one-to-many eigenvoice conversion,'' Proc. of INTERSPEECH, pp.2158-2161, Chiba, Japan, September 2010.
Chie Hayashida, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari and Kiyohiro Shikano, ``Linear transformation approaches to many-to-one voice conversion,'' Proc. of the 7th ISCA Speech Synthesis Workshop (SSW7), pp.74-79, Kyoto, Japan, September 2010.
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``NON-PARALLEL TRAINING FOR MANY-TO-MANY EIGENVOICE CONVERSION,'' Proc. ICASSP 2010, pp. 4822--4825, Dallas, U.S.A., March 2010.
Shin-ichi Kawamoto, Yoshihiro Adachi, Yamato Ohtani, Tatsuo Yotsukura, Shigeo Morishima and Satoshi Nakamura, ``Scenario speech assignment technique for instant casting movie system,'' ACCV2009 Invited workshop on Vision Based Human Modeling and Synthesis, Xi'an, China, September 23-27, 2009.
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Many-to-Many Eigenvoice Conversion with Reference Voice,'' INTERSPEECH, pp. 1623--1626, Brighton, UK, Sept. 2009.
Malorie Charlier, Yamato Ohtani, Tomoki Toda, Alexis Moinet and Thierry Dutoit, ``Cross-Language Voice Conversion Based on Eigenvoices,'' INTERSPEECH, pp. 1635-1638, Brighton, UK, Sept. 2009.
Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Low-Delay Voice Conversion based on Maximum Likelihood Estimation of Spectral Parameter Trajectory,'' INTERSPEECH 2008, pp.1076--1079, September 2008.
Daisuke Tani, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari and Kiyohiro Shikano, ``Maximum A Posteriori Adaptation for Many-to-One Eigenvoice Conversion,'' INTERSPEECH 2008, pp.1461--1464, September 2008.
Yamato Ohtani, Tomoki Toda, Hiroshi. Saruwatari and Kiyohiro Shikano, ``An Improved One-to-Many Eigenvoice Conversion System,'' INTERSPEECH 2008, pp. 1080--1083, September 2008.
Yamato Ohtani, Tomoki Toda, Hiroshi. Saruwatari and Kiyohiro Shikano, ``Speaker Adaptive Training for One-to-Many Eigenvoice Conversion Based on Gaussian Mixture Model,'' Proceedings of the 10th European Conference on Speech Communication and Technology (Interspeech 2007 - Eurospeech), pp. 1981--1984, August 2007.
Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion,'' 6th ISCA Speech Synthesis Workshop (SSW6), pp. 101-106, August 2007.
Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi. Saruwatari and Kiyohiro Shikano, ``An Evaluation of Many-to-One Voice Conversion Algorithms with Pre-Stored Speaker Data Sets,'' 6th ISCA Speech Synthesis Workshop (SSW6), pp. 107-112, August 2007.
Tomoki Toda, Yamato Ohtani and Kiyohiro Shikano, ``One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices,'' International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 4, pp. 1249-1252, April 2007.
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Evaluation of eigenvoice conversion based on Gaussian mixture model,'' ASA/ASJ Joint Meeting, November 2006.
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Maximum Likelihood Voice Conversion Based on GMM with STRAIGHT Mixed Excitation'', the 9th International Conference on Spoken Language Processing (Interspeech 2006 - ICSLP), pp. 2266--2269, September 2006.
Tomoki Toda, Yamato Ohtani and Kiyohiro Shikano, ``Eigenvoice Conversion Based on Gaussian Mixture Model,'' the 9th International Conference on Spoken Language Processing (Interspeech 2006 - ICSLP), pp. 2446--2449, September 2006.
Yamato Ohtani, Noriyuki Matsunaga, Hiroyuki Hirai, ``Emotion manipulation for unit-selection-based speech synthesis using deep neural network,'' IPSJ Technical Report, 2019-SLP-127, 2019 (in Japanese).
Yamato Ohtani and Koichiro Mori, ``Text-to-speech technology to control speaker Individuality with Intuitive Expressions,'' Toshiba review, Vol. 71, No. 4, pp. 80--83, June 2016 (in Japanese).
Yamato Ohtani, Yu Nasu, Ryo Morinaka, Masatsune Tamura, Masahiro Morita, Masami Akamine, ``Shared emotion additive model for HMM-based emotional speech synthesis,'' IEICE Technical Report, SP2014-114 No. 303, pp. 13--18, November 2014 (in Japanese).
Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Masami Akamine, ``Statistical bandwidth extension using sub-band basis spectrum model,'' IEICE Technical Report, SP2014-114, No. 52, pp. 303--308, May 2014 (in Japanese).
Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and K. Shikano, ``Diagonalizing covariance matrices for reducing computation cost of voice conversion based on Gaussian mixture model,'' ISPJ SIG Notes, 2008-SLP-75, pp. 33--38, February 2009 (in Japanese)..
Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and K. Shikano, ``Low-delay voice conversion algorithm based on maximum likelihood estimation of spectral parameter trajectory,'' IEICE Technical Report, SP2008-141, pp. 91--96, January 2009 (in Japanese).
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Many-to-many eigenvoice conversion algorithms with a reference speaker,'' IEICE Technical Report, SP2008-140, pp. 85--90, January 2009 (in Japanese).
Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano,``Evaluation of voice quality control based on one-to-many eigenvoice conversion,'' IEICE Technical Report, SP2007-82, no. 282, pp. 67--72, October 2007 (in Japanese).
Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Many-to-one voice conversion algorithms with pre-stored speaker data sets,'' IEICE Technical Report, SP2007-81, pp. 61--66, October 2007 (in Japanese).
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano, ``Speaker adaptive training for voice conversion based on eigenvoice,'' IEICE Technical Report, SP2006-40, pp. 31--36, August 2006 (in Japanese).
Tomoki Toda, Yamato Ohtani and Kiyohiro Shikano, ``A voice conversion algorithm based on eigenvoice,'' IEICE Technical Report, SP2006-39, pp. 25--30, August 2006 (in Japanese).
Kouichiro Mori, Yamato Ohtani and Masahiro Morita, ``Speaker generation by voice impression words,’’ Proc. of 2016 Spring Meeting of Acoustic Society of Japan, 1-R-22, pp. 289--292, Kanagawa, Japan March 2016 (in Japanese).
Yamato Ohtani, Kouichiro Mori and Masahiro Morita, ``Speaker individuality control by perceptual expressions using cluster adaptive training in statistical speech synthesis,’’ Proc. of 2016 Spring Meeting of Acoustic Society of Japan, 1-R-21, pp. 287--288, Kanagawa, Japan March 2016 (in Japanese).
Yamato Ohtani, Yu Nasu, Masahiro Morita and Masami Akamine, ``Statistical emotional speech synthesis based on emotion additive model predicted from target neutral speech,’’ Proc. of 2015 Autumn Meeting of Acoustic Society of Japan, 2-1-12, pp. 1329--1332, Aizu, Japan September 2015 (in Japanese).
Yamato Ohtani, Yu Nasu, Ryo Morinaka, Masatsune Tamura, Masahiro Morita and Masami Akamine, ``A study of emotion addition to arbitrary speakers based on additive acoustic model in HMM-based speech synthesis,’’ Proc. of 2014 Autumn Meeting of Acoustic Society of Japan, 2-7-2, pp. 233-236, Sapporo, Japan September 2014 (in Japanese).
Yamato Ohtani, Masatsune Tamura, Masahiro Morita and Masami Akamine, ``Bandwidth extension with sub-band spectrum model based on Gaussian mixture model,’’ Proc. of 2014 Spring Meeting of Acoustic Society of Japan, 1-R5-8, pp. 395--396, Tokyo, Japan March 2014 (in Japanese).
Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima and Masami Akamine, ``HMM-based speech synthesis using sub-band basis spectrum model,'' Proc. of 2013 Spring Meeting of Acoustic Society of Japan, 3-P-22b, pp. 491-492, Tokyo, Japan March 2013 (in Japanese).
Yamato Ohtani, Masatsune Tamura and Masahiro Morita, ``Parameter emphasis for HMM-based speech synthesis using histogram,'' Proc. of 2011 Autumn Meeting of Acoustic Society of Japan, 3-Q-1, pp. 349-450, Nagano, Japan September 2011 (in Japanese).
Chie Hayashida, Yamato Ohtani, Tomoki Toda, H. Saruwatari and K. Shikano, ``An evaluation of method of model adaptation with linear regression for manyto-one voice conversion,'' Proc. 2010 Spring Meeting of Acoustic Society of Japan, Tokyo, Japan, 1-7-18, pp. 319--320, March 2010 (in Japanese).
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Canonical model training using non-parallel data sets for many-to-many eigenvoice conversion,'' Proc. 2010 Spring Meeting of Acoustic Society of Japan, 1-7-17, pp. 317--318, Tokyo, Japan, March 2010 (in Japanese).
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Many-to-many voice conversion based on eigenvoices with a reference voice,'' Proc. of 2009 Autumn Meeting of Acoustic Society of Japan, 2-2-1, pp. 285--286, Koriyama, Japan, September 2009 (in Japanese).
Chie Hayashida, Yamato Ohtani, Tomoki Toda, H. Saruwatari and K. Shikano, ``An evaluation of many-to-one voice conversion with linear regression-based adaptation,'' Proc. of 2009 Autumn Meeting of Acoustic Society of Japan, 1-2-13, pp. 261--262, Koriyama, Japan, September 2009 (in Japanese).
Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Diagonalizing covariance matrices for reducing computation cost of voice conversion based on Gaussian mixture model,'' Proc. of 2009 Spring Meeting of Acoustic Society of Japan, 1-6-10, pp. 309--310, Tokyo, Japan, March 2009 (in Japanese).
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``A study of the initial models in speaker adaptive training for eigenvoice conversion,'' Proc. of 2008 Autumn Meeting of Acoustic Society of Japan, 2-P-23, pp. 409--410, Fukuoka, Japan, September 2008 (in Japanese).
Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Low-delay Voice conversion based on maximum likelihood estimation of spectral parameter trajectory,'' Proc. of 2008 Autumn Meeting of Acoustic Society of Japan, 3-4-9, pp. 299--300, Fukuoka, Japan, September 2008 (in Japanese).
Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Many-to-one eigenvoice conversion method rubust to the amount of adaptation data,'' Proc. of 2008 Spring Meeting of Acoustic Society of Japan, 3-Q-11, pp. 397--398, Chiba, Japan, March 2008 (in Japanese).
Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Enhanced voice quality control method based on one-to-many eigenvoice conversion,'' Proc. of 2008 Spring Meeting of Acoustic Society of Japan, 2-11-5, pp. 345--346, Chiba, Japan, March 2008 (in Japanese).
Yamato Ohtani, Shin-ichi Kawamoto, Tomoki Toda, Kiyohiro Shikano and Satoshi Nakamura, ``Specific speech generation based on STRAIGHT morphing,'' Proc. of 2008 Spring Meeting of Acoustic Society of Japan, 1-11-29, pp. 309--310, Chiba, Japan, March 2008 (in Japanese).
Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Preliminary evaluation of voice quality control based on one-to-many eigenvoice conversion,'' Proc. of 2007 Autumn Meeting of Acoustic Society of Japan, 1-4-13, pp. 317--318, Kofu, Japan, September 2007 (in Japanese).
Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``An evaluation of many-to-one voice conversion algorithms based on speaker selection and eigenvoice,'' Proc. of 2007 Autumn Meeting of Acoustic Society of Japan, 1-4-14, pp. 319--320, Kofu, Japan, September 2007 (in Japanese).
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Voice conversion based on eigenvoices considering source features and global variance,'' Proc. of 2007 Spring Meeting of Acoustic Society of Japan, 1-8-12, pp. 215--216, Tokyo, Japan, March 2007 (in Japanese).
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Applying speaker adaptive training to voice conversion based on eigenvoice, '' Proc. of 2006 Autumn Meeting of Acoustic Society of Japan, 1-6-14, pp. 181--182, Kanazawa, Japan, September 2006 (in Japanese).
Tomoki Toda, Yamato Ohtani and Kiyohiro Shikano,``A voice conversion/control algorithm based on eigenvoice,'' Proc. of 2006 Autumn Meeting of Acoustic Society of Japan, 1-6-13, pp. 179--180, Kanazawa, Japan, September 2006 (in Japanese).
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari and Kiyohiro Shikano, ``Maximam likelihood voice conversion based on GMM with STRAIGHT mixed excitation, '' Proc. of 2006 Spring Meeting of Acoustic Society of Japan, 1-4-11, pp. 233--234, Tokyo, Japan, March 2006 (in Japanese).
Yamato Ohtani, ``Voice conversion based on eigenvoices,'' Talk, GIPSA-Lab, France, July 2009.