Research topics

xxx

To appear.

Dialogue-history-aware speech synthesis (2022)

To appear.

Automatic prediction of synthetic speech quality (2022)

To appear.

Self-supervised audio restoration (2022)

To appear.

Emotion prediction of nonverbal vocalization (2022)

To appear.

Speech pseudonymization (2022)

To appear.

Personalized filled-pause prediction (2022)

To appear.

Emotion-controllable speech synthesis (2021)

To appear.

Digital speech makeup (2021)

To appear.

Audiobook speech synthesis considering cross-sentence contexts (2021)

To appear.

Accent modeling for dialect speech synthesis (2021)

To appear.

Cross-lingual TTS as domain adaptation (2021)

To appear.

Computational speech chain (2021)

To appear.

Onoma-to-wave (2021)

To appear.

VAE-based voice conversion (2021)

To appear.

Inter-singer similarity in Japanese song (2020)

To appear.

Crowd-powered anonymization (2020)

To appear.

Stereo electrolarynx (2020)

To appear.

VAE-based accent retrieval (2020)

To appear.

manga2voice (2020)

To appear.

Lightweight voice anonymization (2020)

To appear.

Phoneme-balanced singing voice corpus (2020)

To appear.

Cross-lingual TTS via domain adaptation (2020)

To appear.

Real-time, full-band, light-weight voice conversion (2020)

To appear.

HumanGAN (2020)

To appear.

Mask-shaped voice conversion device (2019)

To appear.

Speaker V2S (verification-to-synthesis) attack (2019)

To appear.

Speaker representation using human perception (2019)

To appear.

Real-time DNN-based voice conversion (2019)

To appear.

Unsupervised subword tokenization for end-to-end speech synthesis (2019)

To appar.

心地よく使える高品質音声変換を目指し,リアルタイム広帯域音声変換と,変換エラーの定量化・緩和を目指します.

個別化言語教育に向け,外国人による非流暢な日本語音声から,その人の声色で,その人より流暢な日本語音声を合成します.

人間の「歌声の間違い方」を学習して,自然な多重録音感を付与します. It provides natural "doubletrackedness" to singing voices.

tamaru19icassp_ndt_poster.pdf

NMF/VAE-based anomaly detection under noisy environments (2018)

雑音環境下における異常音検出のため,VAEとNMFに基づく異常検出法を提案します. VAE/NMF-based detection of anomaly audio under noisy environments.

aiba18otogaku_noise-nmf.pdf

MMD-based data augmentation for speaker recognition (2018)

話者照合の精度向上のために,MMDに基づくデータ拡張を行います.Introduce MMD-based data augmentation for improving speaker verification accuracy.

shiota18apsipa_asv-augmentation.pdf

Automated jazz harmonization (2018)

ジャズ音楽のための自動和声付けを行います.DNN-based jazz harmonization

Crowdworkers' creditability for sound localization evaluation (2018)

クラウドソーシングに基づく音像定位評価のために,参加者の信頼度を導入します.Introduce listeners' creditability for sound localization evaluation using crowdsourcing.

takamichi18asja_crowd_poster.pdf

Low-musical-noise speech enhancement (2018)

処理による不快な人口雑音 (ミュージカルノイズ) の発生を抑圧するDNN音声強調です.DNN-based speech enhancement to reduce harmful noise (a.k.a., musical noise).

mizoguchi18ea11_kurtosis-ratio_poster.pdf

Speech-based automated dementia detection (2018)

音声の長期変動成分により,認知症を自動検出します.Automated dementia detection using long-term components of speech spectra.

sonobe18asjs_dementia_poster.pdf

Phase modeling with directional-statistics DNN (2018)

方向統計DNNを用いて,音声信号の位相情報をモデル化します.Modeling phase spectrograms of speech signals based on directional-statistics DNN

takamichi18iwaenc_phase_poster.pdf

Multi-dialect speech synthesis (2018)

日本の色んな方言を1人の声で話してくれる音声合成です.Multi-Japanese-dialect speech synthesis with one speaker's voice.

akiyama18apsipa_dialect_slide.pdf

Pre-processing for vocoder-free speech synthesis (2017)

WaveNetなどのボコーダフリー音声合成の学習精度を上げるための前処理法です.A pre-processing method for improving training accuracy of vocoder-free speech synthesis such as WaveNet.

敵対的学習 (GAN) を導入して「声のなりすまし検出器」を騙すように音声合成をアップデートします. Introducing generative adversarial networks, we train speech synthesis to deceive anti-spoofing verification (security technology to distinguish natural/synthetic speech.)

計算機が人間の様に「喋るたびにちょっとずつ違う」音声 (一期一会な音声) を生成します.Computers can speak speech that has human-like inter-utterance randomness.

takamichi17interspeech_poster.pdf

音声変換の品質を競うVoice conversion challenge 2016にエントリーし,話者再現度に関して世界最高精度を獲得しました.Report of Voice Conversion Challenge 2016. Our system achieved the 1st place in speaker similarity.

kobayashi2016interspeech.pdf

言語学習者の声で,その人が将来話すであろう流暢な音声を作り出します.Speech technologies for language learning. We can synthesize your fluent (native-like) speech from your non-fluent (non-native) speech.

テキスト音声合成の品質を競うBlizzard Challenge 2015の結果報告です.3つのインド言語において,世界最高の音質と評価されました.Report of Blizzard Challenge 2015 which compares performance of Text-to-speech synthesizer. Our system achieved the best speech quality in three Indian languages.

音声特徴量の時間変動を表す変調スペクトルを持ちいて,高品質音声合成を実現します.Speech synthesis incorporating the modulation spectra that represent temporal fluctuation of speech parameters.