Satoru Fukayama

Satoru Fukayama (he / him)

Research Team Leader

Intelligent Media Processing Research Team, Artificial Intelligence Research Center,

National Institute of Advanced Industrial Science and Technology (AIST), Japan.

CV : [pdf] Google Scholar : [Link] Web of Science : [Link]

Selected Publications

SingDistVis: Interactive Overview+Detail Visualization for F0 Trajectories of Numerous Singers Singing the Same Song, Takayuki Ito, Tomoyasu Nakano, Satoru Fukayama, Masahiro Hamasaki, Masataka Goto, Multimedia Tools and Applications, DOI 10.1007/s11042-024-18932-3, 2024
DanceUnisoner: A Parametric, Visual, and Interactive Simulation Interface for Choreographic Composition of Group Dance, Shuhei Tsuchida, Satoru Fukayama, Jun Kato, Hiromu Yakura, and Masataka Goto, Vol.E107-D, No.3, pp.386-399, Mar. 2024.
Singer Diarization for Polyphonic Music with Unison Singing, Hitoshi Suda, Daisuke Saito, Satoru Fukayama, Tomoyasu Nakano, Masataka Goto, IEEE-ACM Transaction on Audio Speech and Language Processing, vol. 30, pp. 1531-1545, 2022 doi: 10.1109/TASLP.2022.3166262.
Automatic melody harmonization with triad chords: A comparative study, Yin-Cheng Yeh, Wen-Yi Hsiao, Satoru Fukayama, Tetsuro Kitahara, Benjamin Genchel, Hao-Min Liu, Hao-Wen Dong, Yian Chen, Terence Leong, Yi-Hsuan Yang, Journal of New Music Research, vol. 50, issue 1, pp. 37-51, Jan. 2021
Melody harmonisation with interpolated probabilistic models, Stanislaw. Raczynski, Satoru Fukayama, Emmanuel Vincent, Journal of New Music Research, vol. 42, issue 3, pp. 223-235, Oct. 2013

Research Projects

Speech / Multimedia Processing

Speech Recogntion
Speech Emotion Recognition

Kentaro Onda, Satoru Fukayama, Daisuke Saito, and Nobuaki Minematsu, "Advanced Modeling of Interlanguage Speech Intelligibility Benefit with L1-L2 Multi-Task Learning Using Differentiable K-Means for Accent-Robust Discrete Token-Based ASR," in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2026.
Shinnosuke Takamichi, Tomohiko Nakamura, Hitoshi Suda, Satoru Fukayama, and Jun Ogata, "MangaVox: Dataset of acted voices aligned with manga images towards computer understanding of audio comics," in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2026.
Yu Hayashizaki, Takashi Nose, Sumiharu Kobayashi, Satoru Fukayama, Akinori Ito, "PUNSER: Large-Scale Pre-trained and Unified Model for Practical Speech Emotion Recognition," in Proceedings of the 17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Oct. 2025.
Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe, "OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder," in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025, Oct. 2025.
Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe, "The CMU-AIST submission for the ICME 2025 Audio Encoder Challenge," in Proceedings of 2025 IEEE International Conference on Multimedia and Expo Workshops, Jun. 2025.
Kentaro Onda, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu, "Benchmarking Prosody Encoding in Discrete Speech Tokens," in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2025.
Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu, "Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora," in Proceedings of Interspeech 2025, Aug. 2025.
Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu, "Discrete Tokens Exhibit Interlanguage Speech Intelligibility Benefit: an Analytical Study Towards Accent-robust ASR Only with Native Speech Data," in Proceedings of Interspeech 2025, Aug. 2025.
Hitoshi Suda, Shinnosuke Takamichi, Satoru Fukayama, "Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora," in the Proceedings of Interspeech 2025, Aug. 2025.
Tomohiko Nakamura, Kwanghee Choi, Keigo Hojo, Yoshiaki Bando, Satoru Fukayama, and Shinji Watanabe, "Discrete speech unit extraction via independent component analysis," in SALMA: Speech and Audio Language Models - Architectures, Data Sources, and Training Paradigms, IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, Apr. 2025.
Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, and Shinji Watanabe, "Self-supervised speech representations are more phonetic than semantic," in Proceedings of INTERSPEECH, 2024.
Exploiting Fine-tuning of Self-supervised Learning Models for Improving Bi-modal Sentiment Analysis and Emotion Recognition, Wei Yang, Satoru Fukayama, Panikos Heracleous, Jun Ogata, Interspeech, pp.1998-2002, Sep. 2022
Applying Generative Adversarial Networks and Vision Transformers in Speech Emotion Recognition, Panikos Heracleous, Satoru Fukayama, Jun Ogata, Yasser Mohammad, HCI International 2022 - Late Breaking Papers. Multimodality in Advanced Interaction Environments, LNCS, vol. 13519, pp. 67-75, 2022
Audio-Visual Object Removal in 360-Degree Videos, Ryo Shimamura, Feng Qi, Yuki Koyama, Takayuki Nakatsuka, Satoru Fukayama, Masahiro Hamasaki, Masataka Goto, Shigeo Morishima, The Visual Computer, 36, pp. 2117–2128, Jul. 2020

Music Information Retrieval

Singer diarization
Beat tracking
Recommendation
Transcription
Active music listening interfaces

Hitoshi Suda, Junya Koguchi, Shunsuke Yoshida, Tomohiko Nakamura, Satoru Fukayama, and Jun Ogata, "IdolSongsJp corpus: A multi-singer song corpus in the style of Japanese idol groups," in Proceedings of the 26th International Society for Music Information Retrieval (ISMIR) Conference, Sep. 2025.
Hitoshi Suda, Shunsuke Yoshida, Tomohiko Nakamura, Satoru Fukayama, and Jun Ogata. FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs. in Proceedings of the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024.
SingDistVis: Interactive Overview+Detail Visualization for F0 Trajectories of Numerous Singers Singing the Same Song, Takayuki Ito, Tomoyasu Nakano, Masahiro Hamasaki, Masataka Goto, Multimedia Tools and Applications, 2024 (accepted)
Jacappela Corpus: A Japanese a Cappela Vocal Ensemble Corpus, Tomohiko Nakamura, Shinnosuke Takamichi, Naoko Tanji, Satoru Fukayama, Hiroshi Saruwatari, Proceedings of the 48th IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE ICASSP2023), 2023
Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders, Futa Nakashima, Tomohiko Nakamura, Norihiro Takamune, Satoru Fukayama, Hiroshi Saruwatari, Asia Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 736-743, Nov. 2022
Singer Diarization for Polyphonic Music with Unison Singing, Hitoshi Suda, Daisuke Saito, Satoru Fukayama, Tomoyasu Nakano, Masataka Goto, IEEE-ACM Transaction on Audio Speech and Language Processing, vol. 30, pp. 1531-1545, 2022 doi: 10.1109/TASLP.2022.3166262.
Joint Beat and Downbeat Tracking Based on CRNN Models and a Comparison of Using Different Context Ranges in Convolutional Layers, Tian Cheng, Satoru Fukayama, Masataka Goto, International Computer Music Conference pp. 239-244, 2021 (paper published in 2020, presentation postponed to 2021).
ABCPRec: Adaptively Bridging Consumer and Producer Roles for User-Generated Content Recommendation, Kosetsu Tsukuda, Satoru Fukayama, Masataka Goto, 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR 2019), pp. 1197-1200, July 2019.

Automatic Singing Transcription based on Encoder-Decoder Recurrent Neural Networks with a Weakly-Supervised Attention Mechanism, Ryo Nishikimi, Eita Nakamura, Satoru Fukayama, Masataka Goto, Kazuyoshi Yoshii, 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2019), pp. 161-165, May 2019.
Joint Transcription of Lead, Bass, and Rhythm Guitars based on a Factorial Hidden Semi-Markov Model, Kentaro Shibata, Ryo Nishikimi, Satoru Fukayama, Masataka Goto, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii, 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2019), pp. 236-240, May 2019.
Listener Anonymizer: Camouflaging Play Logs to Preserve User’s Demographic Anonymity, Kosetsu Tsukuda, Satoru Fukayama, Masataka Goto, The 19th International Society for Music Information Retrieval Conference (ISMIR 2018), pp. 687-694, Sep. 2018.
Instrudive: A Music Visualization System Based on Automatically Recognized Instrumentation, Takumi Takahashi, Satoru Fukayama, Masataka Goto, The 19th International Society for Music Information Retrieval Conference (ISMIR 2018), pp. 561-568, Sep. 2018.
Comparing RNN Parameters for Melodic Similarity, Tian Cheng, Satoru Fukayama, Masataka Goto, The 19th International Society for Music Information Retrieval Conference (ISMIR 2018), pp. 763-770, Sep. 2018.
Convolving Gaussian Kernels for RNN-based Beat Tracking, Tian Cheng, Satoru Fukayama, Masataka Goto, The 26th European Signal Processing Conference (EUSIPCO 2018), pp. 1919-1923, Sep. 2018.
ChordScanner: Browsing Chord Progressions based on Musical Typicality and Intra-Composer Consistency, Hiromi Nakamura, Tomoyasu Nakano, Satoru Fukayama, Masataka Goto, The 43rd International Computer Music Conference (ICMC 2018), pp. 250-255, Aug. 2018.
The CrossSong Puzzle: Developing a Logic Puzzle for Musical Thinking, Jordan B. L. Smith, Jun Kato, Satoru Fukayama, Graham Percival, Masataka Goto, Journal of New Music Research, vol. 46, issue 3, pp.213-228, Mar. 2017
Music Emotion Recognition with adaptive aggregation of Gaussian Process Regressors, Satoru Fukayama, Masataka Goto, Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE ICASSP2016), pp.71-75, Mar. 2016
CrossSongPuzzle: Generating and Unscrambling Music Mashups with Real-time Interactivity, Smith Jordan, Graham Percival, Jun Kato, Masataka Goto, Satoru Fukayama, Proceedings of the 12th Sound and Music Computing Conference (SMC2015), pp.61-67, Jul. 2015

Vibration Data Processing

Seismic wave analysis
Anomaly detection of machines (wind turbines)

Yuta Amezawa, Tomohiko Nakamura, Takahiro Shiina, Satoru Fukayama, Jun Ogata, Hiroki Kuroda, and Takahiko Uchide, "Automatic detection and extraction of later phase in S coda using machine learning for crustal heterogeneity exploration," ACES (APEC Cooperation for Earthquake Science) International Workshop, Nov. 2025.

Music Generation

Generating Melody, Chords, Drum Track with Probabilistic Models
Melody Harmonization
Automatic Arrangement (Piano, Guitar, String Quartet, Chorus, Jazzification)
Expressive Performance Rendering
Composing Japanese Songs from Lyrics

Contour-Preserving Melody Conversion, Satoru Fukayama, Masataka Goto, International Computer Music Conference 2021 (ICMC2021), pp.172-177, 2021
Automatic melody harmonization with triad chords: A comparative study, Yin-Cheng Yeh, Wen-Yi Hsiao, Satoru Fukayama, Tetsuro Kitahara, Benjamin Genchel, Hao-Min Liu, Hao-Wen Dong, Yian Chen, Terence Leong, Yi-Hsuan Yang, Journal of New Music Research, vol. 50, issue 1, pp. 37-51, Jan. 2021
Chord Jazzification: Learning Jazz Interpretations of Chord Symbols, Tsung-Ping Chen, Satoru Fukayama, Masataka Goto, Li Su, 21th International Society for Music Information Retrieval Conference (ISMIR2020), pp. 360-367, Oct. 2020
Transdrums: A Drum Pattern Transfer System Preserving Global Pattern Structure, Shun Sawada, Satoru Fukayama, Masataka Goto, Keiji Hirata, 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2019), pp. 391-395, May 2019.
Audio-Based Automatic Generation of a Piano Reduction Score by Considering the Musical Structure, Hirofumi Takamori, Takayuki Nakatsuka, Satoru Fukayama, Masataka Goto, Shigeo Morishima, 25th International Conference on MultiMedia Modeling (MMM2019), pp. 169-181, Jan. 2019.
CTcomposer: An Interface for Music Composition Considering Intra-Composer Consistency and Musical Typicality, Hiromi Nakamura, Tomoyasu Nakano, Satoru Fukayama, Masataka Goto, The 15th Sound and Music Computing Conference (SMC 2018), pp. 500-507, Jul. 2018.
Song2Guitar: A Difficulty-Aware Arrangement System for Generating Guitar Solo Covers from Polyphonic Audio of Popular Music, Shunya Ariga, Satoru Fukayama, Masataka Goto, Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR2017), pp. 568-574, Oct. 2017.
A Singing Instrument for Real-time Vocal-part Arrangement of Music Audio Signals, Yuta Ojima, Tomoyasu Nakano, Satoru Fukayama, Jun Kato, Masataka Goto, Kazutoshi Itoyama, Kazuyoshi Yoshii, Proceedings of the 14th Sound and Music Computing Conference (SMC-17), pp. 443-449, Jul. 2017
Song2Quartet: A System for Generating String Quartet Cover Songs from Polyphonic Audio of Popular Music, Graham Percival, Satoru Fukayama, Masataka Goto, Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR2015), pp.114-120, Oct. 2015
AutoGuitarTab: Computer-Aided Composition of Rhythm and Lead Guitar Parts in the Tablature Space, McVicar James Matthew, Satoru Fukayama, Masataka Goto, IEEE Transactions on Audio Speech and Language Processing, 23-7, pp .1105-1117, Apr. 2015
AutoLeadGuitar: Automatic generation of guitar solo phrases in the tablature space, McVicar James Matthew, Satoru Fukayama, Masataka Goto, Proceedings of the 12th IEEE International Conference on Signal Processing (ICSP2014), pp.599-604, Oct. 2014
HarmonyMixer: Mixing the Character of Chords among Polyphonic Audio, Satoru Fukayama, Masataka Goto, Proceedings of the 40th International Computer Music Conference and 11th Sound&Music Computing conference (Joint ICMC|SMC 2014 Conference), pp.1513-1510, Sep. 2014
AutoRhythmGuitar: Computer-aided composition for Rhythm Guitar in the Tab Space, McVicar James Matthew , Satoru Fukayama, Masataka Goto, Proceedings of the 40th International Computer Music Conference and 11th Sound&Music Computing conference (JointICMC|SMC|2014Conference), pp.293-300, Sep. 2014
AutoChorusCreator: Four-Part Chorus Generator with Musical Feature Control Using Search Spaces Constructed from Rules of Music Theory, Benjamin Evans, Satoru Fukayama, Masataka Goto, Nagisa Munekata, Tetsuo Ono, Proceedings of the 40th International Computer Music Conference and 11th Sound&Music Computing conference (JointICMC|SMC|2014Conference), pp.1016-1023, Sep. 2014
Chord-Sequence-Factory: A Chord Arrangement System Modifying Factorized Chord Sequence Probabilities, Satoru Fukayama, Kazuyoshi Yoshii, Masataka Goto, Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR2013), pp.457-462, Nov. 2013
Melody harmonisation with interpolated probabilistic models, Stanislaw. Raczynski, Satoru Fukayama, Emmanuel. Vincent, Journal of New Music Research, vol. 42, issue 3, pp. 223-235, Oct. 2013
Automatic Music Composition from Japanese Lyrics with Probabilistic Formulation, Satoru Fukayama, Doctoral Dissertation, the University of Tokyo, 甲29607, Mar. 2013.
Assistance for Novice Users on Creating Songs from Japanese Lyrics, Satoru Fukayama, Daisuke Saito, Shigeki Sagayama, Proceedings of ICMC, pp.441-446, Sep. 2012
Automatic Arrangement for Guitars using Hidden Markov Model, Gen Hori, Yuma Yoshinaga, Satoru Fukayama, Shigeki Sagayama, Proceedings of SMC, pp.450-456, Jul. 2012
An Interactive Music Composition System Based on Autonomous Maintenance of Musical Consistency, Tetsuro Kitahara, Satoru Fukayama, Shigeki Sagayama, Haruhiro Katayose, Noriko Nagata, Proceedings of SMC, Jul. 2011
Polyhymnia: An automatic piano performance system with statistical modeling of polyphonic expression and musical symbol interpretation, Tae Hun Kim, Satoru Fukayama, Takuya Nishimoto, Shigeki Sagayama, Proceedings of NIME, pp.96-99, May 2011
Lexical Tones Learning with Automatic Music Composition System Considering Prosody of Mandarin Chinese, Siwei Qin, Satoru Fukayama, Takuya Nishimoto, Shigeki Sagayama, Proceedings of L2WS, O4-2, 4 pages, Sep. 2010
Performance rendering for polyphonic piano music with a combination of probabilistic models for melody and harmony, Tae Hun Kim, Satoru Fukayama, Takuya Nishimoto, Shigeki Sagayama, Proceedings of SMC, pp.23-30, Jul. 2010
Automatic Song Composition from the Lyrics exploiting Prosody of the Japanese Language, Satoru Fukayama, Kei Nakatsuma, Shinji Sako, Takuya Nishimoto, Shigeki Sagayama, Proceedings of SMC, pp. 299-302, Jul. 2010
Orpheus: Automatic Composition System Considering Prosody of Japanese Lyrics (demo paper), Satoru Fukayama, Kei Nakatsuma, Shinji Sako, Yu-ichiro Yonebayashi, Tae Hun Kim, Qin Si Wei, Takuho Nakano, Takuya Nishimoto, Shigeki Sagayama, Proceedings of ICEC 2009, LNCS, Springer, Sep. 2009

Dance Motion Processing

Pose Estimation
Query-by-Dancing
Dance Motion Editing
Automated Choreography

DanceUnisoner: A Parametric, Visual, and Interactive Simulation Interface for Choreographic Composition of Group Dance, Shuhei Tsuchida, Satoru Fukayama, Jun Kato, Hiromu Yakura, and Masataka Goto, Vol.E107-D,No.3,pp.-,Mar. 2024. (accepted)
MirrorNet: A Deep Reflective Approach to 2D Pose Estimation for Single-Person Images, Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima, Journal of Information Processing, 2021
AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information Processing, Shuhei Tsuchida; Satoru Fukayama; Masahiro Hamasaki; Masataka Goto, 20th International Society for Music Information Retrieval Conference (ISMIR 2019), pp. 501-510, Nov. 2019.
Query-by-Dancing: A Dance Music Retrieval System Based on Body-Motion Similarity, Shuhei Tsuchida, Satoru Fukayama, Masataka Goto, 25th International Conference on MultiMedia Modeling (MMM2019), pp. 251-263, Jan. 2019.
An Automatic System for Editing Dance Videos Recorded by Multiple Cameras, Shuhei Tsuchida, Satoru Fukayama, Masataka Goto, Proceedings of the 14th International Conference on Advances in Computer Entertainment Technology (ACE2017), 18 pages, Dec. 2017.
Authoring System for Choreography Using Dance Motion Retrieval and Synthesis, Ryo Kakitsuka, Tsukuda Kosetsu, Satoru Fukayama, Naoya Iwamoto, Masataka Goto, Shigeo Morishima, Proceedings of the 30th Conference on Computer Animation and Social Agents (ACM CASA2017), pp.122-131, May 2017
A choreographic authoring system for character dance animation reflecting a user's preference, Ryo Kakitsuka, Kosetsu Tsukuda, Satoru Fukayama, Naoya Iwamoto, Masataka Goto, Shigeo Morishima, Proceeding s of SIGGRAPH/Eurographics Symposium on Computer Animation (ACM SCA2016), Article No. 5, Jul. 2016
Music Content Driven Automated Choreography with Beat-wise Motion Connectivity Constraints, Satoru Fukayama, Masataka Goto, Proceedings of the 12th Sound and Music Computing Conference (SMC2015), pp.177-183, Jul. 2015
Automated Choreography Synthesis Using a Gaussian Process Leveraging Consumer-Generated Dance Motions, Satoru Fukayama, Masataka Goto, Proceedings of ACE2014, Article No.23, Nov. 2014

Lyrics Generation

Lyrics Language Models
Lyrics Writing Support Interfaces

A Melody-conditioned Lyrics Language Model, Kento Watanabe, Yuichiroh Matsubayashi, Satoru Fukayama, Masataka Goto, Kentaro Inui, Tomoyasu Nakano, Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), pp. 163-172, June 2018.
Modeling Storylines in Lyrics, Kento watanabe, Yuichiroh Matsubayashi, Kentaro Inui, Satoru Fukayama, Tomoyasu Nakano, Masataka Goto, IEICE Transaction on Information and Systems, Vol. E101.D, No. 4, pp. 1167-1179, 2018.
LyriSys: An Interactive Support System for Writing Lyrics Based on Topic Transition, Kento Watanabe, Yuichiro Matsubayashi, Kentaro Inui, Tomoyasu Nakano, Satoru Fukayama, Masataka Goto, Proceedings of the International Conference on Intelligent User Interfaces (ACM IUI2017), pp.559-563, Mar. 2017
Modeling Discourse Segments in Lyrics Using Repeated Patterns, Kento Watanabe, Yuichiro Matsubayashi, Naho Orita, Naoaki Okazaki, Kentaro Inui, Satoru Fukayama, Tomoyasu Nakano, Smith Jordan, Masataka Goto, Proceedings of the 26th International Conference on Computational Linguistics (COLING2016), pp.1959-1969, Dec. 2016

Google Sites

Report abuse