Publications

You can also find my articles on my Google Scholar profile.

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

Pin-Jui Ku, Alexander H. Liu, Roman Korostik, Sung-Feng Huang, Szu-Wei Fu, Ante Jukić

Published in IEEE ICASSP, 2025

Foundation flow-matching model for generation tasks

Recommended citation: Ku, P. J., Liu, A. H., Korostik, R., Huang, S. F., Fu, S. W., & Jukić, A. (2024). Generative speech foundation model pretraining for high-quality speech extraction and restoration. arXiv preprint arXiv:2409.16117. https://arxiv.org/abs/2409.16117

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Chao-Han Huck Yang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee, Szu-Wei Fu

Published in IEEE SLT, 2024

Speech editing dataset & edit deepfake detection

Recommended citation: Huang, Sung-Feng, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Chao-Han Huck Yang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee, and Szu-Wei Fu. "Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits." In 2024 IEEE Spoken Language Technology Workshop (SLT), pp. 652-659. IEEE, 2024. https://ieeexplore.ieee.org/abstract/document/10832200/

Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization

Wei-Ping Huang, Sung-Feng Huang, Hung-yi Lee

Published in IEEE ASRU, 2023

Utilize unlabeled speech data for few-shot cross-lingual TTS adaptation

Recommended citation: Huang, Wei-Ping, Sung-Feng Huang, and Hung-yi Lee. "Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization." In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1-8. IEEE, 2023. https://ieeexplore.ieee.org/abstract/document/10389665

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Sung-Feng Huang, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-yi Lee

Published in IEEE ICASSP, 2023

Learnable model pruning for TTS fine-tuning

Recommended citation: Huang, Sung-Feng, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, and Hung-yi Lee. "Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning." In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023. https://ieeexplore.ieee.org/abstract/document/10097178

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-yi Lee

Published in IEEE/ACM TASLP, 2022

Meta-learning for few-shot speaker adaptive text-to-speech

Recommended citation: Huang, Sung-Feng, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, and Hung-yi Lee. "Meta-tts: Meta-learning for few-shot speaker adaptive text-to-speech." IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022): 1558-1571. https://ieeexplore.ieee.org/abstract/document/9756900

Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

Da-rong Liu, Po-chun Hsu, Yi-chen Chen, Sung-feng Huang, Shun-po Chuang, Da-yi Wu, Hung-yi Lee

Published in IEEE/ACM TASLP, 2021

Unsupervised ASR

Recommended citation: Liu, Da-rong, Po-chun Hsu, Yi-chen Chen, Sung-feng Huang, Shun-po Chuang, Da-yi Wu, and Hung-yi Lee. "Learning phone recognition from unpaired audio and phone sequences based on generative adversarial network." IEEE/ACM transactions on audio, speech, and language processing 30 (2021): 230-243. https://ieeexplore.ieee.org/abstract/document/9664381/

Non-Autoregressive Mandarin-English Code-Switching Speech Recognition

Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-yi Lee

Published in IEEE ASRU, 2021

Mask-CTC NAR ASR framework to tackle the CS speech recognition issue.

Recommended citation: Chuang, Shun-Po, Heng-Jui Chang, Sung-Feng Huang, and Hung-yi Lee. "Non-autoregressive mandarin-english code-switching speech recognition." In 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 465-472. IEEE, 2021. https://ieeexplore.ieee.org/abstract/document/9688174

Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training

Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-yi Lee

Published in Interspeech, 2020

SSL for speech separation.

Recommended citation: Huang, S.-F., Chuang, S.-P., Liu, D.-R., Chen, Y.-C., Yang, G.-P., Lee, H.-y. (2021) Stabilizing Label Assignment for Speech Separation by Self-Supervised Pre-Training. Proc. Interspeech 2021, 3056-3060, doi: 10.21437/Interspeech.2021-763 https://www.isca-archive.org/interspeech_2021/huang21h_interspeech.html

Pretrained Language Model Embryology: The Birth of ALBERT

Cheng-Han Chiang, Sung-Feng Huang, Hung-yi Lee

Published in EMNLP, 2020

The results show that ALBERT learns to reconstruct and predict tokens of different parts of speech (POS) in different learning speeds during pretraining, and it is found that linguistic knowledge and world knowledge do not generally improve as pretraining proceeds, nor do downstream tasks' performance.

Recommended citation: Chiang, C., Huang, S., & Lee, H. (2020). Pretrained Language Model Embryology: The Birth of ALBERT. ArXiv, abs/2010.02480. https://arxiv.org/abs/2010.02480

Audio Word2vec: Sequence-to-Sequence Autoencoding for Unsupervised Learning of Audio Segmentation and Representation

Yi-Chen Chen, Sung-Feng Huang, Hung-yi Lee, Yu-Hsuan Wang, Chia-Hao Shen

Published in IEEE/ACM TASLP, 2019

Part of Audio Word2Vec Project.

Recommended citation: Yi-Chen Chen, Sung-Feng Huang, Hung-yi Lee, Yu-Hsuan Wang and Chia-Hao Shen, "Audio Word2vec: Sequence-to-Sequence Autoencoding for Unsupervised Learning of Audio Segmentation and Representation," IEEE/ACM Transactions on Audio, Speech, and Language Processing 27.9 (2019): 1481-1493. https://ieeexplore.ieee.org/abstract/document/8736337

Phonetic-and-Semantic Embedding of Spoken words with Applications in Spoken Content Retrieval

Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-yi Lee, Lin-shan Lee

Published in IEEE SLT, 2018

Part of Audio Word2Vec Project.

Recommended citation: Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-yi Lee and Lin-shan Lee, "Phonetic-and-Semantic Embedding of Spoken words with Applications in Spoken Content Retrieval," 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 2018, pp. 941-948, doi: 10.1109/SLT.2018.8639553. https://ieeexplore.ieee.org/abstract/document/8639553