Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in IEEE SLT, 2018
Part of Audio Word2Vec Project.
Recommended citation: Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-yi Lee and Lin-shan Lee, "Phonetic-and-Semantic Embedding of Spoken words with Applications in Spoken Content Retrieval," 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 2018, pp. 941-948, doi: 10.1109/SLT.2018.8639553. https://ieeexplore.ieee.org/abstract/document/8639553
Published in IEEE/ACM TASLP, 2019
Part of Audio Word2Vec Project.
Recommended citation: Yi-Chen Chen, Sung-Feng Huang, Hung-yi Lee, Yu-Hsuan Wang and Chia-Hao Shen, "Audio Word2vec: Sequence-to-Sequence Autoencoding for Unsupervised Learning of Audio Segmentation and Representation," IEEE/ACM Transactions on Audio, Speech, and Language Processing 27.9 (2019): 1481-1493. https://ieeexplore.ieee.org/abstract/document/8736337
Published in EMNLP, 2020
The results show that ALBERT learns to reconstruct and predict tokens of different parts of speech (POS) in different learning speeds during pretraining, and it is found that linguistic knowledge and world knowledge do not generally improve as pretraining proceeds, nor do downstream tasks' performance.
Recommended citation: Chiang, C., Huang, S., & Lee, H. (2020). Pretrained Language Model Embryology: The Birth of ALBERT. ArXiv, abs/2010.02480. https://arxiv.org/abs/2010.02480
Published in Interspeech, 2020
SSL for speech separation.
Recommended citation: Huang, S.-F., Chuang, S.-P., Liu, D.-R., Chen, Y.-C., Yang, G.-P., Lee, H.-y. (2021) Stabilizing Label Assignment for Speech Separation by Self-Supervised Pre-Training. Proc. Interspeech 2021, 3056-3060, doi: 10.21437/Interspeech.2021-763 https://www.isca-archive.org/interspeech_2021/huang21h_interspeech.html
Published in IEEE ASRU, 2021
Mask-CTC NAR ASR framework to tackle the CS speech recognition issue.
Recommended citation: Chuang, Shun-Po, Heng-Jui Chang, Sung-Feng Huang, and Hung-yi Lee. "Non-autoregressive mandarin-english code-switching speech recognition." In 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 465-472. IEEE, 2021. https://ieeexplore.ieee.org/abstract/document/9688174
Published in IEEE/ACM TASLP, 2021
Unsupervised ASR
Recommended citation: Liu, Da-rong, Po-chun Hsu, Yi-chen Chen, Sung-feng Huang, Shun-po Chuang, Da-yi Wu, and Hung-yi Lee. "Learning phone recognition from unpaired audio and phone sequences based on generative adversarial network." IEEE/ACM transactions on audio, speech, and language processing 30 (2021): 230-243. https://ieeexplore.ieee.org/abstract/document/9664381/
Published in IEEE/ACM TASLP, 2022
Meta-learning for few-shot speaker adaptive text-to-speech
Recommended citation: Huang, Sung-Feng, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, and Hung-yi Lee. "Meta-tts: Meta-learning for few-shot speaker adaptive text-to-speech." IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022): 1558-1571. https://ieeexplore.ieee.org/abstract/document/9756900
Published in IEEE ICASSP, 2023
Learnable model pruning for TTS fine-tuning
Recommended citation: Huang, Sung-Feng, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, and Hung-yi Lee. "Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning." In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023. https://ieeexplore.ieee.org/abstract/document/10097178
Published in IEEE ASRU, 2023
Utilize unlabeled speech data for few-shot cross-lingual TTS adaptation
Recommended citation: Huang, Wei-Ping, Sung-Feng Huang, and Hung-yi Lee. "Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization." In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1-8. IEEE, 2023. https://ieeexplore.ieee.org/abstract/document/10389665
Published in IEEE SLT, 2024
Speech editing dataset & edit deepfake detection
Recommended citation: Huang, Sung-Feng, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Chao-Han Huck Yang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee, and Szu-Wei Fu. "Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits." In 2024 IEEE Spoken Language Technology Workshop (SLT), pp. 652-659. IEEE, 2024. https://ieeexplore.ieee.org/abstract/document/10832200/
Published in IEEE ICASSP, 2025
Foundation flow-matching model for generation tasks
Recommended citation: Ku, P. J., Liu, A. H., Korostik, R., Huang, S. F., Fu, S. W., & Jukić, A. (2024). Generative speech foundation model pretraining for high-quality speech extraction and restoration. arXiv preprint arXiv:2409.16117. https://arxiv.org/abs/2409.16117
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.