I am currently a Ph.D student, Department of Electrical and Eletronic Engineering, Faculty of Engineering, The Hong Kong Polytechnic University (PolyU)🇭🇰. My chief supervisor is Prof. Kong Aik LEE and co-supervisor is Prof. Man-Wai Mak. I have visited in The Chinese University of Hong Kong (Shenzhen) under supervision of Prof. Haizhou LI and Prof. Shuai Wang in 2023-2024. I recived my Master’s Degree in College of Intellgence and Computing, Tianjin University, supervised by Prof. Longbiao WANG, and recived my Bachelor’s Degree in School of Computer Science and Technology, Tiangong University. In my third year as an undergraduate, I was fortunate to have the opportunity to study under Dr. Ding Liu, who is an excellent teacher, guiding me to the academic path.
My research interest includes Speech Separation, Text to Speech and Speaker-related tasks, such as Speaker Verification, Anti-spoofing and Voice Anonymization.
🔥 News
- 2025.03: 🎉🎉 “MoMuSE: Momentum Multi-modal target Speaker Extraction for scenarios with impaired visual cues” has been accepted by ICME2025! [pdf]
- 2025.02: 🎉🎉 Xi-vector has been open sourced in Wespeaker toolkit [404].
- 2025.01: 🎉🎉 “Audio-Visual Target Speaker Extraction with Selective Auditory Attention” has been accepted by TASLP [pdf]
📖 Educations
- 2024.05 - present, Ph.D. candidate in Department of Electrical and Eletronic Engineering, The Hong Kong Polytechnic University, Hong Kong SAR.
- 2020.09 - 2023.03, M.E. in College of Intellgence and Computing, Tianjin University, Tianjin.
- 2016-09 - 2020.06, B.E. in School of Computer Science and Technology, Tiangong University (Tianjin Polytechnic University), Tianjin.
💻 Internship Experience
- 2023.04 - 2024.04, Research Asistant, supervised by Prof. Haizhou Li and Shuai Wang, The Chinese University of Hong Kong (Shenzhen). [Project Demo]
- 2022.06 - 2022.12, supervised by Dr. Shiliang Zhang, Alibaba DAMO Academy, Hangzhou.
- 2021.11 - 2022.01, ICT, Huawei, Dongguan.
📝 Publications
Speaker verification
Speech Separation
-
Li Junjie, Zhang Ke, Wang Shuai, et al. “MoMuSE: Momentum Multi-modal target Speaker Extraction for scenarios with impaired visual cues”. [demo]
-
Zhang Ke, Li Junjie, et al. Multi-level speaker representation for target speaker extraction. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
-
Li, J., Zhang, K., Wang, S., Li, H., Mak, M. W., & Lee, K. A. (2024, December). On the effectiveness of enrollment speech augmentation for Target Speaker Extraction. In 2024 IEEE Spoken Language Technology Workshop (SLT) (pp. 325-332). IEEE.
-
Wang, J., Wang, S., Li, J., Zhang, K., Qian, Y., & Li, H. (2024, December). Enhancing Speaker Extraction Through Rectifying Target Confusion. In 2024 IEEE Spoken Language Technology Workshop (SLT) (pp. 349-356). IEEE.
-
Wang, S., Zhang, K., Lin, S., Li, J., Wang, X., Ge, M., … & Li, H. (2024). WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction. In Proc. Interspeech 2024 (pp. 4273-4277).
-
Yang, H., Chen, X., Li, J., Huang, H., Cai, S., & Li, H. (2024, August). Listen to the Speaker in Your Gaze. In 2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM) (pp. 380-385). IEEE.
-
R. Tao, X. Qian, Y. Jiang, J. Li, J. Wang and H. Li, “Audio-Visual Target Speaker Extraction with Selective Auditory Attention,” in IEEE Transactions on Audio, Speech and Language Processing, doi: 10.1109/TASLPRO.2025.3527766.
-
Li Junjie, Tao Ruijie, Ge Meng, et al. “Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech,” ICASSP 2024, pp. 10666-10670.
[Demo]
-
Wang Honglong, Fu Yanjie,Li Junjie, et al. ”Stream Attention Based U-Net for L3DAS23 Challenge,” ICASSP 2023, pp. 1-2, doi: 10.1109/ICASSP49357.2023.10095854.
- Li Junjie, Ge Meng, et al. Rethinking the Visual Cues in Audio-Visual Speaker Extraction. Proc. INTER-
SPEECH 2023, 3754-3758, doi: 10.21437/Interspeech.2023-2545.
- Li Junjie, Ge Meng, et al. ”Deep Multi-task Cascaded Acoustic Echo Cancellation and Noise Suppres-
sion,” 2022 13th ISCSLP, pp. 130-134, doi: 10.1109/ISCSLP57327.2022.10037852.
- Li Junjie, Ge Meng, et al. VCSE: Time-Domain Visual-Contextual Speaker Extraction Network. Proc.
INTERSPEECH 2022, 906-910, doi: 10.21437/Interspeech.2022-11183.
- Li Junjie and Liu Ding, “Information bottleneck theory on convolutional neural networks,” Neural Pro-
cessing Letters, vol. 53, no. 2, pp. 1385–1400, 2021. (JCR Q3)
💻 Open Source Toolkit
🎖 Honors and Awards
- 2016-2017 President’s Scholarship Second Class (top 7%)
- 2016-2017 Merit Student (top 5%)
- 2017-2018 President’s Scholarship Third Class (top 15%)
- 2018-2019 President’s Scholarship Third Class (top 15%)
- 2018-2019 Merit Student (top 5%)
- 2020 Outstanding Graduate Award (top 5%)
- 2021-2022 Honda Kiyoshi’s Advanced Speech Science Award
😄 Academic Activities
- 2024.09 - 2024.12 Teaching Assistant for EIE 3312 Linear Systems.
- 2024.12 SLT 2024, Macao, China [Image]
- 2024.11. 深圳大湾区学术论坛 [Image]
- 2024.08 PolyU Research Student Conference [Image]
- 2024.04 Attending ICASSP 2024, Korea. [Image]
- 2024.03 ICASSP 2024 preview, organised by Dr. Zhizheng WU , Shenzhen. [Image]
- 2023.12 International Doctoral Forum 2023, CUHK. [Image]
- 2023.12 International Workshop on Mathematical Issues in Information Sciences 2023, CUHK(SZ). [Image]
- 2023.12 CHINA HI-TECH Forum 2023, Shenzhen. [Image]
💬 Blog
Reviewer
- IEEE The International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2025
- IEEE International Conference on Multimedia & Expo 2025