I am currently a Ph.D student, Department of Electrical and Eletronic Engineering, Faculty of Engineering, The Hong Kong Polytechnic University (PolyU)🇭🇰. My chief supervisor is Prof. Kong Aik LEE and co-supervisor is Prof. Man-Wai Mak. I have visited in The Chinese University of Hong Kong (Shenzhen) under supervision of Prof. Haizhou LI and Prof. Shuai Wang in 2023-2024. I recived my Master’s Degree in College of Intellgence and Computing, Tianjin University, supervised by Prof. Longbiao WANG, and recived my Bachelor’s Degree in School of Computer Science and Technology, Tiangong University. In my third year as an undergraduate, I was fortunate to have the opportunity to study under Dr. Ding Liu, who is an excellent teacher, guiding me to the academic path.

My research interest includes Speech Separation, Text to Speech and Speaker-related tasks, such as Speaker Verification, Anti-spoofing and Voice Anonymization.

🔥 News

2025.03: 🎉🎉 “MoMuSE: Momentum Multi-modal target Speaker Extraction for scenarios with impaired visual cues” has been accepted by ICME2025! [pdf]
2025.02: 🎉🎉 Xi-vector has been open sourced in Wespeaker toolkit [404].
2025.01: 🎉🎉 “Audio-Visual Target Speaker Extraction with Selective Auditory Attention” has been accepted by TASLP [pdf]

📖 Educations

2024.05 - present, Ph.D. candidate in Department of Electrical and Eletronic Engineering, The Hong Kong Polytechnic University, Hong Kong SAR.
2020.09 - 2023.03, M.E. in College of Intellgence and Computing, Tianjin University, Tianjin.
2016-09 - 2020.06, B.E. in School of Computer Science and Technology, Tiangong University (Tianjin Polytechnic University), Tianjin.

💻 Internship Experience

2023.04 - 2024.04, Research Asistant, supervised by Prof. Haizhou Li and Shuai Wang, The Chinese University of Hong Kong (Shenzhen). [Project Demo]
2022.06 - 2022.12, supervised by Dr. Shiliang Zhang, Alibaba DAMO Academy, Hangzhou.
2021.11 - 2022.01, ICT, Huawei, Dongguan.

📝 Publications

Speaker verification

xi+: uncertainty supervision for robust speaker embedding

Speech Separation

MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions [demo]
Li Junjie, Zhang Ke, Wang Shuai, et al. “MoMuSE: Momentum Multi-modal target Speaker Extraction for scenarios with impaired visual cues”. [demo]
Zhang Ke, Li Junjie, et al. Multi-level speaker representation for target speaker extraction. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
Li, J., Zhang, K., Wang, S., Li, H., Mak, M. W., & Lee, K. A. (2024, December). On the effectiveness of enrollment speech augmentation for Target Speaker Extraction. In 2024 IEEE Spoken Language Technology Workshop (SLT) (pp. 325-332). IEEE.
Wang, J., Wang, S., Li, J., Zhang, K., Qian, Y., & Li, H. (2024, December). Enhancing Speaker Extraction Through Rectifying Target Confusion. In 2024 IEEE Spoken Language Technology Workshop (SLT) (pp. 349-356). IEEE.
Wang, S., Zhang, K., Lin, S., Li, J., Wang, X., Ge, M., … & Li, H. (2024). WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction. In Proc. Interspeech 2024 (pp. 4273-4277).
Yang, H., Chen, X., Li, J., Huang, H., Cai, S., & Li, H. (2024, August). Listen to the Speaker in Your Gaze. In 2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM) (pp. 380-385). IEEE.
R. Tao, X. Qian, Y. Jiang, J. Li, J. Wang and H. Li, “Audio-Visual Target Speaker Extraction with Selective Auditory Attention,” in IEEE Transactions on Audio, Speech and Language Processing, doi: 10.1109/TASLPRO.2025.3527766.
Li Junjie, Tao Ruijie, Ge Meng, et al. “Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech,” ICASSP 2024, pp. 10666-10670. [Demo]
Wang Honglong, Fu Yanjie,Li Junjie, et al. ”Stream Attention Based U-Net for L3DAS23 Challenge,” ICASSP 2023, pp. 1-2, doi: 10.1109/ICASSP49357.2023.10095854.
Li Junjie, Ge Meng, et al. Rethinking the Visual Cues in Audio-Visual Speaker Extraction. Proc. INTER- SPEECH 2023, 3754-3758, doi: 10.21437/Interspeech.2023-2545.
Li Junjie, Ge Meng, et al. ”Deep Multi-task Cascaded Acoustic Echo Cancellation and Noise Suppres- sion,” 2022 13th ISCSLP, pp. 130-134, doi: 10.1109/ISCSLP57327.2022.10037852.
Li Junjie, Ge Meng, et al. VCSE: Time-Domain Visual-Contextual Speaker Extraction Network. Proc. INTERSPEECH 2022, 906-910, doi: 10.21437/Interspeech.2022-11183.
Li Junjie and Liu Ding, “Information bottleneck theory on convolutional neural networks,” Neural Pro- cessing Letters, vol. 53, no. 2, pp. 1385–1400, 2021. (JCR Q3)

NLP

Do We Really Need GNNs with Explicit Structural Modeling? MLPs Suffice for Language Model Representations

💻 Open Source Toolkit

WeSep
WeSpeaker

🎖 Honors and Awards

2016-2017 President’s Scholarship Second Class (top 7%)
2016-2017 Merit Student (top 5%)
2017-2018 President’s Scholarship Third Class (top 15%)
2018-2019 President’s Scholarship Third Class (top 15%)
2018-2019 Merit Student (top 5%)
2020 Outstanding Graduate Award (top 5%)
2021-2022 Honda Kiyoshi’s Advanced Speech Science Award

😄 Academic Activities

2025.06 - 2025.07 ICME 2025 Nantes, France. [Image]
2024.09 - 2024.12 Teaching Assistant for EIE 3312 Linear Systems.
2024.12 SLT 2024, Macao, China [Image]
2024.11. 深圳大湾区学术论坛 [Image]
2024.08 PolyU Research Student Conference [Image]
2024.04 Attending ICASSP 2024, Korea. [Image]
2024.03 ICASSP 2024 preview, organised by Dr. Zhizheng WU , Shenzhen. [Image]
2023.12 International Doctoral Forum 2023, CUHK. [Image]
2023.12 International Workshop on Mathematical Issues in Information Sciences 2023, CUHK(SZ). [Image]
2023.12 CHINA HI-TECH Forum 2023, Shenzhen. [Image]

💬 Blog

Summary of Speaker-realted tasks

Reviewer

IEEE The International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2025 [pdf]
IEEE International Conference on Multimedia & Expo 2025 [pdf]
The 17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, October 22nd–24th 2025, Shangri-la, Singapore
IEEE Signal Processing Letters

Junjie LI