Qingkai Fang

Qingkai Fang | 房庆凯

I am a fourth-year Ph.D. student at Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS), luckily advised by Prof. Yang Feng at ICT Natural Language Processing (ICTNLP) group. Before that, I received my B.E. degree in Computer Science and Technology from Beijing University of Posts and Telecommunications (BUPT) in Jun. 2021.

Email: fangqingkai21b [at] ict.ac.cn / poeroz1204 [at] gmail.com

Google Scholar / ACL Anthology / DBLP / Github / CV

I expect to graduate with a Ph.D. in June 2026 and will be seeking job opportunities in the industry. If you are interested, please feel free to reach out.

Research

My research interests mainly lie in natural language processing and multimodal learning. Particularly, I am interested in:

Large language models (LLMs) and multimodal LLMs, including speech-language models and vision-language models.
(Simultaneous) speech-to-text and speech-to-speech translation.
Non-autoregressive sequence generation and streaming sequence generation.

News

[2025/05] One paper is accepted to ACL 2025 main conference!

[2025/01] Two papers are accepted to ICLR 2025!

[2024/11] I got the National Scholarship (30,000 RMB).

[2024/09] Our speech-language model LLaMA-Omni is released! It is a powerful speech interaction model built upon Llama-3.1-8B-Instruct, which achieves low-latency and high-quality speech interactions. Check our paper, code, and model!

[2024/05] Four papers are accepted to ACL 2024 (3 main conference + 1 findings)!

[2023/10] One paper is accepted to EMNLP 2023 main conference!

[2023/09] One paper is accepted to NeurIPS 2023!

[2023/06] Our multilingual LLM BayLing (百聆) is released! BayLing is an instruction-following LLM with advanced language alignment and multi-turn interaction capability. Read our paper and try our online demo!

[2023/05] Three papers are accepted to ACL 2023 main conference!

[2022/10] One paper is accepted to EMNLP 2022 main conference!

[2022/07] We won first place in the Chinese-Thai track and second place in the Mongolian-Chinese track in CCMT 2022!

[2022/02] Two papers are accepted to ACL 2022 main conference!

Preprint

BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment
Shaolei Zhang, Kehao Zhang, Qingkai Fang, Shoutao Guo, Yan Zhou, Xiaodong Liu, Yang Feng.
Paper / Code / Demo

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models
Shaolei Zhang, Qingkai Fang, Zhuocheng Zhang, Zhengrui Ma, Yan Zhou, Langlin Huang, Mengyu Bu, Shangtong Gui, Yunji Chen, Xilin Chen, Yang Feng
Paper / Code / Demo

Publications

2025

LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
Qingkai Fang, Yan Zhou, Shoutao Guo, Shaolei Zhang, Yang Feng
Proceedings of ACL 2025 (CCF-A)
Paper / Code / Model

LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng
ICLR 2025
Paper / Code / Model

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang, Qingkai Fang, Zhe Yang, Yang Feng
ICLR 2025
Paper / Code / Model

2024

Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?
Qingkai Fang, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng
Proceedings of ACL 2024 (CCF-A)
Paper / Code / Demo

CTC-based Non-autoregressive Textless Speech-to-Speech Translation
Qingkai Fang, Zhengrui Ma, Yan Zhou, Min Zhang, Yang Feng
Findings of ACL 2024
Paper / Code

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng
Proceedings of ACL 2024 (CCF-A)
Paper / Code / Demo

A Non-autoregressive Generation Framework for Simultaneous Speech-to-x Translation
Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang
Proceedings of ACL 2024 (CCF-A)
Paper / Code / Demo

2023

DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
Qingkai Fang, Yan Zhou, Yang Feng
NeurIPS 2023 (CCF-A)
Paper / Code / Demo

Understanding and Bridging the Modality Gap for Speech Translation
Qingkai Fang, Yang Feng
Proceedings of ACL 2023 (CCF-A)
Paper / Code

Back Translation for Speech-to-text Translation Without Transcripts
Qingkai Fang, Yang Feng
Proceedings of ACL 2023 (CCF-A)
Paper / Code

CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation
Yan Zhou, Qingkai Fang, Yang Feng
Proceedings of ACL 2023 (CCF-A)
Paper / Code

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation
Wenyu Guo, Qingkai Fang, Dong Yu, Yang Feng
Proceedings of EMNLP 2023 (CCF-B)
Paper / Code

2022

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation
Qingkai Fang, Rong Ye, Lei Li, Yang Feng, Mingxuan Wang
Proceedings of ACL 2022 (CCF-A)
Paper / Code

Neural Machine Translation with Phrase-Level Universal Visual Representations
Qingkai Fang, Yang Feng
Proceedings of ACL 2022 (CCF-A)
Paper / Code

Low-resource Neural Machine Translation with Cross-modal Alignment
Zhe Yang, Qingkai Fang, Yang Feng
Proceedings of EMNLP 2022 (CCF-B)
Paper / Code

Awards

National Scholarship, at ICT/CAS, Nov. 2024

ICT's Special Scholarship (Highest award in ICT/CAS), at ICT/CAS, Jan. 2024

First Academic Scholarship, at ICT/CAS, Sep. 2023/2024

Merit Student, at ICT/CAS, May. 2023/2024

Outstanding Graduates in Beijing, at BUPT, Jun. 2021

CCF Elite Collegiate Award, Aug. 2020

National Scholarship (Top 1%), at BUPT, Dec. 2019

Silver Medal, ACM-ICPC Asia Regional Contest, Shenyang Site, Oct. 2018

Silver Medal, China Collegiate Programming Contest (CCPC), Guilin Site, Oct. 2018

Bronze Medal, National Olympiad in Informatics (NOI), Jul. 2016

Services

Reviewer: NeurIPS, ICLR, ACL, EMNLP, TASLP, TALLIP

Updated at Jan. 2025

Template