Principal Researcher and Team Manager @Tencent
I am a Principal Researcher and Team Manager at Tencent Youtu Lab. My research interests lie in the area of deep learning and its application in computer vision and natural language processing. Before joining Tencent, I received my M.S. degree from Xiamen University in 2018 under the supervision of Prof. Rongrong Ji. I received my B.S. degree from Zhengzhou University in 2015, advised by Prof. Mingliang Xu.
Below are some of the works that represent my main research interests. Full paper list (including preprints) could be found at Google Scholar.
(* denotes corresponding author)
| Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models Yulei Qin, Gang Li, Zongyi Li, Zihan Xu, Yuchen Shi, Zhekai Lin, Xiao Cui, Ke Li, Xing Sun. Advances in Neural Information Processing Systems (NeurIPS), 2025. Paper, Code |
| VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model Zuwei Long, Yunhang Shen, Chaoyou Fu, Heting Gao, lijiang Li, Peixian Chen, Mengdan Zhang, Hang Shao, Jian Li, Jinlong Peng, Haoyu Cao, Ke Li, Rongrong Ji, Xing Sun. Advances in Neural Information Processing Systems (NeurIPS), 2025. Paper, Code |
| VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Chaoyou Fu, Haojia Lin, Xiong Wang, YiFan Zhang, Yunhang Shen, Xiaoyu Liu, Haoyu Cao, Zuwei Long, Heting Gao, Ke Li, Long MA, Xiawu Zheng, Rongrong Ji, Xing Sun, Caifeng Shan, Ran He. Advances in Neural Information Processing Systems (NeurIPS, spotlight), 2025. Paper, Code |
| MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li*, Xing Sun, Rongrong Ji. Advances in Neural Information Processing Systems (NeurIPS, spotlight) Paper, Code |
| LTD-Bench: Evaluating Large Language Models by Letting Them Draw Liuhao Lin, Ke Li*, Zihan Xu, Yuchen Shi, Yulei Qin, Yan Zhang, Xing Sun, Rongrong Ji. Advances in Neural Information Processing Systems (NeurIPS), 2025. Paper, Code |
| Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM Xiong Wang, Yangze Li, Chaoyou Fu, Yike Zhang, Yunhang Shen, Lei Xie, Ke Li, Xing Sun, Long MA. International Conference on Machine Learning (ICML), 2025 Paper, Code |
| Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Caifeng Shan, Ran He, Xing Sun. Computer Vision and Pattern Recognition (CVPR, highlight), 2025 Paper, Code |
| Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment Xudong Li, Wenjie Nie, Yan Zhang, Runze Hu, Ke Li, Xiawu Zheng, Liujuan Cao. Computer Vision and Pattern Recognition (CVPR), 2025 Paper |
| FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression Bo Tong, Bokai Lai, Yiyi Zhou, Gen Luo, Yunhang Shen, Ke Li, Xiaoshuai Sun, Rongrong Ji. Computer Vision and Pattern Recognition (CVPR), 2025 Paper, Code |
| Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models Xudong Li, Zihao Huang, Yan Zhang, Yunhang Shen, Ke Li, Xiawu Zheng, Liujuan Cao, Rongrong Ji. International Conference on Computer Vision (ICCV), 2025 Paper, Code |
| PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications Dingkang Yang, Jinjie Wei, Dongling Xiao, Shunli Wang, Tong Wu, Gang Li, Mingcheng Li, Shuaibing Wang, Jiawei Chen, Yue Jiang, Qingyao Xu, Ke Li, Peng Zhai, Lihua Zhang. Advances in Neural Information Processing Systems (NeurIPS), 2024. Paper, Code |
| Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception Kun Yang, Dingkang Yang, Ke Li, Dongling Xiao, Zedian Shao, Peng Sun, Liang Song. European Conference on Computer Vision (ECCV, Oral), 2024 Paper |
| Towards Multimodal Sentiment Analysis Debiasing via Bias Purification Dingkang Yang, Mingcheng Li, Dongling Xiao, Yang Liu, Kun Yang, Zhaoyu Chen, Yuzheng Wang, Peng Zhai, Ke Li, Lihua Zhang. European Conference on Computer Vision (ECCV), 2024 Paper |
| Integrating Global Context Contrast and Local Sensitivity for Blind Image Quality Assessment XuDong Li, Runze Hu, Jingyuan Zheng, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Ke Li, Yunhang Shen, Yutao Liu, Pingyang Dai, Rongrong Ji. International Conference on Machine Learning (ICML, spotlight), 2024 Paper |
| Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity Xudong Li, Timin Gao, Runze Hu, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Jingyuan Zheng, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Rongrong Ji. International Conference on Machine Learning (ICML), 2024 Paper |
| Training-free Transformer Architecture Search with Zero-cost Proxy Guided Evolution Qinqin Zhou, Kekai Sheng, Xiawu Zheng, Ke Li, Yonghong Tian, Jie Chen, Rongrong Ji. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024. Paper, Code |
| Sinkhorn Distance Minimization for Knowledge Distillation Xiao Cui, Yulei Qin, Yuting Gao, Enwei Zhang, Zihan Xu, Tong Wu, Ke Li, Xing Sun, Wengang Zhou, Houqiang Li. International Conference on Computational Linguistics (COLING), 2024 Paper |
| Aligning and Prompting Everything All at Once for Universal Visual Perception Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji. Computer Vision and Pattern Recognition (CVPR), 2024 Paper, Code |
| A General and Efficient Training for Transformer via Token Expansion Wenxuan Huang, Yunhang Shen, Jiao Xie, Baochang Zhang, Gaoqi He, Ke Li, Xing Sun, Shaohui Lin. Computer Vision and Pattern Recognition (CVPR), 2024 Paper, Code |
| Solving the Catastrophic Forgetting Problem in Generalized Category Discovery Xinzi Cao, Xiawu Zheng, Guanhong Wang, Weijiang Yu, Yunhang Shen, Ke Li, Yutong Lu, Yonghong Tian. Computer Vision and Pattern Recognition (CVPR), 2024 Paper, Code |
| Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning Xialei Liu, Jiang-Tian Zhai, Andrew D. Bagdanov, Ke Li, Ming-Ming Cheng. Computer Vision and Pattern Recognition (CVPR), 2024 Paper, Code |
| Weakly Supervised Open-Vocabulary Object Detection Jianghang Lin, Yunhang Shen, Bingquan Wang, Shaohui Lin, Ke Li, Liujuan Cao. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024 Paper, Code |
| SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space Yunchen Li, Zhou Yu, Gaoqi He, Yunhang Shen, Ke Li, Xing Sun, Shaohui Lin. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024 Paper, Code |
| Semi-Supervised Blind Image Quality Assessment through Knowledge Distillation and Incremental Learning Wensheng Pan, Timin Gao, Yan Zhang, Xiawu Zheng, Yunhang Shen, Ke Li, Runze Hu, Yutao Liu, Pingyang Dai. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024 Paper |
| SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger Yuting Gao, Jinfeng Liu, Zihan Xu, Tong Wu, Enwei Zhang, Ke Li, Jie Yang, Wei Liu, Xing Sun. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024 Paper |
| CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes Yulei Qin, Xingyu Chen, Yunhang Shen, Chaoyou Fu, Yun Gu, Ke Li, Xing Sun, Rongrong Ji. Advances in Neural Information Processing Systems (NeurIPS), 2023. Paper, Code |
| Multi-modal Queried Object Detection in the Wild Yifan Xu, Mengdan Zhang, Chaoyou Fu, Peixian Chen, Xiaoshan Yang, Ke Li, Changsheng Xu. Advances in Neural Information Processing Systems (NeurIPS), 2023. Paper, Code |
| LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization Xinzi Cao, Xiawu Zheng, Yunhang Shen, Ke Li, Jie Chen, Yutong Lu, Yonghong Tian. ACM International Conference on Multimedia (ACM MM), 2023. Paper, Code |
| Masked Autoencoders are Efficient Class Incremental Learners Jiang-Tian Zhai, Xialei Liu, Andy Bagdanov, Ke Li, Ming-Ming Cheng. International Conference on Computer Vision (ICCV), 2023. Paper, Code |
| Woodpecker: Hallucination Correction for Multimodal Large Language Models Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen. arxiv, 2023 Paper, Code |
| A Survey on Multimodal Large Language Models Shukang Yin , Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen. arxiv, 2023 Paper, Code |
| CF-ViT: A General Coarse-to-Fine Method for Vision Transformer Mengzhao Chen, Mingbao Lin, Ke Li, Yunhang Shen, Yongjian Wu, Fei Chao, Rongrong Ji. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI, Oral), 2023 Paper, Code |
| Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation Linrui Gong, Shaohui Lin, Baochang Zhang, Yunhang Shen, Ke Li, Ruizhi Qiao, Bo Ren, Muqing Li, Zhou Yu, Lizhuang Ma. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023 Paper |
| PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining Yuting Gao, Jinfeng Liu, Zihan Xu, Jun Zhang, Ke Li, Rongrong Ji, Chunhua Shen. Advances in Neural Information Processing Systems (NeurIPS, Oral), 2022 Paper, Code |
| Learning Best Combination for Efficient N:M Sparsity Yuxin Zhang, Mingbao Lin, Zhihang Lin, Yiting Luo, Ke Li, Fei Chao, Yongjian Wu, Rongrong Ji. Advances in Neural Information Processing Systems (NeurIPS), 2022 Paper, Code |
| Fine-grained Data Distribution Alignment for Post-Training Quantization Yunshan Zhong, Mingbao Lin, Mengzhao Chen, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Rongrong Ji. European Conference on Computer Vision (ECCV), 2022 Paper, Code |
| Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks Yunshan Zhong, Mingbao Lin, Xunchao Li, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Rongrong Ji. European Conference on Computer Vision (ECCV), 2022 Paper, Code |
| Long-Tailed Class Incremental Learning Xialei Liu, Yusong Hu, Xu-Sheng Cao, Andy Bagdanov, Ke Li, Ming-Ming Cheng. European Conference on Computer Vision (ECCV), 2022 Paper, Code |
| DisCo: Remedying Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning Yuting Gao, Jia-Xin Zhuang, Shaohui Lin, Hao Cheng, Xing Sun, Ke Li*, Chunhua Shen. European Conference on Computer Vision (ECCV, Oral), 2022 Paper, Code |
| Efficient Decoder-free Object Detection with Transformers Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li*, Chunhua Shen. European Conference on Computer Vision (ECCV), 2022 Paper, Code |
| ARM: Any-Time Super-Resolution Method Bohong Chen, Mingbao Lin, Kekai Sheng, Mengdan Zhang, Peixian Chen, Ke Li, Liujuan Cao, Rongrong Ji. European Conference on Computer Vision (ECCV), 2022 Paper, Code |
| Self-supervised Models are Good Teaching Assistants for Vision Transformers Haiyan Wu, Yuting Gao, Yinqi Zhang, Shaohui Lin, Yuan Xie, Xing Sun, Ke Li . International Conference on Machine Learning (ICML), 2022 Paper |
| Training-free Transformer Architecture Search Qinqin Zhou, Kekai Sheng, Xiawu Zheng, Ke Li, Xing Sun, Yonghong Tian, Jie Chen, Rongrong Ji . Computer Vision and Pattern Recognition (CVPR, Oral), 2022 Paper, Code |
| Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer Yifan Xu, Zhijie Zhang, Mengdan Zhang, Kekai Sheng, Ke Li, Weiming Dong, Liqing Zhang, Changsheng Xu, Xing Sun. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2022 Paper, Code |
| Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning Jinpeng Wang, Yuting Gao, Ke Li, Yiqi Lin, Andy J Ma, Xing Sun. Computer Vision and Pattern Recognition (CVPR), 2021 Paper, Code |
| Architecture Disentanglement for Deep Neural Networks Jie Hu, Liujuan Cao, Qixiang Ye, Tong Tong, ShengChuan Zhang, Ke Li, Feiyue Huang, Rongrong Ji, Ling Shao. International Conference on Computer Vision (ICCV, Oral), 2021 Paper, Code |
| Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Jinpeng Wang, Yuting Gao, Ke Li, Xinyang Jiang, Xiaowei Guo, Rongrong Ji, Xing Sun. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021 Paper, Code |
| One for More: Selecting Generalizable Samples for Generalizable ReID Model Enwei Zhang, Xinyang Jiang, Hao Cheng, Ancong Wu, Fufu Yu, Ke Li, Xiaowei Guo, Feng Zheng, Wei-Shi Zheng, Xing Sun. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021 Paper |
| Pruning Filter in Filter Fanxu Meng, Hao Cheng, Ke Li, Huixiang Luo, Xiaowei Guo, Guangming Lu, Xing Sun. Advances in Neural Information Processing Systems (NeurIPS), 2020 Paper, Code |
| Filter Grafting for Deep Neural Networks Fanxu Meng, Hao Cheng, Ke Li, Zhixin Xu, Rongrong Ji, Xing Sun, Gaungming Lu. Computer Vision and Pattern Recognition (CVPR), 2020 Paper, Code |
| Asymmetric Co-Teaching for Unsupervised Cross Domain Person Re-Identification Fengxiang Yang, Ke Li, Zhun Zhong, Zhiming Luo, Xing Sun, Hao Cheng, Xiaowei Guo, Feiyue Huang, Rongrong Ji, Shaozi Li. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2020. Paper, Code |
| Semi-Supervised Adversarial Monocular Depth Estimation Rongrong Ji, Ke Li*, Yan Wang, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, and Jiebo Luo. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019. Paper |