About me
I am currently a 2nd year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China, fortunate to be co-advised by Prof. Zhicheng Dou and Prof. Jirong Wen. I earned my M.Eng (2024) and B.Eng (2021) degrees in Information and Communication Engineering from Beijing University of Posts and Telecommunications (BUPT), advised by Prof. Weiran Xu.
Prior to that, I was a research intern at Kuaishou Klear Team, Alibaba Qwen Team and Meituan NLP center. I have published over 40+ papers in top-tier AI conferences and journals (10+ first author paper), including NeurIPS, ICLR, ACL, WWW, EMNLP, NAACL, AAAI, IP&M etc.
My long-term goal is to explore an automated, scalable, and safe way that fosters exceptional intelligence to achieve AGI.
Research Interests:
- Agentic Reinforcement Learning
- Deep Search & Research Agent
- Alignment for Large Language Models
🔥 News
- 2024.10: We introduce AEPO, designed to balance entropy for multi-turn LLM agent training! Featured as 🤗 HF Daily Paper #2!
- 2025.09: 🌐 WebThinker has been accepted by NeurIPS 2025! Feel free to check out our Demo!
- 2024.08: We introduce ARPO, an agentic RL algorithm for multi-turn LLM agents! Featured as 🤗 HF Weekly Paper #1!
- 2025.08: 🔍 Search-o1 has been accepted by EMNLP 2025 as Oral Presentation!
- 2025.05: Four papers have been accepted by ACL 2025!
- 2025.05: We propose 🌟Tool-Star, a LLM-brained multi-tool reasoner via RL! Check out our project!
- 2025.04: We introduce 🌐 WebThinker, a powerful open-sourced deep research agent! Feel free to check out our Demo!
- 2025.02: Our modular toolkit ⚡FlashRAG supports a range of multimodal retrievers and generators, please check it out!
- 2025.01: DPA-RAG has been accepted by WWW 2025, which is designed to align diverse preferences within RAG systems.
- 2025.01: Two papers have been accepted by ICLR 2025! AUTOIF is the secret behind
Qwen’s instruction-following alignment. - 2024.12: Honored to be a contributor to the Qwen2.5
, a series of LLMs designed to meet diverse needs! - 2024.09: Glad to be a Ph.D. student at GSAI, Renmin University of China.
- 2024.09: Two papers have been accepted by EMNLP 2024!
- 2024.07: We release our technical report of Qwen2
. a large-scale language model developed by Alibaba Group. - 2024.05: Four papers have been accepted by ACL 2024! Looking forward to seeing you in Bangkok!
📖 Education
- 2024.9 - Present,
Ph.D, Gaoling School of Artificial Intelligence, Renmin University of China. - 2021.9 - 2024.6,
M.Eng, Artificial Intelligence, Beijing University of Posts and Telecommunications. - 2017.9 - 2021.6,
B.Eng, Information and Communication Engineering, Beijing University of Posts and Telecommunications. - 2018.7 - 2018.8,
Summer Exchange Internship, University of Oxford.
💻 Experiences
- 2025.4 - Present,
Kuaishou, Foundation LLM Team
- Research Intern on Deep Search & Research Agents
- Mentor: Hangyu Mao, Fuzheng Zhang
- 2023.6 - 2024.8,
Alibaba, Qwen Team
- Research Intern on Alignment & Reasoning of Large Language Models
- Mentor: Bowen Yu, Zheng Yuan, Wei Wang, Keming Lu
- 2022.9 - 2023.5,
Meituan, NLP Center
- Research Intern on Knowledge Augmented Generation
- Mentor: Rumei Li
🏆 Honors & Awards
- National Scholarship for Ph.D Students(Top 1%), 2025
- 1st Place in the PhD Entrance Exam (Preliminary) at the GSAI, Renmin University of China, 2024
- Outstanding Graduates of Beijing(Top 1%), 2024
- National Scholarship for Master Students(Top 1%), 2023
- Outstanding Graduate of Master Students(Top 5%), BUPT, 2023
- Excellent First-class Scholarship for Master Students, BUPT(Two times), 2021, 2022
- 1st Award on track 2 of SereTOD Challenge, EMNLP 2022, 2022
- Gold Award for College Music Festival Instrumental Performance, Beijing, 2021
- The Mathematical Contest in Modeling, Honorable Mention, 2021
📝 Publications
* for corresponding author, # for equal contribution.
As the First Author
2025
ArxivAgentic Entropy-Balanced Policy Optimization, Guanting Dong, Licheng Bao, Zhongyuan Wang, Kangzhi Zhao, Xiaoxi Li, Jiajie Jin, Jinghan Yang, Hangyu Mao, Fuzheng Zhang, Kun Gai, Guorui Zhou, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou.ArxivAgentic Reinforced Policy Optimization, Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du, Huiyang Wang, Fuzheng Zhang, Guorui Zhou, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou.ArxivTool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning, Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, Ji-Rong Wen.ACL 2025RAG-Critic: Leveraging Automated Critic-Guided Agentic Workflow for Retrieval Augmented Generation, Guanting Dong, Chenghao Zhang, Mengjie Deng, Yutao Zhu, Zhicheng Dou, and Ji-Rong Wen.ACL 2025Progressive Multimodal Reasoning via Active Retrieval, Guanting Dong, Chenghao Zhang, Mengjie Deng, Yutao Zhu, Zhicheng Dou, and Ji-Rong Wen.ACL 2025We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?, Runqi Qiao, Qiuna Tan, Guanting Dong#, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang.WWW 2025Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation, Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, and Ji-Rong Wen.AAAI 2025Toward General Instruction-Following Alignment for Retrieval-Augmented Generation, Guanting Dong, Xiaoshuai Song, Yutao Zhu, Runqi Qiao, Zhicheng Dou, and Ji-Rong Wen.ICLR 2025(Spotlight) Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models, Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, Jingren Zhou.ICLR 2025CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery, Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma GongQue, Jianing Yu, Qiuna Tan, Weiran Xu. [homepage][dataset]
2024
ACL 2024How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition, Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, Jingren Zhou.
2023
EMNLP 2023 FindingsDemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task, Guanting Dong, Tingfeng Hui, Zhuoma GongQue, Jinxu Zhao, Daichi Guo, Gang Zhao, Keqing He, Weiran Xu.CIKM 2023A Multi-Task Semantic Decomposition Framework with Task-specific Pre-training for Few-Shot NER, Guanting Dong, Zechen Wang, Jinxu Zhao, Gang Zhao, Daichi Guo, Dayuan Fu, Tingfeng Hui, Chen Zeng, Keqing He, Xuefeng Li, Liwen Wang, Xinyue Cui, Weiran Xu.CIKM 2023Bridging the KB-Text Gap: Leveraging Structured Knowledge-aware Pre-training for KBQA, Guanting Dong, Rumei Li, Sirui Wang, Yupeng Zhang, Yunsen Xian, Weiran Xu.NLPCC 2023Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task, Guanting Dong, Jinxu Zhao, Tingfeng Hui, Daichi Guo, Wenlong Wang, Boqi Feng, Yueyan Qiu, Zhuoma Gongque, Keqing He, Zechen Wang, Weiran Xu.ICASSP 2023A Prototypical Semantic Decoupling Method via Joint Contrastive Learning for Few-Shot Named Entity Recognition, Guanting Dong, Zechen Wang, Liwen Wang, Daichi Guo, Dayuan Fu, Yuxiang Wu, Chen Zeng, Xuefeng Li, Tingfeng Hui, Keqing He, Xinyue Cui, Qixiang Gao, Weiran Xu.
2022
EMNLP 2022Exploiting domain-slot related keywords description for Few-Shot Cross-Domain Dialogue State Tracking, Gao Qixiang, Guanting Dong#, Yutao Mou, Liwen Wang, Chen Zeng, Daichi Guo, Mingyang Sun, Weiran Xu.EMNLP 2022 FindingsEntity-level Interaction via Heterogeneous Graph for Multimodal Named Entity Recognition, Gang Zhao, Guanting Dong, Yidong Shi, Haolong Yan, Weiran Xu, Si Li.COLING 2022PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling, Guanting Dong, Daichi Guo, Liwen Wang, Xuefeng Li, Zechen Wang, Chen Zeng, Keqing He, Jinzheng Zhao, Hao Lei, Xinyue Cui, Yi Huang, Junlan Feng, Weiran Xu.
As a Co-author
2025
NeurIPS 2025WebThinker: Empowering Large Reasoning Models with Deep Research Capability, Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yongkang Wu, Ji-Rong Wen, Yutao Zhu, and Zhicheng Dou.EMNLP 2025Search-o1: Agentic Search-Enhanced Large Reasoning Models, Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou.ACL 2025Hierarchical Document Refinement for Long-context Retrieval-augmented Generation, Jiajie Jin, Xiaoxi Li, Guanting Dong, Yuyao Zhang, Yutao Zhu, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou.NAACL 2025 FindingsCORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation, Yiruo Cheng, Kelong Mao, Ziliang Zhao, Guanting Dong, Hongjin Qian, Yongkang Wu, Tetsuya Sakai, Ji-Rong Wen, Zhicheng Dou.WWW 2025⚡FlashRAG: A Python Toolkit for Efficient RAG Research, Jiajie Jin, Yutao Zhu, Guanting Dong, Yuyao Zhang, Xinyu Yang, Chenghao Zhang, Tong Zhao, Zhao Yang, Zhicheng Dou, Ji-Rong Wen.WWW 2025Knowledge Editing on Black-box Large Language Models, Xiaoshuai Song, Zhengyang Wang, Keqing He, Guanting Dong, Jinxu Zhao, Weiran Xu.Information Processing & ManagementINSNER: A generative instruction-based prompting method for boosting performance in few-shot NER, Peiwen Zhao, Chong Feng, Peiguang Li, Guanting Dong, Sirui Wang.
2024
arXivQwen2.5 Technical Report, Qwen Team (129 authors including Guanting Dong).arXivQwen2 Technical Report, Qwen Team (64 authors including Guanting Dong).arXivDotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning, Chengpeng Li, Guanting Dong, Mingfeng Xue, Ru Peng, Xiang Wang, Dayiheng Liu.arXivSmaller Language Models Are Better Instruction Evolvers, Tingfeng Hui, Lulu Zhao, Guanting Dong, Yaqi Zhang, Hua Zhou, Sen Su.EMNLP 2024MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making, Dayuan Fu, Biqing Qi, Yihuai Gao, Che Jiang, Guanting Dong, Bowen Zhou.ACL 2024MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning, Chengpeng Li, Zheng Yuan, Hongyi Yuan, Guanting Dong, Keming Lu, Jiancan Wu, Chuanqi Tan, Xiang Wang, Chang Zhou.ACL 2024 FindingsChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models, Haoran Luo, Haihong E, Zichen Tang, Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma, Guanting Dong, Meina Song, Wei Lin.ACL 2024DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning, Yejie Wang, Keqing He, Guanting Dong, Pei Wang, Weihao Zeng, Muxi Diao, Yutao Mou, Mengdi Zhang, Jingang Wang, Xunliang Cai, Weiran Xu.
2023
arXivScaling relationship on learning mathematical reasoning with large language models, Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, Chang Zhou.arXivInstructERC: Reforming Emotion Recognition in Conversation with a Retrieval Multi-task LLMs Framework, Shanglin Lei, Guanting Dong*, Xiaoping Wang, Keheng Wang, Sirui Wang.EMNLP 2023Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT, Xiaoshuai Song, Keqing He, Pei Wang, Guanting Dong, Yutao Mou, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu.ACL 2023 FindingsPay Attention to Implicit Attribute Values: A Multi-modal Generative Framework for AVE Task, Yupeng Zhang, Shensi Wang, Peiguang Li, Guanting Dong, Sirui Wang, Yunsen Xian, Zhoujun Li, Hongzhi Zhang.
2022
ICASSP 2022A Robust Contrastive Alignment Method for Multi-domain Text Classification, Xuefeng Li, Hao Lei, Liwen Wang, Guanting Dong, Jinzheng Zhao, Jiachi Liu, Weiran Xu, Chunyun Zhang.
🔍 Academic Services
- SPC Reviewer for: AAAI
- PC Reviewer for: ICLR, ACL, EMNLP, NAACL, COLING