About Me
I am currently a 2nd-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China, fortunate to be co-advised by Prof. Zhicheng Dou and Prof. Jirong Wen. I earned my M.Eng (2024) and B.Eng (2021) degrees in Information and Communication Engineering from Beijing University of Posts and Telecommunications (BUPT), advised by Prof. Weiran Xu.
I’m currently a Top Seed research intern focusing on general agent research at Bytedance Seed. Previously, I held research intern positions at the Alibaba Qwen Team, Kuaishou Klear Team, and Meituan NLP Center. I have published 40+ papers in top-tier AI conferences and journals (10+ first-author papers), including NeurIPS, ICLR, ACL, WWW, EMNLP, NAACL, AAAI, and IP&M.
Research Interests:
- Agentic Reinforcement Learning — Training general agents via Fundamental RL-based optimization
- Deep Search Agents — Enhancing long-horizon reasoning with web-scale information Seeking
- Alignment for Large Language Models — Improving Multi-diminutional alignment for LLMs
My long-term goal is to develop automated, scalable and safe ways that foster exceptional intelligence toward achieving AGI.
🔥 News
- 2025.10: AEPO featured as 🤗 HF Daily Paper #2! Our work on entropy-balanced policy optimization for multi-turn LLM agents.
- 2025.09: 🌐 WebThinker accepted by NeurIPS 2025! A powerful open-source deep research agent. Check out our demo!
- 2025.08: ARPO featured as 🤗 HF Weekly Paper #1! An agentic RL algorithm for multi-turn LLM agents.
- 2025.08: 🔍 Search-o1 accepted by EMNLP 2025 as Oral Presentation!
- 2025.05: Four papers accepted by ACL 2025!
- 2025.05: Released Tool-Star, an LLM-brained multi-tool reasoner via RL! Check out our project.
- 2025.02: ⚡ FlashRAG now supports multimodal retrievers and generators!
- 2025.01: DPA-RAG accepted by WWW 2025 — aligning diverse preferences in RAG systems.
- 2025.01: Two papers accepted by ICLR 2025! AutoIF is the secret behind
Qwen’s instruction-following alignment.
📖 Education
-
2024.09 - Present |
Ph.D. in Artificial Intelligence
Gaoling School of Artificial Intelligence, Renmin University of China -
2021.09 - 2024.06 |
M.Eng in Artificial Intelligence
Beijing University of Posts and Telecommunications -
2017.09 - 2021.06 |
B.Eng in Information and Communication Engineering
Beijing University of Posts and Telecommunications -
2018.07 - 2018.08 |
Summer Exchange Program
University of Oxford
💻 Research Experience
-
2025.11 - Present |
ByteDance, Seed General Agent Team
- Research Intern on RL for General Agent (Top Seed Program)
- Mentor: Wanjun Zhong -
2025.04 - 2025.11 |
Kuaishou, Foundation LLM Team
- Research Intern on Agentic RL & Deep Search Agent (K-Star Program)
- Mentors: Hangyu Mao, Fuzheng Zhang -
2023.06 - 2024.08 |
Alibaba,
Qwen Foundation LLM Team
- Research Intern on Alignment & Reasoning of Large Language Models
- Mentors: Bowen Yu, Zheng Yuan, Wei Wang, Keming Lu -
2022.09 - 2023.05 |
Meituan, NLP Center
- Research Intern on Knowledge-Augmented Generation
- Mentor: Rumei Li
🏆 Honors & Awards
Grants
- 2026: Young Talent Support Program for Doctoral Students, CAST (中国科协青年人才托举工程博士生专项计划)
- 2026.01-2027.12: Fundamental Research Project for PhD Students, NSFC (国家自然科学基金青年学生基础研究项目-博士生)
Scholarships
- 2025: National Scholarship for Ph.D. Students (博士生国家奖学金, Top 1%)
- 2024: Outstanding Graduates of Beijing (北京市优秀毕业生, Top 1%), Link
- 2024: 1st Place in PhD Entrance Exam (Preliminary), GSAI, Renmin University of China, Link
- 2023: National Scholarship for Master Students (硕士生国家奖学金, Top 1%), Link
- 2021-2022: Excellent First-class Scholarship for Master Students, BUPT
Competitions
- 2025: 2nd Award on Track 1 of LiveRAG Workshop Challenge, SIGIR 2025, Link
- 2022: 1st Award on Track 2 of SereTOD Workshop Challenge, EMNLP 2022, Link
- 2021: Honorable Mention, The American Mathematical Contest in Modeling
🎤 Invited Talks
- 2025.11: “Agentic Reinforcement Policy Optimization”, MLNLP Community, Slides
- 2025.10: “Agentic Reinforcement Policy Optimization”, EvoAgentX Community, Slides
- 2025.09: “Agentic Reinforcement Policy Optimization”, HunYuan Team, Tencent
- 2025.08: “ARPO: Encouraging Agents to Explore at Critical Moments”, NICE Community, Slides
📝 Selected Preprints
* for corresponding author, # for equal contribution.
-
Qwen2.5 Technical Report

Qwen Team (129 authors including Guanting Dong).
-
Qwen2 Technical Report

Qwen Team (64 authors including Guanting Dong).
-
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, Chang Zhou.
-
InstructERC: Reforming Emotion Recognition in Conversation with a Retrieval Multi-task LLMs Framework
Shanglin Lei, Guanting Dong*, Xiaoping Wang, Keheng Wang, Sirui Wang.
-
Agentic Entropy-Balanced Policy Optimization
Guanting Dong, Licheng Bao, Zhongyuan Wang, Kangzhi Zhao, Xiaoxi Li, Jiajie Jin, Jinghan Yang, Hangyu Mao, Fuzheng Zhang, Kun Gai, Guorui Zhou, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou.
-
Agentic Reinforced Policy Optimization
Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du, Huiyang Wang, Fuzheng Zhang, Guorui Zhou, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou.
-
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, Ji-Rong Wen.
📝 Selected Publications(Full List)
-
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yongkang Wu, Ji-Rong Wen, Yutao Zhu, and Zhicheng Dou.
NeurIPS 2025 (CCF-A)
-
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou.
EMNLP 2025 (CCF-B)
-
RAG-Critic: Leveraging Automated Critic-Guided Agentic Workflow for Retrieval Augmented Generation
Guanting Dong, Chenghao Zhang, Mengjie Deng, Yutao Zhu, Zhicheng Dou, and Ji-Rong Wen.
ACL 2025 (CCF-A)
-
Progressive Multimodal Reasoning via Active Retrieval
Guanting Dong, Chenghao Zhang, Mengjie Deng, Yutao Zhu, Zhicheng Dou, and Ji-Rong Wen.
ACL 2025 (CCF-A) -
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao, Qiuna Tan, Guanting Dong#, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang.
ACL 2025 (CCF-A)
-
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation
Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, and Ji-Rong Wen.
WWW 2025 (CCF-A)
-
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
Guanting Dong, Xiaoshuai Song, Yutao Zhu, Runqi Qiao, Zhicheng Dou, and Ji-Rong Wen.
AAAI 2025 (CCF-A)
-
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, Jingren Zhou.
ICLR 2025 Spotlight (CCF-A)
-
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma GongQue, Jianing Yu, Qiuna Tan, Weiran Xu.
ICLR 2025 (CCF-A)
-
FlashRAG: A Python Toolkit for Efficient RAG Research ⚡
Jiajie Jin, Yutao Zhu, Guanting Dong, Yuyao Zhang, Xinyu Yang, Chenghao Zhang, Tong Zhao, Zhao Yang, Zhicheng Dou, Ji-Rong Wen.
WWW 2025 (CCF-A)
-
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition
Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, Jingren Zhou.
ACL 2024 (CCF-A)
-
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning
Chengpeng Li, Zheng Yuan, Hongyi Yuan, Guanting Dong, Keming Lu, Jiancan Wu, Chuanqi Tan, Xiang Wang, Chang Zhou.
ACL 2024 (CCF-A)
-
ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models
Haoran Luo, Haihong E, Zichen Tang, Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma, Guanting Dong, Meina Song, Wei Lin.
ACL 2024 Findings (CCF-A)
-
DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning
Yejie Wang, Keqing He, Guanting Dong, Pei Wang, Weihao Zeng, Muxi Diao, Yutao Mou, Mengdi Zhang, Jingang Wang, Xunliang Cai, Weiran Xu.
ACL 2024 (CCF-A) -
A Multi-Task Semantic Decomposition Framework with Task-specific Pre-training for Few-Shot NER
Guanting Dong, Zechen Wang, Jinxu Zhao, Gang Zhao, Daichi Guo, Dayuan Fu, Tingfeng Hui, Chen Zeng, Keqing He, Xuefeng Li, Liwen Wang, Xinyue Cui, Weiran Xu.
CIKM 2023 (CCF-B)
-
Bridging the KB-Text Gap: Leveraging Structured Knowledge-aware Pre-training for KBQA
Guanting Dong, Rumei Li, Sirui Wang, Yupeng Zhang, Yunsen Xian, Weiran Xu.
CIKM 2023 (CCF-B)
🔍 Academic Services
Journal Reviewer
- Knowledge-Based Systems (KBS)
Senior Program Committee (SPC)
- AAAI: 2026
Program Committee Member / Reviewer
- Top-tier ML Conferences: NeurIPS (2024–2025), ICML (2025), ICLR (2023–2026)
- Top-tier AI/DM Conferences: KDD (2025), SIGIR (2025), WWW (2025–2026), CIKM (2024–2025), AAAI (2026)
- Top-tier NLP Conferences: ACL ARR (2024–2026)