About Me

I am currently a 2nd-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China, fortunate to be co-advised by Prof. Zhicheng Dou and Prof. Jirong Wen. I earned my M.Eng (2024) and B.Eng (2021) degrees in Information and Communication Engineering from Beijing University of Posts and Telecommunications (BUPT), advised by Prof. Weiran Xu.

I’m currently a Top Seed research intern focusing on general agent research at Bytedance Seed. Previously, I held research intern positions at the Alibaba Qwen Team, Kuaishou Klear Team, and Meituan NLP Center. I have published 40+ papers in top-tier AI conferences and journals (10+ first-author papers), including NeurIPS, ICLR, ACL, WWW, EMNLP, NAACL, AAAI, and IP&M.

Research Interests：

General Agent Training — Training long-horizon agents with scalable real-world interaction capabilities
Agent Harness Engineering — Building stronger scaffolds to fully unlock frontier agent capabilities of foundation models
Agentic Reinforcement Learning — Training general agent intelligence via fundamental RL-based optimization

My long-term goal is to develop automated, scalable, and safe approaches that foster exceptional intelligence toward achieving AGI. I am also a firm believer in The Bitter Lesson.

🔥 News

2026.04: 🚀 Released Agent-World, scaling real-world environment synthesis for evolving general agent intelligence! Check out our demo!
2026.03: 🎉 Tool-Star, ARPO, and AEPO accepted at SIGIR 2026, ICLR 2026, and WWW 2026! Welcome to follow our Agent RL family.
2026.02: 🎉 Honored to receive the 2026 Tencent Project Up Scholarship (首届腾讯青云奖学金, 全国15人). Thanks to my teachers and co-authors for their support!
2026.02: 🚀 We released Seed2.0. As a core contributor, I am responsible for the core MCP tool-use agent capability. (MCPMark 54.7, BFCL-V4 73.4, VitaBench 47).
2026.02: 🚀 Released OmniGAIA, building native omni-modal AI agents!
2025.12: 🚀 Released Seed1.8, towards generalized real-world agency! Honored to be a core contributor.
2025.10: AEPO featured as 🤗 HF Daily Paper #2! Our work on entropy-balanced policy optimization for multi-turn LLM agents.
2025.09: 🌐 WebThinker accepted by NeurIPS 2025! A powerful open-source deep research agent. Check out our demo!
2025.08: ARPO featured as 🤗 HF Weekly Paper #1! An agentic RL algorithm for multi-turn LLM agents.
2025.08: 🔍 Search-o1 accepted by EMNLP 2025 as Oral Presentation!
2025.05: Four papers accepted by ACL 2025!
2025.05: Released Tool-Star, an LLM-brained multi-tool reasoner via RL! Check out our project.
2025.02: ⚡ FlashRAG now supports multimodal retrievers and generators!
2025.01: DPA-RAG accepted by WWW 2025 — aligning diverse preferences in RAG systems.
2025.01: Two papers accepted by ICLR 2025! AutoIF is the secret behind Qwen’s instruction-following alignment.

📖 Education

2024.09 - Present | Ph.D. in Artificial Intelligence
Gaoling School of Artificial Intelligence, Renmin University of China
2021.09 - 2024.06 | M.Eng in Artificial Intelligence
Beijing University of Posts and Telecommunications
2017.09 - 2021.06 | B.Eng in Information and Communication Engineering
Beijing University of Posts and Telecommunications
2018.07 - 2018.08 | Summer Exchange Program
University of Oxford

💻 Research Experience

2025.11 - Present | ByteDance, Seed General Agent Team
- Research Intern on RL for General Agent (Top Seed Program)
- Mentors: Wanjun Zhong, Yujia Qin
2025.04 - 2025.11 | Kuaishou, Foundation LLM Team
- Research Intern on Agentic RL & Deep Search Agent (K-Star Program)
- Mentors: Hangyu Mao, Fuzheng Zhang
2023.06 - 2024.08 | Alibaba, Qwen Foundation LLM Team
- Research Intern on Alignment & Reasoning of Large Language Models
- Mentors: Bowen Yu, Zheng Yuan, Wei Wang, Keming Lu
2022.09 - 2023.05 | Meituan, NLP Center
- Research Intern on Knowledge-Augmented Generation
- Mentor: Rumei Li

🏆 Honors & Awards

Grants

2026.01-2027.12: Fundamental Research Project for PhD Students, NSFC (国家自然科学基金青年学生基础研究项目-博士生)
2026: Young Talent Support Program for Doctoral Students, CAST (中国科协青年人才托举工程博士生专项计划)
2026: Top-Tier Innovative Talent Cultivation Support Program of RUC (中国人民大学拔尖创新人才培育资助计划)

Scholarships

2026: Tencent Project Up Scholarship (首届腾讯青云奖学金, 全国15人)
2025: National Scholarship for Ph.D. Students (博士生国家奖学金, Top 1%)
2024: Outstanding Graduates of Beijing (北京市优秀毕业生, Top 1%), Link
2024: 1st Place in PhD Entrance Exam (Preliminary), GSAI, Renmin University of China, Link
2023: National Scholarship for Master Students (硕士生国家奖学金, Top 1%), Link
2021-2022: Excellent First-class Scholarship for Master Students, BUPT

Competitions

2025: 2nd Award on Track 1 of LiveRAG Workshop Challenge, SIGIR 2025, Link
2022: 1st Award on Track 2 of SereTOD Workshop Challenge, EMNLP 2022, Link
2021: Honorable Mention, The American Mathematical Contest in Modeling

📝 Selected Preprints

* for corresponding author, ^# for equal contribution. Recent representative preprints and technical reports.

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Guanting Dong, Junting Lu, Junjie Huang, Wanjun Zhong, Longxiang Liu, Shijue Huang, Zhenyu Li, Yang Zhao, Xiaoshuai Song, Xiaoxi Li, Jiajie Jin, Yutao Zhu, Hanbin Wang, Fangyu Lei, Qinyu Luo, Mingyang Chen, Zehui Chen, Jiazhan Feng, Ji-Rong Wen, Zhicheng Dou.
Seed2.0 Model Card: Towards Intelligence Frontier for Real-World Complexity
ByteDance Seed Team (core contributors including Guanting Dong).
Seed1.8 Model Card: Towards Generalized Real-World Agency
ByteDance Seed Team (core contributors including Guanting Dong).
Qwen2.5 Technical Report
Qwen Team (129 authors including Guanting Dong).
Qwen2 Technical Report
Qwen Team (64 authors including Guanting Dong).
OmniGAIA: Towards Native Omni-Modal AI Agents
Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Shijian Wang, Guanting Dong, Jiajie Jin, Hao Wang, Yinuo Wang, Ji-Rong Wen, Yuan Lu, Zhicheng Dou.
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, Chang Zhou.
InstructERC: Reforming Emotion Recognition in Conversation with a Retrieval Multi-task LLMs Framework
Shanglin Lei, Guanting Dong^*, Xiaoping Wang, Keheng Wang, Sirui Wang.

📝 Selected Publications（Full List）

Selected peer-reviewed publications.

2026

ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration
Yifei Chen, Guanting Dong, Zhicheng Dou.
ACL 2026 (CCF-A)
ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use
Mengjie Deng, Guanting Dong, Zhicheng Dou.
ACL 2026 Findings
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis
Xiaoshuai Song, Haofei Chang, Guanting Dong, Yutao Zhu, Zhicheng Dou, Ji-Rong Wen.
ACL 2026 Findings
Tool-Star: Empowering Multi-Tool Collaborative Web Agent via Reinforcement Learning
Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, Ji-Rong Wen.
SIGIR 2026 (CCF-A)
SmartSearch: Process Reward-Guided Query Refinement for Search Agents
Tongyu Wen, Guanting Dong, Zhicheng Dou.
SIGIR 2026 (CCF-A)
Agentic Reinforced Policy Optimization
Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du, Huiyang Wang, Fuzheng Zhang, Guorui Zhou, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou.
ICLR 2026 (CCF-A)
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
Yifei Chen, Guanting Dong, Zhicheng Dou.
ICLR 2026 (CCF-A)
Agentic Entropy-Balanced Policy Optimization
Guanting Dong, Licheng Bao, Zhongyuan Wang, Kangzhi Zhao, Xiaoxi Li, Jiajie Jin, Jinghan Yang, Hangyu Mao, Fuzheng Zhang, Kun Gai, Guorui Zhou, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou.
WWW 2026 (CCF-A) (Oral)
DeepAgent: A General Reasoning Agent with Scalable Toolsets
Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Guanting Dong, Jiajie Jin, Yinuo Wang, Hao Wang, Yutao Zhu, Ji-Rong Wen, Yuan Lu, Zhicheng Dou.
WWW 2026 (CCF-A) (Oral)

2025

WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Xiaoxi Li, Jiajie Jin, Guanting Dong^#, Hongjin Qian, Yongkang Wu, Ji-Rong Wen, Yutao Zhu, and Zhicheng Dou.
NeurIPS 2025 (CCF-A)
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou.
EMNLP 2025 (CCF-B) (Oral) (Most Influential arXiv AI Papers in 2025 – Top 5/All)
RAG-Critic: Leveraging Automated Critic-Guided Agentic Workflow for Retrieval Augmented Generation
Guanting Dong, Chenghao Zhang, Mengjie Deng, Yutao Zhu, Zhicheng Dou, and Ji-Rong Wen.
ACL 2025 (CCF-A)
Progressive Multimodal Reasoning via Active Retrieval
Guanting Dong, Chenghao Zhang, Mengjie Deng, Yutao Zhu, Zhicheng Dou, and Ji-Rong Wen.
ACL 2025 (CCF-A)
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao, Qiuna Tan, Guanting Dong^#, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang.
ACL 2025 (CCF-A) (Most Influential ACL 2025 Papers – Top 6/All)
$GitHub Repo stars$
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation
Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, and Ji-Rong Wen.
WWW 2025 (CCF-A) (Most Influential WWW Papers in 2025 – Top 10/All)
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
Guanting Dong, Xiaoshuai Song, Yutao Zhu, Runqi Qiao, Zhicheng Dou, and Ji-Rong Wen.
AAAI 2025 (CCF-A)
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, Jingren Zhou.
ICLR 2025 Spotlight (CCF-A)
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song, Muxi Diao, Guanting Dong^*, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma GongQue, Jianing Yu, Qiuna Tan, Weiran Xu.
ICLR 2025 (CCF-A)
FlashRAG: A Python Toolkit for Efficient RAG Research ⚡
Jiajie Jin, Yutao Zhu, Guanting Dong, Yuyao Zhang, Xinyu Yang, Chenghao Zhang, Tong Zhao, Zhao Yang, Zhicheng Dou, Ji-Rong Wen.
WWW 2025 (CCF-A)

2024

How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition
Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, Jingren Zhou.
ACL 2024 (CCF-A) (Most Influential ACL 2024 Papers – Top 11/All)
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning
Chengpeng Li, Zheng Yuan, Hongyi Yuan, Guanting Dong, Keming Lu, Jiancan Wu, Chuanqi Tan, Xiang Wang, Chang Zhou.
ACL 2024 (CCF-A)
ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models
Haoran Luo, Haihong E, Zichen Tang, Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma, Guanting Dong, Meina Song, Wei Lin.
ACL 2024 Findings
DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning
Yejie Wang, Keqing He, Guanting Dong, Pei Wang, Weihao Zeng, Muxi Diao, Yutao Mou, Mengdi Zhang, Jingang Wang, Xunliang Cai, Weiran Xu.
ACL 2024 (CCF-A)

2023

A Multi-Task Semantic Decomposition Framework with Task-specific Pre-training for Few-Shot NER
Guanting Dong, Zechen Wang, Jinxu Zhao, Gang Zhao, Daichi Guo, Dayuan Fu, Tingfeng Hui, Chen Zeng, Keqing He, Xuefeng Li, Liwen Wang, Xinyue Cui, Weiran Xu.
CIKM 2023 (CCF-B)
Bridging the KB-Text Gap: Leveraging Structured Knowledge-aware Pre-training for KBQA
Guanting Dong, Rumei Li, Sirui Wang, Yupeng Zhang, Yunsen Xian, Weiran Xu.
CIKM 2023 (CCF-B)

🎤 Invited Talks

2025.11: “Agentic Reinforcement Policy Optimization”, MLNLP Community, Slides
2025.10: “Agentic Reinforcement Policy Optimization”, EvoAgentX Community, Slides
2025.09: “Agentic Reinforcement Policy Optimization”, HunYuan Team, Tencent
2025.08: “ARPO: Encouraging Agents to Explore at Critical Moments”, NICE Community, Slides

🔍 Academic Services

Journal Reviewer

Knowledge-Based Systems (KBS)

Senior Program Committee (SPC)

AAAI: 2026

Program Committee Member / Reviewer

Top-tier ML Conferences: NeurIPS (2024–2025), ICML (2025), ICLR (2023–2026)
Top-tier AI/DM Conferences: KDD (2025), SIGIR (2025), WWW (2025–2026), CIKM (2024–2025), AAAI (2026)
Top-tier NLP Conferences: ACL ARR (2024–2026)

Guanting Dong (董冠霆)