About Me

I am currently a 2nd-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China, fortunate to be co-advised by Prof. Zhicheng Dou and Prof. Jirong Wen. I earned my M.Eng (2024) and B.Eng (2021) degrees in Information and Communication Engineering from Beijing University of Posts and Telecommunications (BUPT), advised by Prof. Weiran Xu.

I’m currently a Top Seed research intern focusing on general agent research at Bytedance Seed. Previously, I held research intern positions at the Alibaba Qwen Team, Kuaishou Klear Team, and Meituan NLP Center. I have published 40+ papers in top-tier AI conferences and journals (10+ first-author papers), including NeurIPS, ICLR, ACL, WWW, EMNLP, NAACL, AAAI, and IP&M.

Research Interests:

  • Agentic Reinforcement Learning — Training general agents via Fundamental RL-based optimization
  • Deep Search Agents — Enhancing long-horizon reasoning with web-scale information Seeking
  • Alignment for Large Language Models — Improving Multi-diminutional alignment for LLMs

My long-term goal is to develop automated, scalable and safe ways that foster exceptional intelligence toward achieving AGI.

🔥 News

  • 2025.10: AEPO featured as 🤗 HF Daily Paper #2! Our work on entropy-balanced policy optimization for multi-turn LLM agents.
  • 2025.09: 🌐 WebThinker accepted by NeurIPS 2025! A powerful open-source deep research agent. Check out our demo!
  • 2025.08: ARPO featured as 🤗 HF Weekly Paper #1! An agentic RL algorithm for multi-turn LLM agents.
  • 2025.08: 🔍 Search-o1 accepted by EMNLP 2025 as Oral Presentation!
  • 2025.05: Four papers accepted by ACL 2025!
  • 2025.05: Released Tool-Star, an LLM-brained multi-tool reasoner via RL! Check out our project.
  • 2025.02: ⚡ FlashRAG now supports multimodal retrievers and generators!
  • 2025.01: DPA-RAG accepted by WWW 2025 — aligning diverse preferences in RAG systems.
  • 2025.01: Two papers accepted by ICLR 2025! AutoIF is the secret behind Qwen’s instruction-following alignment.

📖 Education

  • 2024.09 - Present | Ph.D. in Artificial Intelligence
    Gaoling School of Artificial Intelligence, Renmin University of China

  • 2021.09 - 2024.06 | M.Eng in Artificial Intelligence
    Beijing University of Posts and Telecommunications

  • 2017.09 - 2021.06 | B.Eng in Information and Communication Engineering
    Beijing University of Posts and Telecommunications

  • 2018.07 - 2018.08 | Summer Exchange Program
    University of Oxford

💻 Research Experience

  • 2025.11 - Present | ByteDance, Seed General Agent Team
    - Research Intern on RL for General Agent (Top Seed Program)
    - Mentor: Wanjun Zhong

  • 2025.04 - 2025.11 | Kuaishou, Foundation LLM Team
    - Research Intern on Agentic RL & Deep Search Agent (K-Star Program)
    - Mentors: Hangyu Mao, Fuzheng Zhang

  • 2023.06 - 2024.08 | Alibaba, Qwen Foundation LLM Team
    - Research Intern on Alignment & Reasoning of Large Language Models
    - Mentors: Bowen Yu, Zheng Yuan, Wei Wang, Keming Lu

  • 2022.09 - 2023.05 | Meituan, NLP Center
    - Research Intern on Knowledge-Augmented Generation
    - Mentor: Rumei Li

🏆 Honors & Awards

Grants

  • 2026: Young Talent Support Program for Doctoral Students, CAST (中国科协青年人才托举工程博士生专项计划)
  • 2026.01-2027.12: Fundamental Research Project for PhD Students, NSFC (国家自然科学基金青年学生基础研究项目-博士生)

Scholarships

  • 2025: National Scholarship for Ph.D. Students (博士生国家奖学金, Top 1%)
  • 2024: Outstanding Graduates of Beijing (北京市优秀毕业生, Top 1%), Link
  • 2024: 1st Place in PhD Entrance Exam (Preliminary), GSAI, Renmin University of China, Link
  • 2023: National Scholarship for Master Students (硕士生国家奖学金, Top 1%), Link
  • 2021-2022: Excellent First-class Scholarship for Master Students, BUPT

Competitions

  • 2025: 2nd Award on Track 1 of LiveRAG Workshop Challenge, SIGIR 2025, Link
  • 2022: 1st Award on Track 2 of SereTOD Workshop Challenge, EMNLP 2022, Link
  • 2021: Honorable Mention, The American Mathematical Contest in Modeling

🎤 Invited Talks

  • 2025.11: “Agentic Reinforcement Policy Optimization”, MLNLP Community, Slides
  • 2025.10: “Agentic Reinforcement Policy Optimization”, EvoAgentX Community, Slides
  • 2025.09: “Agentic Reinforcement Policy Optimization”, HunYuan Team, Tencent
  • 2025.08: “ARPO: Encouraging Agents to Explore at Critical Moments”, NICE Community, Slides

📝 Selected Preprints

* for corresponding author, # for equal contribution.

📝 Selected Publications(Full List

🔍 Academic Services

Journal Reviewer

  • Knowledge-Based Systems (KBS)

Senior Program Committee (SPC)

  • AAAI: 2026

Program Committee Member / Reviewer

  • Top-tier ML Conferences: NeurIPS (2024–2025), ICML (2025), ICLR (2023–2026)
  • Top-tier AI/DM Conferences: KDD (2025), SIGIR (2025), WWW (2025–2026), CIKM (2024–2025), AAAI (2026)
  • Top-tier NLP Conferences: ACL ARR (2024–2026)