About me

I am currently a 2nd year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China, fortunate to be co-advised by Prof. Zhicheng Dou and Prof. Jirong Wen. I earned my M.Eng (2024) and B.Eng (2021) degrees in Information and Communication Engineering from Beijing University of Posts and Telecommunications (BUPT), advised by Prof. Weiran Xu.

Prior to that, I was a research intern at Kuaishou Klear Team, Alibaba Qwen Team and Meituan NLP center. I have published over 40+ papers in top-tier AI conferences and journals (10+ first author paper), including NeurIPS, ICLR, ACL, WWW, EMNLP, NAACL, AAAI, IP&M etc.

My long-term goal is to explore an automated, scalable, and safe way that fosters exceptional intelligence to achieve AGI.

Research Interests:

  • Agentic Reinforcement Learning
  • Deep Search & Research Agent
  • Alignment for Large Language Models

🔥 News

  • 2024.10: We introduce AEPO, designed to balance entropy for multi-turn LLM agent training! Featured as 🤗 HF Daily Paper #2
  • 2025.09: 🌐 WebThinker has been accepted by NeurIPS 2025! Feel free to check out our Demo!
  • 2024.08: We introduce ARPO, an agentic RL algorithm for multi-turn LLM agents! Featured as 🤗 HF Weekly Paper #1
  • 2025.08: 🔍 Search-o1 has been accepted by EMNLP 2025 as Oral Presentation!
  • 2025.05: Four papers have been accepted by ACL 2025!
  • 2025.05: We propose 🌟Tool-Star, a LLM-brained multi-tool reasoner via RL! Check out our project!
  • 2025.04: We introduce 🌐 WebThinker, a powerful open-sourced deep research agent! Feel free to check out our Demo!
  • 2025.02: Our modular toolkit ⚡FlashRAG supports a range of multimodal retrievers and generators, please check it out!
  • 2025.01: DPA-RAG has been accepted by WWW 2025, which is designed to align diverse preferences within RAG systems.
  • 2025.01: Two papers have been accepted by ICLR 2025! AUTOIF is the secret behind Qwen’s instruction-following alignment.
  • 2024.12: Honored to be a contributor to the Qwen2.5 , a series of LLMs designed to meet diverse needs!
  • 2024.09: Glad to be a Ph.D. student at GSAI, Renmin University of China.
  • 2024.09: Two papers have been accepted by EMNLP 2024!
  • 2024.07: We release our technical report of Qwen2 . a large-scale language model developed by Alibaba Group.
  • 2024.05: Four papers have been accepted by ACL 2024! Looking forward to seeing you in Bangkok!

📖 Education

  • 2024.9 - Present, Ph.D, Gaoling School of Artificial Intelligence, Renmin University of China.
  • 2021.9 - 2024.6, M.Eng, Artificial Intelligence, Beijing University of Posts and Telecommunications.
  • 2017.9 - 2021.6, B.Eng, Information and Communication Engineering, Beijing University of Posts and Telecommunications.
  • 2018.7 - 2018.8, Summer Exchange Internship, University of Oxford.

💻 Experiences

  • 2025.4 - Present, Kuaishou, Foundation LLM Team
  • 2023.6 - 2024.8, Alibaba, Qwen Team
  • 2022.9 - 2023.5, Meituan, NLP Center
    • Research Intern on Knowledge Augmented Generation
    • Mentor: Rumei Li

🏆 Honors & Awards

📝 Publications

* for corresponding author, # for equal contribution.

As the First Author

2025

2024

2023

2022

As a Co-author

2025

2024

2023

2022

🔍 Academic Services

  • SPC Reviewer for: AAAI
  • PC Reviewer for: ICLR, ACL, EMNLP, NAACL, COLING