I work at BIGAI as a senior research engineer now in Beijing, advised by Zilong Zheng (郑子隆). I am now working on diffusion language model, long context and long sequence generation research.
I graduated from Computer Technology, Tsinghua University (清华大学) with a master’s degree and from Computer Science and Technology, Beijing Institute and Technology (北京理工大学) with a bachelor’s degree.
I have interned at X-Tech, XiZi, SenseTime, IDEA, MSRA, Deepseek, and BIGAI. At X-Tech, I was advised by Yifei Jin (金逸飞) and Jian Li (李建). At IDEA, I was advised by Hao Wang (王昊) and Jiaxing Zhang (张家兴). At MSRA, I was advised by Zhihao Fan (范智昊) and Yeyun Gong (宫叶云). I have published some papers at the top international AI conferences such as NeurIPS, ICML.
🔥 News
- 2024.09: 🎉🎉 Our paper is accepted by NIPS 2024
📝 Publications
Tong Wu, Yanpeng Zhao, Zilong Zheng
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Tong Wu^, Zhihao Fan^, Xiao Liu, Hai-Tao Zheng, Yeyun Gong, yelong shen, Jian Jiao, Juntao Li, zhongyu wei, Jian Guo, Nan Duan, Weizhu Chen
Enhancing Text Generation with Cooperative Training
Tong Wu^, Hao Wang^, Zhongshen Zeng, Wei Wang, Hai-Tao Zheng, Jiaxing Zhang
- Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise, Zhenghao Lin, Yeyun Gong, Yelong Shen, Tong Wu, Zhihao Fan, Chen Lin, Nan Duan, Weizhu Chen, ICML 2023
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
📖 Educations
- 2021.09 - 2024.06, Master, Tsinghua University, Beijing.
- 2017.08 - 2021.06, Bachelor, Beijing Institute and Technology, Beijing.
💻 Internships
- 2024.02 - 2024.06, BIGAI, NLCo, Beijing.
- 2023.07 - 2023.11, Deepseek, Beijing.
- 2022.11 - 2023.07, MSRA, NLC, Beijing.
- 2022.04 - 2022.10, IDEA, CCNL, Shenzhen.
- 2021.10 - 2022.02, SenseTime, Shenzhen.
- 2021.01 - 2021.06, Xizi, Beijing.
- 2020.01 - 2020.05, X-Tech, Beijing.