I am a final-year Ph.D. candidate in the School of Software Engineering at South China University of Technology, advised by Prof. Mingkui Tan and Prof. Chuang Gan. I engage in developing an agent that can understand and interact with the multi-modal world. Toward this goal, my research mainly focus on:
- Embodied AI: Visual Navigation; Robot Manipulation
- Multi-Modal Video Understanding: Self-Supervised Video Representation Learning; Temporal Action Localization; Visually-Aligned Sound Generation