UMass Embodied AGI Group

Research

Our long-term career goal is to develop general-purpose embodied agents that could understand and interact with the physical world and other intelligent beings as flexibly as humans. Ultimately, we aim to bring embodied general intelligence to both virtual and physical environments.

Lab Publications

2024

FlexAttention for Efficient High-Resolution Vision-Language Models
Junyan Li, Delin Chen, Tianle Cai, Peihao Chen, Yining Hong, Zhenfang Chen, Yikang Shen, Chuang Gan
arXiv  ·  30 Jul 2024  ·  arXiv:2407.20228
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, Chuang Gan
arXiv  ·  18 Jun 2024  ·  arXiv:2311.01455
CoNav: A Benchmark for Human-Centered Collaborative Navigation
Changhao Li, Xinyu Sun, Peihao Chen, Jugang Fan, Zixu Wang, Yanxia Liu, Jinhui Zhu, Chuang Gan, Mingkui Tan
arXiv  ·  05 Jun 2024  ·  arXiv:2406.02425
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
Hongxin Zhang, Zeyuan Wang, Qiushi Lyu, Zheyuan Zhang, Sunli Chen, Tianmin Shu, Yilun Du, Chuang Gan
arXiv  ·  17 Apr 2024  ·  arXiv:2404.10775
SALMON: Self-Alignment with Instructable Reward Models
Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan
arXiv  ·  11 Apr 2024  ·  arXiv:2310.05910
Thin-Shell Object Manipulations With Differentiable Physics Simulations
Yian Wang, Juntian Zheng, Zhehuan Chen, Zhou Xian, Gu Zhang, Chao Liu, Chuang Gan
arXiv  ·  02 Apr 2024  ·  arXiv:2404.00451
Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning
Zhenfang Chen, Qinhong Zhou, Yikang Shen, Yining Hong, Zhiqing Sun, Dan Gutfreund, Chuang Gan
Proceedings of the AAAI Conference on Artificial Intelligence  ·  24 Mar 2024  ·  doi:10.1609/aaai.v38i2.27888
3D-VLA: A 3D Vision-Language-Action Generative World Model
Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan
arXiv  ·  15 Mar 2024  ·  arXiv:2403.09631
Building Cooperative Embodied Agents Modularly with Large Language Models
Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan
arXiv  ·  20 Feb 2024  ·  arXiv:2307.02485
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han
arXiv  ·  07 Feb 2024  ·  arXiv:2205.14756
HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments
Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan
arXiv  ·  24 Jan 2024  ·  arXiv:2401.12975
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Yining Hong, Zishuo Zheng, Peihao Chen, Yian Wang, Junyan Li, Chuang Gan
arXiv  ·  17 Jan 2024  ·  arXiv:2401.08577

2023

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning
Kunyang Lin, Yufeng Wang, Peihao Chen, Runhao Zeng, Siyuan Zhou, Mingkui Tan, Chuang Gan
arXiv  ·  12 Dec 2023  ·  arXiv:2312.05783
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan
arXiv  ·  05 Dec 2023  ·  arXiv:2305.03047
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Junyan Li, Delin Chen, Yining Hong, Zhenfang Chen, Peihao Chen, Yikang Shen, Chuang Gan
arXiv  ·  07 Nov 2023  ·  arXiv:2311.03354
$A^2$Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models
Peihao Chen, Xinyu Sun, Hongyan Zhi, Runhao Zeng, Thomas H. Li, Gaowen Liu, Mingkui Tan, Chuang Gan
arXiv  ·  17 Aug 2023  ·  arXiv:2308.07997
Learning Vision-and-Language Navigation from YouTube Videos
Kunyang Lin, Peihao Chen, Diwei Huang, Thomas H. Li, Mingkui Tan, Chuang Gan
arXiv  ·  25 Jul 2023  ·  arXiv:2307.11984
3D-LLM: Injecting the 3D World into Large Language Models
Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan
arXiv  ·  25 Jul 2023  ·  arXiv:2307.12981
Generating Visually Aligned Sound from Videos
Peihao Chen, Yang Zhang, Mingkui Tan, Hongdong Xiao, Deng Huang, Chuang Gan
arXiv  ·  19 Jul 2023  ·  arXiv:2008.00820
Masked Motion Encoding for Self-Supervised Video Representation Learning
Xinyu Sun, Peihao Chen, Liangwei Chen, Changhao Li, Thomas H. Li, Mingkui Tan, Chuang Gan
arXiv  ·  24 Mar 2023  ·  arXiv:2210.06096

2022

Learning Active Camera for Multi-Object Navigation
Peihao Chen, Dongyu Ji, Kunyang Lin, Weiwen Hu, Wenbing Huang, Thomas H. Li, Mingkui Tan, Chuang Gan
arXiv  ·  17 Oct 2022  ·  arXiv:2210.07505
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation
Peihao Chen, Dongyu Ji, Kunyang Lin, Runhao Zeng, Thomas H. Li, Mingkui Tan, Chuang Gan
arXiv  ·  17 Oct 2022  ·  arXiv:2210.07506

2021

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan
arXiv  ·  16 Mar 2021  ·  arXiv:2011.07949

2020

Relation Attention for Temporal Action Localization
Peihao Chen, Chuang Gan, Guangyao Shen, Wenbing Huang, Runhao Zeng, Mingkui Tan
IEEE Transactions on Multimedia  ·  01 Oct 2020  ·  doi:10.1109/TMM.2019.2959977
Location-aware Graph Convolutional Networks for Video Question Answering
Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, Chuang Gan
arXiv  ·  21 Aug 2020  ·  arXiv:2008.09105
Foley Music: Learning to Generate Music from Videos
Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba
arXiv  ·  22 Jul 2020  ·  arXiv:2007.10984
Dense Regression Network for Video Grounding
Runhao Zeng, Haoming Xu, Wenbing Huang, Peihao Chen, Mingkui Tan, Chuang Gan
arXiv  ·  08 Apr 2020  ·  arXiv:2004.03545

2019

Breaking Winner-Takes-All: Iterative-Winners-Out Networks for Weakly Supervised Temporal Action Localization
Runhao Zeng, Chuang Gan, Peihao Chen, Wenbing Huang, Qingyao Wu, Mingkui Tan
IEEE Transactions on Image Processing  ·  01 Dec 2019  ·  doi:10.1109/TIP.2019.2922108
Self-supervised Moving Vehicle Tracking with Stereo Sound
Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba
arXiv  ·  28 Oct 2019  ·  arXiv:1910.11760