About Us
Welcome to the UMass Embodied AGI Group, where we aim to create intelligent agents that understand and interact with the world like humans. By combining physical and social intelligence with advanced models, we aim to push the boundaries of embodied general intelligence for real-world and virtual environments.
News
- Jul 2024 FlexAttention for Efficient High-Resolution Vision-Language Models appears at ECCV 2024. We introduce a plug-and-play attention module, called FlexAttention, that can enhance VLMs’ ability to perceive details in high resolution image in an efficient way.
- May 2024 RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text is now on arXiv. In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works that typically address two modalities in isolation (text-to-motion, text-to-audio, or audio-to-motion).
- May 2024 RoboDreamer: Learning Compositional World Models for Robot Imagination appears at ICML 2024. We propose FlexAttention, a flexible attention mechanism for efficient high-resolution vision-language models.
- May 2024 We released SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge. This work aims to delve deeper into reasoning evaluations, specifically within dynamic, open-world, and structured context knowledge.
- April 2024 COMBO: Compositional World Models for Embodied Multi-Agent Cooperation is now on arXiv. In this work, we propose a novel Compositional wOrld Model-based emBOdied multi-agent planning framework
- Mar 2024 3D-VLA: A 3D Vision-Language-Action Generative World Model is now on arXiv. We introduce a new family of embodied foundation models that seamlessly link 3D perception, reasoning, and action through a generative world model.
- Mar 2024 Thin-Shell Object Manipulations With Differentiable Physics Simulations appears at ICLR 2024. We introduce ThinShellLab - a fully differentiable simulation platform tailored for robotic interactions with diverse thin-shell materials possessing varying material properties, enabling flexible thin-shell manipulation skill learning and evaluation.
Highlights
Our Research
Our long-term career goal is to develop general-purpose embodied agents that could understand and interact with the physical world and other intelligent beings as flexibly as humans. Ultimately, we aim to bring embodied general intelligence to both virtual and physical environments.