EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation

Qi, Carl; Haramati, Dan; Daniel, Tal; Tamar, Aviv; Zhang, Amy

Computer Science > Artificial Intelligence

arXiv:2412.18907 (cs)

[Submitted on 25 Dec 2024 (v1), last revised 25 Sep 2025 (this version, v3)]

Title:EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation

Authors:Carl Qi, Dan Haramati, Tal Daniel, Aviv Tamar, Amy Zhang

View PDF HTML (experimental)

Abstract:Object manipulation is a common component of everyday tasks, but learning to manipulate objects from high-dimensional observations presents significant challenges. These challenges are heightened in multi-object environments due to the combinatorial complexity of the state space as well as of the desired behaviors. While recent approaches have utilized large-scale offline data to train models from pixel observations, achieving performance gains through scaling, these methods struggle with compositional generalization in unseen object configurations with constrained network and dataset sizes. To address these issues, we propose a novel behavioral cloning (BC) approach that leverages object-centric representations and an entity-centric Transformer with diffusion-based optimization, enabling efficient learning from offline image data. Our method first decomposes observations into an object-centric representation, which is then processed by our entity-centric Transformer that computes attention at the object level, simultaneously predicting object dynamics and the agent's actions. Combined with the ability of diffusion models to capture multi-modal behavior distributions, this results in substantial performance improvements in multi-object tasks and, more importantly, enables compositional generalization. We present BC agents capable of zero-shot generalization to tasks with novel compositions of objects and goals, including larger numbers of objects than seen during training. We provide video rollouts on our webpage: this https URL.

Subjects:	Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Cite as:	arXiv:2412.18907 [cs.AI]
	(or arXiv:2412.18907v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2412.18907

Submission history

From: Carl Qi [view email]
[v1] Wed, 25 Dec 2024 13:50:15 UTC (27,960 KB)
[v2] Fri, 14 Feb 2025 20:43:22 UTC (27,962 KB)
[v3] Thu, 25 Sep 2025 04:41:03 UTC (27,975 KB)

Computer Science > Artificial Intelligence

Title:EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators