We gratefully acknowledge support from
the Simons Foundation and member institutions.

Aadarsh Sahoo is qualified to endorse.

Aligning Text, Images, and 3D Structure Token-by-Token

Aadarsh Sahoo: Is registered as an author of this paper.
Can endorse for cs.CV. (why?)

Vansh Tibrewal and Georgia Gkioxari are not registered as owners of this paper. (why?)