SAM 2: Segment Anything in Images and Videos
Authors:
Nikhila Ravi,
Valentin Gabeur,
Yuan-Ting Hu,
Ronghang Hu,
Chaitanya Ryali,
Tengyu Ma,
Haitham Khedr,
Roman Rädle,
Chloe Rolland,
Laura Gustafson,
Eric Mintun,
Junting Pan,
Kalyan Vasudev Alwala,
Nicolas Carion,
Chao-Yuan Wu,
Ross Girshick,
Piotr Dollár,
Christoph Feichtenhofer
Abstract:
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. Our model is a simple transformer architecture with streaming memory for real-time video processing. SAM 2 trained on our data provi…
▽ More
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. Our model is a simple transformer architecture with streaming memory for real-time video processing. SAM 2 trained on our data provides strong performance across a wide range of tasks. In video segmentation, we observe better accuracy, using 3x fewer interactions than prior approaches. In image segmentation, our model is more accurate and 6x faster than the Segment Anything Model (SAM). We believe that our data, model, and insights will serve as a significant milestone for video segmentation and related perception tasks. We are releasing our main model, dataset, as well as code for model training and our demo.
△ Less
Submitted 28 October, 2024; v1 submitted 1 August, 2024;
originally announced August 2024.
AdaM: Adapting Multi-User Interfaces for Collaborative Environments in Real-Time
Authors:
Seonwook Park,
Christoph Gebhardt,
Roman Rädle,
Anna Feit,
Hana Vrzakova,
Niraj Dayama,
Hui-Shyong Yeo,
Clemens Klokmose,
Aaron Quigley,
Antti Oulasvirta,
Otmar Hilliges
Abstract:
Developing cross-device multi-user interfaces (UIs) is a challenging problem. There are numerous ways in which content and interactivity can be distributed. However, good solutions must consider multiple users, their roles, their preferences and access rights, as well as device capabilities. Manual and rule-based solutions are tedious to create and do not scale to larger problems nor do they adapt…
▽ More
Developing cross-device multi-user interfaces (UIs) is a challenging problem. There are numerous ways in which content and interactivity can be distributed. However, good solutions must consider multiple users, their roles, their preferences and access rights, as well as device capabilities. Manual and rule-based solutions are tedious to create and do not scale to larger problems nor do they adapt to dynamic changes, such as users leaving or joining an activity. In this paper, we cast the problem of UI distribution as an assignment problem and propose to solve it using combinatorial optimization. We present a mixed integer programming formulation which allows real-time applications in dynamically changing collaborative settings. It optimizes the allocation of UI elements based on device capabilities, user roles, preferences, and access rights. We present a proof-of-concept designer-in-the-loop tool, allowing for quick solution exploration. Finally, we compare our approach to traditional paper prototyping in a lab study.
△ Less
Submitted 29 March, 2018; v1 submitted 3 March, 2018;
originally announced March 2018.