StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Wang, Haibo; Feng, Bo; Lai, Zhengfeng; Xu, Mingze; Li, Shiyu; Ge, Weifeng; Dehghan, Afshin; Cao, Meng; Huang, Ping

Computer Science > Computer Vision and Pattern Recognition

arXiv:2505.05467 (cs)

[Submitted on 8 May 2025]

Title:StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Authors:Haibo Wang, Bo Feng, Zhengfeng Lai, Mingze Xu, Shiyu Li, Weifeng Ge, Afshin Dehghan, Meng Cao, Ping Huang

View PDF HTML (experimental)

Abstract:We present StreamBridge, a simple yet effective framework that seamlessly transforms offline Video-LLMs into streaming-capable models. It addresses two fundamental challenges in adapting existing models into online scenarios: (1) limited capability for multi-turn real-time understanding, and (2) lack of proactive response mechanisms. Specifically, StreamBridge incorporates (1) a memory buffer combined with a round-decayed compression strategy, supporting long-context multi-turn interactions, and (2) a decoupled, lightweight activation model that can be effortlessly integrated into existing Video-LLMs, enabling continuous proactive responses. To further support StreamBridge, we construct Stream-IT, a large-scale dataset tailored for streaming video understanding, featuring interleaved video-text sequences and diverse instruction formats. Extensive experiments show that StreamBridge significantly improves the streaming understanding capabilities of offline Video-LLMs across various tasks, outperforming even proprietary models such as GPT-4o and Gemini 1.5 Pro. Simultaneously, it achieves competitive or superior performance on standard video understanding benchmarks.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2505.05467 [cs.CV]
	(or arXiv:2505.05467v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2505.05467

Submission history

From: Haibo Wang [view email]
[v1] Thu, 8 May 2025 17:57:40 UTC (1,521 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators