-
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Authors:
Ruotong Liao,
Max Erler,
Huiyu Wang,
Guangyao Zhai,
Gengyuan Zhang,
Yunpu Ma,
Volker Tresp
Abstract:
In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in…
▽ More
In the video-language domain, recent works in leveraging zero-shot Large Language Model-based reasoning for video understanding have become competitive challengers to previous end-to-end models. However, long video understanding presents unique challenges due to the complexity of reasoning over extended timespans, even for zero-shot LLM-based approaches. The challenge of information redundancy in long videos prompts the question of what specific information is essential for large language models (LLMs) and how to leverage them for complex spatial-temporal reasoning in long-form video analysis. We propose a framework VideoINSTA, i.e. INformative Spatial-TemporAl Reasoning for zero-shot long-form video understanding. VideoINSTA contributes (1) a zero-shot framework for long video understanding using LLMs; (2) an event-based temporal reasoning and content-based spatial reasoning approach for LLMs to reason over spatial-temporal information in videos; (3) a self-reflective information reasoning scheme balancing temporal factors based on information sufficiency and prediction confidence. Our model significantly improves the state-of-the-art on three long video question-answering benchmarks: EgoSchema, NextQA, and IntentQA, and the open question answering dataset ActivityNetQA. The code is released here: https://github.com/mayhugotong/VideoINSTA.
△ Less
Submitted 4 October, 2024; v1 submitted 30 September, 2024;
originally announced September 2024.
-
Algorithmic Optimisations for Iterative Deconvolution Methods
Authors:
Martin Welk,
Martin Erler
Abstract:
We investigate possibilities to speed up iterative algorithms for non-blind image deconvolution. We focus on algorithms in which convolution with the point-spread function to be deconvolved is used in each iteration, and aim at accelerating these convolution operations as they are typically the most expensive part of the computation. We follow two approaches: First, for some practically important…
▽ More
We investigate possibilities to speed up iterative algorithms for non-blind image deconvolution. We focus on algorithms in which convolution with the point-spread function to be deconvolved is used in each iteration, and aim at accelerating these convolution operations as they are typically the most expensive part of the computation. We follow two approaches: First, for some practically important specific point-spread functions, algorithmically efficient sliding window or list processing techniques can be used. In some constellations this allows faster computation than via the Fourier domain. Second, as iterations progress, computation of convolutions can be restricted to subsets of pixels. For moderate thinning rates this can be done with almost no impact on the reconstruction quality. Both approaches are demonstrated in the context of Richardson-Lucy deconvolution but are not restricted to this method.
△ Less
Submitted 26 April, 2013;
originally announced April 2013.
-
Fast and Robust Linear Motion Deblurring
Authors:
Martin Welk,
Patrik Raudaschl,
Thomas Schwarzbauer,
Martin Erler,
Martin Läuter
Abstract:
We investigate efficient algorithmic realisations for robust deconvolution of grey-value images with known space-invariant point-spread function, with emphasis on 1D motion blur scenarios. The goal is to make deconvolution suitable as preprocessing step in automated image processing environments with tight time constraints. Candidate deconvolution methods are selected for their restoration quality…
▽ More
We investigate efficient algorithmic realisations for robust deconvolution of grey-value images with known space-invariant point-spread function, with emphasis on 1D motion blur scenarios. The goal is to make deconvolution suitable as preprocessing step in automated image processing environments with tight time constraints. Candidate deconvolution methods are selected for their restoration quality, robustness and efficiency. Evaluation of restoration quality and robustness on synthetic and real-world test images leads us to focus on a combination of Wiener filtering with few iterations of robust and regularised Richardson-Lucy deconvolution. We discuss algorithmic optimisations for specific scenarios. In the case of uniform linear motion blur in coordinate direction, it is possible to achieve real-time performance (less than 50 ms) in single-threaded CPU computation on images of $256\times256$ pixels. For more general space-invariant blur settings, still favourable computation times are obtained. Exemplary parallel implementations demonstrate that the proposed method also achieves real-time performance for general 1D motion blurs in a multi-threaded CPU setting, and for general 2D blurs on a GPU.
△ Less
Submitted 10 December, 2012;
originally announced December 2012.