Search | arXiv e-print repository

LTX-Video: Realtime Video Latent Diffusion

Authors: Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richardson, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, Poriya Panet, Sapir Weissbuch, Victor Kulikov, Yaki Bitterman, Zeev Melumian, Ofir Bibi

Abstract: We introduce LTX-Video, a transformer-based latent diffusion model that adopts a holistic approach to video generation by seamlessly integrating the responsibilities of the Video-VAE and the denoising transformer. Unlike existing methods, which treat these components as independent, LTX-Video aims to optimize their interaction for improved efficiency and quality. At its core is a carefully designe… ▽ More We introduce LTX-Video, a transformer-based latent diffusion model that adopts a holistic approach to video generation by seamlessly integrating the responsibilities of the Video-VAE and the denoising transformer. Unlike existing methods, which treat these components as independent, LTX-Video aims to optimize their interaction for improved efficiency and quality. At its core is a carefully designed Video-VAE that achieves a high compression ratio of 1:192, with spatiotemporal downscaling of 32 x 32 x 8 pixels per token, enabled by relocating the patchifying operation from the transformer's input to the VAE's input. Operating in this highly compressed latent space enables the transformer to efficiently perform full spatiotemporal self-attention, which is essential for generating high-resolution videos with temporal consistency. However, the high compression inherently limits the representation of fine details. To address this, our VAE decoder is tasked with both latent-to-pixel conversion and the final denoising step, producing the clean result directly in pixel space. This approach preserves the ability to generate fine details without incurring the runtime cost of a separate upsampling module. Our model supports diverse use cases, including text-to-video and image-to-video generation, with both capabilities trained simultaneously. It achieves faster-than-real-time generation, producing 5 seconds of 24 fps video at 768x512 resolution in just 2 seconds on an Nvidia H100 GPU, outperforming all existing models of similar scale. The source code and pre-trained models are publicly available, setting a new benchmark for accessible and scalable video generation. △ Less

Submitted 30 December, 2024; originally announced January 2025.

arXiv:2202.00541 [pdf, ps, other]

doi 10.1016/B978-0-32-390504-6.00007-3

Transport and Optimal Control of Vaccination Dynamics for COVID-19

Authors: Mohamed Abdelaziz Zaitri, Mohand Ouamer Bibi, Delfim F. M. Torres

Abstract: We develop a mathematical model for transferring the vaccine BNT162b2 based on the heat diffusion equation. Then, we apply optimal control theory to the proposed generalized SEIR model. We introduce vaccination for the susceptible population to control the spread of the COVID-19 epidemic. For this, we use the Pontryagin minimum principle to find the necessary optimality conditions for the optimal… ▽ More We develop a mathematical model for transferring the vaccine BNT162b2 based on the heat diffusion equation. Then, we apply optimal control theory to the proposed generalized SEIR model. We introduce vaccination for the susceptible population to control the spread of the COVID-19 epidemic. For this, we use the Pontryagin minimum principle to find the necessary optimality conditions for the optimal control. The optimal control problem and the heat diffusion equation are solved numerically. Finally, several simulations are done to study and predict the spread of the COVID-19 epidemic in Italy. In particular, we compare the model in the presence and absence of vaccination. △ Less

Submitted 21 April, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

Comments: This is a preprint whose final form is published by Elsevier at [https://doi.org/10.1016/B978-0-32-390504-6.00007-3]

MSC Class: 35K05; 49K15; 92-10

arXiv:2110.08893 [pdf, other]

Temporally stable video segmentation without video annotations

Authors: Aharon Azulay, Tavi Halperin, Orestis Vantzos, Nadav Borenstein, Ofir Bibi

Abstract: Temporally consistent dense video annotations are scarce and hard to collect. In contrast, image segmentation datasets (and pre-trained models) are ubiquitous, and easier to label for any novel task. In this paper, we introduce a method to adapt still image segmentation models to video in an unsupervised manner, by using an optical flow-based consistency measure. To ensure that the inferred segmen… ▽ More Temporally consistent dense video annotations are scarce and hard to collect. In contrast, image segmentation datasets (and pre-trained models) are ubiquitous, and easier to label for any novel task. In this paper, we introduce a method to adapt still image segmentation models to video in an unsupervised manner, by using an optical flow-based consistency measure. To ensure that the inferred segmented videos appear more stable in practice, we verify that the consistency measure is well correlated with human judgement via a user study. Training a new multi-input multi-output decoder using this measure as a loss, together with a technique for refining current image segmentation datasets and a temporal weighted-guided filter, we observe stability improvements in the generated segmented videos with minimal loss of accuracy. △ Less

Submitted 17 March, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3449-3458. 2022

arXiv:2107.11849 [pdf, ps, other]

doi 10.48129/kjs.splcov.13961

Optimal Control to Limit the Spread of COVID-19 in Italy

Authors: Mohamed Abdelaziz Zaitri, Mohand Ouamer Bibi, Delfim F. M. Torres

Abstract: We apply optimal control theory to a generalized SEIR-type model. The proposed system has three controls, representing social distancing, preventive means, and treatment measures to combat the spread of the COVID-19 pandemic. We analyze such optimal control problem with respect to real data transmission in Italy. Our results show the appropriateness of the model, in particular with respect to the… ▽ More We apply optimal control theory to a generalized SEIR-type model. The proposed system has three controls, representing social distancing, preventive means, and treatment measures to combat the spread of the COVID-19 pandemic. We analyze such optimal control problem with respect to real data transmission in Italy. Our results show the appropriateness of the model, in particular with respect to the number of quarantined/hospitalized (confirmed and infected) and recovered individuals. Considering the Pontryagin controls, we show how in a perfect world one could have drastically diminish the number of susceptible, exposed, infected, quarantined/hospitalized, and death individuals, by increasing the population of insusceptible/protected. △ Less

Submitted 25 July, 2021; originally announced July 2021.

Comments: This is a preprint of a paper whose final and definite form is published by 'Kuwait Journal of Science' (KJS), ISSN 2307-4108 (print), ISSN 2307-4116 (online), available at [https://journalskuwait.org/kjs]

MSC Class: 49K15; 92D30

Journal ref: Kuwait J. Sci., Special Issue (2021), 1--14

arXiv:2105.09374 [pdf]

doi 10.1145/3450626.3459935

Endless Loops: Detecting and Animating Periodic Patterns in Still Images

Authors: Tavi Halperin, Hanit Hakim, Orestis Vantzos, Gershon Hochman, Netai Benaim, Lior Sassy, Michael Kupchik, Ofir Bibi, Ohad Fried

Abstract: We present an algorithm for producing a seamless animated loop from a single image. The algorithm detects periodic structures, such as the windows of a building or the steps of a staircase, and generates a non-trivial displacement vector field that maps each segment of the structure onto a neighboring segment along a user- or auto-selected main direction of motion. This displacement field is used,… ▽ More We present an algorithm for producing a seamless animated loop from a single image. The algorithm detects periodic structures, such as the windows of a building or the steps of a staircase, and generates a non-trivial displacement vector field that maps each segment of the structure onto a neighboring segment along a user- or auto-selected main direction of motion. This displacement field is used, together with suitable temporal and spatial smoothing, to warp the image and produce the frames of a continuous animation loop. Our cinemagraphs are created in under a second on a mobile device. Over 140,000 users downloaded our app and exported over 350,000 cinemagraphs. Moreover, we conducted two user studies that show that users prefer our method for creating surreal and structured cinemagraphs compared to more manual approaches and compared to previous methods. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: SIGGRAPH 2021. Project page: https://pub.res.lightricks.com/endless-loops/ . Video: https://youtu.be/8ZYUvxWuD2Y

Journal ref: ACM Trans. Graph., Vol. 40, No. 4, Article 142. Publication date: August 2021

arXiv:1903.02582 [pdf, other]

Clear Skies Ahead: Towards Real-Time Automatic Sky Replacement in Video

Authors: Tavi Halperin, Harel Cain, Ofir Bibi, Michael Werman

Abstract: Digital videos such as those captured by a smartphone often exhibit exposure inconsistencies, a poorly exposed sky, or simply suffer from an uninteresting or plain looking sky. Professionals may edit these videos using advanced and time-consuming tools unavailable to most users, to replace the sky with a more expressive or imaginative sky. In this work, we propose an algorithm for automatic replac… ▽ More Digital videos such as those captured by a smartphone often exhibit exposure inconsistencies, a poorly exposed sky, or simply suffer from an uninteresting or plain looking sky. Professionals may edit these videos using advanced and time-consuming tools unavailable to most users, to replace the sky with a more expressive or imaginative sky. In this work, we propose an algorithm for automatic replacement of the sky region in a video with a different sky, providing nonprofessional users with a simple yet efficient tool to seamlessly replace the sky. The method is fast, achieving close to real-time performance on mobile devices and the user's involvement can remain as limited as simply selecting the replacement sky. △ Less

Submitted 6 March, 2019; originally announced March 2019.

Comments: Eurographics 2019. Supplementary video: https://youtu.be/1uZ46YzX-pI

Showing 1–6 of 6 results for author: Bibi, O