Skip to main content

Showing 1–31 of 31 results for author: Gallo, O

.
  1. arXiv:2502.13078  [pdf, other

    cs.CV

    L4P: Low-Level 4D Vision Perception Unified

    Authors: Abhishek Badki, Hang Su, Bowen Wen, Orazio Gallo

    Abstract: The spatio-temporal relationship between the pixels of a video carries critical information for low-level 4D perception tasks. A single model that reasons about it should be able to solve several such tasks well. Yet, most state-of-the-art methods rely on architectures specialized for the task at hand. We present L4P, a feedforward, general-purpose architecture that solves low-level 4D perception… ▽ More

    Submitted 25 April, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  2. arXiv:2501.10357  [pdf, other

    cs.CV

    Zero-Shot Monocular Scene Flow Estimation in the Wild

    Authors: Yiqing Liang, Abhishek Badki, Hang Su, James Tompkin, Orazio Gallo

    Abstract: Large models have shown generalization across datasets for many low-level vision tasks, like depth estimation, but no such general models exist for scene flow. Even though scene flow has wide potential use, it is not used in practice because current predictive models do not generalize well. We identify three key challenges and propose solutions for each. First, we create a method that jointly esti… ▽ More

    Submitted 19 January, 2025; v1 submitted 17 January, 2025; originally announced January 2025.

    Comments: Project Website: https://research.nvidia.com/labs/lpr/zero_msf//

  3. arXiv:2501.09898  [pdf, other

    cs.CV cs.LG cs.RO

    FoundationStereo: Zero-Shot Stereo Matching

    Authors: Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, Stan Birchfield

    Abstract: Tremendous progress has been made in deep stereo matching to excel on benchmark datasets through per-domain fine-tuning. However, achieving strong zero-shot generalization - a hallmark of foundation models in other computer vision tasks - remains challenging for stereo matching. We introduce FoundationStereo, a foundation model for stereo depth estimation designed to achieve strong zero-shot gener… ▽ More

    Submitted 3 April, 2025; v1 submitted 16 January, 2025; originally announced January 2025.

    Comments: CVPR 2025

  4. arXiv:2410.12074  [pdf, other

    cs.CV

    nvTorchCam: An Open-source Library for Camera-Agnostic Differentiable Geometric Vision

    Authors: Daniel Lichy, Hang Su, Abhishek Badki, Jan Kautz, Orazio Gallo

    Abstract: We introduce nvTorchCam, an open-source library under the Apache 2.0 license, designed to make deep learning algorithms camera model-independent. nvTorchCam abstracts critical camera operations such as projection and unprojection, allowing developers to implement algorithms once and apply them across diverse camera models--including pinhole, fisheye, and 360 equirectangular panoramas, which are co… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: Source code and installation instructions are available at https://github.com/NVlabs/nvTorchCam

  5. arXiv:2410.03387  [pdf, ps, other

    econ.TH

    Anonymity and strategy-proofness on a domain of single-peaked and single-dipped preferences

    Authors: Oihane Gallo

    Abstract: We analyze the problem of locating a public facility on a line in a society where agents have either single-peaked or single-dipped preferences. We consider the domain analyzed in Alcalde-Unzu et al. (2024), where the type of preference of each agent is public information, but the location of her peak/dip as well as the rest of the preference are unknown. We characterize all strategy-proof and typ… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  6. arXiv:2401.13786  [pdf, other

    cs.CV

    FoVA-Depth: Field-of-View Agnostic Depth Estimation for Cross-Dataset Generalization

    Authors: Daniel Lichy, Hang Su, Abhishek Badki, Jan Kautz, Orazio Gallo

    Abstract: Wide field-of-view (FoV) cameras efficiently capture large portions of the scene, which makes them attractive in multiple domains, such as automotive and robotics. For such applications, estimating depth from multiple images is a critical task, and therefore, a large amount of ground truth (GT) data is available. Unfortunately, most of the GT data is for pinhole cameras, making it impossible to pr… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 3DV 2024 (Oral); Project Website: https://research.nvidia.com/labs/lpr/fova-depth/

  7. arXiv:2311.03950  [pdf, ps, other

    econ.TH

    Stable partitions for proportional generalized claims problems

    Authors: Oihane Gallo, Bettina Klaus

    Abstract: We consider a set of agents who have claims on an endowment that is not large enough to cover all claims. Agents can form coalitions but a minimal coalition size $θ$ is required to have positive coalitional funding that is proportional to the sum of the claims of its members. We analyze the structure of stable partitions when coalition members use well-behaved rules to allocate coalitional endowme… ▽ More

    Submitted 29 August, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

  8. arXiv:2306.00200  [pdf, other

    cs.CV

    Zero-shot Pose Transfer for Unrigged Stylized 3D Characters

    Authors: Jiashun Wang, Xueting Li, Sifei Liu, Shalini De Mello, Orazio Gallo, Xiaolong Wang, Jan Kautz

    Abstract: Transferring the pose of a reference avatar to stylized 3D characters of various shapes is a fundamental task in computer graphics. Existing methods either require the stylized characters to be rigged, or they use the stylized character in the desired pose as ground truth at training. We present a zero-shot approach that requires only the widely available deformed non-stylized avatars in training,… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: CVPR 2023

  9. arXiv:2305.03713  [pdf, other

    cs.CV

    Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos

    Authors: Ekta Prashnani, Koki Nagano, Shalini De Mello, David Luebke, Orazio Gallo

    Abstract: Modern avatar generators allow anyone to synthesize photorealistic real-time talking avatars, ushering in a new era of avatar-based human communication, such as with immersive AR/VR interactions or videoconferencing with limited bandwidths. Their safe adoption, however, requires a mechanism to verify if the rendered avatar is trustworthy: does it use the appearance of an individual without their c… ▽ More

    Submitted 4 August, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: 26 pages, 8 figures

  10. arXiv:2303.05781  [pdf, ps, other

    econ.TH

    Strategy-proofness with single-peaked and single-dipped preferences

    Authors: Jorge Alcalde-Unzu, Oihane Gallo, Marc Vorsatz

    Abstract: We analyze the problem of locating a public facility in a domain of single-peaked and single-dipped preferences when the social planner knows the type of preference (single-peaked or single-dipped) of each agent. Our main result characterizes all strategy-proof rules and shows that they can be decomposed into two steps. In the first step, the agents with single-peaked preferences are asked about t… ▽ More

    Submitted 30 March, 2024; v1 submitted 10 March, 2023; originally announced March 2023.

  11. arXiv:2302.07618  [pdf, other

    econ.TH

    Solidarity to achieve stability

    Authors: Jorge Alcalde-Unzu, Oihane Gallo, Elena Inarra, Juan D. Moreno-Ternero

    Abstract: Agents may form coalitions. Each coalition shares its endowment among its agents by applying a sharing rule. The sharing rule induces a coalition formation problem by assuming that agents rank coalitions according to the allocation they obtain in the corresponding sharing problem. We characterize the sharing rules that induce a class of stable coalition formation problems as those that satisfy a n… ▽ More

    Submitted 5 October, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

  12. arXiv:2112.11347  [pdf, other

    cs.CV

    Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects

    Authors: Atsuhiro Noguchi, Umar Iqbal, Jonathan Tremblay, Tatsuya Harada, Orazio Gallo

    Abstract: Rendering articulated objects while controlling their poses is critical to applications such as virtual reality or animation for movies. Manipulating the pose of an object, however, requires the understanding of its underlying structure, that is, its joints and how they interact with each other. Unfortunately, assuming the structure to be known, as existing methods do, precludes the ability to wor… ▽ More

    Submitted 6 April, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

    Comments: CVPR2022, 16 pages, Project page: https://nvlabs.github.io/watch-it-move

  13. arXiv:2112.07945  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Efficient Geometry-aware 3D Generative Adversarial Networks

    Authors: Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, Gordon Wetzstein

    Abstract: Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape… ▽ More

    Submitted 27 April, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Project page: https://matthew-a-chan.github.io/EG3D

  14. Real-time ground filtering algorithm of cloud points acquired using Terrestrial Laser Scanner (TLS)

    Authors: Nelson Diaz, Omar Gallo, Jhon Caceres, Hernan Porras

    Abstract: 3D modeling based on point clouds requires ground-filtering algorithms that separate ground from non-ground objects. This study presents two ground filtering algorithms. The first one is based on normal vectors. It has two variants depending on the procedure to compute the k-nearest neighbors. The second algorithm is based on transforming the cloud points into a voxel structure. To evaluate them,… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

    Comments: 25 pages, 7 figures

  15. arXiv:2105.05994  [pdf, other

    cs.CV

    Neural Trajectory Fields for Dynamic Novel View Synthesis

    Authors: Chaoyang Wang, Ben Eckart, Simon Lucey, Orazio Gallo

    Abstract: Recent approaches to render photorealistic views from a limited set of photographs have pushed the boundaries of our interactions with pictures of static scenes. The ability to recreate moments, that is, time-varying sequences, is perhaps an even more interesting scenario, but it remains largely unsolved. We introduce DCT-NeRF, a coordinatebased neural representation for dynamic scenes. DCTNeRF le… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

  16. arXiv:2104.08038  [pdf, other

    cs.CV cs.LG

    Noise-Aware Video Saliency Prediction

    Authors: Ekta Prashnani, Orazio Gallo, Joohwan Kim, Josef Spjut, Pradeep Sen, Iuri Frosio

    Abstract: We tackle the problem of predicting saliency maps for videos of dynamic scenes. We note that the accuracy of the maps reconstructed from the gaze data of a fixed number of observers varies with the frame, as it depends on the content of the scene. This issue is particularly pressing when a limited number of observers are available. In such cases, directly minimizing the discrepancy between the pre… ▽ More

    Submitted 22 November, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: 10 pages, 3 figures, 7 tables

    Journal ref: British Machine Vision Conference (BMVC) 2021

  17. arXiv:2101.04777  [pdf, other

    cs.CV cs.RO

    Binary TTC: A Temporal Geofence for Autonomous Navigation

    Authors: Abhishek Badki, Orazio Gallo, Jan Kautz, Pradeep Sen

    Abstract: Time-to-contact (TTC), the time for an object to collide with the observer's plane, is a powerful tool for path planning: it is potentially more informative than the depth, velocity, and acceleration of objects in the scene -- even for humans. TTC presents several advantages, including requiring only a monocular, uncalibrated camera. However, regressing TTC for each pixel is not straightforward, a… ▽ More

    Submitted 28 April, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

    Comments: To be presented at CVPR 2021

  18. arXiv:2008.09106  [pdf, other

    cs.CV cs.LG eess.IV

    Generative View Synthesis: From Single-view Semantics to Novel-view Images

    Authors: Tewodros Habtegebrial, Varun Jampani, Orazio Gallo, Didier Stricker

    Abstract: Content creation, central to applications such as virtual reality, can be a tedious and time-consuming. Recent image synthesis methods simplify this task by offering tools to generate new views from as little as a single input image, or by converting a semantic map into a photorealistic image. We propose to push the envelope further, and introduce Generative View Synthesis (GVS), which can synthes… ▽ More

    Submitted 2 October, 2020; v1 submitted 20 August, 2020; originally announced August 2020.

    Comments: Accepted at Neurips-2020

  19. arXiv:2005.07274  [pdf, other

    cs.CV cs.RO

    Bi3D: Stereo Depth Estimation via Binary Classifications

    Authors: Abhishek Badki, Alejandro Troccoli, Kihwan Kim, Jan Kautz, Pradeep Sen, Orazio Gallo

    Abstract: Stereo-based depth estimation is a cornerstone of computer vision, with state-of-the-art methods delivering accurate results in real time. For several applications such as autonomous navigation, however, it may be useful to trade accuracy for lower latency. We present Bi3D, a method that estimates depth via a series of binary classifications. Rather than testing if objects are at a particular dept… ▽ More

    Submitted 1 June, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: To be presented at CVPR 2020

  20. arXiv:2004.01294  [pdf, other

    cs.CV

    Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera

    Authors: Jae Shin Yoon, Kihwan Kim, Orazio Gallo, Hyun Soo Park, Jan Kautz

    Abstract: This paper presents a new method to synthesize an image from arbitrary views and times given a collection of images of a dynamic scene. A key challenge for the novel view synthesis arises from dynamic scene reconstruction where epipolar geometry does not apply to the local motion of dynamic contents. To address this challenge, we propose to combine the depth from single view (DSV) and the depth fr… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Comments: This paper is accepted to CVPR 2020

  21. arXiv:2001.01744  [pdf, other

    cs.CV

    Meshlet Priors for 3D Mesh Reconstruction

    Authors: Abhishek Badki, Orazio Gallo, Jan Kautz, Pradeep Sen

    Abstract: Estimating a mesh from an unordered set of sparse, noisy 3D points is a challenging problem that requires carefully selected priors. Existing hand-crafted priors, such as smoothness regularizers, impose an undesirable trade-off between attenuating noise and preserving local detail. Recent deep-learning approaches produce impressive results by learning priors directly from the data. However, the pr… ▽ More

    Submitted 1 June, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

    Comments: To be presented at CVPR 2020

  22. arXiv:1907.13622  [pdf, other

    cs.CV

    Video Stitching for Linear Camera Arrays

    Authors: Wei-Sheng Lai, Orazio Gallo, Jinwei Gu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz

    Abstract: Despite the long history of image and video stitching research, existing academic and commercial solutions still produce strong artifacts. In this work, we propose a wide-baseline video stitching algorithm for linear camera arrays that is temporally stable and tolerant to strong parallax. Our key insight is that stitching can be cast as a problem of learning a smooth spatial interpolation between… ▽ More

    Submitted 31 July, 2019; originally announced July 2019.

    Comments: This work is accepted in BMVC 2019. Project website: http://vllab.ucmerced.edu/wlai24/video_stitching/

  23. arXiv:1904.05373  [pdf, other

    cs.CV cs.GR cs.LG

    Pixel-Adaptive Convolutional Neural Networks

    Authors: Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik Learned-Miller, Jan Kautz

    Abstract: Convolutions are the fundamental building block of CNNs. The fact that their weights are spatially shared is one of the main reasons for their widespread use, but it also is a major limitation, as it makes convolutions content agnostic. We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied w… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: CVPR 2019. Video introduction: https://youtu.be/gsQZbHuR64o

  24. arXiv:1812.04777  [pdf, other

    cs.CV

    Extreme View Synthesis

    Authors: Inchang Choi, Orazio Gallo, Alejandro Troccoli, Min H. Kim, Jan Kautz

    Abstract: We present Extreme View Synthesis, a solution for novel view extrapolation that works even when the number of input images is small--as few as two. In this context, occlusions and depth uncertainty are two of the most pressing issues, and worsen as the degree of extrapolation increases. We follow the traditional paradigm of performing depth-based warping and refinement, with a few key improvements… ▽ More

    Submitted 29 August, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

    Comments: Accepted as an oral presentation at IEEE ICCV 2019

  25. arXiv:1810.10066  [pdf, other

    cs.CV

    A Fusion Approach for Multi-Frame Optical Flow Estimation

    Authors: Zhile Ren, Orazio Gallo, Deqing Sun, Ming-Hsuan Yang, Erik B. Sudderth, Jan Kautz

    Abstract: To date, top-performing optical flow estimation methods only take pairs of consecutive frames into account. While elegant and appealing, the idea of using more than two frames has not yet produced state-of-the-art results. We present a simple, yet effective fusion approach for multi-frame optical flow that benefits from longer-term temporal cues. Our method first warps the optical flow from previo… ▽ More

    Submitted 29 November, 2018; v1 submitted 23 October, 2018; originally announced October 2018.

    Comments: Work accepted at IEEE Winter Conference on Applications of Computer Vision (WACV 2019)

  26. arXiv:1807.10376  [pdf, other

    cs.CV

    Tackling 3D ToF Artifacts Through Learning and the FLAT Dataset

    Authors: Qi Guo, Iuri Frosio, Orazio Gallo, Todd Zickler, Jan Kautz

    Abstract: Scene motion, multiple reflections, and sensor noise introduce artifacts in the depth reconstruction performed by time-of-flight cameras. We propose a two-stage, deep-learning approach to address all of these sources of artifacts simultaneously. We also introduce FLAT, a synthetic dataset of 2000 ToF measurements that capture all of these nonidealities, and allows to simulate different camera hard… ▽ More

    Submitted 26 July, 2018; originally announced July 2018.

    Comments: ECCV 2018

  27. arXiv:1801.05117  [pdf, other

    cs.CV

    Reblur2Deblur: Deblurring Videos via Self-Supervised Learning

    Authors: Huaijin Chen, Jinwei Gu, Orazio Gallo, Ming-Yu Liu, Ashok Veeraraghavan, Jan Kautz

    Abstract: Motion blur is a fundamental problem in computer vision as it impacts image quality and hinders inference. Traditional deblurring algorithms leverage the physics of the image formation model and use hand-crafted priors: they usually produce results that better reflect the underlying scene, but present artifacts. Recent learning-based methods implicitly extract the distribution of natural images di… ▽ More

    Submitted 16 January, 2018; originally announced January 2018.

  28. arXiv:1712.02099  [pdf, other

    cs.CV

    Separating Reflection and Transmission Images in the Wild

    Authors: Patrick Wieschollek, Orazio Gallo, Jinwei Gu, Jan Kautz

    Abstract: The reflections caused by common semi-reflectors, such as glass windows, can impact the performance of computer vision algorithms. State-of-the-art methods can remove reflections on synthetic data and in controlled scenarios. However, they are based on strong assumptions and do not generalize well to real-world images. Contrary to a common misconception, real-world images are challenging even when… ▽ More

    Submitted 16 August, 2018; v1 submitted 6 December, 2017; originally announced December 2017.

    Comments: accepted at ECCV 2018

  29. arXiv:1612.00986  [pdf, other

    cs.CV

    Deep Learning with Energy-efficient Binary Gradient Cameras

    Authors: Suren Jayasuriya, Orazio Gallo, Jinwei Gu, Jan Kautz

    Abstract: Power consumption is a critical factor for the deployment of embedded computer vision systems. We explore the use of computational cameras that directly output binary gradient images to reduce the portion of the power consumption allocated to image sensing. We survey the accuracy of binary gradient cameras on a number of computer vision tasks using deep learning. These include object recognition,… ▽ More

    Submitted 3 December, 2016; originally announced December 2016.

  30. arXiv:1511.08861  [pdf, other

    cs.CV

    Loss Functions for Neural Networks for Image Processing

    Authors: Hang Zhao, Orazio Gallo, Iuri Frosio, Jan Kautz

    Abstract: Neural networks are becoming central in several areas of computer vision and image processing and different architectures have been proposed to solve specific problems. The impact of the loss layer of neural networks, however, has not received much attention in the context of image processing: the default and virtually only choice is L2. In this paper, we bring attention to alternative choices for… ▽ More

    Submitted 20 April, 2018; v1 submitted 27 November, 2015; originally announced November 2015.

    Comments: This paper was published in IEEE Transactions on Computational Imaging on December 23, 2016

  31. arXiv:1504.01441  [pdf, other

    cs.CV

    Locally Non-rigid Registration for Mobile HDR Photography

    Authors: Orazio Gallo, Alejandro Troccoli, Jun Hu, Kari Pulli, Jan Kautz

    Abstract: Image registration for stack-based HDR photography is challenging. If not properly accounted for, camera motion and scene changes result in artifacts in the composite image. Unfortunately, existing methods to address this problem are either accurate, but too slow for mobile devices, or fast, but prone to failing. We propose a method that fills this void: our approach is extremely fast---under 700m… ▽ More

    Submitted 4 May, 2015; v1 submitted 6 April, 2015; originally announced April 2015.