-
Fine-Grained Spatially Varying Material Selection in Images
Authors:
Julia Guerrero-Viu,
Michael Fischer,
Iliyan Georgiev,
Elena Garces,
Diego Gutierrez,
Belen Masia,
Valentin Deschaintre
Abstract:
Selection is the first step in many image editing processes, enabling faster and simpler modifications of all pixels sharing a common modality. In this work, we present a method for material selection in images, robust to lighting and reflectance variations, which can be used for downstream editing tasks. We rely on vision transformer (ViT) models and leverage their features for selection, proposi…
▽ More
Selection is the first step in many image editing processes, enabling faster and simpler modifications of all pixels sharing a common modality. In this work, we present a method for material selection in images, robust to lighting and reflectance variations, which can be used for downstream editing tasks. We rely on vision transformer (ViT) models and leverage their features for selection, proposing a multi-resolution processing strategy that yields finer and more stable selection results than prior methods. Furthermore, we enable selection at two levels: texture and subtexture, leveraging a new two-level material selection (DuMaS) dataset which includes dense annotations for over 800,000 synthetic images, both on the texture and subtexture levels.
△ Less
Submitted 11 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Crossing Borders Without Crossing Boundaries: How Sociolinguistic Awareness Can Optimize User Engagement with Localized Spanish AI Models Across Hispanophone Countries
Authors:
Martin Capdevila,
Esteban Villa Turek,
Ellen Karina Chumbe Fernandez,
Luis Felipe Polo Galvez,
Luis Cadavid,
Andrea Marroquin,
Rebeca Vargas Quesada,
Johanna Crew,
Nicole Vallejo Galarraga,
Christopher Rodriguez,
Diego Gutierrez,
Radhi Datla
Abstract:
Large language models are, by definition, based on language. In an effort to underscore the critical need for regional localized models, this paper examines primary differences between variants of written Spanish across Latin America and Spain, with an in-depth sociocultural and linguistic contextualization therein. We argue that these differences effectively constitute significant gaps in the quo…
▽ More
Large language models are, by definition, based on language. In an effort to underscore the critical need for regional localized models, this paper examines primary differences between variants of written Spanish across Latin America and Spain, with an in-depth sociocultural and linguistic contextualization therein. We argue that these differences effectively constitute significant gaps in the quotidian use of Spanish among dialectal groups by creating sociolinguistic dissonances, to the extent that locale-sensitive AI models would play a pivotal role in bridging these divides. In doing so, this approach informs better and more efficient localization strategies that also serve to more adequately meet inclusivity goals, while securing sustainable active daily user growth in a major low-risk investment geographic area. Therefore, implementing at least the proposed five sub variants of Spanish addresses two lines of action: to foment user trust and reliance on AI language models while also demonstrating a level of cultural, historical, and sociolinguistic awareness that reflects positively on any internationalization strategy.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
PreciseCam: Precise Camera Control for Text-to-Image Generation
Authors:
Edurne Bernal-Berdun,
Ana Serrano,
Belen Masia,
Matheus Gadelha,
Yannick Hold-Geoffroy,
Xin Sun,
Diego Gutierrez
Abstract:
Images as an artistic medium often rely on specific camera angles and lens distortions to convey ideas or emotions; however, such precise control is missing in current text-to-image models. We propose an efficient and general solution that allows precise control over the camera when generating both photographic and artistic images. Unlike prior methods that rely on predefined shots, we rely solely…
▽ More
Images as an artistic medium often rely on specific camera angles and lens distortions to convey ideas or emotions; however, such precise control is missing in current text-to-image models. We propose an efficient and general solution that allows precise control over the camera when generating both photographic and artistic images. Unlike prior methods that rely on predefined shots, we rely solely on four simple extrinsic and intrinsic camera parameters, removing the need for pre-existing geometry, reference 3D objects, and multi-view data. We also present a novel dataset with more than 57,000 images, along with their text prompts and ground-truth camera parameters. Our evaluation shows precise camera control in text-to-image generation, surpassing traditional prompt engineering approaches. Our data, model, and code are publicly available at https://graphics.unizar.es/projects/PreciseCam2024.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
Polarimetric BSSRDF Acquisition of Dynamic Faces
Authors:
Hyunho Ha,
Inseung Hwang,
Nestor Monzon,
Jaemin Cho,
Donggun Kim,
Seung-Hwan Baek,
Adolfo Muñoz,
Diego Gutierrez,
Min H. Kim
Abstract:
Acquisition and modeling of polarized light reflection and scattering help reveal the shape, structure, and physical characteristics of an object, which is increasingly important in computer graphics. However, current polarimetric acquisition systems are limited to static and opaque objects. Human faces, on the other hand, present a particularly difficult challenge, given their complex structure a…
▽ More
Acquisition and modeling of polarized light reflection and scattering help reveal the shape, structure, and physical characteristics of an object, which is increasingly important in computer graphics. However, current polarimetric acquisition systems are limited to static and opaque objects. Human faces, on the other hand, present a particularly difficult challenge, given their complex structure and reflectance properties, the strong presence of spatially-varying subsurface scattering, and their dynamic nature. We present a new polarimetric acquisition method for dynamic human faces, which focuses on capturing spatially varying appearance and precise geometry, across a wide spectrum of skin tones and facial expressions. It includes both single and heterogeneous subsurface scattering, index of refraction, and specular roughness and intensity, among other parameters, while revealing biophysically-based components such as inner- and outer-layer hemoglobin, eumelanin and pheomelanin. Our method leverages such components' unique multispectral absorption profiles to quantify their concentrations, which in turn inform our model about the complex interactions occurring within the skin layers. To our knowledge, our work is the first to simultaneously acquire polarimetric and spectral reflectance information alongside biophysically-based skin parameters and geometry of dynamic human faces. Moreover, our polarimetric skin model integrates seamlessly into various rendering pipelines.
△ Less
Submitted 29 December, 2024;
originally announced January 2025.
-
Towards an Autonomous Surface Vehicle Prototype for Artificial Intelligence Applications of Water Quality Monitoring
Authors:
Luis Miguel Díaz,
Samuel Yanes Luis,
Alejandro Mendoza Barrionuevo,
Dame Seck Diop,
Manuel Perales,
Alejandro Casado,
Sergio Toral,
Daniel Gutiérrez
Abstract:
The use of Autonomous Surface Vehicles, equipped with water quality sensors and artificial vision systems, allows for a smart and adaptive deployment in water resources environmental monitoring. This paper presents a real implementation of a vehicle prototype that to address the use of Artificial Intelligence algorithms and enhanced sensing techniques for water quality monitoring. The vehicle is f…
▽ More
The use of Autonomous Surface Vehicles, equipped with water quality sensors and artificial vision systems, allows for a smart and adaptive deployment in water resources environmental monitoring. This paper presents a real implementation of a vehicle prototype that to address the use of Artificial Intelligence algorithms and enhanced sensing techniques for water quality monitoring. The vehicle is fully equipped with high-quality sensors to measure water quality parameters and water depth. Furthermore, by means of a stereo-camera, it also can detect and locate macro-plastics in real environments by means of deep visual models, such as YOLOv5. In this paper, experimental results, carried out in Lago Mayor (Sevilla), has been presented as proof of the capabilities of the proposed architecture. The overall system, and the early results obtained, are expected to provide a solid example of a real platform useful for the water resource monitoring task, and to serve as a real case scenario for deploying Artificial Intelligence algorithms, such as path planning, artificial vision, etc.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Best Practices for Multi-Fidelity Bayesian Optimization in Materials and Molecular Research
Authors:
Víctor Sabanza-Gil,
Riccardo Barbano,
Daniel Pacheco Gutiérrez,
Jeremy S. Luterbacher,
José Miguel Hernández-Lobato,
Philippe Schwaller,
Loïc Roch
Abstract:
Multi-fidelity Bayesian Optimization (MFBO) is a promising framework to speed up materials and molecular discovery as sources of information of different accuracies are at hand at increasing cost. Despite its potential use in chemical tasks, there is a lack of systematic evaluation of the many parameters playing a role in MFBO. In this work, we provide guidelines and recommendations to decide when…
▽ More
Multi-fidelity Bayesian Optimization (MFBO) is a promising framework to speed up materials and molecular discovery as sources of information of different accuracies are at hand at increasing cost. Despite its potential use in chemical tasks, there is a lack of systematic evaluation of the many parameters playing a role in MFBO. In this work, we provide guidelines and recommendations to decide when to use MFBO in experimental settings. We investigate MFBO methods applied to molecules and materials problems. First, we test two different families of acquisition functions in two synthetic problems and study the effect of the informativeness and cost of the approximate function. We use our implementation and guidelines to benchmark three real discovery problems and compare them against their single-fidelity counterparts. Our results may help guide future efforts to implement MFBO as a routine tool in the chemical sciences.
△ Less
Submitted 1 June, 2025; v1 submitted 1 October, 2024;
originally announced October 2024.
-
TexSliders: Diffusion-Based Texture Editing in CLIP Space
Authors:
Julia Guerrero-Viu,
Milos Hasan,
Arthur Roullier,
Midhun Harikumar,
Yiwei Hu,
Paul Guerrero,
Diego Gutierrez,
Belen Masia,
Valentin Deschaintre
Abstract:
Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion techniques to edit textures, a specific class of images that are an essential part of 3D content creation pipelines. We analyze existing editing methods and show…
▽ More
Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion techniques to edit textures, a specific class of images that are an essential part of 3D content creation pipelines. We analyze existing editing methods and show that they are not directly applicable to textures, since their common underlying approach, manipulating attention maps, is unsuitable for the texture domain. To address this, we propose a novel approach that instead manipulates CLIP image embeddings to condition the diffusion generation. We define editing directions using simple text prompts (e.g., "aged wood" to "new wood") and map these to CLIP image embedding space using a texture prior, with a sampling-based approach that gives us identity-preserving directions in CLIP space. To further improve identity preservation, we project these directions to a CLIP subspace that minimizes identity variations resulting from entangled texture attributes. Our editing pipeline facilitates the creation of arbitrary sliders using natural language prompts only, with no ground-truth annotated data necessary.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Automating Personalized Parsons Problems with Customized Contexts and Concepts
Authors:
Andre del Carpio Gutierrez,
Paul Denny,
Andrew Luxton-Reilly
Abstract:
Parsons problems provide useful scaffolding for introductory programming students learning to write code. However, generating large numbers of high-quality Parsons problems that appeal to the diverse range of interests in a typical introductory course is a significant challenge for educators. Large language models (LLMs) may offer a solution, by allowing students to produce on-demand Parsons probl…
▽ More
Parsons problems provide useful scaffolding for introductory programming students learning to write code. However, generating large numbers of high-quality Parsons problems that appeal to the diverse range of interests in a typical introductory course is a significant challenge for educators. Large language models (LLMs) may offer a solution, by allowing students to produce on-demand Parsons problems for topics covering the breadth of the introductory programming curriculum, and targeting thematic contexts that align with their personal interests. In this paper, we introduce PuzzleMakerPy, an educational tool that uses an LLM to generate unlimited contextualized drag-and-drop programming exercises in the form of Parsons Problems, which introductory programmers can use as a supplemental learning resource. We evaluated PuzzleMakerPy by deploying it in a large introductory programming course, and found that the ability to personalize the contextual framing used in problem descriptions was highly engaging for students, and being able to customize the programming topics was reported as being useful for their learning.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Predicting Perceived Gloss: Do Weak Labels Suffice?
Authors:
Julia Guerrero-Viu,
J. Daniel Subias,
Ana Serrano,
Katherine R. Storrs,
Roland W. Fleming,
Belen Masia,
Diego Gutierrez
Abstract:
Estimating perceptual attributes of materials directly from images is a challenging task due to their complex, not fully-understood interactions with external factors, such as geometry and lighting. Supervised deep learning models have recently been shown to outperform traditional approaches, but rely on large datasets of human-annotated images for accurate perception predictions. Obtaining reliab…
▽ More
Estimating perceptual attributes of materials directly from images is a challenging task due to their complex, not fully-understood interactions with external factors, such as geometry and lighting. Supervised deep learning models have recently been shown to outperform traditional approaches, but rely on large datasets of human-annotated images for accurate perception predictions. Obtaining reliable annotations is a costly endeavor, aggravated by the limited ability of these models to generalise to different aspects of appearance. In this work, we show how a much smaller set of human annotations ("strong labels") can be effectively augmented with automatically derived "weak labels" in the context of learning a low-dimensional image-computable gloss metric. We evaluate three alternative weak labels for predicting human gloss perception from limited annotated data. Incorporating weak labels enhances our gloss prediction beyond the current state of the art. Moreover, it enables a substantial reduction in human annotation costs without sacrificing accuracy, whether working with rendered images or real photographs.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Self-Calibrating, Fully Differentiable NLOS Inverse Rendering
Authors:
Kiseok Choi,
Inchul Kim,
Dongyoung Choi,
Julio Marco,
Diego Gutierrez,
Min H. Kim
Abstract:
Existing time-resolved non-line-of-sight (NLOS) imaging methods reconstruct hidden scenes by inverting the optical paths of indirect illumination measured at visible relay surfaces. These methods are prone to reconstruction artifacts due to inversion ambiguities and capture noise, which are typically mitigated through the manual selection of filtering functions and parameters. We introduce a fully…
▽ More
Existing time-resolved non-line-of-sight (NLOS) imaging methods reconstruct hidden scenes by inverting the optical paths of indirect illumination measured at visible relay surfaces. These methods are prone to reconstruction artifacts due to inversion ambiguities and capture noise, which are typically mitigated through the manual selection of filtering functions and parameters. We introduce a fully-differentiable end-to-end NLOS inverse rendering pipeline that self-calibrates the imaging parameters during the reconstruction of hidden scenes, using as input only the measured illumination while working both in the time and frequency domains. Our pipeline extracts a geometric representation of the hidden scene from NLOS volumetric intensities and estimates the time-resolved illumination at the relay wall produced by such geometric information using differentiable transient rendering. We then use gradient descent to optimize imaging parameters by minimizing the error between our simulated time-resolved illumination and the measured illumination. Our end-to-end differentiable pipeline couples diffraction-based volumetric NLOS reconstruction with path-space light transport and a simple ray marching technique to extract detailed, dense sets of surface points and normals of hidden scenes. We demonstrate the robustness of our method to consistently reconstruct geometry and albedo, even under significant noise levels.
△ Less
Submitted 25 September, 2023; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Virtual Mirrors: Non-Line-of-Sight Imaging Beyond the Third Bounce
Authors:
Diego Royo,
Talha Sultan,
Adolfo Muñoz,
Khadijeh Masumnia-Bisheh,
Eric Brandt,
Diego Gutierrez,
Andreas Velten,
Julio Marco
Abstract:
Non-line-of-sight (NLOS) imaging methods are capable of reconstructing complex scenes that are not visible to an observer using indirect illumination. However, they assume only third-bounce illumination, so they are currently limited to single-corner configurations, and present limited visibility when imaging surfaces at certain orientations. To reason about and tackle these limitations, we make t…
▽ More
Non-line-of-sight (NLOS) imaging methods are capable of reconstructing complex scenes that are not visible to an observer using indirect illumination. However, they assume only third-bounce illumination, so they are currently limited to single-corner configurations, and present limited visibility when imaging surfaces at certain orientations. To reason about and tackle these limitations, we make the key observation that planar diffuse surfaces behave specularly at wavelengths used in the computational wave-based NLOS imaging domain. We call such surfaces virtual mirrors. We leverage this observation to expand the capabilities of NLOS imaging using illumination beyond the third bounce, addressing two problems: imaging single-corner objects at limited visibility angles, and imaging objects hidden behind two corners. To image objects at limited visibility angles, we first analyze the reflections of the known illuminated point on surfaces of the scene as an estimator of the position and orientation of objects with limited visibility. We then image those limited visibility objects by computationally building secondary apertures at other surfaces that observe the target object from a direct visibility perspective. Beyond single-corner NLOS imaging, we exploit the specular behavior of virtual mirrors to image objects hidden behind a second corner by imaging the space behind such virtual mirrors, where the mirror image of objects hidden around two corners is formed. No specular surfaces were involved in the making of this paper.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
The Visual Language of Fabrics
Authors:
Valentin Deschaintre,
Julia Guerrero-Viu,
Diego Gutierrez,
Tamy Boubekeur,
Belen Masia
Abstract:
We introduce text2fabric, a novel dataset that links free-text descriptions to various fabric materials. The dataset comprises 15,000 natural language descriptions associated to 3,000 corresponding images of fabric materials. Traditionally, material descriptions come in the form of tags/keywords, which limits their expressivity, induces pre-existing knowledge of the appropriate vocabulary, and ult…
▽ More
We introduce text2fabric, a novel dataset that links free-text descriptions to various fabric materials. The dataset comprises 15,000 natural language descriptions associated to 3,000 corresponding images of fabric materials. Traditionally, material descriptions come in the form of tags/keywords, which limits their expressivity, induces pre-existing knowledge of the appropriate vocabulary, and ultimately leads to a chopped description system. Therefore, we study the use of free-text as a more appropriate way to describe material appearance, taking the use case of fabrics as a common item that non-experts may often deal with. Based on the analysis of the dataset, we identify a compact lexicon, set of attributes and key structure that emerge from the descriptions. This allows us to accurately understand how people describe fabrics and draw directions for generalization to other types of materials. We also show that our dataset enables specializing large vision-language models such as CLIP, creating a meaningful latent space for fabric appearance, and significantly improving applications such as fine-grained material retrieval and automatic captioning.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Application of federated learning techniques for arrhythmia classification using 12-lead ECG signals
Authors:
Daniel Mauricio Jimenez Gutierrez,
Hafiz Muuhammad Hassan,
Lorella Landi,
Andrea Vitaletti,
Ioannis Chatzigiannakis
Abstract:
Artificial Intelligence-based (AI) analysis of large, curated medical datasets is promising for providing early detection, faster diagnosis, and more effective treatment using low-power Electrocardiography (ECG) monitoring devices information. However, accessing sensitive medical data from diverse sources is highly restricted since improper use, unsafe storage, or data leakage could violate a pers…
▽ More
Artificial Intelligence-based (AI) analysis of large, curated medical datasets is promising for providing early detection, faster diagnosis, and more effective treatment using low-power Electrocardiography (ECG) monitoring devices information. However, accessing sensitive medical data from diverse sources is highly restricted since improper use, unsafe storage, or data leakage could violate a person's privacy. This work uses a Federated Learning (FL) privacy-preserving methodology to train AI models over heterogeneous sets of high-definition ECG from 12-lead sensor arrays collected from six heterogeneous sources. We evaluated the capacity of the resulting models to achieve equivalent performance compared to state-of-the-art models trained in a Centralized Learning (CL) fashion. Moreover, we assessed the performance of our solution over Independent and Identical distributed (IID) and non-IID federated data. Our methodology involves machine learning techniques based on Deep Neural Networks and Long-Short-Term Memory models. It has a robust data preprocessing pipeline with feature engineering, selection, and data balancing techniques. Our AI models demonstrated comparable performance to models trained using CL, IID, and non-IID approaches. They showcased advantages in reduced complexity and faster training time, making them well-suited for cloud-edge architectures.
△ Less
Submitted 5 January, 2024; v1 submitted 23 August, 2022;
originally announced August 2022.
-
Sparse Ellipsometry: Portable Acquisition of Polarimetric SVBRDF and Shape with Unstructured Flash Photography
Authors:
Inseung Hwang,
Daniel S. Jeon,
Adolfo Muñoz,
Diego Gutierrez,
Xin Tong,
Min H. Kim
Abstract:
Ellipsometry techniques allow to measure polarization information of materials, requiring precise rotations of optical components with different configurations of lights and sensors. This results in cumbersome capture devices, carefully calibrated in lab conditions, and in very long acquisition times, usually in the order of a few days per object. Recent techniques allow to capture polarimetric sp…
▽ More
Ellipsometry techniques allow to measure polarization information of materials, requiring precise rotations of optical components with different configurations of lights and sensors. This results in cumbersome capture devices, carefully calibrated in lab conditions, and in very long acquisition times, usually in the order of a few days per object. Recent techniques allow to capture polarimetric spatially-varying reflectance information, but limited to a single view, or to cover all view directions, but limited to spherical objects made of a single homogeneous material. We present sparse ellipsometry, a portable polarimetric acquisition method that captures both polarimetric SVBRDF and 3D shape simultaneously. Our handheld device consists of off-the-shelf, fixed optical components. Instead of days, the total acquisition time varies between twenty and thirty minutes per object. We develop a complete polarimetric SVBRDF model that includes diffuse and specular components, as well as single scattering, and devise a novel polarimetric inverse rendering algorithm with data augmentation of specular reflection samples via generative modeling. Our results show a strong agreement with a recent ground-truth dataset of captured polarimetric BRDFs of real-world objects.
△ Less
Submitted 8 February, 2023; v1 submitted 9 July, 2022;
originally announced July 2022.
-
Differentiable Transient Rendering
Authors:
Shinyoung Yi,
Donggun Kim,
Kiseok Choi,
Adrian Jarabo,
Diego Gutierrez,
Min H. Kim
Abstract:
Recent differentiable rendering techniques have become key tools to tackle many inverse problems in graphics and vision. Existing models, however, assume steady-state light transport, i.e., infinite speed of light. While this is a safe assumption for many applications, recent advances in ultrafast imaging leverage the wealth of information that can be extracted from the exact time of flight of lig…
▽ More
Recent differentiable rendering techniques have become key tools to tackle many inverse problems in graphics and vision. Existing models, however, assume steady-state light transport, i.e., infinite speed of light. While this is a safe assumption for many applications, recent advances in ultrafast imaging leverage the wealth of information that can be extracted from the exact time of flight of light. In this context, physically-based transient rendering allows to efficiently simulate and analyze light transport considering that the speed of light is indeed finite. In this paper, we introduce a novel differentiable transient rendering framework, to help bring the potential of differentiable approaches into the transient regime. To differentiate the transient path integral we need to take into account that scattering events at path vertices are no longer independent; instead, tracking the time of flight of light requires treating such scattering events at path vertices jointly as a multidimensional, evolving manifold. We thus turn to the generalized transport theorem, and introduce a novel correlated importance term, which links the time-integrated contribution of a path to its light throughput, and allows us to handle discontinuities in the light and sensor functions. Last, we present results in several challenging scenarios where the time of flight of light plays an important role such as optimizing indices of refraction, non-line-of-sight tracking with nonplanar relay walls, and non-line-of-sight tracking around two corners.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
A Probabilistic Time-Evolving Approach to Scanpath Prediction
Authors:
Daniel Martin,
Diego Gutierrez,
Belen Masia
Abstract:
Human visual attention is a complex phenomenon that has been studied for decades. Within it, the particular problem of scanpath prediction poses a challenge, particularly due to the inter- and intra-observer variability, among other reasons. Besides, most existing approaches to scanpath prediction have focused on optimizing the prediction of a gaze point given the previous ones. In this work, we p…
▽ More
Human visual attention is a complex phenomenon that has been studied for decades. Within it, the particular problem of scanpath prediction poses a challenge, particularly due to the inter- and intra-observer variability, among other reasons. Besides, most existing approaches to scanpath prediction have focused on optimizing the prediction of a gaze point given the previous ones. In this work, we present a probabilistic time-evolving approach to scanpath prediction, based on Bayesian deep learning. We optimize our model using a novel spatio-temporal loss function based on a combination of Kullback-Leibler divergence and dynamic time warping, jointly considering the spatial and temporal dimensions of scanpaths. Our scanpath prediction framework yields results that outperform those of current state-of-the-art approaches, and are almost on par with the human baseline, suggesting that our model is able to generate scanpaths whose behavior closely resembles those of the real ones.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Single-image Full-body Human Relighting
Authors:
Manuel Lagunas,
Xin Sun,
Jimei Yang,
Ruben Villegas,
Jianming Zhang,
Zhixin Shu,
Belen Masia,
Diego Gutierrez
Abstract:
We present a single-image data-driven method to automatically relight images with full-body humans in them. Our framework is based on a realistic scene decomposition leveraging precomputed radiance transfer (PRT) and spherical harmonics (SH) lighting. In contrast to previous work, we lift the assumptions on Lambertian materials and explicitly model diffuse and specular reflectance in our data. Mor…
▽ More
We present a single-image data-driven method to automatically relight images with full-body humans in them. Our framework is based on a realistic scene decomposition leveraging precomputed radiance transfer (PRT) and spherical harmonics (SH) lighting. In contrast to previous work, we lift the assumptions on Lambertian materials and explicitly model diffuse and specular reflectance in our data. Moreover, we introduce an additional light-dependent residual term that accounts for errors in the PRT-based image reconstruction. We propose a new deep learning architecture, tailored to the decomposition performed in PRT, that is trained using a combination of L1, logarithmic, and rendering losses. Our model outperforms the state of the art for full-body human relighting both with synthetic images and photographs.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Virtual Light Transport Matrices for Non-Line-Of-Sight Imaging
Authors:
Julio Marco,
Adrian Jarabo,
Ji Hyun Nam,
Xiaochun Liu,
Miguel Ángel Cosculluela,
Andreas Velten,
Diego Gutierrez
Abstract:
The light transport matrix (LTM) is an instrumental tool in line-of-sight (LOS) imaging, describing how light interacts with the scene and enabling applications such as relighting or separation of illumination components. We introduce a framework to estimate the LTM of non-line-of-sight (NLOS) scenarios, coupling recent virtual forward light propagation models for NLOS imaging with the LOS light t…
▽ More
The light transport matrix (LTM) is an instrumental tool in line-of-sight (LOS) imaging, describing how light interacts with the scene and enabling applications such as relighting or separation of illumination components. We introduce a framework to estimate the LTM of non-line-of-sight (NLOS) scenarios, coupling recent virtual forward light propagation models for NLOS imaging with the LOS light transport equation. We design computational projector-camera setups, and use these virtual imaging systems to estimate the transport matrix of hidden scenes. We introduce the specific illumination functions to compute the different elements of the matrix, overcoming the challenging wide-aperture conditions of NLOS setups. Our NLOS light transport matrix allows us to (re)illuminate specific locations of a hidden scene, and separate direct, first-order indirect, and higher-order indirect illumination of complex cluttered hidden scenes, similar to existing LOS techniques.
△ Less
Submitted 5 October, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Multimodality in VR: A survey
Authors:
Daniel Martin,
Sandra Malpica,
Diego Gutierrez,
Belen Masia,
Ana Serrano
Abstract:
Virtual reality (VR) is rapidly growing, with the potential to change the way we create and consume content. In VR, users integrate multimodal sensory information they receive, to create a unified perception of the virtual world. In this survey, we review the body of work addressing multimodality in VR, and its role and benefits in user experience, together with different applications that leverag…
▽ More
Virtual reality (VR) is rapidly growing, with the potential to change the way we create and consume content. In VR, users integrate multimodal sensory information they receive, to create a unified perception of the virtual world. In this survey, we review the body of work addressing multimodality in VR, and its role and benefits in user experience, together with different applications that leverage multimodality in many disciplines. These works thus encompass several fields of research, and demonstrate that multimodality plays a fundamental role in VR; enhancing the experience, improving overall performance, and yielding unprecedented abilities in skill and knowledge transfer.
△ Less
Submitted 12 April, 2022; v1 submitted 19 January, 2021;
originally announced January 2021.
-
The joint role of geometry and illumination on material recognition
Authors:
Manuel Lagunas,
Ana Serrano,
Diego Gutierrez,
Belen Masia
Abstract:
Observing and recognizing materials is a fundamental part of our daily life. Under typical viewing conditions, we are capable of effortlessly identifying the objects that surround us and recognizing the materials they are made of. Nevertheless, understanding the underlying perceptual processes that take place to accurately discern the visual properties of an object is a long-standing problem. In t…
▽ More
Observing and recognizing materials is a fundamental part of our daily life. Under typical viewing conditions, we are capable of effortlessly identifying the objects that surround us and recognizing the materials they are made of. Nevertheless, understanding the underlying perceptual processes that take place to accurately discern the visual properties of an object is a long-standing problem. In this work, we perform a comprehensive and systematic analysis of how the interplay of geometry, illumination, and their spatial frequencies affects human performance on material recognition tasks. We carry out large-scale behavioral experiments where participants are asked to recognize different reference materials among a pool of candidate samples. In the different experiments, we carefully sample the information in the frequency domain of the stimuli. From our analysis, we find significant first-order interactions between the geometry and the illumination, of both the reference and the candidates. In addition, we observe that simple image statistics and higher-order image histograms do not correlate with human performance. Therefore, we perform a high-level comparison of highly non-linear statistics by training a deep neural network on material recognition tasks. Our results show that such models can accurately classify materials, which suggests that they are capable of defining a meaningful representation of material appearance from labeled proximal image data. Last, we find preliminary evidence that these highly non-linear models and humans may use similar high-level factors for material recognition tasks.
△ Less
Submitted 4 February, 2021; v1 submitted 7 January, 2021;
originally announced January 2021.
-
A Similarity Measure for Material Appearance
Authors:
Manuel Lagunas,
Sandra Malpica,
Ana Serrano,
Elena Garces,
Diego Gutierrez,
Belen Masia
Abstract:
We present a model to measure the similarity in appearance between different materials, which correlates with human similarity judgments. We first create a database of 9,000 rendered images depicting objects with varying materials, shape and illumination. We then gather data on perceived similarity from crowdsourced experiments; our analysis of over 114,840 answers suggests that indeed a shared pe…
▽ More
We present a model to measure the similarity in appearance between different materials, which correlates with human similarity judgments. We first create a database of 9,000 rendered images depicting objects with varying materials, shape and illumination. We then gather data on perceived similarity from crowdsourced experiments; our analysis of over 114,840 answers suggests that indeed a shared perception of appearance similarity exists. We feed this data to a deep learning architecture with a novel loss function, which learns a feature space for materials that correlates with such perceived appearance similarity. Our evaluation shows that our model outperforms existing metrics. Last, we demonstrate several applications enabled by our metric, including appearance-based search for material suggestions, database visualization, clustering and summarization, and gamut mapping.
△ Less
Submitted 4 May, 2019;
originally announced May 2019.
-
Learning icons appearance similarity
Authors:
Manuel Lagunas,
Elena Garces,
Diego Gutierrez
Abstract:
Selecting an optimal set of icons is a crucial step in the pipeline of visual design to structure and navigate through content. However, designing the icons sets is usually a difficult task for which expert knowledge is required. In this work, to ease the process of icon set selection to the users, we propose a similarity metric which captures the properties of style and visual identity. We train…
▽ More
Selecting an optimal set of icons is a crucial step in the pipeline of visual design to structure and navigate through content. However, designing the icons sets is usually a difficult task for which expert knowledge is required. In this work, to ease the process of icon set selection to the users, we propose a similarity metric which captures the properties of style and visual identity. We train a Siamese Neural Network with an online dataset of icons organized in visually coherent collections that are used to adaptively sample training data and optimize the training process. As the dataset contains noise, we further collect human-rated information on the perception of icon's similarity which will be used for evaluating and testing the proposed model. We present several results and applications based on searches, kernel visualizations and optimized set proposals that can be helpful for designers and non-expert users while exploring large collections of icons.
△ Less
Submitted 1 February, 2019;
originally announced February 2019.
-
Virtual Wave Optics for Non-Line-of-Sight Imaging
Authors:
Xiaochun Liu,
Ibón Guillén,
Marco La Manna,
Ji Hyun Nam,
Syed Azer Reza,
Toan Huu Le,
Diego Gutierrez,
Adrian Jarabo,
Andreas Velten
Abstract:
Non-Line-of-Sight (NLOS) imaging allows to observe objects partially or fully occluded from direct view, by analyzing indirect diffuse reflections off a secondary, relay surface. Despite its many potential applications, existing methods lack practical usability due to several shared limitations, including the assumption of single scattering only, lack of occlusions, and Lambertian reflectance. We…
▽ More
Non-Line-of-Sight (NLOS) imaging allows to observe objects partially or fully occluded from direct view, by analyzing indirect diffuse reflections off a secondary, relay surface. Despite its many potential applications, existing methods lack practical usability due to several shared limitations, including the assumption of single scattering only, lack of occlusions, and Lambertian reflectance. We lift these limitations by transforming the NLOS problem into a virtual Line-Of-Sight (LOS) one. Since imaging information cannot be recovered from the irradiance arriving at the relay surface, we introduce the concept of the phasor field, a mathematical construct representing a fast variation in irradiance. We show that NLOS light transport can be modeled as the propagation of a phasor field wave, which can be solved accurately by the Rayleigh-Sommerfeld diffraction integral. We demonstrate for the first time NLOS reconstruction of complex scenes with strong multiply scattered and ambient light, arbitrary materials, large depth range, and occlusions. Our method handles these challenging cases without explicitly developing a light transport model. By leveraging existing fast algorithms, we outperform existing methods in terms of execution speed, computational complexity, and memory use. We believe that our approach will help unlock the potential of NLOS imaging, and the development of novel applications not restricted to lab conditions. For example, we demonstrate both refocusing and transient NLOS videos of real-world, complex scenes with large depth.
△ Less
Submitted 6 August, 2019; v1 submitted 17 October, 2018;
originally announced October 2018.
-
On-the-Fly Power-Aware Rendering
Authors:
Yunjin Zhang,
Marta Ortin,
Victor Arellano,
Rui Wang,
Diego Gutierrez,
Hujun Bao
Abstract:
Power saving is a prevailing concern in desktop computers and, especially, in battery-powered devices such as mobile phones. This is generating a growing demand for power-aware graphics applications that can extend battery life, while preserving good quality. In this paper, we address this issue by presenting a real-time power-efficient rendering framework, able to dynamically select the rendering…
▽ More
Power saving is a prevailing concern in desktop computers and, especially, in battery-powered devices such as mobile phones. This is generating a growing demand for power-aware graphics applications that can extend battery life, while preserving good quality. In this paper, we address this issue by presenting a real-time power-efficient rendering framework, able to dynamically select the rendering configuration with the best quality within a given power budget. Different from the current state of the art, our method does not require precomputation of the whole camera-view space, nor Pareto curves to explore the vast power-error space; as such, it can also handle dynamic scenes. Our algorithm is based on two key components: our novel power prediction model, and our runtime quality error estimation mechanism. These components allow us to search for the optimal rendering configuration at runtime, being transparent to the user. We demonstrate the performance of our framework on two different platforms: a desktop computer, and a mobile device. In both cases, we produce results close to the maximum quality, while achieving significant power savings.
△ Less
Submitted 31 July, 2018;
originally announced July 2018.
-
An intuitive control space for material appearance
Authors:
Ana Serrano,
Diego Gutierrez,
Karol Myszkowski,
Hans-Peter Seidel,
Belen Masia
Abstract:
Many different techniques for measuring material appearance have been proposed in the last few years. These have produced large public datasets, which have been used for accurate, data-driven appearance modeling. However, although these datasets have allowed us to reach an unprecedented level of realism in visual appearance, editing the captured data remains a challenge. In this paper, we present…
▽ More
Many different techniques for measuring material appearance have been proposed in the last few years. These have produced large public datasets, which have been used for accurate, data-driven appearance modeling. However, although these datasets have allowed us to reach an unprecedented level of realism in visual appearance, editing the captured data remains a challenge. In this paper, we present an intuitive control space for predictable editing of captured BRDF data, which allows for artistic creation of plausible novel material appearances, bypassing the difficulty of acquiring novel samples. We first synthesize novel materials, extending the existing MERL dataset up to 400 mathematically valid BRDFs. We then design a large-scale experiment, gathering 56,000 subjective ratings on the high-level perceptual attributes that best describe our extended dataset of materials. Using these ratings, we build and train networks of radial basis functions to act as functionals mapping the perceptual attributes to an underlying PCA-based representation of BRDFs. We show that our functionals are excellent predictors of the perceived attributes of appearance. Our control space enables many applications, including intuitive material editing of a wide range of visual properties, guidance for gamut mapping, analysis of the correlation between perceptual attributes, or novel appearance similarity metrics. Moreover, our methodology can be used to derive functionals applicable to classic analytic BRDF representations. We release our code and dataset publicly, in order to support and encourage further research in this direction.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Convolutional Sparse Coding for High Dynamic Range Imaging
Authors:
Ana Serrano,
Felix Heide,
Diego Gutierrez,
Gordon Wetzstein,
Belen Masia
Abstract:
Current HDR acquisition techniques are based on either (i) fusing multibracketed, low dynamic range (LDR) images, (ii) modifying existing hardware and capturing different exposures simultaneously with multiple sensors, or (iii) reconstructing a single image with spatially-varying pixel exposures. In this paper, we propose a novel algorithm to recover high-quality HDRI images from a single, coded e…
▽ More
Current HDR acquisition techniques are based on either (i) fusing multibracketed, low dynamic range (LDR) images, (ii) modifying existing hardware and capturing different exposures simultaneously with multiple sensors, or (iii) reconstructing a single image with spatially-varying pixel exposures. In this paper, we propose a novel algorithm to recover high-quality HDRI images from a single, coded exposure. The proposed reconstruction method builds on recently-introduced ideas of convolutional sparse coding (CSC); this paper demonstrates how to make CSC practical for HDR imaging. We demonstrate that the proposed algorithm achieves higher-quality reconstructions than alternative methods, we evaluate optical coding schemes, analyze algorithmic parameters, and build a prototype coded HDR camera that demonstrates the utility of convolutional sparse HDRI coding with a custom hardware platform.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Convolutional sparse coding for capturing high speed video content
Authors:
Ana Serrano,
Elena Garces,
Diego Gutierrez,
Belen Masia
Abstract:
Video capture is limited by the trade-off between spatial and temporal resolution: when capturing videos of high temporal resolution, the spatial resolution decreases due to bandwidth limitations in the capture system. Achieving both high spatial and temporal resolution is only possible with highly specialized and very expensive hardware, and even then the same basic trade-off remains. The recent…
▽ More
Video capture is limited by the trade-off between spatial and temporal resolution: when capturing videos of high temporal resolution, the spatial resolution decreases due to bandwidth limitations in the capture system. Achieving both high spatial and temporal resolution is only possible with highly specialized and very expensive hardware, and even then the same basic trade-off remains. The recent introduction of compressive sensing and sparse reconstruction techniques allows for the capture of single-shot high-speed video, by coding the temporal information in a single frame, and then reconstructing the full video sequence from this single coded image and a trained dictionary of image patches. In this paper, we first analyze this approach, and find insights that help improve the quality of the reconstructed videos. We then introduce a novel technique, based on convolutional sparse coding (CSC), and show how it outperforms the state-of-the-art, patch-based approach in terms of flexibility and efficiency, due to the convolutional nature of its filter banks. The key idea for CSC high-speed video acquisition is extending the basic formulation by imposing an additional constraint in the temporal dimension, which enforces sparsity of the first-order derivatives over time.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Movie Editing and Cognitive Event Segmentation in Virtual Reality Video
Authors:
Ana Serrano,
Vincent Sitzmann,
Jaime Ruiz-Borau,
Gordon Wetzstein,
Diego Gutierrez,
Belen Masia
Abstract:
Traditional cinematography has relied for over a century on a well-established set of editing rules, called continuity editing, to create a sense of situational continuity. Despite massive changes in visual content across cuts, viewers in general experience no trouble perceiving the discontinuous flow of information as a coherent set of events. However, Virtual Reality (VR) movies are intrinsicall…
▽ More
Traditional cinematography has relied for over a century on a well-established set of editing rules, called continuity editing, to create a sense of situational continuity. Despite massive changes in visual content across cuts, viewers in general experience no trouble perceiving the discontinuous flow of information as a coherent set of events. However, Virtual Reality (VR) movies are intrinsically different from traditional movies in that the viewer controls the camera orientation at all times. As a consequence, common editing techniques that rely on camera orientations, zooms, etc., cannot be used. In this paper we investigate key relevant questions to understand how well traditional movie editing carries over to VR. To do so, we rely on recent cognition studies and the event segmentation theory, which states that our brains segment continuous actions into a series of discrete, meaningful events. We first replicate one of these studies to assess whether the predictions of such theory can be applied to VR. We next gather gaze data from viewers watching VR videos containing different edits with varying parameters, and provide the first systematic analysis of viewers' behavior and the perception of continuity in VR. From this analysis we make a series of relevant findings; for instance, our data suggests that predictions from the cognitive event segmentation theory are useful guides for VR editing; that different types of edits are equally well understood in terms of continuity; and that spatial misalignments between regions of interest at the edit boundaries favor a more exploratory behavior even after viewers have fixated on a new region of interest. In addition, we propose a number of metrics to describe viewers' attentional behavior in VR. We believe the insights derived from our work can be useful as guidelines for VR content creation.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Progressive Transient Photon Beams
Authors:
Julio Marco,
Ibón Guillén,
Wojciech Jarosz,
Diego Gutierrez,
Adrian Jarabo
Abstract:
In this work we introduce a novel algorithm for transient rendering in participating media. Our method is consistent, robust, and is able to generate animations of time-resolved light transport featuring complex caustic light paths in media. We base our method on the observation that the spatial continuity provides an increased coverage of the temporal domain, and generalize photon beams to transi…
▽ More
In this work we introduce a novel algorithm for transient rendering in participating media. Our method is consistent, robust, and is able to generate animations of time-resolved light transport featuring complex caustic light paths in media. We base our method on the observation that the spatial continuity provides an increased coverage of the temporal domain, and generalize photon beams to transient-state. We extend the beam steady-state radiance estimates to include the temporal domain. Then, we develop a progressive version of spatio-temporal density estimations, that converges to the correct solution with finite memory requirements by iteratively averaging several realizations of independent renders with a progressively reduced kernel bandwidth. We derive the optimal convergence rates accounting for space and time kernels, and demonstrate our method against previous consistent transient rendering methods for participating media.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
DeepToF: Off-the-Shelf Real-Time Correction of Multipath Interference in Time-of-Flight Imaging
Authors:
Julio Marco,
Quercus Hernandez,
Adolfo Muñoz,
Yue Dong,
Adrian Jarabo,
Min H. Kim,
Xin Tong,
Diego Gutierrez
Abstract:
Time-of-flight (ToF) imaging has become a widespread technique for depth estimation, allowing affordable off-the-shelf cameras to provide depth maps in real time. However, multipath interference (MPI) resulting from indirect illumination significantly degrades the captured depth. Most previous works have tried to solve this problem by means of complex hardware modifications or costly computations.…
▽ More
Time-of-flight (ToF) imaging has become a widespread technique for depth estimation, allowing affordable off-the-shelf cameras to provide depth maps in real time. However, multipath interference (MPI) resulting from indirect illumination significantly degrades the captured depth. Most previous works have tried to solve this problem by means of complex hardware modifications or costly computations. In this work we avoid these approaches, and propose a new technique that corrects errors in depth caused by MPI that requires no camera modifications, and corrects depth in just 10 milliseconds per frame. By observing that most MPI information can be expressed as a function of the captured depth, we pose MPI removal as a convolutional approach, and model it using a convolutional neural network. In particular, given that the input and output data present similar structure, we base our network in an autoencoder, which we train in two stages: first, we use the encoder (convolution filters) to learn a suitable basis to represent corrupted range images; then, we train the decoder (deconvolution filters) to correct depth from the learned basis from synthetically generated scenes. This approach allows us to tackle the lack of reference data, by using a large-scale captured training set with corrupted depth to train the encoder, and a smaller synthetic training set with ground truth depth to train the corrector stage of the network, which we generate by using a physically-based, time-resolved rendering. We demonstrate and validate our method on both synthetic and real complex scenarios, using an off-the-shelf ToF camera, and with only the captured incorrect depth as input.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
Second-Order Occlusion-Aware Volumetric Radiance Caching
Authors:
Julio Marco,
Adrian Jarabo,
Wojciech Jarosz,
Diego Gutierrez
Abstract:
We present a second-order gradient analysis of light transport in participating media and use this to develop an improved radiance caching algorithm for volumetric light transport. We adaptively sample and interpolate radiance from sparse points in the medium using a second-order Hessian-based error metric to determine when interpolation is appropriate. We derive our metric from each point's incom…
▽ More
We present a second-order gradient analysis of light transport in participating media and use this to develop an improved radiance caching algorithm for volumetric light transport. We adaptively sample and interpolate radiance from sparse points in the medium using a second-order Hessian-based error metric to determine when interpolation is appropriate. We derive our metric from each point's incoming light field, computed by using a proxy triangulation-based representation of the radiance reflected by the surrounding medium and geometry. We use this representation to efficiently compute the first- and second-order derivatives of the radiance at the cache points while accounting for occlusion changes.
We also propose a self-contained two-dimensional model for light transport in media and use it to validate and analyze our approach, demonstrating that our method outperforms previous radiance caching algorithms both in terms of accurate derivative estimates and final radiance extrapolation. We generalize these findings to practical three-dimensional scenarios, where we show improved results while reducing computation time by up to 30\% compared to previous work.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
Analyzing Interfaces and Workflows for Light Field Editing
Authors:
Marta Ortin,
Adrian Jarabo,
Belen Masia,
Diego Gutierrez
Abstract:
With the increasing number of available consumer light field cameras, such as Lytro TM, Raytrix TM, or Pelican Imaging TM, this new form of photography is progressively becoming more common. However, there are still very few tools for light field editing, and the interfaces to create those edits remain largely unexplored. Given the extended dimensionality of light field data, it is not clear what…
▽ More
With the increasing number of available consumer light field cameras, such as Lytro TM, Raytrix TM, or Pelican Imaging TM, this new form of photography is progressively becoming more common. However, there are still very few tools for light field editing, and the interfaces to create those edits remain largely unexplored. Given the extended dimensionality of light field data, it is not clear what the most intuitive interfaces and optimal workflows are, in contrast with well-studied 2D image manipulation software. In this work we provide a detailed description of subjects' performance and preferences for a number of simple editing tasks, which form the basis for more complex operations. We perform a detailed state sequence analysis and hidden Markov chain analysis based on the sequence of tools and interaction paradigms users employ while editing light fields. These insights can aid researchers and designers in creating new light field editing tools and interfaces, thus helping to close the gap between 4D and 2D image editing.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
A Radiative Transfer Framework for Spatially-Correlated Materials
Authors:
Adrian Jarabo,
Carlos Aliaga,
Diego Gutierrez
Abstract:
We introduce a non-exponential radiative framework that takes into account the local spatial correlation of scattering particles in a medium. Most previous works in graphics have ignored this, assuming uncorrelated media with a uniform, random local distribution of particles. However, positive and negative correlation lead to slower- and faster-than-exponential attenuation respectively, which cann…
▽ More
We introduce a non-exponential radiative framework that takes into account the local spatial correlation of scattering particles in a medium. Most previous works in graphics have ignored this, assuming uncorrelated media with a uniform, random local distribution of particles. However, positive and negative correlation lead to slower- and faster-than-exponential attenuation respectively, which cannot be predicted by the Beer-Lambert law. As our results show, this has a major effect on extinction, and thus appearance. From recent advances in neutron transport, we first introduce our Extended Generalized Boltzmann Equation, and develop a general framework for light transport in correlated media. We lift the limitations of the original formulation, including an analysis of the boundary conditions, and present a model suitable for computer graphics, based on optical properties of the media and statistical distributions of scatterers. In addition, we present an analytic expression for transmittance in the case of positive correlation, and show how to incorporate it efficiently into a Monte Carlo renderer. We show results with a wide range of both positive and negative correlation, and demonstrate the differences compared to classic light transport.
△ Less
Submitted 7 May, 2018;
originally announced May 2018.
-
A Computational Model of a Single-Photon Avalanche Diode Sensor for Transient Imaging
Authors:
Quercus Hernandez,
Diego Gutierrez,
Adrian Jarabo
Abstract:
Single-Photon Avalanche Diodes (SPAD) are affordable photodetectors, capable to collect extremely fast low-energy events, due to their single-photon sensibility. This makes them very suitable for time-of-flight-based range imaging systems, allowing to reduce costs and power requirements, without sacrifizing much temporal resolution. In this work we describe a computational model to simulate the be…
▽ More
Single-Photon Avalanche Diodes (SPAD) are affordable photodetectors, capable to collect extremely fast low-energy events, due to their single-photon sensibility. This makes them very suitable for time-of-flight-based range imaging systems, allowing to reduce costs and power requirements, without sacrifizing much temporal resolution. In this work we describe a computational model to simulate the behaviour of SPAD sensors, aiming to provide a realistic camera model for time-resolved light transport simulation, with applications on prototyping new reconstructions techniques based on SPAD time-of-flight data. Our model accounts for the major effects of the sensor on the incoming signal. We compare our model against real-world measurements, and apply it to a variety of scenarios, including complex multiply-scattered light transport.
△ Less
Submitted 23 February, 2017;
originally announced March 2017.
-
Fast Back-Projection for Non-Line of Sight Reconstruction
Authors:
Victor Arellano,
Diego Gutierrez,
Adrian Jarabo
Abstract:
Recent works have demonstrated non-line of sight (NLOS) reconstruction by using the time-resolved signal frommultiply scattered light. These works combine ultrafast imaging systems with computation, which back-projects the recorded space-time signal to build a probabilistic map of the hidden geometry. Unfortunately, this computation is slow, becoming a bottleneck as the imaging technology improves…
▽ More
Recent works have demonstrated non-line of sight (NLOS) reconstruction by using the time-resolved signal frommultiply scattered light. These works combine ultrafast imaging systems with computation, which back-projects the recorded space-time signal to build a probabilistic map of the hidden geometry. Unfortunately, this computation is slow, becoming a bottleneck as the imaging technology improves. In this work, we propose a new back-projection technique for NLOS reconstruction, which is up to a thousand times faster than previous work, with almost no quality loss. We base on the observation that the hidden geometry probability map can be built as the intersection of the three-bounce space-time manifolds defined by the light illuminating the hidden geometry and the visible point receiving the scattered light from such hidden geometry. This allows us to pose the reconstruction of the hidden geometry as the voxelization of these space-time manifolds, which has lower theoretic complexity and is easily implementable in the GPU. We demonstrate the efficiency and quality of our technique compared against previous methods in both captured and synthetic data
△ Less
Submitted 20 June, 2017; v1 submitted 6 March, 2017;
originally announced March 2017.
-
How do people explore virtual environments?
Authors:
Vincent Sitzmann,
Ana Serrano,
Amy Pavel,
Maneesh Agrawala,
Diego Gutierrez,
Belen Masia,
Gordon Wetzstein
Abstract:
Understanding how people explore immersive virtual environments is crucial for many applications, such as designing virtual reality (VR) content, developing new compression algorithms, or learning computational models of saliency or visual attention. Whereas a body of recent work has focused on modeling saliency in desktop viewing conditions, VR is very different from these conditions in that view…
▽ More
Understanding how people explore immersive virtual environments is crucial for many applications, such as designing virtual reality (VR) content, developing new compression algorithms, or learning computational models of saliency or visual attention. Whereas a body of recent work has focused on modeling saliency in desktop viewing conditions, VR is very different from these conditions in that viewing behavior is governed by stereoscopic vision and by the complex interaction of head orientation, gaze, and other kinematic constraints. To further our understanding of viewing behavior and saliency in VR, we capture and analyze gaze and head orientation data of 169 users exploring stereoscopic, static omni-directional panoramas, for a total of 1980 head and gaze trajectories for three different viewing conditions. We provide a thorough analysis of our data, which leads to several important insights, such as the existence of a particular fixation bias, which we then use to adapt existing saliency predictors to immersive VR conditions. In addition, we explore other applications of our data and analysis, including automatic alignment of VR video cuts, panorama thumbnails, panorama video synopsis, and saliency-based compression.
△ Less
Submitted 19 September, 2017; v1 submitted 13 December, 2016;
originally announced December 2016.
-
Recent Advances in Transient Imaging: A Computer Graphics and Vision Perspective
Authors:
Adrian Jarabo,
Belen Masia,
Julio Marco,
Diego Gutierrez
Abstract:
Transient imaging has recently made a huge impact in the computer graphics and computer vision fields. By capturing, reconstructing, or simulating light transport at extreme temporal resolutions, researchers have proposed novel techniques to show movies of light in motion, see around corners, detect objects in highly-scattering media, or infer material properties from a distance, to name a few. Th…
▽ More
Transient imaging has recently made a huge impact in the computer graphics and computer vision fields. By capturing, reconstructing, or simulating light transport at extreme temporal resolutions, researchers have proposed novel techniques to show movies of light in motion, see around corners, detect objects in highly-scattering media, or infer material properties from a distance, to name a few. The key idea is to leverage the wealth of information in the temporal domain at the pico or nanosecond resolution, information usually lost during the capture-time temporal integration. This paper presents recent advances in this field of transient imaging from a graphics and vision perspective, including capture techniques, analysis, applications and simulation.
△ Less
Submitted 3 November, 2016;
originally announced November 2016.
-
Intrinsic Light Field Images
Authors:
Elena Garces,
Jose I. Echevarria,
Wen Zhang,
Hongzhi Wu,
Kun Zhou,
Diego Gutierrez
Abstract:
We present a method to automatically decompose a light field into its intrinsic shading and albedo components. Contrary to previous work targeted to 2D single images and videos, a light field is a 4D structure that captures non-integrated incoming radiance over a discrete angular domain. This higher dimensionality of the problem renders previous state-of-the-art algorithms impractical either due t…
▽ More
We present a method to automatically decompose a light field into its intrinsic shading and albedo components. Contrary to previous work targeted to 2D single images and videos, a light field is a 4D structure that captures non-integrated incoming radiance over a discrete angular domain. This higher dimensionality of the problem renders previous state-of-the-art algorithms impractical either due to their cost of processing a single 2D slice, or their inability to enforce proper coherence in additional dimensions. We propose a new decomposition algorithm that jointly optimizes the whole light field data for proper angular coherence. For efficiency, we extend Retinex theory, working on the gradient domain, where new albedo and occlusion terms are introduced. Results show our method provides 4D intrinsic decompositions difficult to achieve with previous state-of-the-art algorithms. We further provide a comprehensive analysis and comparisons with existing intrinsic image/video decomposition methods on light field images.
△ Less
Submitted 12 April, 2017; v1 submitted 15 August, 2016;
originally announced August 2016.