-
A Challenge to Build Neuro-Symbolic Video Agents
Authors:
Sahil Shah,
Harsh Goel,
Sai Shankar Narasimhan,
Minkyu Choi,
S P Sharan,
Oguzhan Akcin,
Sandeep Chinchali
Abstract:
Modern video understanding systems excel at tasks such as scene classification, object detection, and short video retrieval. However, as video analysis becomes increasingly central to real-world applications, there is a growing need for proactive video agents for the systems that not only interpret video streams but also reason about events and take informed actions. A key obstacle in this directi…
▽ More
Modern video understanding systems excel at tasks such as scene classification, object detection, and short video retrieval. However, as video analysis becomes increasingly central to real-world applications, there is a growing need for proactive video agents for the systems that not only interpret video streams but also reason about events and take informed actions. A key obstacle in this direction is temporal reasoning: while deep learning models have made remarkable progress in recognizing patterns within individual frames or short clips, they struggle to understand the sequencing and dependencies of events over time, which is critical for action-driven decision-making. Addressing this limitation demands moving beyond conventional deep learning approaches. We posit that tackling this challenge requires a neuro-symbolic perspective, where video queries are decomposed into atomic events, structured into coherent sequences, and validated against temporal constraints. Such an approach can enhance interpretability, enable structured reasoning, and provide stronger guarantees on system behavior, all key properties for advancing trustworthy video agents. To this end, we present a grand challenge to the research community: developing the next generation of intelligent video agents that integrate three core capabilities: (1) autonomous video search and analysis, (2) seamless real-world interaction, and (3) advanced content generation. By addressing these pillars, we can transition from passive perception to intelligent video agents that reason, predict, and act, pushing the boundaries of video understanding.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Digital Kitchen Remodeling: Editing and Relighting Intricate Indoor Scenes from a Single Panorama
Authors:
Guanzhou Ji,
Azadeh O. Sawyer,
Srinivasa G. Narasimhan
Abstract:
We present a novel virtual staging application for kitchen remodeling from a single panorama. To ensure the realism of the virtual rendered scene, we capture real-world High Dynamic Range (HDR) panoramas and recover the absolute scene radiance for high-quality scene relighting. Our application pipeline consists of three key components: (1) HDR photography for capturing paired indoor and outdoor pa…
▽ More
We present a novel virtual staging application for kitchen remodeling from a single panorama. To ensure the realism of the virtual rendered scene, we capture real-world High Dynamic Range (HDR) panoramas and recover the absolute scene radiance for high-quality scene relighting. Our application pipeline consists of three key components: (1) HDR photography for capturing paired indoor and outdoor panoramas, (2) automatic kitchen layout generation with new kitchen components, and (3) an editable rendering pipeline that flexibly edits scene materials and relights the new virtual scene with global illumination. Additionally, we contribute a novel Pano-Pano HDR dataset with 141 paired indoor and outdoor panoramas and present a low-cost photometric calibration method for panoramic HDR photography.
△ Less
Submitted 4 February, 2025;
originally announced April 2025.
-
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
Authors:
Khiem Vuong,
Anurag Ghosh,
Deva Ramanan,
Srinivasa Narasimhan,
Shubham Tulsiani
Abstract:
We explore the task of geometric reconstruction of images captured from a mixture of ground and aerial views. Current state-of-the-art learning-based approaches fail to handle the extreme viewpoint variation between aerial-ground image pairs. Our hypothesis is that the lack of high-quality, co-registered aerial-ground datasets for training is a key reason for this failure. Such data is difficult t…
▽ More
We explore the task of geometric reconstruction of images captured from a mixture of ground and aerial views. Current state-of-the-art learning-based approaches fail to handle the extreme viewpoint variation between aerial-ground image pairs. Our hypothesis is that the lack of high-quality, co-registered aerial-ground datasets for training is a key reason for this failure. Such data is difficult to assemble precisely because it is difficult to reconstruct in a scalable way. To overcome this challenge, we propose a scalable framework combining pseudo-synthetic renderings from 3D city-wide meshes (e.g., Google Earth) with real, ground-level crowd-sourced images (e.g., MegaDepth). The pseudo-synthetic data simulates a wide range of aerial viewpoints, while the real, crowd-sourced images help improve visual fidelity for ground-level images where mesh-based renderings lack sufficient detail, effectively bridging the domain gap between real images and pseudo-synthetic renderings. Using this hybrid dataset, we fine-tune several state-of-the-art algorithms and achieve significant improvements on real-world, zero-shot aerial-ground tasks. For example, we observe that baseline DUSt3R localizes fewer than 5% of aerial-ground pairs within 5 degrees of camera rotation error, while fine-tuning with our data raises accuracy to nearly 56%, addressing a major failure point in handling large viewpoint changes. Beyond camera estimation and scene reconstruction, our dataset also improves performance on downstream tasks like novel-view synthesis in challenging aerial-ground scenarios, demonstrating the practical value of our approach in real-world applications.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Securing the supply of graphite for batteries
Authors:
Karan Bhuwalka,
Hari Ramachandran,
Swati Narasimhan,
Adrian Yao,
Julia Frohmann,
Leopold Peiseler,
William Chueh,
Adam Boies,
Steven J. Davis,
Sally Benson
Abstract:
The increasing demand for graphite in batteries has led to concerns around supply chain security. Currently, over 92% of global anode material is produced in China, posing a geopolitical risk for other countries reliant on graphite supply for domestic industries. This paper assesses the costs of producing battery-grade graphite (natural and synthetic) in the US and China using process-based cost m…
▽ More
The increasing demand for graphite in batteries has led to concerns around supply chain security. Currently, over 92% of global anode material is produced in China, posing a geopolitical risk for other countries reliant on graphite supply for domestic industries. This paper assesses the costs of producing battery-grade graphite (natural and synthetic) in the US and China using process-based cost models. We find that production costs in the US significantly exceed those in China due to higher capital intensity and input costs. Our analysis reveals that a majority of modeled projects in the US are not competitive at current market prices. We identify key cost drivers, including capital costs, economies of scale, and input material prices, and explore pathways to improve the competitiveness of US graphite production, such as supportive financing and process innovation directions. The analysis of conventional graphite production costs at scale also informs ceiling costs for alternative, promising pathways such as methane pyrolysis and catalytic graphitization. This study highlights the challenges and trade-offs in building a diversified graphite supply chain and informs policy and investment decisions.
△ Less
Submitted 29 April, 2025; v1 submitted 27 March, 2025;
originally announced March 2025.
-
Accenture-NVS1: A Novel View Synthesis Dataset
Authors:
Thomas Sugg,
Kyle O'Brien,
Lekh Poudel,
Alex Dumouchelle,
Michelle Jou,
Marc Bosch,
Deva Ramanan,
Srinivasa Narasimhan,
Shubham Tulsiani
Abstract:
This paper introduces ACC-NVS1, a specialized dataset designed for research on Novel View Synthesis specifically for airborne and ground imagery. Data for ACC-NVS1 was collected in Austin, TX and Pittsburgh, PA in 2023 and 2024. The collection encompasses six diverse real-world scenes captured from both airborne and ground cameras, resulting in a total of 148,000 images. ACC-NVS1 addresses challen…
▽ More
This paper introduces ACC-NVS1, a specialized dataset designed for research on Novel View Synthesis specifically for airborne and ground imagery. Data for ACC-NVS1 was collected in Austin, TX and Pittsburgh, PA in 2023 and 2024. The collection encompasses six diverse real-world scenes captured from both airborne and ground cameras, resulting in a total of 148,000 images. ACC-NVS1 addresses challenges such as varying altitudes and transient objects. This dataset is intended to supplement existing datasets, providing additional resources for comprehensive research, rather than serving as a benchmark.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Distributed Certifiably Correct Range-Aided SLAM
Authors:
Alexander Thoms,
Alan Papalia,
Jared Velasquez,
David M. Rosen,
Sriram Narasimhan
Abstract:
Reliable simultaneous localization and mapping (SLAM) algorithms are necessary for safety-critical autonomous navigation. In the communication-constrained multi-agent setting, navigation systems increasingly use point-to-point range sensors as they afford measurements with low bandwidth requirements and known data association. The state estimation problem for these systems takes the form of range-…
▽ More
Reliable simultaneous localization and mapping (SLAM) algorithms are necessary for safety-critical autonomous navigation. In the communication-constrained multi-agent setting, navigation systems increasingly use point-to-point range sensors as they afford measurements with low bandwidth requirements and known data association. The state estimation problem for these systems takes the form of range-aided (RA) SLAM. However, distributed algorithms for solving the RA-SLAM problem lack formal guarantees on the quality of the returned estimate. To this end, we present the first distributed algorithm for RA-SLAM that can efficiently recover certifiably globally optimal solutions. Our algorithm, distributed certifiably correct RA-SLAM (DCORA), achieves this via the Riemannian Staircase method, where computational procedures developed for distributed certifiably correct pose graph optimization are generalized to the RA-SLAM problem. We demonstrate DCORA's efficacy on real-world multi-agent datasets by achieving absolute trajectory errors comparable to those of a state-of-the-art centralized certifiably correct RA-SLAM algorithm. Additionally, we perform a parametric study on the structure of the RA-SLAM problem using synthetic data, revealing how common parameters affect DCORA's performance.
△ Less
Submitted 13 May, 2025; v1 submitted 5 March, 2025;
originally announced March 2025.
-
Indoor Light and Heat Estimation from a Single Panorama
Authors:
Guanzhou Ji,
Sriram Narayanan,
Azadeh Sawyer,
Srinivasa Narasimhan
Abstract:
This paper presents a novel application for directly estimating indoor light and heat maps from captured indoor-outdoor High Dynamic Range (HDR) panoramas. In our image-based rendering method, the indoor panorama is used to estimate the 3D room layout, while the corresponding outdoor panorama serves as an environment map to infer spatially-varying light and material properties. We establish a conn…
▽ More
This paper presents a novel application for directly estimating indoor light and heat maps from captured indoor-outdoor High Dynamic Range (HDR) panoramas. In our image-based rendering method, the indoor panorama is used to estimate the 3D room layout, while the corresponding outdoor panorama serves as an environment map to infer spatially-varying light and material properties. We establish a connection between indoor light transport and heat transport and implement transient heat simulation to generate indoor heat panoramas. The sensitivity analysis of various thermal parameters is conducted, and the resulting heat maps are compared with the images captured by the thermal camera in real-world scenarios. This digital application enables automatic indoor light and heat estimation without manual inputs and cumbersome field measurements.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Supporting Contraceptive Decision-Making in the Intermediated Pharmacy Setting in Kenya
Authors:
Lisa Orii,
Elizabeth K Harrington,
Serah Gitome,
Nelson Kiprotich Cheruiyot,
Elizabeth Anne Bukusi,
Sandy Cheng,
Ariel Fu,
Khushi Khandelwal,
Shrimayee Narasimhan,
Richard Anderson
Abstract:
Adolescent girls and young women (AGYW) in sub-Saharan Africa face unique barriers to contraceptive access and lack AGYW-centered contraceptive decision-support resources. To empower AGYW to make informed choices and improve reproductive health outcomes, we developed a tablet-based application to provide contraceptive education and decision-making support in the pharmacy setting - a key source of…
▽ More
Adolescent girls and young women (AGYW) in sub-Saharan Africa face unique barriers to contraceptive access and lack AGYW-centered contraceptive decision-support resources. To empower AGYW to make informed choices and improve reproductive health outcomes, we developed a tablet-based application to provide contraceptive education and decision-making support in the pharmacy setting - a key source of contraceptive services for AGYW - in Kenya. We conducted workshops with AGYW and pharmacy providers in Kenya to gather app feedback and understand how to integrate the intervention into the pharmacy setting. Our analysis highlights how intermediated interactions - a multiuser, cooperative effort to enable technology use and information access - could inform a successful contraceptive intervention in Kenya. The potential strengths of intermediation in our setting inform implications for technological health interventions in intermediated scenarios in \lrem{LMICs}\ladd{low- and middle-income countries}, including challenges and opportunities for extending impact to different populations and integrating technology into resource-constrained healthcare settings.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Visual-Lidar Map Alignment for Infrastructure Inspections
Authors:
Jake McLaughlin,
Nicholas Charron,
Sriram Narasimhan
Abstract:
Routine and repetitive infrastructure inspections present safety, efficiency, and consistency challenges as they are performed manually, often in challenging or hazardous environments. They can also introduce subjectivity and errors into the process, resulting in undesirable outcomes. Simultaneous localization and mapping (SLAM) presents an opportunity to generate high-quality 3D maps that can be…
▽ More
Routine and repetitive infrastructure inspections present safety, efficiency, and consistency challenges as they are performed manually, often in challenging or hazardous environments. They can also introduce subjectivity and errors into the process, resulting in undesirable outcomes. Simultaneous localization and mapping (SLAM) presents an opportunity to generate high-quality 3D maps that can be used to extract accurate and objective inspection data. Yet, many SLAM algorithms are limited in their ability to align 3D maps from repeated inspections in GPS-denied settings automatically. This limitation hinders practical long-term asset health assessments by requiring tedious manual alignment for data association across scans from previous inspections. This paper introduces a versatile map alignment algorithm leveraging both visual and lidar data for improved place recognition robustness and presents an infrastructure-focused dataset tailored for consecutive inspections. By detaching map alignment from SLAM, our approach enhances infrastructure inspection pipelines, supports monitoring asset degradation over time, and invigorates SLAM research by permitting exploration beyond existing multi-session SLAM algorithms.
△ Less
Submitted 27 January, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Object-level Visual Prompts for Compositional Image Generation
Authors:
Gaurav Parmar,
Or Patashnik,
Kuan-Chieh Wang,
Daniil Ostashev,
Srinivasa Narasimhan,
Jun-Yan Zhu,
Daniel Cohen-Or,
Kfir Aberman
Abstract:
We introduce a method for composing object-level visual prompts within a text-to-image diffusion model. Our approach addresses the task of generating semantically coherent compositions across diverse scenes and styles, similar to the versatility and expressiveness offered by text prompts. A key challenge in this task is to preserve the identity of the objects depicted in the input visual prompts,…
▽ More
We introduce a method for composing object-level visual prompts within a text-to-image diffusion model. Our approach addresses the task of generating semantically coherent compositions across diverse scenes and styles, similar to the versatility and expressiveness offered by text prompts. A key challenge in this task is to preserve the identity of the objects depicted in the input visual prompts, while also generating diverse compositions across different images. To address this challenge, we introduce a new KV-mixed cross-attention mechanism, in which keys and values are learned from distinct visual representations. The keys are derived from an encoder with a small bottleneck for layout control, whereas the values come from a larger bottleneck encoder that captures fine-grained appearance details. By mixing keys and values from these complementary sources, our model preserves the identity of the visual prompts while supporting flexible variations in object arrangement, pose, and composition. During inference, we further propose object-level compositional guidance to improve the method's identity preservation and layout correctness. Results show that our technique produces diverse scene compositions that preserve the unique characteristics of each visual prompt, expanding the creative potential of text-to-image generation.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
SynDiff-AD: Improving Semantic Segmentation and End-to-End Autonomous Driving with Synthetic Data from Latent Diffusion Models
Authors:
Harsh Goel,
Sai Shankar Narasimhan,
Oguzhan Akcin,
Sandeep Chinchali
Abstract:
In recent years, significant progress has been made in collecting large-scale datasets to improve segmentation and autonomous driving models. These large-scale datasets are often dominated by common environmental conditions such as "Clear and Day" weather, leading to decreased performance in under-represented conditions like "Rainy and Night". To address this issue, we introduce SynDiff-AD, a nove…
▽ More
In recent years, significant progress has been made in collecting large-scale datasets to improve segmentation and autonomous driving models. These large-scale datasets are often dominated by common environmental conditions such as "Clear and Day" weather, leading to decreased performance in under-represented conditions like "Rainy and Night". To address this issue, we introduce SynDiff-AD, a novel data augmentation pipeline that leverages diffusion models (DMs) to generate realistic images for such subgroups. SynDiff-AD uses ControlNet-a DM that guides data generation conditioned on semantic maps-along with a novel prompting scheme that generates subgroup-specific, semantically dense prompts. By augmenting datasets with SynDiff-AD, we improve the performance of segmentation models like Mask2Former and SegFormer by up to 1.2% and 2.3% on the Waymo dataset, and up to 1.4% and 0.7% on the DeepDrive dataset, respectively. Additionally, we demonstrate that our SynDiff-AD pipeline enhances the driving performance of end-to-end autonomous driving models, like AIM-2D and AIM-BEV, by up to 20% across diverse environmental conditions in the CARLA autonomous driving simulator, providing a more robust model.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
Context Matters: Leveraging Contextual Features for Time Series Forecasting
Authors:
Sameep Chattopadhyay,
Pulkit Paliwal,
Sai Shankar Narasimhan,
Shubhankar Agarwal,
Sandeep P. Chinchali
Abstract:
Time series forecasts are often influenced by exogenous contextual features in addition to their corresponding history. For example, in financial settings, it is hard to accurately predict a stock price without considering public sentiments and policy decisions in the form of news articles, tweets, etc. Though this is common knowledge, the current state-of-the-art (SOTA) forecasting models fail to…
▽ More
Time series forecasts are often influenced by exogenous contextual features in addition to their corresponding history. For example, in financial settings, it is hard to accurately predict a stock price without considering public sentiments and policy decisions in the form of news articles, tweets, etc. Though this is common knowledge, the current state-of-the-art (SOTA) forecasting models fail to incorporate such contextual information, owing to its heterogeneity and multimodal nature. To address this, we introduce ContextFormer, a novel plug-and-play method to surgically integrate multimodal contextual information into existing pre-trained forecasting models. ContextFormer effectively distills forecast-specific information from rich multimodal contexts, including categorical, continuous, time-varying, and even textual information, to significantly enhance the performance of existing base forecasters. ContextFormer outperforms SOTA forecasting models by up to 30% on a range of real-world datasets spanning energy, traffic, environmental, and financial domains.
△ Less
Submitted 13 January, 2025; v1 submitted 16 October, 2024;
originally announced October 2024.
-
Constrained Posterior Sampling: Time Series Generation with Hard Constraints
Authors:
Sai Shankar Narasimhan,
Shubhankar Agarwal,
Litu Rout,
Sanjay Shakkottai,
Sandeep P. Chinchali
Abstract:
Generating realistic time series samples is crucial for stress-testing models and protecting user privacy by using synthetic data. In engineering and safety-critical applications, these samples must meet certain hard constraints that are domain-specific or naturally imposed by physics or nature. Consider, for example, generating electricity demand patterns with constraints on peak demand times. Th…
▽ More
Generating realistic time series samples is crucial for stress-testing models and protecting user privacy by using synthetic data. In engineering and safety-critical applications, these samples must meet certain hard constraints that are domain-specific or naturally imposed by physics or nature. Consider, for example, generating electricity demand patterns with constraints on peak demand times. This can be used to stress-test the functioning of power grids during adverse weather conditions. Existing approaches for generating constrained time series are either not scalable or degrade sample quality. To address these challenges, we introduce Constrained Posterior Sampling (CPS), a diffusion-based sampling algorithm that aims to project the posterior mean estimate into the constraint set after each denoising update. Notably, CPS scales to a large number of constraints (~100) without requiring additional training. We provide theoretical justifications highlighting the impact of our projection step on sampling. Empirically, CPS outperforms state-of-the-art methods in sample quality and similarity to real time series by around 10% and 42%, respectively, on real-world stocks, traffic, and air quality datasets.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
OLiVia-Nav: An Online Lifelong Vision Language Approach for Mobile Robot Social Navigation
Authors:
Siddarth Narasimhan,
Aaron Hao Tan,
Daniel Choi,
Goldie Nejat
Abstract:
Service robots in human-centered environments such as hospitals, office buildings, and long-term care homes need to navigate while adhering to social norms to ensure the safety and comfortability of the people they are sharing the space with. Furthermore, they need to adapt to new social scenarios that can arise during robot navigation. In this paper, we present a novel Online Lifelong Vision Lang…
▽ More
Service robots in human-centered environments such as hospitals, office buildings, and long-term care homes need to navigate while adhering to social norms to ensure the safety and comfortability of the people they are sharing the space with. Furthermore, they need to adapt to new social scenarios that can arise during robot navigation. In this paper, we present a novel Online Lifelong Vision Language architecture, OLiVia- Nav, which uniquely integrates vision-language models (VLMs) with an online lifelong learning framework for robot social navigation. We introduce a unique distillation approach, Social Context Contrastive Language Image Pre-training (SC-CLIP), to transfer the social reasoning capabilities of large VLMs to a lightweight VLM, in order for OLiVia-Nav to directly encode social and environment context during robot navigation. These encoded embeddings are used to generate and select robot social compliant trajectories. The lifelong learning capabilities of SC-CLIP enable OLiVia-Nav to update the robot trajectory planning overtime as new social scenarios are encountered. We conducted extensive real-world experiments in diverse social navigation scenarios. The results showed that OLiVia-Nav outperformed existing state-of-the-art DRL and VLM methods in terms of mean squared error, Hausdorff loss, and personal space violation duration. Ablation studies also verified the design choices for OLiVia-Nav.
△ Less
Submitted 8 March, 2025; v1 submitted 20 September, 2024;
originally announced September 2024.
-
Incorporating dense metric depth into neural 3D representations for view synthesis and relighting
Authors:
Arkadeep Narayan Chaudhury,
Igor Vasiljevic,
Sergey Zakharov,
Vitor Guizilini,
Rares Ambrus,
Srinivasa Narasimhan,
Christopher G. Atkeson
Abstract:
Synthesizing accurate geometry and photo-realistic appearance of small scenes is an active area of research with compelling use cases in gaming, virtual reality, robotic-manipulation, autonomous driving, convenient product capture, and consumer-level photography. When applying scene geometry and appearance estimation techniques to robotics, we found that the narrow cone of possible viewpoints due…
▽ More
Synthesizing accurate geometry and photo-realistic appearance of small scenes is an active area of research with compelling use cases in gaming, virtual reality, robotic-manipulation, autonomous driving, convenient product capture, and consumer-level photography. When applying scene geometry and appearance estimation techniques to robotics, we found that the narrow cone of possible viewpoints due to the limited range of robot motion and scene clutter caused current estimation techniques to produce poor quality estimates or even fail. On the other hand, in robotic applications, dense metric depth can often be measured directly using stereo and illumination can be controlled. Depth can provide a good initial estimate of the object geometry to improve reconstruction, while multi-illumination images can facilitate relighting. In this work we demonstrate a method to incorporate dense metric depth into the training of neural 3D representations and address an artifact observed while jointly refining geometry and appearance by disambiguating between texture and geometry edges. We also discuss a multi-flash stereo camera system developed to capture the necessary data for our pipeline and show results on relighting and view synthesis with a few training views.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Authors:
Anurag Ghosh,
Robert Tamburo,
Shen Zheng,
Juan R. Alvarez-Padilla,
Hailiang Zhu,
Michael Cardei,
Nicholas Dunn,
Christoph Mertz,
Srinivasa G. Narasimhan
Abstract:
Perceiving and navigating through work zones is challenging and under-explored, even with major strides in self-driving research. An important reason is the lack of open datasets for developing new algorithms to address this long-tailed scenario. We propose the ROADWork dataset to learn how to recognize, observe and analyze and drive through work zones. We find that state-of-the-art foundation mod…
▽ More
Perceiving and navigating through work zones is challenging and under-explored, even with major strides in self-driving research. An important reason is the lack of open datasets for developing new algorithms to address this long-tailed scenario. We propose the ROADWork dataset to learn how to recognize, observe and analyze and drive through work zones. We find that state-of-the-art foundation models perform poorly on work zones. With our dataset, we improve upon detecting work zone objects (+26.2 AP), while discovering work zones with higher precision (+32.5%) at a much higher discovery rate (12.8 times), significantly improve detecting (+23.9 AP) and reading (+14.2% 1-NED) work zone signs and describing work zones (+36.7 SPICE). We also compute drivable paths from work zone navigation videos and show that it is possible to predict navigational goals and pathways such that 53.6% goals have angular error (AE) < 0.5 degrees (+9.9 %) and 75.3% pathways have AE < 0.5 degrees (+8.1 %).
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection
Authors:
Mehmet Kerem Turkcan,
Sanjeev Narasimhan,
Chengbo Zang,
Gyung Hyun Je,
Bo Yu,
Mahshid Ghasemi,
Javad Ghaderi,
Gil Zussman,
Zoran Kostic
Abstract:
We introduce Constellation, a dataset of 13K images suitable for research on detection of objects in dense urban streetscapes observed from high-elevation cameras, collected for a variety of temporal conditions. The dataset addresses the need for curated data to explore problems in small object detection exemplified by the limited pixel footprint of pedestrians observed tens of meters from above.…
▽ More
We introduce Constellation, a dataset of 13K images suitable for research on detection of objects in dense urban streetscapes observed from high-elevation cameras, collected for a variety of temporal conditions. The dataset addresses the need for curated data to explore problems in small object detection exemplified by the limited pixel footprint of pedestrians observed tens of meters from above. It enables the testing of object detection models for variations in lighting, building shadows, weather, and scene dynamics. We evaluate contemporary object detection architectures on the dataset, observing that state-of-the-art methods have lower performance in detecting small pedestrians compared to vehicles, corresponding to a 10% difference in average precision (AP). Using structurally similar datasets for pretraining the models results in an increase of 1.8% mean AP (mAP). We further find that incorporating domain-specific data augmentations helps improve model performance. Using pseudo-labeled data, obtained from inference outcomes of the best-performing models, improves the performance of the models. Finally, comparing the models trained using the data collected in two different time intervals, we find a performance drift in models due to the changes in intersection conditions over time. The best-performing model achieves a pedestrian AP of 92.0% with 11.5 ms inference time on NVIDIA A100 GPUs, and an mAP of 95.4%.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Robot Safety Monitoring using Programmable Light Curtains
Authors:
Karnik Ram,
Shobhit Aggarwal,
Robert Tamburo,
Siddharth Ancha,
Srinivasa Narasimhan
Abstract:
As factories continue to evolve into collaborative spaces with multiple robots working together with human supervisors in the loop, ensuring safety for all actors involved becomes critical. Currently, laser-based light curtain sensors are widely used in factories for safety monitoring. While these conventional safety sensors meet high accuracy standards, they are difficult to reconfigure and can o…
▽ More
As factories continue to evolve into collaborative spaces with multiple robots working together with human supervisors in the loop, ensuring safety for all actors involved becomes critical. Currently, laser-based light curtain sensors are widely used in factories for safety monitoring. While these conventional safety sensors meet high accuracy standards, they are difficult to reconfigure and can only monitor a fixed user-defined region of space. Furthermore, they are typically expensive. Instead, we leverage a controllable depth sensor, programmable light curtains (PLC), to develop an inexpensive and flexible real-time safety monitoring system for collaborative robot workspaces. Our system projects virtual dynamic safety envelopes that tightly envelop the moving robot at all times and detect any objects that intrude the envelope. Furthermore, we develop an instrumentation algorithm that optimally places (multiple) PLCs in a workspace to maximize the visibility coverage of robots. Our work enables fence-less human-robot collaboration, while scaling to monitor multiple robots with few sensors. We analyze our system in a real manufacturing testbed with four robot arms and demonstrate its capabilities as a fast, accurate, and inexpensive safety monitoring solution.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion
Authors:
Khiem Vuong,
N. Dinesh Reddy,
Robert Tamburo,
Srinivasa G. Narasimhan
Abstract:
Current methods for 2D and 3D object understanding struggle with severe occlusions in busy urban environments, partly due to the lack of large-scale labeled ground-truth annotations for learning occlusion. In this work, we introduce a novel framework for automatically generating a large, realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery. By leveraging…
▽ More
Current methods for 2D and 3D object understanding struggle with severe occlusions in busy urban environments, partly due to the lack of large-scale labeled ground-truth annotations for learning occlusion. In this work, we introduce a novel framework for automatically generating a large, realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery. By leveraging off-the-shelf 2D (bounding box, segmentation, keypoint) and 3D (pose, shape) predictions as pseudo-groundtruth, unoccluded 3D objects are identified automatically and composited into the background in a clip-art style, ensuring realistic appearances and physically accurate occlusion configurations. The resulting clip-art image with pseudo-groundtruth enables efficient training of object reconstruction methods that are robust to occlusions. Our method demonstrates significant improvements in both 2D and 3D reconstruction, particularly in scenarios with heavily occluded objects like vehicles and people in urban scenes.
△ Less
Submitted 1 April, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptation
Authors:
Shen Zheng,
Anurag Ghosh,
Srinivasa G. Narasimhan
Abstract:
Driving is challenging in conditions like night, rain, and snow. Lack of good labeled datasets has hampered progress in scene understanding under such conditions. Unsupervised Domain Adaptation (UDA) using large labeled clear-day datasets is a promising research direction in such cases. However, many UDA methods are trained with dominant scene backgrounds (e.g., roads, sky, sidewalks) that appear…
▽ More
Driving is challenging in conditions like night, rain, and snow. Lack of good labeled datasets has hampered progress in scene understanding under such conditions. Unsupervised Domain Adaptation (UDA) using large labeled clear-day datasets is a promising research direction in such cases. However, many UDA methods are trained with dominant scene backgrounds (e.g., roads, sky, sidewalks) that appear dramatically different across domains. As a result, they struggle to learn effective features of smaller and often sparse foreground objects (e.g., people, vehicles, signs).
In this work, we improve UDA training by applying in-place image warping to focus on salient objects. We design instance-level saliency guidance to adaptively oversample object regions and undersample background areas, which reduces adverse effects from background context and enhances backbone feature learning. Our approach improves adaptation across geographies, lighting, and weather conditions, and is agnostic to the task (segmentation, detection), domain adaptation algorithm, saliency guidance, and underlying model architecture. Result highlights include +6.1 mAP50 for BDD100K Clear $\rightarrow$ DENSE Foggy, +3.7 mAP50 for BDD100K Day $\rightarrow$ Night, +3.0 mAP50 for BDD100K Clear $\rightarrow$ Rainy, and +6.3 mIoU for Cityscapes $\rightarrow$ ACDC. Besides, Our method adds minimal training memory and no additional inference latency. Code is available at https://github.com/ShenZheng2000/Instance-Warp
△ Less
Submitted 4 December, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
One-Step Image Translation with Text-to-Image Models
Authors:
Gaurav Parmar,
Taesung Park,
Srinivasa Narasimhan,
Jun-Yan Zhu
Abstract:
In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate va…
▽ More
In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. Our code and models are available at https://github.com/GaParmar/img2img-turbo.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Gaze-based Human-Robot Interaction System for Infrastructure Inspections
Authors:
Sunwoong Choi,
Zaid Abbas Al-Sabbag,
Sriram Narasimhan,
Chul Min Yeum
Abstract:
Routine inspections for critical infrastructures such as bridges are required in most jurisdictions worldwide. Such routine inspections are largely visual in nature, which are qualitative, subjective, and not repeatable. Although robotic infrastructure inspections address such limitations, they cannot replace the superior ability of experts to make decisions in complex situations, thus making huma…
▽ More
Routine inspections for critical infrastructures such as bridges are required in most jurisdictions worldwide. Such routine inspections are largely visual in nature, which are qualitative, subjective, and not repeatable. Although robotic infrastructure inspections address such limitations, they cannot replace the superior ability of experts to make decisions in complex situations, thus making human-robot interaction systems a promising technology. This study presents a novel gaze-based human-robot interaction system, designed to augment the visual inspection performance through mixed reality. Through holograms from a mixed reality device, gaze can be utilized effectively to estimate the properties of the defect in real-time. Additionally, inspectors can monitor the inspection progress online, which enhances the speed of the entire inspection process. Limited controlled experiments demonstrate its effectiveness across various users and defect types. To our knowledge, this is the first demonstration of the real-time application of eye gaze in civil infrastructure inspections.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Time Weaver: A Conditional Time Series Generation Model
Authors:
Sai Shankar Narasimhan,
Shubhankar Agarwal,
Oguzhan Akcin,
Sujay Sanghavi,
Sandeep Chinchali
Abstract:
Imagine generating a city's electricity demand pattern based on weather, the presence of an electric vehicle, and location, which could be used for capacity planning during a winter freeze. Such real-world time series are often enriched with paired heterogeneous contextual metadata (weather, location, etc.). Current approaches to time series generation often ignore this paired metadata, and its he…
▽ More
Imagine generating a city's electricity demand pattern based on weather, the presence of an electric vehicle, and location, which could be used for capacity planning during a winter freeze. Such real-world time series are often enriched with paired heterogeneous contextual metadata (weather, location, etc.). Current approaches to time series generation often ignore this paired metadata, and its heterogeneity poses several practical challenges in adapting existing conditional generation approaches from the image, audio, and video domains to the time series domain. To address this gap, we introduce Time Weaver, a novel diffusion-based model that leverages the heterogeneous metadata in the form of categorical, continuous, and even time-variant variables to significantly improve time series generation. Additionally, we show that naive extensions of standard evaluation metrics from the image to the time series domain are insufficient. These metrics do not penalize conditional generation approaches for their poor specificity in reproducing the metadata-specific features in the generated time series. Thus, we innovate a novel evaluation metric that accurately captures the specificity of conditional generation and the realism of the generated time series. We show that Time Weaver outperforms state-of-the-art benchmarks, such as Generative Adversarial Networks (GANs), by up to 27% in downstream classification tasks on real-world energy, medical, air quality, and traffic data sets.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
4CNet: A Diffusion Approach to Map Prediction for Decentralized Multi-Robot Exploration
Authors:
Aaron Hao Tan,
Siddarth Narasimhan,
Goldie Nejat
Abstract:
Mobile robots in unknown cluttered environments with irregularly shaped obstacles often face energy and communication challenges which directly affect their ability to explore these environments. In this paper, we introduce a novel deep learning architecture, Confidence-Aware Contrastive Conditional Consistency Model (4CNet), for robot map prediction during decentralized, resource-limited multi-ro…
▽ More
Mobile robots in unknown cluttered environments with irregularly shaped obstacles often face energy and communication challenges which directly affect their ability to explore these environments. In this paper, we introduce a novel deep learning architecture, Confidence-Aware Contrastive Conditional Consistency Model (4CNet), for robot map prediction during decentralized, resource-limited multi-robot exploration. 4CNet uniquely incorporates: 1) a conditional consistency model for map prediction in unstructured unknown regions, 2) a contrastive map-trajectory pretraining framework for a trajectory encoder that extracts spatial information from the trajectories of nearby robots during map prediction, and 3) a confidence network to measure the uncertainty of map prediction for effective exploration under resource constraints. We incorporate 4CNet within our proposed robot exploration with map prediction architecture, 4CNet-E. We then conduct extensive comparison studies with 4CNet-E and state-of-the-art heuristic and learning methods to investigate both map prediction and exploration performance in environments consisting of irregularly shaped obstacles and uneven terrain. Results showed that 4CNet-E obtained statistically significant higher prediction accuracy and area coverage with varying environment sizes, number of robots, energy budgets, and communication limitations when compared to database and learning-based methods. Hardware experiments were performed and validated the applicability and generalizability of 4CNet-E in both unstructured indoor and real natural outdoor environments.
△ Less
Submitted 8 April, 2025; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Virtual Home Staging: Inverse Rendering and Editing an Indoor Panorama under Natural Illumination
Authors:
Guanzhou Ji,
Azadeh O. Sawyer,
Srinivasa G. Narasimhan
Abstract:
We propose a novel inverse rendering method that enables the transformation of existing indoor panoramas with new indoor furniture layouts under natural illumination. To achieve this, we captured indoor HDR panoramas along with real-time outdoor hemispherical HDR photographs. Indoor and outdoor HDR images were linearly calibrated with measured absolute luminance values for accurate scene relightin…
▽ More
We propose a novel inverse rendering method that enables the transformation of existing indoor panoramas with new indoor furniture layouts under natural illumination. To achieve this, we captured indoor HDR panoramas along with real-time outdoor hemispherical HDR photographs. Indoor and outdoor HDR images were linearly calibrated with measured absolute luminance values for accurate scene relighting. Our method consists of three key components: (1) panoramic furniture detection and removal, (2) automatic floor layout design, and (3) global rendering with scene geometry, new furniture objects, and a real-time outdoor photograph. We demonstrate the effectiveness of our workflow in rendering indoor scenes under different outdoor illumination conditions. Additionally, we contribute a new calibrated HDR (Cali-HDR) dataset that consists of 137 calibrated indoor panoramas and their associated outdoor photographs.
△ Less
Submitted 28 January, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Toward Planet-Wide Traffic Camera Calibration
Authors:
Khiem Vuong,
Robert Tamburo,
Srinivasa G. Narasimhan
Abstract:
Despite the widespread deployment of outdoor cameras, their potential for automated analysis remains largely untapped due, in part, to calibration challenges. The absence of precise camera calibration data, including intrinsic and extrinsic parameters, hinders accurate real-world distance measurements from captured videos. To address this, we present a scalable framework that utilizes street-level…
▽ More
Despite the widespread deployment of outdoor cameras, their potential for automated analysis remains largely untapped due, in part, to calibration challenges. The absence of precise camera calibration data, including intrinsic and extrinsic parameters, hinders accurate real-world distance measurements from captured videos. To address this, we present a scalable framework that utilizes street-level imagery to reconstruct a metric 3D model, facilitating precise calibration of in-the-wild traffic cameras. Notably, our framework achieves 3D scene reconstruction and accurate localization of over 100 global traffic cameras and is scalable to any camera with sufficient street-level imagery. For evaluation, we introduce a dataset of 20 fully calibrated traffic cameras, demonstrating our method's significant enhancements over existing automatic calibration techniques. Furthermore, we highlight our approach's utility in traffic analysis by extracting insights via 3D vehicle reconstruction and speed measurement, thereby opening up the potential of using outdoor cameras for automated analysis.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
TPSeNCE: Towards Artifact-Free Realistic Rain Generation for Deraining and Object Detection in Rain
Authors:
Shen Zheng,
Changjie Lu,
Srinivasa G. Narasimhan
Abstract:
Rain generation algorithms have the potential to improve the generalization of deraining methods and scene understanding in rainy conditions. However, in practice, they produce artifacts and distortions and struggle to control the amount of rain generated due to a lack of proper constraints. In this paper, we propose an unpaired image-to-image translation framework for generating realistic rainy i…
▽ More
Rain generation algorithms have the potential to improve the generalization of deraining methods and scene understanding in rainy conditions. However, in practice, they produce artifacts and distortions and struggle to control the amount of rain generated due to a lack of proper constraints. In this paper, we propose an unpaired image-to-image translation framework for generating realistic rainy images. We first introduce a Triangular Probability Similarity (TPS) constraint to guide the generated images toward clear and rainy images in the discriminator manifold, thereby minimizing artifacts and distortions during rain generation. Unlike conventional contrastive learning approaches, which indiscriminately push negative samples away from the anchors, we propose a Semantic Noise Contrastive Estimation (SeNCE) strategy and reassess the pushing force of negative samples based on the semantic similarity between the clear and the rainy images and the feature similarity between the anchor and the negative samples. Experiments demonstrate realistic rain generation with minimal artifacts and distortions, which benefits image deraining and object detection in rain. Furthermore, the method can be used to generate realistic snowy and night images, underscoring its potential for broader applicability. Code is available at https://github.com/ShenZheng2000/TPSeNCE.
△ Less
Submitted 7 November, 2023; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Analysis of potential flow networks: Variations in transport time with $discrete$, $continuous$, and $selfish$ operation
Authors:
Varghese Kurian,
Sridharakumar Narasimhan
Abstract:
In potential flow networks, the equilibrium flow rates are usually not proportional to the demands and flow control elements are required to regulate the flow. The control elements can broadly be classified into two types - discrete and continuous. Discrete control elements can have only two operational states: fully open or fully closed. On the other hand, continuous control elements may be opera…
▽ More
In potential flow networks, the equilibrium flow rates are usually not proportional to the demands and flow control elements are required to regulate the flow. The control elements can broadly be classified into two types - discrete and continuous. Discrete control elements can have only two operational states: fully open or fully closed. On the other hand, continuous control elements may be operated in any intermediate position in addition to the fully open and fully closed states. Naturally, with their increased flexibility, continuous control elements can provide better network performance, but $to~what~extent$?
We consider a class of branched networks with a single source and multiple sinks. The potential drop across edges ($ΔH$) is assumed to be proportional to the $n^{th}$ power of flow rate ($Q$), i.e., $ΔH=kQ^n$ , ($n>=1$). We define $\textbf{R}$ as the ratio of minimal operational times required to transport a given quantum of material with either type of control element and show that $1\leq \textbf{R}\leq m^{\left(1-1/n\right)}$, where $m$ is the maximum depth of the network. The results point to the role of network topology in the variations in operational time. Further analysis reveals that the selfish operation of a network with continuous control valves has the same bounds on the price of anarchy.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Edge Ranking of Graphs in Transportation Networks using a Graph Neural Network (GNN)
Authors:
Debasish Jana,
Sven Malama,
Sriram Narasimhan,
Ertugrul Taciroglu
Abstract:
Many networks, such as transportation, power, and water distribution, can be represented as graphs. Crucial challenge in graph representations is identifying the importance of graph edges and their influence on overall network efficiency and information flow performance. For example, important edges in a transportation network are those roads that, when affected, will significantly alter the netwo…
▽ More
Many networks, such as transportation, power, and water distribution, can be represented as graphs. Crucial challenge in graph representations is identifying the importance of graph edges and their influence on overall network efficiency and information flow performance. For example, important edges in a transportation network are those roads that, when affected, will significantly alter the network's overall efficiency. Commonly used approach to finding such important edges is ``edge betweenness centrality'' (EBC), an edge ranking measure to determine the influential edges of the graph based on connectivity and information spread. Computing the EBC utilizing the common Brandes algorithm involves calculating the shortest paths for every node pair, which can be computationally expensive and restrictive, especially for large graphs. Changes in the graph parameters, e.g., in the edge weight or the addition and deletion of nodes or edges, require the recalculation of the EBC. As the main contribution, we propose an approximate method to estimate the EBC using a Graph Neural Network (GNN), a deep learning-based approach. We show that it is computationally efficient compared to the conventional method, especially for large graphs. The proposed method of GNN-based edge ranking is evaluated on several synthetic graphs and a real-world transportation data set. We show that this framework can estimate the approximate edge ranking much faster compared to the conventional method. This approach is inductive, i.e., training and testing are performed on different sets of graphs with varying numbers of nodes and edges. The proposed method is especially suitable for applications on large-scale networks when edge information is desired, for example, in urban infrastructure improvement projects, power, and water network resilience analyses, and optimizing resource allocations in engineering networks.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
Authors:
Anurag Ghosh,
N. Dinesh Reddy,
Christoph Mertz,
Srinivasa G. Narasimhan
Abstract:
Real-time efficient perception is critical for autonomous navigation and city scale sensing. Orthogonal to architectural improvements, streaming perception approaches have exploited adaptive sampling improving real-time detection performance. In this work, we propose a learnable geometry-guided prior that incorporates rough geometry of the 3D scene (a ground plane and a plane above) to resample im…
▽ More
Real-time efficient perception is critical for autonomous navigation and city scale sensing. Orthogonal to architectural improvements, streaming perception approaches have exploited adaptive sampling improving real-time detection performance. In this work, we propose a learnable geometry-guided prior that incorporates rough geometry of the 3D scene (a ground plane and a plane above) to resample images for efficient object detection. This significantly improves small and far-away object detection performance while also being more efficient both in terms of latency and memory. For autonomous navigation, using the same detector and scale, our approach improves detection rate by +4.1 $AP_{S}$ or +39% and in real-time performance by +5.3 $sAP_{S}$ or +63% for small objects over state-of-the-art (SOTA). For fixed traffic cameras, our approach detects small objects at image scales other methods cannot. At the same scale, our approach improves detection of small objects by 195% (+12.5 $AP_{S}$) over naive-downsampling and 63% (+4.2 $AP_{S}$) over SOTA.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
A Target-Based Extrinsic Calibration Framework for Non-Overlapping Camera-Lidar Systems Using a Motion Capture System
Authors:
Nicholas Charron,
Huaiyuan Weng,
Steven L. Waslander,
Sriram Narasimhan
Abstract:
We present a novel target-based lidar-camera extrinsic calibration methodology that can be used for non-overlapping field of view (FOV) sensors. Contrary to previous work, our methodology overcomes the non-overlapping FOV challenge using a motion capture system (MCS) instead of traditional simultaneous localization and mapping approaches. Due to the high relative precision of MCSs, our methodology…
▽ More
We present a novel target-based lidar-camera extrinsic calibration methodology that can be used for non-overlapping field of view (FOV) sensors. Contrary to previous work, our methodology overcomes the non-overlapping FOV challenge using a motion capture system (MCS) instead of traditional simultaneous localization and mapping approaches. Due to the high relative precision of MCSs, our methodology can achieve both the high accuracy and repeatable calibrations common to traditional target-based methods, regardless of the amount of overlap in the sensors' field of view. Furthermore, we design a target-agnostic implementation that does not require uniquely identifiable features by using an iterative closest point approach, enabled by the MSC measurements. We show using simulation that we can accurately recover extrinsic calibrations for a range of perturbations to the true calibration that would be expected in real circumstances. We prove experimentally that our method out-performs state-of-the-art lidar-camera extrinsic calibration methods that can be used for non-overlapping FOV systems, while using a target-based approach that guarantees repeatably high accuracy. Lastly, we show in simulation that different target designs can be used, including easily constructed 3D targets such as a cylinder that are normally considered degenerate in most calibration formulations.
△ Less
Submitted 2 March, 2025; v1 submitted 19 March, 2023;
originally announced March 2023.
-
Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits
Authors:
Siddharth Ancha,
Gaurav Pathak,
Ji Zhang,
Srinivasa Narasimhan,
David Held
Abstract:
To navigate in an environment safely and autonomously, robots must accurately estimate where obstacles are and how they move. Instead of using expensive traditional 3D sensors, we explore the use of a much cheaper, faster, and higher resolution alternative: programmable light curtains. Light curtains are a controllable depth sensor that sense only along a surface that the user selects. We adapt a…
▽ More
To navigate in an environment safely and autonomously, robots must accurately estimate where obstacles are and how they move. Instead of using expensive traditional 3D sensors, we explore the use of a much cheaper, faster, and higher resolution alternative: programmable light curtains. Light curtains are a controllable depth sensor that sense only along a surface that the user selects. We adapt a probabilistic method based on particle filters and occupancy grids to explicitly estimate the position and velocity of 3D points in the scene using partial measurements made by light curtains. The central challenge is to decide where to place the light curtain to accurately perform this task. We propose multiple curtain placement strategies guided by maximizing information gain and verifying predicted object locations. Then, we combine these strategies using an online learning framework. We propose a novel self-supervised reward function that evaluates the accuracy of current velocity estimates using future light curtain placements. We use a multi-armed bandit framework to intelligently switch between placement policies in real time, outperforming fixed policies. We develop a full-stack navigation system that uses position and velocity estimates from light curtains for downstream tasks such as localization, mapping, path-planning, and obstacle avoidance. This work paves the way for controllable light curtains to accurately, efficiently, and purposefully perceive and navigate complex and dynamic environments. Project website: https://siddancha.github.io/projects/active-velocity-estimation/
△ Less
Submitted 29 May, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
Safe Networked Robotics with Probabilistic Verification
Authors:
Sai Shankar Narasimhan,
Sharachchandra Bhat,
Sandeep P. Chinchali
Abstract:
Autonomous robots must utilize rich sensory data to make safe control decisions. To process this data, compute-constrained robots often require assistance from remote computation, or the cloud, that runs compute-intensive deep neural network perception or control models. However, this assistance comes at the cost of a time delay due to network latency, resulting in past observations being used in…
▽ More
Autonomous robots must utilize rich sensory data to make safe control decisions. To process this data, compute-constrained robots often require assistance from remote computation, or the cloud, that runs compute-intensive deep neural network perception or control models. However, this assistance comes at the cost of a time delay due to network latency, resulting in past observations being used in the cloud to compute the control commands for the present robot state. Such communication delays could potentially lead to the violation of essential safety properties, such as collision avoidance. This paper develops methods to ensure the safety of robots operated over communication networks with stochastic latency. To do so, we use tools from formal verification to construct a shield, i.e., a run-time monitor, that provides a list of safe actions for any delayed sensory observation, given the expected and maximum network latency. Our shield is minimally intrusive and enables networked robots to satisfy key safety constraints, expressed as temporal logic specifications, with desired probability. We demonstrate our approach on a real F1/10th autonomous vehicle that navigates in indoor environments and transmits rich LiDAR sensory data over congested WiFi links.
△ Less
Submitted 3 December, 2024; v1 submitted 17 February, 2023;
originally announced February 2023.
-
On Text Style Transfer via Style Masked Language Models
Authors:
Sharan Narasimhan,
Pooja Shekar,
Suvodip Dey,
Maunendra Sankar Desarkar
Abstract:
Text Style Transfer (TST) is performable through approaches such as latent space disentanglement, cycle-consistency losses, prototype editing etc. The prototype editing approach, which is known to be quite successful in TST, involves two key phases a) Masking of source style-associated tokens and b) Reconstruction of this source-style masked sentence conditioned with the target style. We follow a…
▽ More
Text Style Transfer (TST) is performable through approaches such as latent space disentanglement, cycle-consistency losses, prototype editing etc. The prototype editing approach, which is known to be quite successful in TST, involves two key phases a) Masking of source style-associated tokens and b) Reconstruction of this source-style masked sentence conditioned with the target style. We follow a similar transduction method, in which we transpose the more difficult direct source to target TST task to a simpler Style-Masked Language Model (SMLM) Task, wherein, similar to BERT \cite{bert}, the goal of our model is now to reconstruct the source sentence from its style-masked version. We arrive at the SMLM mechanism naturally by formulating prototype editing/ transduction methods in a probabilistic framework, where TST resolves into estimating a hypothetical parallel dataset from a partially observed parallel dataset, wherein each domain is assumed to have a common latent style-masked prior. To generate this style-masked prior, we use "Explainable Attention" as our choice of attribution for a more precise style-masking step and also introduce a cost-effective and accurate "Attribution-Surplus" method of determining the position of masks from any arbitrary attribution model in O(1) time. We empirically show that this non-generational approach well suites the "content preserving" criteria for a task like TST, even for a complex style like Discourse Manipulation. Our model, the Style MLM, outperforms strong TST baselines and is on par with state-of-the-art TST models, which use complex architectures and orders of more parameters.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Learning Continuous Implicit Representation for Near-Periodic Patterns
Authors:
Bowei Chen,
Tiancheng Zhi,
Martial Hebert,
Srinivasa G. Narasimhan
Abstract:
Near-Periodic Patterns (NPP) are ubiquitous in man-made scenes and are composed of tiled motifs with appearance differences caused by lighting, defects, or design elements. A good NPP representation is useful for many applications including image completion, segmentation, and geometric remapping. But representing NPP is challenging because it needs to maintain global consistency (tiled motifs layo…
▽ More
Near-Periodic Patterns (NPP) are ubiquitous in man-made scenes and are composed of tiled motifs with appearance differences caused by lighting, defects, or design elements. A good NPP representation is useful for many applications including image completion, segmentation, and geometric remapping. But representing NPP is challenging because it needs to maintain global consistency (tiled motifs layout) while preserving local variations (appearance differences). Methods trained on general scenes using a large dataset or single-image optimization struggle to satisfy these constraints, while methods that explicitly model periodicity are not robust to periodicity detection errors. To address these challenges, we learn a neural implicit representation using a coordinate-based MLP with single image optimization. We design an input feature warping module and a periodicity-guided patch loss to handle both global consistency and local variations. To further improve the robustness, we introduce a periodicity proposal module to search and use multiple candidate periodicities in our pipeline. We demonstrate the effectiveness of our method on more than 500 images of building facades, friezes, wallpapers, ground, and Mondrian patterns on single and multi-planar scenes.
△ Less
Submitted 25 August, 2022;
originally announced August 2022.
-
Doppler: Automated SKU Recommendation in Migrating SQL Workloads to the Cloud
Authors:
Joyce Cahoon,
Wenjing Wang,
Yiwen Zhu,
Katherine Lin,
Sean Liu,
Raymond Truong,
Neetu Singh,
Chengcheng Wan,
Alexandra M Ciortea,
Sreraman Narasimhan,
Subru Krishnan
Abstract:
Selecting the optimal cloud target to migrate SQL estates from on-premises to the cloud remains a challenge. Current solutions are not only time-consuming and error-prone, requiring significant user input, but also fail to provide appropriate recommendations. We present Doppler, a scalable recommendation engine that provides right-sized Azure SQL Platform-as-a-Service (PaaS) recommendations withou…
▽ More
Selecting the optimal cloud target to migrate SQL estates from on-premises to the cloud remains a challenge. Current solutions are not only time-consuming and error-prone, requiring significant user input, but also fail to provide appropriate recommendations. We present Doppler, a scalable recommendation engine that provides right-sized Azure SQL Platform-as-a-Service (PaaS) recommendations without requiring access to sensitive customer data and queries. Doppler introduces a novel price-performance methodology that allows customers to get a personalized rank of relevant cloud targets solely based on low-level resource statistics, such as latency and memory usage. Doppler supplements this rank with internal knowledge of Azure customer behavior to help guide new migration customers towards one optimal target. Experimental results over a 9-month period from prospective and existing customers indicate that Doppler can identify optimal targets and adapt to changes in customer workloads. It has also found cost-saving opportunities among over-provisioned cloud customers, without compromising on capacity or other requirements. Doppler has been integrated and released in the Azure Data Migration Assistant v5.5, which receives hundreds of assessment requests daily.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Semantically Supervised Appearance Decomposition for Virtual Staging from a Single Panorama
Authors:
Tiancheng Zhi,
Bowei Chen,
Ivaylo Boyadzhiev,
Sing Bing Kang,
Martial Hebert,
Srinivasa G. Narasimhan
Abstract:
We describe a novel approach to decompose a single panorama of an empty indoor environment into four appearance components: specular, direct sunlight, diffuse and diffuse ambient without direct sunlight. Our system is weakly supervised by automatically generated semantic maps (with floor, wall, ceiling, lamp, window and door labels) that have shown success on perspective views and are trained for…
▽ More
We describe a novel approach to decompose a single panorama of an empty indoor environment into four appearance components: specular, direct sunlight, diffuse and diffuse ambient without direct sunlight. Our system is weakly supervised by automatically generated semantic maps (with floor, wall, ceiling, lamp, window and door labels) that have shown success on perspective views and are trained for panoramas using transfer learning without any further annotations. A GAN-based approach supervised by coarse information obtained from the semantic map extracts specular reflection and direct sunlight regions on the floor and walls. These lighting effects are removed via a similar GAN-based approach and a semantic-aware inpainting step. The appearance decomposition enables multiple applications including sun direction estimation, virtual furniture insertion, floor material replacement, and sun direction change, providing an effective tool for virtual home staging. We demonstrate the effectiveness of our approach on a large and recently released dataset of panoramas of empty homes.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer
Authors:
Sharan Narasimhan,
Suvodip Dey,
Maunendra Sankar Desarkar
Abstract:
Recent studies show that auto-encoder based approaches successfully perform language generation, smooth sentence interpolation, and style transfer over unseen attributes using unlabelled datasets in a zero-shot manner. The latent space geometry of such models is organised well enough to perform on datasets where the style is "coarse-grained" i.e. a small fraction of words alone in a sentence are e…
▽ More
Recent studies show that auto-encoder based approaches successfully perform language generation, smooth sentence interpolation, and style transfer over unseen attributes using unlabelled datasets in a zero-shot manner. The latent space geometry of such models is organised well enough to perform on datasets where the style is "coarse-grained" i.e. a small fraction of words alone in a sentence are enough to determine the overall style label. A recent study uses a discrete token-based perturbation approach to map "similar" sentences ("similar" defined by low Levenshtein distance/ high word overlap) close by in latent space. This definition of "similarity" does not look into the underlying nuances of the constituent words while mapping latent space neighbourhoods and therefore fails to recognise sentences with different style-based semantics while mapping latent neighbourhoods. We introduce EPAAEs (Embedding Perturbed Adversarial AutoEncoders) which completes this perturbation model, by adding a finely adjustable noise component on the continuous embeddings space. We empirically show that this (a) produces a better organised latent space that clusters stylistically similar sentences together, (b) performs best on a diverse set of text style transfer tasks than similar denoising-inspired baselines, and (c) is capable of fine-grained control of Style Transfer strength. We also extend the text style transfer tasks to NLI datasets and show that these more complex definitions of style are learned best by EPAAE. To the best of our knowledge, extending style transfer to NLI tasks has not been explored before.
△ Less
Submitted 4 May, 2022;
originally announced May 2022.
-
Active Safety Envelopes using Light Curtains with Probabilistic Guarantees
Authors:
Siddharth Ancha,
Gaurav Pathak,
Srinivasa G. Narasimhan,
David Held
Abstract:
To safely navigate unknown environments, robots must accurately perceive dynamic obstacles. Instead of directly measuring the scene depth with a LiDAR sensor, we explore the use of a much cheaper and higher resolution sensor: programmable light curtains. Light curtains are controllable depth sensors that sense only along a surface that a user selects. We use light curtains to estimate the safety e…
▽ More
To safely navigate unknown environments, robots must accurately perceive dynamic obstacles. Instead of directly measuring the scene depth with a LiDAR sensor, we explore the use of a much cheaper and higher resolution sensor: programmable light curtains. Light curtains are controllable depth sensors that sense only along a surface that a user selects. We use light curtains to estimate the safety envelope of a scene: a hypothetical surface that separates the robot from all obstacles. We show that generating light curtains that sense random locations (from a particular distribution) can quickly discover the safety envelope for scenes with unknown objects. Importantly, we produce theoretical safety guarantees on the probability of detecting an obstacle using random curtains. We combine random curtains with a machine learning based model that forecasts and tracks the motion of the safety envelope efficiently. Our method accurately estimates safety envelopes while providing probabilistic safety guarantees that can be used to certify the efficacy of a robot perception system to detect and avoid dynamic obstacles. We evaluate our approach in a simulated urban driving environment and a real-world environment with moving pedestrians using a light curtain device and show that we can estimate safety envelopes efficiently and effectively. Project website: https://siddancha.github.io/projects/active-safety-envelopes-with-guarantees
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Symmetric products and moduli spaces of vector bundles of curves
Authors:
Kyoung-Seog Lee,
M. S. Narasimhan
Abstract:
Let $X$ be a smooth projective curve of genus $g \geq 2$ and $M$ be the moduli space of rank 2 stable vector bundles on $X$ whose determinants are isomorphic to a fixed odd degree line bundle $L$. There has been a lot of works studying the moduli and recently the bounded derived category of coherent sheaves on $M$ draws lots of attentions. It was proved that the derived category of $X$ can be embe…
▽ More
Let $X$ be a smooth projective curve of genus $g \geq 2$ and $M$ be the moduli space of rank 2 stable vector bundles on $X$ whose determinants are isomorphic to a fixed odd degree line bundle $L$. There has been a lot of works studying the moduli and recently the bounded derived category of coherent sheaves on $M$ draws lots of attentions. It was proved that the derived category of $X$ can be embedded into the derived category of $M$ by the second named author and Fonarev-Kuznetsov. In this paper we prove that the derived category of the second symmetric product of $X$ can be embedded into derived category of $M$ when $X$ is non-hyperelliptic and $g \geq 16$.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Higgs bundles twisted by a vector bundle
Authors:
Guillermo Gallego,
Oscar Garcia-Prada,
M. S. Narasimhan
Abstract:
In this paper, we consider a generalization of the theory of Higgs bundles over a smooth complex projective curve in which the twisting of the Higgs field by the canonical bundle of the curve is replaced by a rank 2 vector bundle. We define a Hitchin map and give a spectral correspondence. We also state a Hitchin-Kobayashi correspondence for a generalization of the Hitchin equations to this situat…
▽ More
In this paper, we consider a generalization of the theory of Higgs bundles over a smooth complex projective curve in which the twisting of the Higgs field by the canonical bundle of the curve is replaced by a rank 2 vector bundle. We define a Hitchin map and give a spectral correspondence. We also state a Hitchin-Kobayashi correspondence for a generalization of the Hitchin equations to this situation. In a certain sense, this theory lies halfway between the theories of Higgs bundles on a curve and on a higher dimensional variety.
△ Less
Submitted 18 November, 2023; v1 submitted 12 May, 2021;
originally announced May 2021.
-
NVIDIA SimNet^{TM}: an AI-accelerated multi-physics simulation framework
Authors:
Oliver Hennigh,
Susheela Narasimhan,
Mohammad Amin Nabian,
Akshay Subramaniam,
Kaustubh Tangsali,
Max Rietmann,
Jose del Aguila Ferrandis,
Wonmin Byeon,
Zhiwei Fang,
Sanjay Choudhry
Abstract:
We present SimNet, an AI-driven multi-physics simulation framework, to accelerate simulations across a wide range of disciplines in science and engineering. Compared to traditional numerical solvers, SimNet addresses a wide range of use cases - coupled forward simulations without any training data, inverse and data assimilation problems. SimNet offers fast turnaround time by enabling parameterized…
▽ More
We present SimNet, an AI-driven multi-physics simulation framework, to accelerate simulations across a wide range of disciplines in science and engineering. Compared to traditional numerical solvers, SimNet addresses a wide range of use cases - coupled forward simulations without any training data, inverse and data assimilation problems. SimNet offers fast turnaround time by enabling parameterized system representation that solves for multiple configurations simultaneously, as opposed to the traditional solvers that solve for one configuration at a time. SimNet is integrated with parameterized constructive solid geometry as well as STL modules to generate point clouds. Furthermore, it is customizable with APIs that enable user extensions to geometry, physics and network architecture. It has advanced network architectures that are optimized for high-performance GPU computing, and offers scalable performance for multi-GPU and multi-Node implementation with accelerated linear algebra as well as FP32, FP64 and TF32 computations. In this paper we review the neural network solver methodology, the SimNet architecture, and the various features that are needed for effective solution of the PDEs. We present real-world use cases that range from challenging forward multi-physics simulations with turbulence and complex 3D geometries, to industrial design optimization and inverse problems that are not addressed efficiently by the traditional solvers. Extensive comparisons of SimNet results with open source and commercial solvers show good correlation.
△ Less
Submitted 14 December, 2020;
originally announced December 2020.
-
Identification of Errors-in-Variables ARX Models Using Modified Dynamic Iterative PCA
Authors:
Deepak Maurya,
Arun K. Tangirala,
Shankar Narasimhan
Abstract:
Identification of autoregressive models with exogenous input (ARX) is a classical problem in system identification. This article considers the errors-in-variables (EIV) ARX model identification problem, where input measurements are also corrupted with noise. The recently proposed DIPCA technique solves the EIV identification problem but is only applicable to white measurement errors. We propose a…
▽ More
Identification of autoregressive models with exogenous input (ARX) is a classical problem in system identification. This article considers the errors-in-variables (EIV) ARX model identification problem, where input measurements are also corrupted with noise. The recently proposed DIPCA technique solves the EIV identification problem but is only applicable to white measurement errors. We propose a novel identification algorithm based on a modified Dynamic Iterative Principal Components Analysis (DIPCA) approach for identifying the EIV-ARX model for single-input, single-output (SISO) systems where the output measurements are corrupted with coloured noise consistent with the ARX model. Most of the existing methods assume important parameters like input-output orders, delay, or noise-variances to be known. This work's novelty lies in the joint estimation of error variances, process order, delay, and model parameters. The central idea used to obtain all these parameters in a theoretically rigorous manner is based on transforming the lagged measurements using the appropriate error covariance matrix, which is obtained using estimated error variances and model parameters. Simulation studies on two systems are presented to demonstrate the efficacy of the proposed algorithm.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
SEDRo: A Simulated Environment for Developmental Robotics
Authors:
Aishwarya Pothula,
Md Ashaduzzaman Rubel Mondol,
Sanath Narasimhan,
Sm Mazharul Islam,
Deokgun Park
Abstract:
Even with impressive advances in application-specific models, we still lack knowledge about how to build a model that can learn in a human-like way and do multiple tasks. To learn in a human-like way, we need to provide a diverse experience that is comparable to humans. In this paper, we introduce our ongoing effort to build a simulated environment for developmental robotics (SEDRo). SEDRo provide…
▽ More
Even with impressive advances in application-specific models, we still lack knowledge about how to build a model that can learn in a human-like way and do multiple tasks. To learn in a human-like way, we need to provide a diverse experience that is comparable to humans. In this paper, we introduce our ongoing effort to build a simulated environment for developmental robotics (SEDRo). SEDRo provides diverse human experiences ranging from those of a fetus to a 12th-month-old. A series of simulated tests based on developmental psychology will be used to evaluate the progress of a learning model. We anticipate SEDRo to lower the cost of entry and facilitate research in the developmental robotics community.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Identification of MISO systems in Minimal Realization Form
Authors:
Chaithanya K. Donda,
Deepak Maurya,
Arun K. Tangirala,
Shankar Narasimhan
Abstract:
The paper is concerned with identifying transfer functions of individual input channels in minimal realization form of a Multi-Input Single Output (MISO) from the input-output data corrupted by the error in all the variables. Such a framework is commonly referred to as error-in-variables (EIV). A common approach in the existing methods for identification of MISO systems is to estimate a non-minima…
▽ More
The paper is concerned with identifying transfer functions of individual input channels in minimal realization form of a Multi-Input Single Output (MISO) from the input-output data corrupted by the error in all the variables. Such a framework is commonly referred to as error-in-variables (EIV). A common approach in the existing methods for identification of MISO systems is to estimate a non-minimal order transfer function under a subset of simplistic assumptions like homoskedastic error variances, known order, and delay. In this work, we deal with the challenging problem of identifying order, delay in each input of minimal realization form separately while estimating the transfer functions. We also estimate the heteroskedastic noise variances in each of the multiple inputs and output variables. An automated approach for the identification of MISO systems of minimal realization form in the EIV framework is proposed. Numerical case studies are presented to illustrate the efficacy of the proposed algorithm in identifying the transfer function along with the order, delay, and noise variances.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
ARX Model Identification using Generalized Spectral Decomposition
Authors:
Deepak Maurya,
Arun K. Tangirala,
Shankar Narasimhan
Abstract:
This article is concerned with the identification of autoregressive with exogenous inputs (ARX) models. Most of the existing approaches like prediction error minimization and state-space framework are widely accepted and utilized for the estimation of ARX models but are known to deliver unbiased and consistent parameter estimates for a correctly supplied guess of input-output orders and delay.
I…
▽ More
This article is concerned with the identification of autoregressive with exogenous inputs (ARX) models. Most of the existing approaches like prediction error minimization and state-space framework are widely accepted and utilized for the estimation of ARX models but are known to deliver unbiased and consistent parameter estimates for a correctly supplied guess of input-output orders and delay.
In this paper, we propose a novel automated framework which recovers orders, delay, output noise distribution along with parameter estimates. The primary tool utilized in the proposed framework is generalized spectral decomposition. The proposed algorithm systematically estimates all the parameters in two steps. The first step utilizes estimates of the order by examining the generalized eigenvalues, and the second step estimates the parameter from the generalized eigenvectors. Simulation studies are presented to demonstrate the efficacy of the proposed method and are observed to deliver consistent estimates even at low signal to noise ratio (SNR).
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Active Perception using Light Curtains for Autonomous Driving
Authors:
Siddharth Ancha,
Yaadhav Raaj,
Peiyun Hu,
Srinivasa G. Narasimhan,
David Held
Abstract:
Most real-world 3D sensors such as LiDARs perform fixed scans of the entire environment, while being decoupled from the recognition system that processes the sensor data. In this work, we propose a method for 3D object recognition using light curtains, a resource-efficient controllable sensor that measures depth at user-specified locations in the environment. Crucially, we propose using prediction…
▽ More
Most real-world 3D sensors such as LiDARs perform fixed scans of the entire environment, while being decoupled from the recognition system that processes the sensor data. In this work, we propose a method for 3D object recognition using light curtains, a resource-efficient controllable sensor that measures depth at user-specified locations in the environment. Crucially, we propose using prediction uncertainty of a deep learning based 3D point cloud detector to guide active perception. Given a neural network's uncertainty, we derive an optimization objective to place light curtains using the principle of maximizing information gain. Then, we develop a novel and efficient optimization algorithm to maximize this objective by encoding the physical constraints of the device into a constraint graph and optimizing with dynamic programming. We show how a 3D detector can be trained to detect objects in a scene by sequentially placing uncertainty-guided light curtains to successively improve detection accuracy. Code and details can be found on the project webpage: http://siddancha.github.io/projects/active-perception-light-curtains.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video
Authors:
Tiancheng Zhi,
Christoph Lassner,
Tony Tung,
Carsten Stoll,
Srinivasa G. Narasimhan,
Minh Vo
Abstract:
We present TexMesh, a novel approach to reconstruct detailed human meshes with high-resolution full-body texture from RGB-D video. TexMesh enables high quality free-viewpoint rendering of humans. Given the RGB frames, the captured environment map, and the coarse per-frame human mesh from RGB-D tracking, our method reconstructs spatiotemporally consistent and detailed per-frame meshes along with a…
▽ More
We present TexMesh, a novel approach to reconstruct detailed human meshes with high-resolution full-body texture from RGB-D video. TexMesh enables high quality free-viewpoint rendering of humans. Given the RGB frames, the captured environment map, and the coarse per-frame human mesh from RGB-D tracking, our method reconstructs spatiotemporally consistent and detailed per-frame meshes along with a high-resolution albedo texture. By using the incident illumination we are able to accurately estimate local surface geometry and albedo, which allows us to further use photometric constraints to adapt a synthetically trained model to real-world sequences in a self-supervised manner for detailed surface geometry and high-resolution texture estimation. In practice, we train our models on a short example sequence for self-adaptation and the model runs at interactive framerate afterwards. We validate TexMesh on synthetic and real-world data, and show it outperforms the state of art quantitatively and qualitatively.
△ Less
Submitted 20 September, 2020; v1 submitted 31 July, 2020;
originally announced August 2020.
-
Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in the Wild
Authors:
Minh Vo,
Yaser Sheikh,
Srinivasa G. Narasimhan
Abstract:
Bundle adjustment jointly optimizes camera intrinsics and extrinsics and 3D point triangulation to reconstruct a static scene. The triangulation constraint, however, is invalid for moving points captured in multiple unsynchronized videos and bundle adjustment is not designed to estimate the temporal alignment between cameras. We present a spatiotemporal bundle adjustment framework that jointly opt…
▽ More
Bundle adjustment jointly optimizes camera intrinsics and extrinsics and 3D point triangulation to reconstruct a static scene. The triangulation constraint, however, is invalid for moving points captured in multiple unsynchronized videos and bundle adjustment is not designed to estimate the temporal alignment between cameras. We present a spatiotemporal bundle adjustment framework that jointly optimizes four coupled sub-problems: estimating camera intrinsics and extrinsics, triangulating static 3D points, as well as sub-frame temporal alignment between cameras and computing 3D trajectories of dynamic points. Key to our joint optimization is the careful integration of physics-based motion priors within the reconstruction pipeline, validated on a large motion capture corpus of human subjects. We devise an incremental reconstruction and alignment algorithm to strictly enforce the motion prior during the spatiotemporal bundle adjustment. This algorithm is further made more efficient by a divide and conquer scheme while still maintaining high accuracy. We apply this algorithm to reconstruct 3D motion trajectories of human bodies in dynamic events captured by multiple uncalibrated and unsynchronized video cameras in the wild. To make the reconstruction visually more interpretable, we fit a statistical 3D human body model to the asynchronous video streams.Compared to the baseline, the fitting significantly benefits from the proposed spatiotemporal bundle adjustment procedure. Because the videos are aligned with sub-frame precision, we reconstruct 3D motion at much higher temporal resolution than the input videos.
△ Less
Submitted 24 July, 2020;
originally announced July 2020.
-
4D Visualization of Dynamic Events from Unconstrained Multi-View Videos
Authors:
Aayush Bansal,
Minh Vo,
Yaser Sheikh,
Deva Ramanan,
Srinivasa Narasimhan
Abstract:
We present a data-driven approach for 4D space-time visualization of dynamic events from videos captured by hand-held multiple cameras. Key to our approach is the use of self-supervised neural networks specific to the scene to compose static and dynamic aspects of an event. Though captured from discrete viewpoints, this model enables us to move around the space-time of the event continuously. This…
▽ More
We present a data-driven approach for 4D space-time visualization of dynamic events from videos captured by hand-held multiple cameras. Key to our approach is the use of self-supervised neural networks specific to the scene to compose static and dynamic aspects of an event. Though captured from discrete viewpoints, this model enables us to move around the space-time of the event continuously. This model allows us to create virtual cameras that facilitate: (1) freezing the time and exploring views; (2) freezing a view and moving through time; and (3) simultaneously changing both time and view. We can also edit the videos and reveal occluded objects for a given view if it is visible in any of the other views. We validate our approach on challenging in-the-wild events captured using up to 15 mobile cameras.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.