Skip to main content

Showing 1–41 of 41 results for author: Kadambi, A

.
  1. arXiv:2510.13845  [pdf, ps, other

    q-bio.NC

    Embodiment in multimodal large language models

    Authors: Akila Kadambi, Lisa Aziz-Zadeh, Antonio Damasio, Marco Iacoboni, Srini Narayanan

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated extraordinary progress in bridging textual and visual inputs. However, MLLMs still face challenges in situated physical and social interactions in sensorally rich, multimodal and real-world settings where the embodied experience of the living organism is essential. We posit that next frontiers for MLLM development require incorporating bot… ▽ More

    Submitted 11 October, 2025; originally announced October 2025.

  2. arXiv:2510.04390  [pdf, ps, other

    cs.CV cs.AI cs.CL

    MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator

    Authors: Xuehai He, Shijie Zhou, Thivyanth Venkateswaran, Kaizhi Zheng, Ziyu Wan, Achuta Kadambi, Xin Eric Wang

    Abstract: World models that support controllable and editable spatiotemporal environments are valuable for robotics, enabling scalable training data, repro ducible evaluation, and flexible task design. While recent text-to-video models generate realistic dynam ics, they are constrained to 2D views and offer limited interaction. We introduce MorphoSim, a language guided framework that generates 4D sc… ▽ More

    Submitted 5 October, 2025; originally announced October 2025.

  3. arXiv:2509.23517  [pdf, ps, other

    cs.CV cs.AI

    Evaluating point-light biological motion in multimodal large language models

    Authors: Akila Kadambi, Marco Iacoboni, Lisa Aziz-Zadeh, Srini Narayanan

    Abstract: Humans can extract rich semantic information from minimal visual cues, as demonstrated by point-light displays (PLDs), which consist of sparse sets of dots localized to key joints of the human body. This ability emerges early in development and is largely attributed to human embodied experience. Since PLDs isolate body motion as the sole source of meaning, they represent key stimuli for testing th… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

  4. arXiv:2508.02095  [pdf, ps, other

    cs.CV cs.AI

    VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

    Authors: Shijie Zhou, Alexander Vilesov, Xuehai He, Ziyu Wan, Shuwang Zhang, Aditya Nagachandra, Di Chang, Dongdong Chen, Xin Eric Wang, Achuta Kadambi

    Abstract: Vision language models (VLMs) have shown remarkable capabilities in integrating linguistic and visual reasoning but remain fundamentally limited in understanding dynamic spatiotemporal interactions. Humans effortlessly track and reason about object movements, rotations, and perspective shifts-abilities essential for robust dynamic real-world understanding yet notably lacking in current VLMs. In th… ▽ More

    Submitted 6 August, 2025; v1 submitted 4 August, 2025; originally announced August 2025.

    Comments: ICCV 2025, Project Website: https://vlm4d.github.io/

  5. arXiv:2503.20776  [pdf, other

    cs.CV

    Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields

    Authors: Shijie Zhou, Hui Ren, Yijia Weng, Shuwang Zhang, Zhen Wang, Dejia Xu, Zhiwen Fan, Suya You, Zhangyang Wang, Leonidas Guibas, Achuta Kadambi

    Abstract: Recent advancements in 2D and multimodal models have achieved remarkable success by leveraging large-scale training on extensive datasets. However, extending these achievements to enable free-form interactions and high-level semantic operations with complex 3D/4D scenes remains challenging. This difficulty stems from the limited availability of large-scale, annotated 3D/4D or multi-view datasets,… ▽ More

    Submitted 28 March, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  6. arXiv:2412.10846  [pdf

    cs.CV cs.HC

    Detecting Activities of Daily Living in Egocentric Video to Contextualize Hand Use at Home in Outpatient Neurorehabilitation Settings

    Authors: Adesh Kadambi, José Zariffa

    Abstract: Wearable egocentric cameras and machine learning have the potential to provide clinicians with a more nuanced understanding of patient hand use at home after stroke and spinal cord injury (SCI). However, they require detailed contextual information (i.e., activities and object interactions) to effectively interpret metrics and meaningfully guide therapy planning. We demonstrate that an object-cent… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

    Comments: To be submitted to IEEE Transactions on Neural Systems and Rehabilitation Engineering. 11 pages, 3 figures, 2 tables

  7. arXiv:2412.06753  [pdf, other

    cs.CV

    InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention

    Authors: Howard Zhang, Yuval Alaluf, Sizhuo Ma, Achuta Kadambi, Jian Wang, Kfir Aberman

    Abstract: Face image restoration aims to enhance degraded facial images while addressing challenges such as diverse degradation types, real-time processing demands, and, most crucially, the preservation of identity-specific features. Existing methods often struggle with slow processing times and suboptimal restoration, especially under severe degradation, failing to accurately reconstruct finer-level identi… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Project page: https://snap-research.github.io/InstantRestore/

  8. arXiv:2412.00372  [pdf, other

    cs.HC cs.AI

    2-Factor Retrieval for Improved Human-AI Decision Making in Radiology

    Authors: Jim Solomon, Laleh Jalilian, Alexander Vilesov, Meryl Mathew, Tristan Grogan, Arash Bedayat, Achuta Kadambi

    Abstract: Human-machine teaming in medical AI requires us to understand to what degree a trained clinician should weigh AI predictions. While previous work has shown the potential of AI assistance at improving clinical predictions, existing clinical decision support systems either provide no explainability of their predictions or use techniques like saliency and Shapley values, which do not allow for physic… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

  9. arXiv:2410.18956  [pdf, other

    cs.CV

    Large Spatial Model: End-to-end Unposed Images to Semantic 3D

    Authors: Zhiwen Fan, Jian Zhang, Wenyan Cong, Peihao Wang, Renjie Li, Kairun Wen, Shijie Zhou, Achuta Kadambi, Zhangyang Wang, Danfei Xu, Boris Ivanovic, Marco Pavone, Yue Wang

    Abstract: Reconstructing and understanding 3D structures from a limited number of images is a well-established problem in computer vision. Traditional methods usually break this task into multiple subtasks, each requiring complex transformations between different data representations. For instance, dense reconstruction through Structure-from-Motion (SfM) involves converting images into key points, optimizin… ▽ More

    Submitted 30 October, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

    Comments: Project Website: https://largespatialmodel.github.io

  10. arXiv:2407.16902  [pdf, other

    cs.CY cs.AI

    The Potential and Perils of Generative Artificial Intelligence for Quality Improvement and Patient Safety

    Authors: Laleh Jalilian, Daniel McDuff, Achuta Kadambi

    Abstract: Generative artificial intelligence (GenAI) has the potential to improve healthcare through automation that enhances the quality and safety of patient care. Powered by foundation models that have been pretrained and can generate complex content, GenAI represents a paradigm shift away from the more traditional focus on task-specific classifiers that have dominated the AI landscape thus far. We posit… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  11. arXiv:2407.11936  [pdf, other

    cs.CV

    Thermal Imaging and Radar for Remote Sleep Monitoring of Breathing and Apnea

    Authors: Kai Del Regno, Alexander Vilesov, Adnan Armouti, Anirudh Bindiganavale Harish, Selim Emir Can, Ashley Kita, Achuta Kadambi

    Abstract: Polysomnography (PSG), the current gold standard method for monitoring and detecting sleep disorders, is cumbersome and costly. At-home testing solutions, known as home sleep apnea testing (HSAT), exist. However, they are contact-based, a feature which limits the ability of some patient populations to tolerate testing and discourages widespread deployment. Previous work on non-contact sleep monito… ▽ More

    Submitted 7 August, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  12. arXiv:2407.04169  [pdf, other

    cs.CV cs.CR

    Solutions to Deepfakes: Can Camera Hardware, Cryptography, and Deep Learning Verify Real Images?

    Authors: Alexander Vilesov, Yuan Tian, Nader Sehatbakhsh, Achuta Kadambi

    Abstract: The exponential progress in generative AI poses serious implications for the credibility of all real images and videos. There will exist a point in the future where 1) digital content produced by generative AI will be indistinguishable from those created by cameras, 2) high-quality generative algorithms will be accessible to anyone, and 3) the ratio of all synthetic to real images will be large. I… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  13. arXiv:2406.13527  [pdf, other

    cs.CV

    4K4DGen: Panoramic 4D Generation at 4K Resolution

    Authors: Renjie Li, Panwang Pan, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhengzhong Tu, Zhiwen Fan

    Abstract: The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic objects or perform outpainting from a single perspective image, failing to meet the requirements of VR/AR applications that need free-viewpoint, 360… ▽ More

    Submitted 3 October, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  14. arXiv:2405.17315  [pdf, other

    cs.CV

    All-day Depth Completion

    Authors: Vadim Ezhov, Hyoungseob Park, Zhaoyang Zhang, Rishi Upadhyay, Howard Zhang, Chethan Chinder Chandrappa, Achuta Kadambi, Yunhao Ba, Julie Dorsey, Alex Wong

    Abstract: We propose a method for depth estimation under different illumination conditions, i.e., day and night time. As photometry is uninformative in regions under low-illumination, we tackle the problem through a multi-sensor fusion approach, where we take as input an additional synchronized sparse point cloud (i.e., from a LiDAR) projected onto the image plane as a sparse depth map, along with a camera… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 8 pages, 4 figures

  15. arXiv:2404.06903  [pdf, other

    cs.CV cs.AI

    DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

    Authors: Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi

    Abstract: The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{\circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{\circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement… ▽ More

    Submitted 25 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  16. arXiv:2403.14874  [pdf, other

    cs.CV cs.LG

    WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather

    Authors: Blake Gella, Howard Zhang, Rishi Upadhyay, Tiffany Chang, Nathan Wei, Matthew Waliman, Yunhao Ba, Celso de Melo, Alex Wong, Achuta Kadambi

    Abstract: We propose a method to infer semantic segmentation maps from images captured under adverse weather conditions. We begin by examining existing models on images degraded by weather conditions such as rain, fog, or snow, and found that they exhibit a large performance drop as compared to those captured under clear weather. To control for changes in scene structures, we propose WeatherProof, the first… ▽ More

    Submitted 7 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2312.09534

  17. arXiv:2403.12327  [pdf, other

    cs.CV cs.LG

    GT-Rain Single Image Deraining Challenge Report

    Authors: Howard Zhang, Yunhao Ba, Ethan Yang, Rishi Upadhyay, Alex Wong, Achuta Kadambi, Yun Guo, Xueyao Xiao, Xiaoxiong Wang, Yi Li, Yi Chang, Luxin Yan, Chaochao Zheng, Luping Wang, Bin Liu, Sunder Ali Khowaja, Jiseok Yoon, Ik-Hyun Lee, Zhao Zhang, Yanyan Wei, Jiahuan Ren, Suiyi Zhao, Huan Zheng

    Abstract: This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained o… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  18. arXiv:2312.17234  [pdf, other

    cs.CV

    Personalized Restoration via Dual-Pivot Tuning

    Authors: Pradyumna Chari, Sizhuo Ma, Daniil Ostashev, Achuta Kadambi, Gurunandan Krishnan, Jian Wang, Kfir Aberman

    Abstract: Generative diffusion models can serve as a prior which ensures that solutions of image restoration systems adhere to the manifold of natural images. However, for restoring facial images, a personalized prior is necessary to accurately represent and reconstruct unique facial features of a given individual. In this paper, we propose a simple, yet effective, method for personalized restoration, calle… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  19. arXiv:2312.09534  [pdf, other

    cs.CV

    WeatherProof: A Paired-Dataset Approach to Semantic Segmentation in Adverse Weather

    Authors: Blake Gella, Howard Zhang, Rishi Upadhyay, Tiffany Chang, Matthew Waliman, Yunhao Ba, Alex Wong, Achuta Kadambi

    Abstract: The introduction of large, foundational models to computer vision has led to drastically improved performance on the task of semantic segmentation. However, these existing methods exhibit a large performance drop when testing on images degraded by weather conditions such as rain, fog, or snow. We introduce a general paired-training method that can be applied to all current foundational model archi… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  20. arXiv:2312.04875  [pdf, other

    cs.CV

    MVDD: Multi-View Depth Diffusion Models

    Authors: Zhen Wang, Qiangeng Xu, Feitong Tan, Menglei Chai, Shichen Liu, Rohit Pandey, Sean Fanello, Achuta Kadambi, Yinda Zhang

    Abstract: Denoising diffusion models have demonstrated outstanding results in 2D image generation, yet it remains a challenge to replicate its success in 3D shape generation. In this paper, we propose leveraging multi-view depth, which represents complex 3D shapes in a 2D data format that is easy to denoise. We pair this representation with a diffusion model, MVDD, that is capable of generating high-quality… ▽ More

    Submitted 19 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

  21. arXiv:2312.03203  [pdf, other

    cs.CV

    Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

    Authors: Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi

    Abstract: 3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundat… ▽ More

    Submitted 8 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  22. arXiv:2312.00944  [pdf, other

    cs.CV cs.GR

    Enhancing Diffusion Models with 3D Perspective Geometry Constraints

    Authors: Rishi Upadhyay, Howard Zhang, Yunhao Ba, Ethan Yang, Blake Gella, Sicheng Jiang, Alex Wong, Achuta Kadambi

    Abstract: While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are capable of outputting a wide gamut of possible images, it is difficult for these synthesized images to adhere to the principle… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Project Webpage: http://visual.ee.ucla.edu/diffusionperspective.htm/

  23. arXiv:2312.00206  [pdf, other

    cs.CV cs.LG eess.IV

    SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting

    Authors: Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, Achuta Kadambi

    Abstract: 3D Gaussian Splatting (3DGS) has recently enabled real-time rendering of unbounded 3D scenes for novel view synthesis. However, this technique requires dense training views to accurately reconstruct 3D geometry. A limited number of input views will significantly degrade reconstruction quality, resulting in artifacts such as "floaters" and "background collapse" at unseen viewpoints. In this work, w… ▽ More

    Submitted 26 March, 2025; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: Version accepted to 3DV 2025. Project page: https://github.com/ForMyCat/SparseGS

  24. arXiv:2311.17907  [pdf, other

    cs.CV cs.AI

    CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting

    Authors: Alexander Vilesov, Pradyumna Chari, Achuta Kadambi

    Abstract: With the onset of diffusion-based generative models and their ability to generate text-conditioned images, content generation has received a massive invigoration. Recently, these models have been shown to provide useful guidance for the generation of 3D graphics assets. However, existing work in text-conditioned 3D generation faces fundamental constraints: (i) inability to generate detailed, multi… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  25. arXiv:2304.08832  [pdf, ps, other

    eess.IV

    Improving Infrared Thermography after Solar Loading

    Authors: Ellin Q. Zhao, Alexander Vilesov, Pradyumna Chari, Laleh Jalilian, Achuta Kadambi

    Abstract: Widely deployed for fever screening, infrared thermometers (IRTs) enable rapid non-contact detection of body temperature, but they are inaccurate in unconstrained environments. Previous works have studied the impact of transient skin temperature on IRTs, but no studies have quantified the effect of skin temperature elevation due to absorbed solar radiation, which we call solar loading. Solar loadi… ▽ More

    Submitted 19 August, 2025; v1 submitted 18 April, 2023; originally announced April 2023.

  26. arXiv:2304.03243  [pdf, other

    cs.AI cs.LG stat.AP

    Synthetic Data in Healthcare

    Authors: Daniel McDuff, Theodore Curran, Achuta Kadambi

    Abstract: Synthetic data are becoming a critical tool for building artificially intelligent systems. Simulators provide a way of generating data systematically and at scale. These data can then be used either exclusively, or in conjunction with real data, for training and testing systems. Synthetic data are particularly attractive in cases where the availability of ``real'' training examples might be a bott… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  27. arXiv:2212.04096  [pdf, other

    cs.CV

    ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction

    Authors: Zhen Wang, Shijie Zhou, Jeong Joon Park, Despoina Paschalidou, Suya You, Gordon Wetzstein, Leonidas Guibas, Achuta Kadambi

    Abstract: This work introduces alternating latent topologies (ALTO) for high-fidelity reconstruction of implicit 3D surfaces from noisy point clouds. Previous work identifies that the spatial arrangement of latent encodings is important to recover detail. One school of thought is to encode a latent vector for each point (point latents). Another school of thought is to project point latents into a grid (grid… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  28. arXiv:2209.00746  [pdf, other

    cs.LG cs.CV

    MIME: Minority Inclusion for Majority Group Enhancement of AI Performance

    Authors: Pradyumna Chari, Yunhao Ba, Shreeram Athreya, Achuta Kadambi

    Abstract: Several papers have rightly included minority groups in artificial intelligence (AI) training data to improve test inference for minority groups and/or society-at-large. A society-at-large consists of both minority and majority stakeholders. A common misconception is that minority inclusion does not increase performance for majority groups alone. In this paper, we make the surprising finding that… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

  29. arXiv:2206.10779  [pdf, other

    cs.CV

    Not Just Streaks: Towards Ground Truth for Single Image Deraining

    Authors: Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso de Melo, Suya You, Stefano Soatto, Alex Wong, Achuta Kadambi

    Abstract: We propose a large-scale dataset of real-world rainy and clean image pairs and a method to remove degradations, induced by rain streaks and rain accumulation, from the image. As there exists no real-world dataset for deraining, current state-of-the-art methods rely on synthetic data and thus are limited by the sim2real domain gap; moreover, rigorous evaluation remains a challenge due to the absenc… ▽ More

    Submitted 29 July, 2024; v1 submitted 21 June, 2022; originally announced June 2022.

  30. arXiv:2109.13488  [pdf, other

    cs.CV

    Towards Rotation Invariance in Object Detection

    Authors: Agastya Kalra, Guy Stoppi, Bradley Brown, Rishav Agarwal, Achuta Kadambi

    Abstract: Rotation augmentations generally improve a model's invariance/equivariance to rotation - except in object detection. In object detection the shape is not known, therefore rotation creates a label ambiguity. We show that the de-facto method for bounding box label rotation, the Largest Box Method, creates very large labels, leading to poor performance and in many cases worse performance than using n… ▽ More

    Submitted 30 September, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: Accepted ICCV 2021

  31. arXiv:2109.05959  [pdf

    cs.ET physics.optics

    Physics-AI Symbiosis

    Authors: Bahram Jalali, Achuta Kadambi, Vwani Roychowdhury

    Abstract: The phenomenal success of physics in explaining nature and designing hardware is predicated on efficient computational models. A universal codebook of physical laws defines the computational rules and a physical system is an interacting ensemble governed by these rules. Led by deep neural networks, artificial intelligence (AI) has introduced an alternate end-to-end data-driven computational framew… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

  32. arXiv:2106.06007  [pdf, other

    cs.CV

    Overcoming Difficulty in Obtaining Dark-skinned Subjects for Remote-PPG by Synthetic Augmentation

    Authors: Yunhao Ba, Zhen Wang, Kerim Doruk Karinca, Oyku Deniz Bozkurt, Achuta Kadambi

    Abstract: Camera-based remote photoplethysmography (rPPG) provides a non-contact way to measure physiological signals (e.g., heart rate) using facial videos. Recent deep learning architectures have improved the accuracy of such physiological measurement significantly, yet they are restricted by the diversity of the annotated videos. The existing datasets MMSE-HR, AFRL, and UBFC-RPPG contain roughly 10%, 0%,… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  33. arXiv:2010.12769  [pdf

    eess.IV

    Diverse R-PPG: Camera-Based Heart Rate Estimation for Diverse Subject Skin-Tones and Scenes

    Authors: Pradyumna Chari, Krish Kabra, Doruk Karinca, Soumyarup Lahiri, Diplav Srivastava, Kimaya Kulkarni, Tianyuan Chen, Maxime Cannesson, Laleh Jalilian, Achuta Kadambi

    Abstract: Heart rate (HR) is an essential clinical measure for the assessment of cardiorespiratory instability. Since communities of color are disproportionately affected by both COVID-19 and cardiovascular disease, there is a pressing need to deploy contactless HR sensing solutions for high-quality telemedicine evaluations. Existing computer vision methods that estimate HR from facial videos exhibit biased… ▽ More

    Submitted 9 December, 2020; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: 49 pages, 6 figures, 3 tables, Supplement with 7 figures

  34. arXiv:1911.12906  [pdf

    eess.IV cs.CV

    Enhancing Passive Non-Line-of-Sight Imaging Using Polarization Cues

    Authors: Kenichiro Tanaka, Yasuhiro Mukaigawa, Achuta Kadambi

    Abstract: This paper presents a method of passive non-line-of-sight (NLOS) imaging using polarization cues. A key observation is that the oblique light has a different polarimetric signal. It turns out this effect is due to the polarization axis rotation, a phenomena which can be used to better condition the light transport matrix for non-line-of-sight imaging. Our analysis and results show that the use of… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

  35. arXiv:1911.11893  [pdf, other

    cs.CV

    Visual Physics: Discovering Physical Laws from Videos

    Authors: Pradyumna Chari, Chinmay Talegaonkar, Yunhao Ba, Achuta Kadambi

    Abstract: In this paper, we teach a machine to discover the laws of physics from video streams. We assume no prior knowledge of physics, beyond a temporal stream of bounding boxes. The problem is very difficult because a machine must learn not only a governing equation (e.g. projectile motion) but also the existence of governing parameters (e.g. velocities). We evaluate our ability to discover physical laws… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  36. arXiv:1910.00201  [pdf, other

    cs.LG stat.ML

    Blending Diverse Physical Priors with Neural Networks

    Authors: Yunhao Ba, Guangyuan Zhao, Achuta Kadambi

    Abstract: Machine learning in context of physical systems merits a re-examination of the learning strategy. In addition to data, one can leverage a vast library of physical prior models (e.g. kinematics, fluid flow, etc) to perform more robust inference. The nascent sub-field of \emph{physics-based learning} (PBL) studies the blending of neural networks with physical priors. While previous PBL algorithms ha… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

  37. arXiv:1903.10210  [pdf, other

    cs.CV cs.LG

    Deep Shape from Polarization

    Authors: Yunhao Ba, Alex Ross Gilbert, Franklin Wang, Jinfa Yang, Rui Chen, Yiqin Wang, Lei Yan, Boxin Shi, Achuta Kadambi

    Abstract: This paper makes a first attempt to bring the Shape from Polarization (SfP) problem to the realm of deep learning. The previous state-of-the-art methods for SfP have been purely physics-based. We see value in these principled models, and blend these physical models as priors into a neural network architecture. This proposed approach achieves results that exceed the previous state-of-the-art on a c… ▽ More

    Submitted 25 May, 2020; v1 submitted 25 March, 2019; originally announced March 2019.

  38. arXiv:1605.02066  [pdf, other

    cs.CV

    Shape from Mixed Polarization

    Authors: Vage Taamazyan, Achuta Kadambi, Ramesh Raskar

    Abstract: Shape from Polarization (SfP) estimates surface normals using photos captured at different polarizer rotations. Fundamentally, the SfP model assumes that light is reflected either diffusely or specularly. However, this model is not valid for many real-world surfaces exhibiting a mixture of diffuse and specular properties. To address this challenge, previous methods have used a sequential solution:… ▽ More

    Submitted 11 June, 2016; v1 submitted 5 May, 2016; originally announced May 2016.

    Comments: 13 pages, 5 figures

  39. arXiv:1503.01804  [pdf, other

    cs.CV cs.GR

    Frequency Domain TOF: Encoding Object Depth in Modulation Frequency

    Authors: Achuta Kadambi, Vage Taamazyan, Suren Jayasuriya, Ramesh Raskar

    Abstract: Time of flight cameras may emerge as the 3-D sensor of choice. Today, time of flight sensors use phase-based sampling, where the phase delay between emitted and received, high-frequency signals encodes distance. In this paper, we present a new time of flight architecture that relies only on frequency---we refer to this technique as frequency-domain time of flight (FD-TOF). Inspired by optical cohe… ▽ More

    Submitted 5 March, 2015; originally announced March 2015.

    Comments: 10 pages

  40. arXiv:1501.04878   

    cs.CV

    A Light Transport Model for Mitigating Multipath Interference in TOF Sensors

    Authors: Nikhil Naik, Achuta Kadambi, Christoph Rhemann, Shahram Izadi, Ramesh Raskar, Sing Bing Kang

    Abstract: Continuous-wave Time-of-flight (TOF) range imaging has become a commercially viable technology with many applications in computer vision and graphics. However, the depth images obtained from TOF cameras contain scene dependent errors due to multipath interference (MPI). Specifically, MPI occurs when multiple optical reflections return to a single spatial location on the imaging sensor. Many prior… ▽ More

    Submitted 30 January, 2015; v1 submitted 20 January, 2015; originally announced January 2015.

    Comments: This paper has been withdrawn by the submitter as the submission was made due to a miscommunication

  41. arXiv:1404.1116  [pdf, other

    cs.CV cs.IT physics.optics

    Resolving Multi-path Interference in Time-of-Flight Imaging via Modulation Frequency Diversity and Sparse Regularization

    Authors: Ayush Bhandari, Achuta Kadambi, Refael Whyte, Christopher Barsi, Micha Feigin, Adrian Dorrington, Ramesh Raskar

    Abstract: Time-of-flight (ToF) cameras calculate depth maps by reconstructing phase shifts of amplitude-modulated signals. For broad illumination or transparent objects, reflections from multiple scene points can illuminate a given pixel, giving rise to an erroneous depth map. We report here a sparsity regularized solution that separates K-interfering components using multiple modulation frequency measureme… ▽ More

    Submitted 3 April, 2014; originally announced April 2014.

    Comments: 11 Pages, 4 figures, appeared with minor changes in Optics Letters

    Journal ref: Optics Letters, Vol. 39, Issue 6, pp. 1705-1708 (2014)