Skip to main content

Showing 1–10 of 10 results for author: Dagli, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.07932  [pdf, ps, other

    cs.GR cs.CV cs.LG

    Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor

    Authors: Rishit Dagli, Yushi Guan, Sankeerth Durvasula, Mohammadreza Mofayezi, Nandita Vijaykumar

    Abstract: We propose Squeeze3D, a novel framework that leverages implicit prior knowledge learnt by existing pre-trained 3D generative models to compress 3D data at extremely high compression ratios. Our approach bridges the latent spaces between a pre-trained encoder and a pre-trained generation model through trainable mapping networks. Any 3D model represented as a mesh, point cloud, or a radiance field i… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  2. arXiv:2503.19356  [pdf, other

    cs.CV

    Can Vision-Language Models Answer Face to Face Questions in the Real-World?

    Authors: Reza Pourreza, Rishit Dagli, Apratim Bhattacharyya, Sunny Panchal, Guillaume Berger, Roland Memisevic

    Abstract: AI models have made significant strides in recent years in their ability to describe and answer questions about real-world images. They have also made progress in the ability to converse with users in real-time using audio input. This raises the question: have we reached the point where AI models, connected to a camera and microphone, can converse with users in real-time about scenes and events th… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  3. arXiv:2410.02921  [pdf, other

    cs.CV

    AirLetters: An Open Video Dataset of Characters Drawn in the Air

    Authors: Rishit Dagli, Guillaume Berger, Joanna Materzynska, Ingo Bax, Roland Memisevic

    Abstract: We introduce AirLetters, a new video dataset consisting of real-world videos of human-generated, articulated motions. Specifically, our dataset requires a vision model to predict letters that humans draw in the air. Unlike existing video datasets, accurate classification predictions for AirLetters rely critically on discerning motion patterns and on integrating long-range information in the video… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: ECCV'24, HANDS workshop

  4. arXiv:2408.10258  [pdf, other

    cs.CV cs.LG

    NeRF-US: Removing Ultrasound Imaging Artifacts from Neural Radiance Fields in the Wild

    Authors: Rishit Dagli, Atsuhiro Hibi, Rahul G. Krishnan, Pascal N. Tyrrell

    Abstract: Current methods for performing 3D reconstruction and novel view synthesis (NVS) in ultrasound imaging data often face severe artifacts when training NeRF-based approaches. The artifacts produced by current approaches differ from NeRF floaters in general scenes because of the unique nature of ultrasound capture. Furthermore, existing models fail to produce reasonable 3D reconstructions when ultraso… ▽ More

    Submitted 20 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  5. arXiv:2406.10724  [pdf, other

    eess.IV cs.CV cs.LG

    Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft

    Authors: Ian Vyse, Rishit Dagli, Dav Vrat Chadha, John P. Ma, Hector Chen, Isha Ruparelia, Prithvi Seran, Matthew Xie, Eesa Aamer, Aidan Armstrong, Naveen Black, Ben Borstein, Kevin Caldwell, Orrin Dahanaggamaarachchi, Joe Dai, Abeer Fatima, Stephanie Lu, Maxime Michet, Anoushka Paul, Carrie Ann Po, Shivesh Prakash, Noa Prosser, Riddhiman Roy, Mirai Shinjo, Iliya Shofman , et al. (4 additional authors not shown)

    Abstract: Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear in 38th Annual Small Satellite Conference

  6. arXiv:2406.06612  [pdf, ps, other

    cs.CV cs.LG cs.SD eess.AS

    SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

    Authors: Rishit Dagli, Shivesh Prakash, Robert Wu, Houman Khosravani

    Abstract: Generating combined visual and auditory sensory experiences is critical for the consumption of immersive content. Recent advances in neural generative models have enabled the creation of high-resolution content across multiple modalities such as images, text, speech, and videos. Despite these successes, there remains a significant gap in the generation of high-quality spatial audio that complement… ▽ More

    Submitted 7 July, 2025; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://see2sound.github.io/

  7. arXiv:2402.18575  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    DiffuseRAW: End-to-End Generative RAW Image Processing for Low-Light Images

    Authors: Rishit Dagli

    Abstract: Imaging under extremely low-light conditions presents a significant challenge and is an ill-posed problem due to the low signal-to-noise ratio (SNR) caused by minimal photon capture. Previously, diffusion models have been used for multiple kinds of generative tasks and image-to-image tasks, however, these models work as a post-processing step. These diffusion models are trained on processed images… ▽ More

    Submitted 12 December, 2023; originally announced February 2024.

  8. arXiv:2402.10100  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data

    Authors: Hamza Mahdi, Eptehal Nashnoush, Rami Saab, Arjun Balachandar, Rishit Dagli, Lucas X. Perri, Houman Khosravani

    Abstract: This study assesses deep learning models for audio classification in a clinical setting with the constraint of small datasets reflecting real-world prospective data collection. We analyze CNNs, including DenseNet and ConvNeXt, alongside transformer models like ViT, SWIN, and AST, and compare them against pre-trained audio models such as YAMNet and VGGish. Our method highlights the benefits of pre-… ▽ More

    Submitted 5 April, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: CHIL 2024

  9. arXiv:2304.05350  [pdf, other

    cs.CV cs.AI cs.LG

    Astroformer: More Data Might not be all you need for Classification

    Authors: Rishit Dagli

    Abstract: Recent advancements in areas such as natural language processing and computer vision rely on intricate and massive models that have been trained using vast amounts of unlabelled or partly labeled data and training or deploying these state-of-the-art methods to resource constraint environments has been a challenge. Galaxy morphologies are crucial to understanding the processes by which galaxies for… ▽ More

    Submitted 26 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: 21 pages, 7 figures. ICLR 2023

  10. arXiv:2112.09569  [pdf, other

    cs.CV cs.AI cs.LG

    CPPE-5: Medical Personal Protective Equipment Dataset

    Authors: Rishit Dagli, Ali Mustufa Shaikh

    Abstract: We present a new challenging dataset, CPPE - 5 (Medical Personal Protective Equipment), with the goal to allow the study of subordinate categorization of medical personal protective equipments, which is not possible with other popular data sets that focus on broad-level categories (such as PASCAL VOC, ImageNet, Microsoft COCO, OpenImages, etc). To make it easy for models trained on this dataset to… ▽ More

    Submitted 18 February, 2023; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: 18 pages, 6 tables, 6 figures. Code and models are available at https://git.io/cppe5-dataset