Search | arXiv e-print repository

Private Text Generation by Seeding Large Language Model Prompts

Authors: Supriya Nagesh, Justin Y. Chen, Nina Mishra, Tal Wagner

Abstract: We explore how private synthetic text can be generated by suitably prompting a large language model (LLM). This addresses a challenge for organizations like hospitals, which hold sensitive text data like patient medical records, and wish to share it in order to train machine learning models for medical tasks, while preserving patient privacy. Methods that rely on training or finetuning a model may… ▽ More We explore how private synthetic text can be generated by suitably prompting a large language model (LLM). This addresses a challenge for organizations like hospitals, which hold sensitive text data like patient medical records, and wish to share it in order to train machine learning models for medical tasks, while preserving patient privacy. Methods that rely on training or finetuning a model may be out of reach, either due to API limits of third-party LLMs, or due to ethical and legal prohibitions on sharing the private data with the LLM itself. We propose Differentially Private Keyphrase Prompt Seeding (DP-KPS), a method that generates a private synthetic text corpus from a sensitive input corpus, by accessing an LLM only through privatized prompts. It is based on seeding the prompts with private samples from a distribution over phrase embeddings, thus capturing the input corpus while achieving requisite output diversity and maintaining differential privacy. We evaluate DP-KPS on downstream ML text classification tasks, and show that the corpora it generates preserve much of the predictive power of the original ones. Our findings offer hope that institutions can reap ML insights by privately sharing data with simple prompts and little compute. △ Less

Submitted 18 February, 2025; originally announced February 2025.

arXiv:2404.12819 [pdf, other]

Unveiling the Ambiguity in Neural Inverse Rendering: A Parameter Compensation Analysis

Authors: Georgios Kouros, Minye Wu, Sushruth Nagesh, Xianling Zhang, Tinne Tuytelaars

Abstract: Inverse rendering aims to reconstruct the scene properties of objects solely from multiview images. However, it is an ill-posed problem prone to producing ambiguous estimations deviating from physically accurate representations. In this paper, we utilize Neural Microfacet Fields (NMF), a state-of-the-art neural inverse rendering method to illustrate the inherent ambiguity. We propose an evaluation… ▽ More Inverse rendering aims to reconstruct the scene properties of objects solely from multiview images. However, it is an ill-posed problem prone to producing ambiguous estimations deviating from physically accurate representations. In this paper, we utilize Neural Microfacet Fields (NMF), a state-of-the-art neural inverse rendering method to illustrate the inherent ambiguity. We propose an evaluation framework to assess the degree of compensation or interaction between the estimated scene properties, aiming to explore the mechanisms behind this ill-posed problem and potential mitigation strategies. Specifically, we introduce artificial perturbations to one scene property and examine how adjusting another property can compensate for these perturbations. To facilitate such experiments, we introduce a disentangled NMF where material properties are independent. The experimental findings underscore the intrinsic ambiguity present in neural inverse rendering and highlight the importance of providing additional guidance through geometry, material, and illumination priors. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2310.19372 [pdf, other]

RGB-X Object Detection via Scene-Specific Fusion Modules

Authors: Sri Aditya Deevi, Connor Lee, Lu Gan, Sushruth Nagesh, Gaurav Pandey, Soon-Jo Chung

Abstract: Multimodal deep sensor fusion has the potential to enable autonomous vehicles to visually understand their surrounding environments in all weather conditions. However, existing deep sensor fusion methods usually employ convoluted architectures with intermingled multimodal features, requiring large coregistered multimodal datasets for training. In this work, we present an efficient and modular RGB-… ▽ More Multimodal deep sensor fusion has the potential to enable autonomous vehicles to visually understand their surrounding environments in all weather conditions. However, existing deep sensor fusion methods usually employ convoluted architectures with intermingled multimodal features, requiring large coregistered multimodal datasets for training. In this work, we present an efficient and modular RGB-X fusion network that can leverage and fuse pretrained single-modal models via scene-specific fusion modules, thereby enabling joint input-adaptive network architectures to be created using small, coregistered multimodal datasets. Our experiments demonstrate the superiority of our method compared to existing works on RGB-thermal and RGB-gated datasets, performing fusion using only a small amount of additional parameters. Our code is available at https://github.com/dsriaditya999/RGBXFusion. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted to 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

arXiv:2309.04362 [pdf, ps, other]

Sparse Codesigned Communication and Radar Systems

Authors: Hyeon Seok Rou, Giuseppe Thadeu Freitas de Abreu, Saravanan Nagesh, Andreas Bathelt, David González G., Osvaldo Gonsa, Hans-Ludwig Bloecher

Abstract: In the envisioned beyond-fifth-generation (B5G) and sixth-generation (6G) scenarios which expect massive multiple-input multiple-output (mMIMO) and high frequency communications in the millimeter-wave (mmWave) and Terahertz (THz) bands, efficiency in both energy and spectrum is of increasing significance. To that extent, a novel ISAC framework called "sparse codesigned communication and radar (SCC… ▽ More In the envisioned beyond-fifth-generation (B5G) and sixth-generation (6G) scenarios which expect massive multiple-input multiple-output (mMIMO) and high frequency communications in the millimeter-wave (mmWave) and Terahertz (THz) bands, efficiency in both energy and spectrum is of increasing significance. To that extent, a novel ISAC framework called "sparse codesigned communication and radar (SCCR)" systems is described, which codesigns both communication and radar signals by a sparsification of the resource domain and the waveform spectrum domain. This improves the spectral and energy efficiency, but at the inherent cost of missing radar spectrum and irregular beampattern, and decreased throughput and diversity. Such challenges can however be corroborated, by leveraging various sparsity-robust signal processing techniques such as sparse radar reconstruction and index modulation (IM). In light of the above, the white paper aims to outlined the proposed article which provide an overview and a novel classification of the relevant state-of-the-art (SotA) methods and the implications of the challenges in the sparse codesign of the system, followed by a variety of novel SCCR frameworks. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2308.09878 [pdf, other]

DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets

Authors: Shubham Shrivastava, Xianling Zhang, Sushruth Nagesh, Armin Parchami

Abstract: Data imbalance is a well-known issue in the field of machine learning, attributable to the cost of data collection, the difficulty of labeling, and the geographical distribution of the data. In computer vision, bias in data distribution caused by image appearance remains highly unexplored. Compared to categorical distributions using class labels, image appearance reveals complex relationships betw… ▽ More Data imbalance is a well-known issue in the field of machine learning, attributable to the cost of data collection, the difficulty of labeling, and the geographical distribution of the data. In computer vision, bias in data distribution caused by image appearance remains highly unexplored. Compared to categorical distributions using class labels, image appearance reveals complex relationships between objects beyond what class labels provide. Clustering deep perceptual features extracted from raw pixels gives a richer representation of the data. This paper presents a novel method for addressing data imbalance in machine learning. The method computes sample likelihoods based on image appearance using deep perceptual embeddings and clustering. It then uses these likelihoods to weigh samples differently during training with a proposed $\textbf{Generalized Focal Loss}$ function. This loss can be easily integrated with deep learning algorithms. Experiments validate the method's effectiveness across autonomous driving vision datasets including KITTI and nuScenes. The loss function improves state-of-the-art 3D object detection methods, achieving over $200\%$ AP gains on under-represented classes (Cyclist) in the KITTI dataset. The results demonstrate the method is generalizable, complements existing techniques, and is particularly beneficial for smaller datasets and rare classes. Code is available at: https://github.com/towardsautonomy/DatasetEquity △ Less

Submitted 21 August, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: ICCV 2023 Workshop

arXiv:2308.08530 [pdf, other]

Ref-DVGO: Reflection-Aware Direct Voxel Grid Optimization for an Improved Quality-Efficiency Trade-Off in Reflective Scene Reconstruction

Authors: Georgios Kouros, Minye Wu, Shubham Shrivastava, Sushruth Nagesh, Punarjay Chakravarty, Tinne Tuytelaars

Abstract: Neural Radiance Fields (NeRFs) have revolutionized the field of novel view synthesis, demonstrating remarkable performance. However, the modeling and rendering of reflective objects remain challenging problems. Recent methods have shown significant improvements over the baselines in handling reflective scenes, albeit at the expense of efficiency. In this work, we aim to strike a balance between ef… ▽ More Neural Radiance Fields (NeRFs) have revolutionized the field of novel view synthesis, demonstrating remarkable performance. However, the modeling and rendering of reflective objects remain challenging problems. Recent methods have shown significant improvements over the baselines in handling reflective scenes, albeit at the expense of efficiency. In this work, we aim to strike a balance between efficiency and quality. To this end, we investigate an implicit-explicit approach based on conventional volume rendering to enhance the reconstruction quality and accelerate the training and rendering processes. We adopt an efficient density-based grid representation and reparameterize the reflected radiance in our pipeline. Our proposed reflection-aware approach achieves a competitive quality efficiency trade-off compared to competing methods. Based on our experimental results, we propose and discuss hypotheses regarding the factors influencing the results of density-based methods for reconstructing reflective objects. The source code is available at https://github.com/gkouros/ref-dvgo. △ Less

Submitted 21 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: 5 pages, 4 figures, 3 tables, ICCV TRICKY 2023 Workshop

arXiv:2306.06325 [pdf, other]

Explaining a machine learning decision to physicians via counterfactuals

Authors: Supriya Nagesh, Nina Mishra, Yonatan Naamad, James M. Rehg, Mehul A. Shah, Alexei Wagner

Abstract: Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system. However, the lack of explainability is a major roadblock to their adoption in hospitals. \textit{How can the decision of an ML model be explained to a physician?} The explanations considered in this paper are counterfactuals (CFs), hypothetical scenarios that would have resulte… ▽ More Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system. However, the lack of explainability is a major roadblock to their adoption in hospitals. \textit{How can the decision of an ML model be explained to a physician?} The explanations considered in this paper are counterfactuals (CFs), hypothetical scenarios that would have resulted in the opposite outcome. Specifically, time-series CFs are investigated, inspired by the way physicians converse and reason out decisions `I would have given the patient a vasopressor if their blood pressure was lower and falling'. Key properties of CFs that are particularly meaningful in clinical settings are outlined: physiological plausibility, relevance to the task and sparse perturbations. Past work on CF generation does not satisfy these properties, specifically plausibility in that realistic time-series CFs are not generated. A variational autoencoder (VAE)-based approach is proposed that captures these desired properties. The method produces CFs that improve on prior approaches quantitatively (more plausible CFs as evaluated by their likelihood w.r.t original data distribution, and 100$\times$ faster at generating CFs) and qualitatively (2$\times$ more plausible and relevant) as evaluated by three physicians. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2212.07514 [pdf, other]

PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation

Authors: Maxwell A. Xu, Alexander Moreno, Supriya Nagesh, V. Burak Aydemir, David W. Wetter, Santosh Kumar, James M. Rehg

Abstract: The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and… ▽ More The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and a lack of available datasets has stymied progress. We address this gap with PulseImpute, the first large-scale pulsative signal imputation challenge which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. Our baseline models include a novel transformer-based architecture designed to exploit the structure of pulsative signals. We hope that PulseImpute will enable the ML community to tackle this significant and challenging task. △ Less

Submitted 15 December, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

Comments: NeurIPS 2022 | Code available at: https://github.com/rehg-lab/pulseimpute | Data available at: https://doi.org/10.5281/zenodo.7129964

Journal ref: Advances in Neural Information Processing Systems 35 (2022) 26874-26888

arXiv:2208.07362 [pdf, other]

Look Both Ways: Bidirectional Visual Sensing for Automatic Multi-Camera Registration

Authors: Subodh Mishra, Sushruth Nagesh, Sagar Manglani, Graham Mills, Punarjay Chakravarty, Gaurav Pandey

Abstract: This work describes the automatic registration of a large network (approximately 40) of fixed, ceiling-mounted environment cameras spread over a large area (approximately 800 squared meters) using a mobile calibration robot equipped with a single upward-facing fisheye camera and a backlit ArUco marker for easy detection. The fisheye camera is used to do visual odometry (VO), and the ArUco marker f… ▽ More This work describes the automatic registration of a large network (approximately 40) of fixed, ceiling-mounted environment cameras spread over a large area (approximately 800 squared meters) using a mobile calibration robot equipped with a single upward-facing fisheye camera and a backlit ArUco marker for easy detection. The fisheye camera is used to do visual odometry (VO), and the ArUco marker facilitates easy detection of the calibration robot in the environment cameras. In addition, the fisheye camera is also able to detect the environment cameras. This two-way, bidirectional detection constrains the pose of the environment cameras to solve an optimization problem. Such an approach can be used to automatically register a large-scale multi-camera system used for surveillance, automated parking, or robotic applications. This VO based multi-camera registration method has been extensively validated using real-world experiments, and also compared against a similar approach which uses a LiDAR - an expensive, heavier and power hungry sensor. △ Less

Submitted 7 October, 2022; v1 submitted 15 August, 2022; originally announced August 2022.

arXiv:2208.06195 [pdf, other]

Category-Level Pose Retrieval with Contrastive Features Learnt with Occlusion Augmentation

Authors: Georgios Kouros, Shubham Shrivastava, Cédric Picron, Sushruth Nagesh, Punarjay Chakravarty, Tinne Tuytelaars

Abstract: Pose estimation is usually tackled as either a bin classification or a regression problem. In both cases, the idea is to directly predict the pose of an object. This is a non-trivial task due to appearance variations between similar poses and similarities between dissimilar poses. Instead, we follow the key idea that comparing two poses is easier than directly predicting one. Render-and-compare ap… ▽ More Pose estimation is usually tackled as either a bin classification or a regression problem. In both cases, the idea is to directly predict the pose of an object. This is a non-trivial task due to appearance variations between similar poses and similarities between dissimilar poses. Instead, we follow the key idea that comparing two poses is easier than directly predicting one. Render-and-compare approaches have been employed to that end, however, they tend to be unstable, computationally expensive, and slow for real-time applications. We propose doing category-level pose estimation by learning an alignment metric in an embedding space using a contrastive loss with a dynamic margin and a continuous pose-label space. For efficient inference, we use a simple real-time image retrieval scheme with a pre-rendered and pre-embedded reference set of renderings. To achieve robustness to real-world conditions, we employ synthetic occlusions, bounding box perturbations, and appearance augmentations. Our approach achieves state-of-the-art performance on PASCAL3D and OccludedPASCAL3D and surpasses the competing methods on KITTI3D in a cross-dataset evaluation setting. The code is currently available at https://github.com/gkouros/contrastive-pose-retrieval. △ Less

Submitted 12 October, 2022; v1 submitted 12 August, 2022; originally announced August 2022.

Comments: 29 pages, 16 Figures, 14 tables, BMVC 2022

arXiv:2205.12923 [pdf, other]

Domain Adaptation for Object Detection using SE Adaptors and Center Loss

Authors: Sushruth Nagesh, Shreyas Rajesh, Asfiya Baig, Savitha Srinivasan

Abstract: Despite growing interest in object detection, very few works address the extremely practical problem of cross-domain robustness especially for automative applications. In order to prevent drops in performance due to domain shift, we introduce an unsupervised domain adaptation method built on the foundation of faster-RCNN with two domain adaptation components addressing the shift at the instance an… ▽ More Despite growing interest in object detection, very few works address the extremely practical problem of cross-domain robustness especially for automative applications. In order to prevent drops in performance due to domain shift, we introduce an unsupervised domain adaptation method built on the foundation of faster-RCNN with two domain adaptation components addressing the shift at the instance and image levels respectively and apply a consistency regularization between them. We also introduce a family of adaptation layers that leverage the squeeze excitation mechanism called SE Adaptors to improve domain attention and thus improves performance without any prior requirement of knowledge of the new target domain. Finally, we incorporate a center loss in the instance and image level representations to improve the intra-class variance. We report all results with Cityscapes as our source domain and Foggy Cityscapes as the target domain outperforming previous baselines. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.12519 [pdf, other]

Structure Aware and Class Balanced 3D Object Detection on nuScenes Dataset

Authors: Sushruth Nagesh, Asfiya Baig, Savitha Srinivasan, Akshay Rangesh, Mohan Trivedi

Abstract: 3-D object detection is pivotal for autonomous driving. Point cloud based methods have become increasingly popular for 3-D object detection, owing to their accurate depth information. NuTonomy's nuScenes dataset greatly extends commonly used datasets such as KITTI in size, sensor modalities, categories, and annotation numbers. However, it suffers from severe class imbalance. The Class-balanced Gro… ▽ More 3-D object detection is pivotal for autonomous driving. Point cloud based methods have become increasingly popular for 3-D object detection, owing to their accurate depth information. NuTonomy's nuScenes dataset greatly extends commonly used datasets such as KITTI in size, sensor modalities, categories, and annotation numbers. However, it suffers from severe class imbalance. The Class-balanced Grouping and Sampling paper addresses this issue and suggests augmentation and sampling strategy. However, the localization precision of this model is affected by the loss of spatial information in the downscaled feature maps. We propose to enhance the performance of the CBGS model by designing an auxiliary network, that makes full use of the structure information of the 3D point cloud, in order to improve the localization accuracy. The detachable auxiliary network is jointly optimized by two point-level supervisions, namely foreground segmentation and center estimation. The auxiliary network does not introduce any extra computation during inference, since it can be detached at test time. △ Less

Submitted 3 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2111.01222 [pdf, other]

Kernel Deformed Exponential Families for Sparse Continuous Attention

Authors: Alexander Moreno, Supriya Nagesh, Zhenke Wu, Walter Dempsey, James M. Rehg

Abstract: Attention mechanisms take an expectation of a data representation with respect to probability weights. This creates summary statistics that focus on important features. Recently, (Martins et al. 2020, 2021) proposed continuous attention mechanisms, focusing on unimodal attention densities from the exponential and deformed exponential families: the latter has sparse support. (Farinhas et al. 2021)… ▽ More Attention mechanisms take an expectation of a data representation with respect to probability weights. This creates summary statistics that focus on important features. Recently, (Martins et al. 2020, 2021) proposed continuous attention mechanisms, focusing on unimodal attention densities from the exponential and deformed exponential families: the latter has sparse support. (Farinhas et al. 2021) extended this to use Gaussian mixture attention densities, which are a flexible class with dense support. In this paper, we extend this to two general flexible classes: kernel exponential families and our new sparse counterpart kernel deformed exponential families. Theoretically, we show new existence results for both kernel exponential and deformed exponential families, and that the deformed case has similar approximation capabilities to kernel exponential families. Experiments show that kernel deformed exponential families can attend to multiple compact regions of the data domain. △ Less

Submitted 12 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

arXiv:2111.01193 [pdf, other]

Transformers for prompt-level EMA non-response prediction

Authors: Supriya Nagesh, Alexander Moreno, Stephanie M. Carpenter, Jamie Yap, Soujanya Chatterjee, Steven Lloyd Lizotte, Neng Wan, Santosh Kumar, Cho Lam, David W. Wetter, Inbal Nahum-Shani, James M. Rehg

Abstract: Ecological Momentary Assessments (EMAs) are an important psychological data source for measuring current cognitive states, affect, behavior, and environmental factors from participants in mobile health (mHealth) studies and treatment programs. Non-response, in which participants fail to respond to EMA prompts, is an endemic problem. The ability to accurately predict non-response could be utilized… ▽ More Ecological Momentary Assessments (EMAs) are an important psychological data source for measuring current cognitive states, affect, behavior, and environmental factors from participants in mobile health (mHealth) studies and treatment programs. Non-response, in which participants fail to respond to EMA prompts, is an endemic problem. The ability to accurately predict non-response could be utilized to improve EMA delivery and develop compliance interventions. Prior work has explored classical machine learning models for predicting non-response. However, as increasingly large EMA datasets become available, there is the potential to leverage deep learning models that have been effective in other fields. Recently, transformer models have shown state-of-the-art performance in NLP and other domains. This work is the first to explore the use of transformers for EMA data analysis. We address three key questions in applying transformers to EMA data: 1. Input representation, 2. encoding temporal information, 3. utility of pre-training on improving downstream prediction task performance. The transformer model achieves a non-response prediction AUC of 0.77 and is significantly better than classical ML and LSTM-based deep learning models. We will make our a predictive model trained on a corpus of 40K EMA samples freely-available to the research community, in order to facilitate the development of future transformer-based EMA analysis works. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:1901.09482 [pdf, other]

doi 10.1109/TPAMI.2020.2996538

Bridging the Gap Between Computational Photography and Visual Recognition

Authors: Rosaura G. VidalMata, Sreya Banerjee, Brandon RichardWebster, Michael Albright, Pedro Davalos, Scott McCloskey, Ben Miller, Asong Tambo, Sushobhan Ghosh, Sudarshan Nagesh, Ye Yuan, Yueyu Hu, Junru Wu, Wenhan Yang, Xiaoshuai Zhang, Jiaying Liu, Zhangyang Wang, Hwann-Tzong Chen, Tzu-Wei Huang, Wen-Chi Chin, Yi-Chun Li, Mahmoud Lababidi, Charles Otto, Walter J. Scheirer

Abstract: What is the current state-of-the-art for image restoration and enhancement applied to degraded images acquired under less than ideal circumstances? Can the application of such algorithms as a pre-processing step to improve image interpretability for manual analysis or automatic visual recognition to classify scene content? While there have been important advances in the area of computational photo… ▽ More What is the current state-of-the-art for image restoration and enhancement applied to degraded images acquired under less than ideal circumstances? Can the application of such algorithms as a pre-processing step to improve image interpretability for manual analysis or automatic visual recognition to classify scene content? While there have been important advances in the area of computational photography to restore or enhance the visual quality of an image, the capabilities of such techniques have not always translated in a useful way to visual recognition tasks. Consequently, there is a pressing need for the development of algorithms that are designed for the joint problem of improving visual appearance and recognition, which will be an enabling factor for the deployment of visual recognition tools in many real-world scenarios. To address this, we introduce the UG^2 dataset as a large-scale benchmark composed of video imagery captured under challenging conditions, and two enhancement tasks designed to test algorithmic impact on visual quality and automatic object recognition. Furthermore, we propose a set of metrics to evaluate the joint improvement of such tasks as well as individual algorithmic advances, including a novel psychophysics-based evaluation regime for human assessment and a realistic set of quantitative measures for object recognition performance. We introduce six new algorithms for image restoration or enhancement, which were created as part of the IARPA sponsored UG^2 Challenge workshop held at CVPR 2018. Under the proposed evaluation regime, we present an in-depth analysis of these algorithms and a host of deep learning-based and classic baseline approaches. From the observed results, it is evident that we are in the early days of building a bridge between computational photography and visual recognition, leaving many opportunities for innovation in this area. △ Less

Submitted 19 February, 2020; v1 submitted 27 January, 2019; originally announced January 2019.

Comments: CVPR Prize Challenge: http://www.ug2challenge.org

Showing 1–15 of 15 results for author: Nagesh, S