Search | arXiv e-print repository

Digital cloning of online social networks for language-sensitive agent-based modeling of misinformation spread

Authors: Prateek Puri, Gabriel Hassler, Anton Shenk, Sai Katragadda

Abstract: We develop a simulation framework for studying misinformation spread within online social networks that blends agent-based modeling and natural language processing techniques. While many other agent-based simulations exist in this space, questions over their fidelity and generalization to existing networks in part hinders their ability to provide actionable insights. To partially address these con… ▽ More We develop a simulation framework for studying misinformation spread within online social networks that blends agent-based modeling and natural language processing techniques. While many other agent-based simulations exist in this space, questions over their fidelity and generalization to existing networks in part hinders their ability to provide actionable insights. To partially address these concerns, we create a 'digital clone' of a known misinformation sharing network by downloading social media histories for over ten thousand of its users. We parse these histories to both extract the structure of the network and model the nuanced ways in which information is shared and spread among its members. Unlike many other agent-based methods in this space, information sharing between users in our framework is sensitive to topic of discussion, user preferences, and online community dynamics. To evaluate the fidelity of our method, we seed our cloned network with a set of posts recorded in the base network and compare propagation dynamics between the two, observing reasonable agreement across the twin networks over a variety of metrics. Lastly, we explore how the cloned network may serve as a flexible, low-cost testbed for misinformation countermeasure evaluation and red teaming analysis. We hope the tools explored here augment existing efforts in the space and unlock new opportunities for misinformation countermeasure evaluation, a field that may become increasingly important to consider with the anticipated rise of misinformation campaigns fueled by generative artificial intelligence. △ Less

Submitted 23 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.11464 [pdf]

Task-specific regularization loss towards model calibration for reliable lung cancer detection

Authors: Mehar Prateek Kalra, Mansi Singhal, Rohan Raju Dhanakashirur

Abstract: Lung cancer is one of the significant causes of cancer-related deaths globally. Early detection and treatment improve the chances of survival. Traditionally CT scans have been used to extract the most significant lung infection information and diagnose cancer. This process is carried out manually by an expert radiologist. The imbalance in the radiologists-to-population ratio in a country like Indi… ▽ More Lung cancer is one of the significant causes of cancer-related deaths globally. Early detection and treatment improve the chances of survival. Traditionally CT scans have been used to extract the most significant lung infection information and diagnose cancer. This process is carried out manually by an expert radiologist. The imbalance in the radiologists-to-population ratio in a country like India implies significant work pressure on them and thus raises the need to automate a few of their responsibilities. The tendency of modern-day Deep Neural networks to make overconfident mistakes limit their usage to detect cancer. In this paper, we propose a new task-specific loss function to calibrate the neural network to reduce the risk of overconfident mistakes. We use the state-of-the-art Multi-class Difference in Confidence and Accuracy (MDCA) loss in conjunction with the proposed task-specific loss function to achieve the same. We also integrate post-hoc calibration by performing temperature scaling on top of the train-time calibrated model. We demonstrate 5.98% improvement in the Expected Calibration Error (ECE) and a 17.9% improvement in Maximum Calibration Error (MCE) as compared to the best-performing SOTA algorithm. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.11154 [pdf, other]

Motility and pair-wise interactions of chemically active droplets in 1-D confinement

Authors: Pawan Kumar, Prateek Dwivedi, Sobiya Ashraf, Dipin Pillai, Rahul Mangal

Abstract: Self-propelled droplets serve as ideal model systems to delve deeper into understanding of the motion of biological micro-swimmers by simulating their motility. Biological microorganisms are renowned for showcasing a diverse array of dynamic swimming behaviors when confronted with physical constraints. This study aims to elucidate the impact of physical constraints on swimming characteristics of b… ▽ More Self-propelled droplets serve as ideal model systems to delve deeper into understanding of the motion of biological micro-swimmers by simulating their motility. Biological microorganisms are renowned for showcasing a diverse array of dynamic swimming behaviors when confronted with physical constraints. This study aims to elucidate the impact of physical constraints on swimming characteristics of biological microorganisms. To achieve this, we present observations on the individual and pair-wise behavior of micellar solubilized self-propelled 4-Cyano-4'-pentyl-biphenyl (5CB) oil droplets in a square capillary channel filled with a surfactant trimethyl ammonium bromide (TTAB) aqueous solution. To explore the effect of the underlying Péclet ($Pe$) number of the swimming droplets, the study is also performed in the presence of additives such as high molecular weight polymer Polyethylene oxide (PEO) and molecular solute glycerol. The capillary confinement restricts droplet to predominantly one-dimensional (1D) motion, albeit with noticeable differences in their motion across the three scenarios. Through a characterization of the chemical and hydrodynamic flow fields surrounding the droplets, we illustrate that the modification of the droplets' chemical field due to confinement varies significantly based on the underlying differences in the Péclet number ($Pe$) in these cases. This alteration in the chemical field distribution notably affects the individual droplets' motion. Moreover, these distinct chemical field interactions between the droplets also lead to variations in their pair-wise motion, ranging from behaviors like chasing to scattering. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: 13 pages, 9 figures

arXiv:2401.11103 [pdf, other]

Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

Authors: Jiachen T. Wang, Prateek Mittal, Ruoxi Jia

Abstract: This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting… ▽ More This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley's computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: AISTATS 2024 Oral

arXiv:2401.09856 [pdf, other]

EDAF: An End-to-End Delay Analytics Framework for 5G-and-Beyond Networks

Authors: Samie Mostafavi, Marius Tillner, Gourav Prateek Sharma, James Gross

Abstract: Supporting applications in emerging domains like cyber-physical systems and human-in-the-loop scenarios typically requires adherence to strict end-to-end delay guarantees. Contributions of many tandem processes unfolding layer by layer within the wireless network result in violations of delay constraints, thereby severely degrading application performance. Meeting the application's stringent requi… ▽ More Supporting applications in emerging domains like cyber-physical systems and human-in-the-loop scenarios typically requires adherence to strict end-to-end delay guarantees. Contributions of many tandem processes unfolding layer by layer within the wireless network result in violations of delay constraints, thereby severely degrading application performance. Meeting the application's stringent requirements necessitates coordinated optimization of the end-to-end delay by fine-tuning all contributing processes. To achieve this task, we designed and implemented EDAF, a framework to decompose packets' end-to-end delays and determine each component's significance for 5G network. We showcase EDAF on OpenAirInterface 5G uplink, modified to report timestamps across the data plane. By applying the obtained insights, we optimized end-to-end uplink delay by eliminating segmentation and frame-alignment delays, decreasing average delay from 12ms to 4ms. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: Submitted to the 11th International Workshop on Computer and Networking Experimental Research using Testbeds (CNERT 2024)

arXiv:2401.04343 [pdf, other]

Private Fine-tuning of Large Language Models with Zeroth-order Optimization

Authors: Xinyu Tang, Ashwinee Panda, Milad Nasr, Saeed Mahloujifar, Prateek Mittal

Abstract: Differentially private stochastic gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner, but has proven difficult to scale to the era of foundation models. We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods. A key insight into the design of our method is that the direction of the gradient in… ▽ More Differentially private stochastic gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner, but has proven difficult to scale to the era of foundation models. We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods. A key insight into the design of our method is that the direction of the gradient in the zeroth-order optimization we use is random and the only information from training data is the step size, i.e., a scalar. Therefore, we only need to privatize the scalar step size, which is memory-efficient. DP-ZO provides a strong privacy-utility trade-off across different tasks, and model sizes that are comparable to DP-SGD in $(\varepsilon,δ)$-DP. Notably, DP-ZO possesses significant advantages over DP-SGD in memory efficiency, and obtains higher utility in $\varepsilon$-DP when using the Laplace mechanism. △ Less

Submitted 30 January, 2025; v1 submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.02412 [pdf, other]

LLM Augmented LLMs: Expanding Capabilities through Composition

Authors: Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar

Abstract: Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domai… ▽ More Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domains and tasks. In this work, we study the problem of efficient and practical composition of existing foundation models with more specific models to enable newer capabilities. To this end, we propose CALM -- Composition to Augment Language Models -- which introduces cross-attention between models to compose their representations and enable new capabilities. Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using' existing LLMs along with a few additional parameters and data, (ii) Existing model weights are kept intact, and hence preserves existing capabilities, and (iii) Applies to diverse domains and settings. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13\% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40\% over the base model for code generation and explanation tasks -- on-par with fully fine-tuned counterparts. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 17 pages, 2 figures, 8 tables

arXiv:2401.00446 [pdf, other]

Dissipation of AGN jets in a clumpy interstellar medium

Authors: Riju Dutta, Prateek Sharma, Kartick C. Sarkar, James M. Stone

Abstract: Accreting supermassive black holes (SMBHs) frequently power jets that interact with the interstellar/circumgalactic medium (ISM/CGM), regulating star-formation in the galaxy. Highly supersonic jets launched by active galactic nuclei (AGN) power a cocoon that confines them and shocks the ambient medium. We build upon the models of narrow conical jets interacting with a smooth ambient medium, to inc… ▽ More Accreting supermassive black holes (SMBHs) frequently power jets that interact with the interstellar/circumgalactic medium (ISM/CGM), regulating star-formation in the galaxy. Highly supersonic jets launched by active galactic nuclei (AGN) power a cocoon that confines them and shocks the ambient medium. We build upon the models of narrow conical jets interacting with a smooth ambient medium, to include the effect of dense clouds that are an essential ingredient of a multiphase ISM. The key physical ingredient of this model is that the clouds along the supersonic jet-beam strongly decelerate the jet-head, but the subsonic cocoon easily moves around the clouds without much resistance. We propose scalings for important physical quantities -- cocoon pressure, head & cocoon speed, and jet radius. We obtain, for the first time, the analytic condition on clumpiness of the ambient medium for the jet to dissipate within the cocoon and verify it with numerical simulations of conical jets interacting with a uniform ISM with embedded spherical clouds. A jet is defined to be dissipated when the cocoon speed exceeds the speed of the jet-head. We compare our models to more sophisticated numerical simulations, direct observations of jet-ISM interaction (e.g., quasar J1316+1753), and discuss implications for the Fermi/eROSITA bubbles. Our work also motivates effective subgrid models for AGN jet feedback in a clumpy ISM unresolved by the present generation of cosmological galaxy formation simulations. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 23 pages, 12 figures, 3 tables; to be submitted; comments are welcome; accompanying video: http://youtu.be/DUpSwMMrGfk

arXiv:2312.15010 [pdf, other]

SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology

Authors: Saarthak Kapse, Pushpak Pati, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R. Gupta, Prateek Prasanna

Abstract: Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selectio… ▽ More Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selections. To address this, we propose Self-Interpretable MIL (SI-MIL), a method intrinsically designed for interpretability from the very outset. SI-MIL employs a deep MIL framework to guide an interpretable branch grounded on handcrafted pathological features, facilitating linear predictions. Beyond identifying salient regions, SI-MIL uniquely provides feature-level interpretations rooted in pathological insights for WSIs. Notably, SI-MIL, with its linear prediction constraints, challenges the prevalent myth of an inevitable trade-off between model interpretability and performance, demonstrating competitive results compared to state-of-the-art methods on WSI-level prediction tasks across three cancer types. In addition, we thoroughly benchmark the local and global-interpretability of SI-MIL in terms of statistical analysis, a domain expert study, and desiderata of interpretability, namely, user-friendliness and faithfulness. △ Less

Submitted 18 May, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.14461 [pdf, other]

Attacking Byzantine Robust Aggregation in High Dimensions

Authors: Sarthak Choudhary, Aashish Kolluri, Prateek Saxena

Abstract: Training modern neural networks or models typically requires averaging over a sample of high-dimensional vectors. Poisoning attacks can skew or bias the average vectors used to train the model, forcing the model to learn specific patterns or avoid learning anything useful. Byzantine robust aggregation is a principled algorithmic defense against such biasing. Robust aggregators can bound the maximu… ▽ More Training modern neural networks or models typically requires averaging over a sample of high-dimensional vectors. Poisoning attacks can skew or bias the average vectors used to train the model, forcing the model to learn specific patterns or avoid learning anything useful. Byzantine robust aggregation is a principled algorithmic defense against such biasing. Robust aggregators can bound the maximum bias in computing centrality statistics, such as mean, even when some fraction of inputs are arbitrarily corrupted. Designing such aggregators is challenging when dealing with high dimensions. However, the first polynomial-time algorithms with strong theoretical bounds on the bias have recently been proposed. Their bounds are independent of the number of dimensions, promising a conceptual limit on the power of poisoning attacks in their ongoing arms race against defenses. In this paper, we show a new attack called HIDRA on practical realization of strong defenses which subverts their claim of dimension-independent bias. HIDRA highlights a novel computational bottleneck that has not been a concern of prior information-theoretic analysis. Our experimental evaluation shows that our attacks almost completely destroy the model performance, whereas existing attacks with the same goal fail to have much effect. Our findings leave the arms race between poisoning attacks and provable defenses wide open. △ Less

Submitted 15 December, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.07330 [pdf, other]

Learned representation-guided diffusion models for large-image generation

Authors: Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, Dimitris Samaras

Abstract: To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning… ▽ More To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by assembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Generating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions. △ Less

Submitted 28 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.06992 [pdf, other]

doi 10.3847/1538-4357/ad1605

Between the cosmic-ray `knee' and the `ankle': Contribution from star clusters

Authors: Sourav Bhadra, Satyendra Thoudam, Biman B Nath, Prateek Sharma

Abstract: We show that massive young star clusters may be possible candidates that can accelerate Galactic cosmic rays (CRs) in the range of $10^7\hbox{--}10^9$ GeV (between the `knee' and `ankle'). Various plausible scenarios such as acceleration at the wind termination shock (WTS), supernova shocks inside these young star clusters, etc. have been proposed,since it is difficult to accelerate particles up t… ▽ More We show that massive young star clusters may be possible candidates that can accelerate Galactic cosmic rays (CRs) in the range of $10^7\hbox{--}10^9$ GeV (between the `knee' and `ankle'). Various plausible scenarios such as acceleration at the wind termination shock (WTS), supernova shocks inside these young star clusters, etc. have been proposed,since it is difficult to accelerate particles up to the $10^7\hbox{--}10^9$ GeV range in the standard paradigm of CR acceleration in supernova remnants. We consider a model for the production of different nuclei in CRs from massive stellar winds using the observed distribution of young star clusters in the Galactic plane. We present a detailed calculation of CR transport in the Galaxy, taking into account the effect of diffusion, interaction losses during propagation, and particle re-acceleration by old supernova remnants to determine the all-particle CR spectrum. Using the maximum energy estimate from the Hillas criterion, we argue that a young massive star cluster can accelerate protons up to a few tens of PeV. Upon comparison with the observed data, our model requires a CR source spectrum with an exponential cutoff of $5\times 10^7 Z$ GeV ($50\,Z$~PeV) from these clusters together with a cosmic-ray injection fraction of $\sim 5\%$ of the wind kinetic energy. We discuss the possibility of achieving these requirements in star clusters, and the associated uncertainties, in the context of considering star clusters as the natural accelerator of the `second component' of Galactic cosmic rays. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 18 pages, 6 figures, accepted for publication in ApJ

arXiv:2312.06749 [pdf, other]

doi 10.1007/JHEP06(2024)089

Electroweak Phase Transition with a Double Well Done Doubly Well

Authors: Prateek Agrawal, Simone Blasi, Alberto Mariotti, Michael Nee

Abstract: We revisit the electroweak phase transition in the scalar singlet extension of the standard model with a $\mathbb{Z}_2$ symmetry. In significant parts of the parameter space the phase transition occurs in two steps - including canonical benchmarks used in experimental projections for gravitational waves. Domain walls produced in the first step of the transition seed the final step to the electrowe… ▽ More We revisit the electroweak phase transition in the scalar singlet extension of the standard model with a $\mathbb{Z}_2$ symmetry. In significant parts of the parameter space the phase transition occurs in two steps - including canonical benchmarks used in experimental projections for gravitational waves. Domain walls produced in the first step of the transition seed the final step to the electroweak vacuum, an effect which is typically neglected but leads to an exponentially enhanced tunnelling rate. We improve previous results obtained for the seeded transition, which made use of the thin-wall or high temperature approximations, by using the mountain pass algorithm that was recently proposed as a useful tool for seeded processes. We then determine the predictions of the seeded transition for the latent heat, bubble size and characteristic time scale of the transition. Differences compared to homogeneous transitions are most pronounced when there are relatively few domain walls per hubble patch, potentially leading to an enhanced gravitational wave signal. We also provide a derivation of the percolation criteria for a generic seeded transition, which applies to the domain wall seeds we consider as well as to strings and monopoles. △ Less

Submitted 29 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 24 pages, 8 figures, Journal Version

Report number: DESY-23-208

arXiv:2311.18281 [pdf, other]

Utilizing Radiomic Feature Analysis For Automated MRI Keypoint Detection: Enhancing Graph Applications

Authors: Sahar Almahfouz Nasser, Shashwat Pathak, Keshav Singhal, Mohit Meena, Nihar Gupte, Ananya Chinmaya, Prateek Garg, Amit Sethi

Abstract: Graph neural networks (GNNs) present a promising alternative to CNNs and transformers in certain image processing applications due to their parameter-efficiency in modeling spatial relationships. Currently, a major area of research involves the converting non-graph input data for GNN-based models, notably in scenarios where the data originates from images. One approach involves converting images i… ▽ More Graph neural networks (GNNs) present a promising alternative to CNNs and transformers in certain image processing applications due to their parameter-efficiency in modeling spatial relationships. Currently, a major area of research involves the converting non-graph input data for GNN-based models, notably in scenarios where the data originates from images. One approach involves converting images into nodes by identifying significant keypoints within them. Super-Retina, a semi-supervised technique, has been utilized for detecting keypoints in retinal images. However, its limitations lie in the dependency on a small initial set of ground truth keypoints, which is progressively expanded to detect more keypoints. Having encountered difficulties in detecting consistent initial keypoints in brain images using SIFT and LoFTR, we proposed a new approach: radiomic feature-based keypoint detection. Demonstrating the anatomical significance of the detected keypoints was achieved by showcasing their efficacy in improving registration processes guided by these keypoints. Subsequently, these keypoints were employed as the ground truth for the keypoint detection method (LK-SuperRetina). Furthermore, the study showcases the application of GNNs in image matching, highlighting their superior performance in terms of both the number of good matches and confidence scores. This research sets the stage for expanding GNN applications into various other applications, including but not limited to image classification, segmentation, and registration. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.15468 [pdf, other]

doi 10.1103/PhysRevD.108.103509

Adaptive friends-of-friends algorithm for identifying gravitationally bound cosmological structures

Authors: Prateek Gupta, Surajit Paul

Abstract: The Universe at the present epoch is found to be a network of matter over-dense and under-dense regions. To date, this picture of the Universe is best revealed through cosmological large-volume simulations and large-scale galaxy redshift surveys, in which, the most important step is the appropriate identification of structures. So far, these structures are identified using various group finding co… ▽ More The Universe at the present epoch is found to be a network of matter over-dense and under-dense regions. To date, this picture of the Universe is best revealed through cosmological large-volume simulations and large-scale galaxy redshift surveys, in which, the most important step is the appropriate identification of structures. So far, these structures are identified using various group finding codes, mostly based on the friends-of-friends (FoF) or spherical over-density (SO) algorithms. Although, the main purpose is to identify gravitationally bound structures, surprisingly, the mass information has hardly been used effectively by these codes. Moreover, the methods used so far either constrain the over-density or use the real unstructured geometry only. Even though these are key factors in the accurate determination of structures-mass information, hardly any attempt has been made as yet to consider these important parameters together while formulating the grouping algorithms. In this paper, we present our proposed algorithm which takes care of all the above-mentioned relevant features and ensures the bound structures by means of physical quantities, mainly mass and the total energy information. We introduced a novel concept of physically relevant arm-length for each element depending on their individual gravity leading to a distinct linking length for each unique pair of elements. This proposed algorithm is thus fundamentally new that, not only able to catch the gravitationally bound, real unstructured geometry, it does identify it roughly within a predefined physically motivated density threshold. Such a thing could not be simultaneously achieved before by any of the usual FoF or SO-based methods. We also demonstrate the unique ability of the code in the appropriate identification of structures, both from large volume cosmological simulations as well as from galaxy redshift surveys. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 28 pages, 13 figures, published in the Physical Review D

Journal ref: Vol. 108, Issue 10, Page 103509, Year 2023, Phys. Rev. D

arXiv:2311.14744 [pdf]

Coarse-Grained Configurational Polymer Fingerprints for Property Prediction using Machine Learning

Authors: Ishan Kumar, Prateek K Jha

Abstract: In this work, we present a method to generate a configurational level fingerprint for polymers using the Bead-Spring-Model. Unlike some of the previous fingerprinting approaches that employ monomer-level information where atomistic descriptors are computed using quantum chemistry calculations, this approach incorporates configurational information from a coarse-grained model of a long polymer chai… ▽ More In this work, we present a method to generate a configurational level fingerprint for polymers using the Bead-Spring-Model. Unlike some of the previous fingerprinting approaches that employ monomer-level information where atomistic descriptors are computed using quantum chemistry calculations, this approach incorporates configurational information from a coarse-grained model of a long polymer chain. The proposed approach may be advantageous for the study of behavior resulting from large molecular weights. To create this fingerprint, we make use of two kinds of descriptors. First, we calculate certain geometric descriptors like Re2, Rg2 etc. and label them as Calculated Descriptors. Second, we generate a set of data-driven descriptors using an unsupervised autoencoder model and call them Learnt Descriptors. Using a combination of both of them, we are able to learn mappings from the structure to various properties of the polymer chain by training ML models. We test our fingerprint to predict the probability of occurrence of a configuration at equilibrium, which is approximated by a simple linear relationship between the instantaneous internal energy and equilibrium average internal energy. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.13171 [pdf, other]

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

Authors: Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal

Abstract: Parameter-efficient fine-tuning (PEFT) techniques make it possible to efficiently adapt a language model to create "expert" models that specialize to new tasks or domains. Recent techniques in model merging and compositional generalization leverage these expert models by dynamically composing modules to improve zero/few-shot generalization. Despite the efficiency of PEFT methods, the size of exper… ▽ More Parameter-efficient fine-tuning (PEFT) techniques make it possible to efficiently adapt a language model to create "expert" models that specialize to new tasks or domains. Recent techniques in model merging and compositional generalization leverage these expert models by dynamically composing modules to improve zero/few-shot generalization. Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU. To address these issues, we present ComPEFT, a novel method for compressing fine-tuning residuals (task vectors) of PEFT based models. ComPEFT employs sparsification and ternary quantization to reduce the size of the PEFT module without performing any additional retraining while preserving or enhancing model performance. In extensive evaluation across T5, T0, and LLaMA-based models with 200M - 65B parameters, ComPEFT achieves compression ratios of 8x - 50x. In particular, we show that ComPEFT improves with scale - stronger models exhibit higher compressibility and better performance. For example, we show that ComPEFT applied to LLaMA outperforms QLoRA by 4.16% on MMLU with a storage size reduction of up to 26x. In addition, we show that the compressed experts produced by ComPEFT maintain few-shot compositional generalization capabilities, facilitate efficient communication and computation, and exhibit enhanced performance when merged. Lastly, we provide an analysis of different method components, compare it with other PEFT methods, and test ComPEFT's efficacy for compressing the residual of full-finetuning. Our code is available at https://github.com/prateeky2806/compeft. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 25 Pages, 6 Figures, 16 Tables

arXiv:2311.13168 [pdf, other]

3D Face Style Transfer with a Hybrid Solution of NeRF and Mesh Rasterization

Authors: Jianwei Feng, Prateek Singhal

Abstract: Style transfer for human face has been widely researched in recent years. Majority of the existing approaches work in 2D image domain and have 3D inconsistency issue when applied on different viewpoints of the same face. In this paper, we tackle the problem of 3D face style transfer which aims at generating stylized novel views of a 3D human face with multi-view consistency. We propose to use a ne… ▽ More Style transfer for human face has been widely researched in recent years. Majority of the existing approaches work in 2D image domain and have 3D inconsistency issue when applied on different viewpoints of the same face. In this paper, we tackle the problem of 3D face style transfer which aims at generating stylized novel views of a 3D human face with multi-view consistency. We propose to use a neural radiance field (NeRF) to represent 3D human face and combine it with 2D style transfer to stylize the 3D face. We find that directly training a NeRF on stylized images from 2D style transfer brings in 3D inconsistency issue and causes blurriness. On the other hand, training a NeRF jointly with 2D style transfer objectives shows poor convergence due to the identity and head pose gap between style image and content image. It also poses challenge in training time and memory due to the need of volume rendering for full image to apply style transfer loss functions. We therefore propose a hybrid framework of NeRF and mesh rasterization to combine the benefits of high fidelity geometry reconstruction of NeRF and fast rendering speed of mesh. Our framework consists of three stages: 1. Training a NeRF model on input face images to learn the 3D geometry; 2. Extracting a mesh from the trained NeRF model and optimizing it with style transfer objectives via differentiable rasterization; 3. Training a new color network in NeRF conditioned on a style embedding to enable arbitrary style transfer to the 3D face. Experiment results show that our approach generates high quality face style transfer with great 3D consistency, while also enabling a flexible style control. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Journal ref: WACV 2024

arXiv:2311.07449 [pdf, other]

Semantically Grounded QFormer for Efficient Vision Language Understanding

Authors: Moulik Choraria, Xinbo Wu, Sourya Basu, Nitesh Sekhar, Yue Wu, Xu Zhang, Prateek Singhal, Lav R. Varshney

Abstract: General purpose Vision Language Models (VLMs) have received tremendous interest in recent years, owing to their ability to learn rich vision-language correlations as well as their broad zero-shot competencies. One immensely popular line of work utilizes frozen unimodal models, by bridging vision representations to language using a trainable module called the QFormer. However, this method relies he… ▽ More General purpose Vision Language Models (VLMs) have received tremendous interest in recent years, owing to their ability to learn rich vision-language correlations as well as their broad zero-shot competencies. One immensely popular line of work utilizes frozen unimodal models, by bridging vision representations to language using a trainable module called the QFormer. However, this method relies heavily on large-scale multimodal pretraining with huge computational overheads. To that end, we propose a more efficient framework for QFormer-based vision-language alignment. Our key idea relies on the observation that QFormer latents correspond more strongly to the frozen LLM's intermediate latent space. Consequently, instead of using QFormer latents as inputs to the LLM, we alter the framework by using the latents to directly condition the LLM latent space for image-to-text generation. We demonstrate the effectiveness of our approach against existing baselines in improving the efficiency of vision-language pretraining. △ Less

Submitted 16 December, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: Preprint Under Review

arXiv:2311.03376 [pdf, other]

Blocked Collaborative Bandits: Online Collaborative Filtering with Per-Item Budget Constraints

Authors: Soumyabrata Pal, Arun Sai Suggala, Karthikeyan Shanmugam, Prateek Jain

Abstract: We consider the problem of \emph{blocked} collaborative bandits where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. Our goal is to design algorithms that maximize the cumulative reward accrued by all the users over time, under the \em… ▽ More We consider the problem of \emph{blocked} collaborative bandits where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. Our goal is to design algorithms that maximize the cumulative reward accrued by all the users over time, under the \emph{constraint} that no arm of a user is pulled more than $\mathsf{B}$ times. This problem has been originally considered by \cite{Bresler:2014}, and designing regret-optimal algorithms for it has since remained an open problem. In this work, we propose an algorithm called \texttt{B-LATTICE} (Blocked Latent bAndiTs via maTrIx ComplEtion) that collaborates across users, while simultaneously satisfying the budget constraints, to maximize their cumulative rewards. Theoretically, under certain reasonable assumptions on the latent structure, with $\mathsf{M}$ users, $\mathsf{N}$ arms, $\mathsf{T}$ rounds per user, and $\mathsf{C}=O(1)$ latent clusters, \texttt{B-LATTICE} achieves a per-user regret of $\widetilde{O}(\sqrt{\mathsf{T}(1 + \mathsf{N}\mathsf{M}^{-1})}$ under a budget constraint of $\mathsf{B}=Θ(\log \mathsf{T})$. These are the first sub-linear regret bounds for this problem, and match the minimax regret bounds when $\mathsf{B}=\mathsf{T}$. Empirically, we demonstrate that our algorithm has superior performance over baselines even when $\mathsf{B}=1$. \texttt{B-LATTICE} runs in phases where in each phase it clusters users into groups and collaborates across users within a group to quickly learn their reward models. △ Less

Submitted 31 October, 2023; originally announced November 2023.

Comments: 44 pages, To Appear in NeurIPS 2023

arXiv:2311.01279 [pdf, other]

doi 10.1145/3583740.3626819

ExPECA: An Experimental Platform for Trustworthy Edge Computing Applications

Authors: Samie Mostafavi, Vishnu Narayanan Moothedath, Stefan Rönngren, Neelabhro Roy, Gourav Prateek Sharma, Sangwon Seo, Manuel Olguín Muñoz, James Gross

Abstract: This paper presents ExPECA, an edge computing and wireless communication research testbed designed to tackle two pressing challenges: comprehensive end-to-end experimentation and high levels of experimental reproducibility. Leveraging OpenStack-based Chameleon Infrastructure (CHI) framework for its proven flexibility and ease of operation, ExPECA is located in a unique, isolated underground facili… ▽ More This paper presents ExPECA, an edge computing and wireless communication research testbed designed to tackle two pressing challenges: comprehensive end-to-end experimentation and high levels of experimental reproducibility. Leveraging OpenStack-based Chameleon Infrastructure (CHI) framework for its proven flexibility and ease of operation, ExPECA is located in a unique, isolated underground facility, providing a highly controlled setting for wireless experiments. The testbed is engineered to facilitate integrated studies of both communication and computation, offering a diverse array of Software-Defined Radios (SDR) and Commercial Off-The-Shelf (COTS) wireless and wired links, as well as containerized computational environments. We exemplify the experimental possibilities of the testbed using OpenRTiST, a latency-sensitive, bandwidth-intensive application, and analyze its performance. Lastly, we highlight an array of research domains and experimental setups that stand to gain from ExPECA's features, including closed-loop applications and time-sensitive networking. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.19332 [pdf, other]

Solar Flare Prediction and Feature Selection using Light Gradient Boosting Machine Algorithm

Authors: Vysakh P. A., Prateek Mayank

Abstract: Solar flares are among the most severe space weather phenomena, and they have the capacity to generate radiation storms and radio disruptions on Earth. The accurate prediction of solar flare events remains a significant challenge, requiring continuous monitoring and identification of specific features that can aid in forecasting this phenomenon, particularly for different classes of solar flares.… ▽ More Solar flares are among the most severe space weather phenomena, and they have the capacity to generate radiation storms and radio disruptions on Earth. The accurate prediction of solar flare events remains a significant challenge, requiring continuous monitoring and identification of specific features that can aid in forecasting this phenomenon, particularly for different classes of solar flares. In this study, we aim to forecast C and M class solar flares utilising a machine-learning algorithm, namely the Light Gradient Boosting Machine. We have utilised a dataset spanning 9 years, obtained from the Space-weather Helioseismic and Magnetic Imager Active Region Patches (SHARP), with a temporal resolution of 1 hour. A total of 37 flare features were considered in our analysis, comprising of 25 active region parameters and 12 flare history features. To address the issue of class imbalance in solar flare data, we employed the Synthetic Minority Oversampling Technique (SMOTE). We used two labeling approaches in our study: a fixed 24-hour window label and a varying window that considers the changing nature of solar activity. Then, the developed machine learning algorithm was trained and tested using forecast verification metrics, with an emphasis on evaluating the true skill statistic (TSS). Furthermore, we implemented a feature selection algorithm to determine the most significant features from the pool of 37 features that could distinguish between flaring and non-flaring active regions. We found that utilising a limited set of useful features resulted in improved prediction performance. For the 24-hour prediction window, we achieved a TSS of 0.63 (0.69) and accuracy of 0.90 (0.97) for $\geq$C ($\geq$M) class solar flares. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted for publication in Solar Physics journal

arXiv:2310.18219 [pdf, other]

SWASTi-CME: A physics-based model to study CME evolution and its interaction with Solar Wind

Authors: Prateek Mayank, Bhargav Vaidya, Wageesh Mishra, D. Chakrabarty

Abstract: Coronal mass ejections (CMEs) are primary drivers of space weather and studying their evolution in the inner heliosphere is vital to prepare for a timely response. Solar wind streams, acting as background, influence their propagation in the heliosphere and associated geomagnetic storm activity. This study introduces SWASTi-CME, a newly developed MHD-based CME model integrated into the Space Weathe… ▽ More Coronal mass ejections (CMEs) are primary drivers of space weather and studying their evolution in the inner heliosphere is vital to prepare for a timely response. Solar wind streams, acting as background, influence their propagation in the heliosphere and associated geomagnetic storm activity. This study introduces SWASTi-CME, a newly developed MHD-based CME model integrated into the Space Weather Adaptive SimulaTion (SWASTi) framework. It incorporates a non-magnetized elliptic cone and a magnetized flux rope CME model. To validate the model's performance with in-situ observation at L1, two Carrington rotations were chosen: one during solar maxima with multiple CMEs, and one during solar minima with a single CME. The study also presents a quantitative analysis of CME-solar wind interaction using this model. To account for ambient solar wind effects, two scenarios of different complexity in solar wind conditions were established. The results indicate that ambient conditions can significantly impact some of the CME properties in the inner heliosphere. We found that the drag force on the CME front exhibits a variable nature, resulting in asymmetric deformation of the CME leading edge. Additionally, the study reveals that the impact on the distribution of CME internal pressure primarily occurs during the initial stage, while the CME density distribution is affected throughout its propagation. Moreover, regardless of the ambient conditions, it was observed that after a certain propagation time (t), the CME volume follows a non-fractal power-law expansion ($\propto t^{3.03-3.33}$) due to the attainment of a balanced state with ambient. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: Accepted for publication in ApJS

arXiv:2310.16033 [pdf, other]

Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs

Authors: Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

Abstract: Multimodal Large Language Models (MLLMs) have recently achieved promising zero-shot accuracy on visual question answering (VQA) -- a fundamental task affecting various downstream applications and domains. Given the great potential for the broad use of these models, it is important to investigate their limitations in dealing with different image and question properties. In this work, we investigate… ▽ More Multimodal Large Language Models (MLLMs) have recently achieved promising zero-shot accuracy on visual question answering (VQA) -- a fundamental task affecting various downstream applications and domains. Given the great potential for the broad use of these models, it is important to investigate their limitations in dealing with different image and question properties. In this work, we investigate whether MLLMs can perceive small details as well as large details in images. In particular, we show that their zero-shot accuracy in answering visual questions is very sensitive to the size of the visual subject of the question, declining up to 46% with size. Furthermore, we show that this effect is causal by observing that human visual cropping can significantly mitigate their sensitivity to size. Inspired by the usefulness of human cropping, we then propose five automatic visual cropping methods -- leveraging either external localization models or the decision process of the given MLLM itself -- as inference time mechanisms to improve the zero-shot performance of MLLMs. We study their effectiveness on four popular VQA datasets, and a subset of the VQAv2 dataset tailored towards fine visual details. Our findings suggest that MLLMs should be used with caution in detail-sensitive VQA applications, and that visual cropping is a promising direction to improve their zero-shot performance. To facilitate further investigation of MLLMs' behaviors, our code and data are publicly released. △ Less

Submitted 12 February, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 20 pages, 12 figures, 7 tables

arXiv:2310.13076 [pdf, other]

PatchCURE: Improving Certifiable Robustness, Model Utility, and Computation Efficiency of Adversarial Patch Defenses

Authors: Chong Xiang, Tong Wu, Sihui Dai, Jonathan Petit, Suman Jana, Prateek Mittal

Abstract: State-of-the-art defenses against adversarial patch attacks can now achieve strong certifiable robustness with a marginal drop in model utility. However, this impressive performance typically comes at the cost of 10-100x more inference-time computation compared to undefended models -- the research community has witnessed an intense three-way trade-off between certifiable robustness, model utility,… ▽ More State-of-the-art defenses against adversarial patch attacks can now achieve strong certifiable robustness with a marginal drop in model utility. However, this impressive performance typically comes at the cost of 10-100x more inference-time computation compared to undefended models -- the research community has witnessed an intense three-way trade-off between certifiable robustness, model utility, and computation efficiency. In this paper, we propose a defense framework named PatchCURE to approach this trade-off problem. PatchCURE provides sufficient "knobs" for tuning defense performance and allows us to build a family of defenses: the most robust PatchCURE instance can match the performance of any existing state-of-the-art defense (without efficiency considerations); the most efficient PatchCURE instance has similar inference efficiency as undefended models. Notably, PatchCURE achieves state-of-the-art robustness and utility performance across all different efficiency levels, e.g., 16-23% absolute clean accuracy and certified robust accuracy advantages over prior defenses when requiring computation efficiency to be close to undefended models. The family of PatchCURE defenses enables us to flexibly choose appropriate defenses to satisfy given computation and/or utility constraints in practice. △ Less

Submitted 2 April, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: USENIX Security 2024. (extended) technical report

arXiv:2310.12916 [pdf, ps, other]

Plücker inequalities for weakly separated coordinates in totally nonnegative Grassmannian

Authors: Daniel Soskin, Prateek Kumar Vishwakarma

Abstract: We show that the partial sums of the long Plücker relations for pairs of weakly separated Plücker coordinates oscillate around $0$ on the totally nonnegative part of the Grassmannian. Our result generalizes the classical oscillating inequalities by Gantmacher--Krein (1941) and recent results on totally nonnegative matrix inequalities by Fallat--Vishwakarma (2023). In fact we obtain a characterizat… ▽ More We show that the partial sums of the long Plücker relations for pairs of weakly separated Plücker coordinates oscillate around $0$ on the totally nonnegative part of the Grassmannian. Our result generalizes the classical oscillating inequalities by Gantmacher--Krein (1941) and recent results on totally nonnegative matrix inequalities by Fallat--Vishwakarma (2023). In fact we obtain a characterization of weak separability, by showing that no other pair of Plücker coordinates satisfies this property. Weakly separated sets were initially introduced by Leclerc and Zelevinsky and are closely connected with the cluster algebra of the Grassmannian. Moreover, our work connects several fundamental objects such as weak separability, Temperley--Lieb immanants, and Plücker relations, and provides a very general and natural class of additive determinantal inequalities on the totally nonnegative part of the Grassmannian. △ Less

Submitted 24 January, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: Updated the main theorem (and its proof) to a more general setting. Minor changes to the exposition. 21 pages, 20 figures

MSC Class: Primary 15A15; 15B48; 15A15; secondary 15A45; 20C08

arXiv:2310.10636 [pdf, other]

Dual-Encoders for Extreme Multi-Label Classification

Authors: Nilesh Gupta, Devvrit Khatri, Ankit S Rawat, Srinadh Bhojanapalli, Prateek Jain, Inderjit Dhillon

Abstract: Dual-encoder (DE) models are widely used in retrieval tasks, most commonly studied on open QA benchmarks that are often characterized by multi-class and limited training data. In contrast, their performance in multi-label and data-rich retrieval settings like extreme multi-label classification (XMC), remains under-explored. Current empirical evidence indicates that DE models fall significantly sho… ▽ More Dual-encoder (DE) models are widely used in retrieval tasks, most commonly studied on open QA benchmarks that are often characterized by multi-class and limited training data. In contrast, their performance in multi-label and data-rich retrieval settings like extreme multi-label classification (XMC), remains under-explored. Current empirical evidence indicates that DE models fall significantly short on XMC benchmarks, where SOTA methods linearly scale the number of learnable parameters with the total number of classes (documents in the corpus) by employing per-class classification head. To this end, we first study and highlight that existing multi-label contrastive training losses are not appropriate for training DE models on XMC tasks. We propose decoupled softmax loss - a simple modification to the InfoNCE loss - that overcomes the limitations of existing contrastive losses. We further extend our loss design to a soft top-k operator-based loss which is tailored to optimize top-k prediction performance. When trained with our proposed loss functions, standard DE models alone can match or outperform SOTA methods by up to 2% at Precision@1 even on the largest XMC datasets while being 20x smaller in terms of the number of trainable parameters. This leads to more parameter-efficient and universally applicable solutions for retrieval tasks. Our code and models are publicly available at https://github.com/nilesh2797/dexml. △ Less

Submitted 17 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 27 pages, 8 figures

Journal ref: ICLR 2024 camera-ready publication

arXiv:2310.10294 [pdf, other]

Key-phrase boosted unsupervised summary generation for FinTech organization

Authors: Aadit Deshpande, Shreya Goyal, Prateek Nagwanshi, Avinash Tripathy

Abstract: With the recent advances in social media, the use of NLP techniques in social media data analysis has become an emerging research direction. Business organizations can particularly benefit from such an analysis of social media discourse, providing an external perspective on consumer behavior. Some of the NLP applications such as intent detection, sentiment classification, text summarization can he… ▽ More With the recent advances in social media, the use of NLP techniques in social media data analysis has become an emerging research direction. Business organizations can particularly benefit from such an analysis of social media discourse, providing an external perspective on consumer behavior. Some of the NLP applications such as intent detection, sentiment classification, text summarization can help FinTech organizations to utilize the social media language data to find useful external insights and can be further utilized for downstream NLP tasks. Particularly, a summary which highlights the intents and sentiments of the users can be very useful for these organizations to get an external perspective. This external perspective can help organizations to better manage their products, offers, promotional campaigns, etc. However, certain challenges, such as a lack of labeled domain-specific datasets impede further exploration of these tasks in the FinTech domain. To overcome these challenges, we design an unsupervised phrase-based summary generation from social media data, using 'Action-Object' pairs (intent phrases). We evaluated the proposed method with other key-phrase based summary generation methods in the direction of contextual information of various Reddit discussion threads, available in the different summaries. We introduce certain "Context Metrics" such as the number of Unique words, Action-Object pairs, and Noun chunks to evaluate the contextual information retrieved from the source text in these phrase-based summaries. We demonstrate that our methods significantly outperform the baseline on these metrics, thus providing a qualitative and quantitative measure of their efficacy. Proposed framework has been leveraged as a web utility portal hosted within Amex. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 8 pages, 4 figures

arXiv:2310.08891 [pdf, other]

EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval

Authors: Ramnath Kumar, Anshul Mittal, Nilesh Gupta, Aditya Kusupati, Inderjit Dhillon, Prateek Jain

Abstract: Dense embedding-based retrieval is widely used for semantic search and ranking. However, conventional two-stage approaches, involving contrastive embedding learning followed by approximate nearest neighbor search (ANNS), can suffer from misalignment between these stages. This mismatch degrades retrieval performance. We propose End-to-end Hierarchical Indexing (EHI), a novel method that directly ad… ▽ More Dense embedding-based retrieval is widely used for semantic search and ranking. However, conventional two-stage approaches, involving contrastive embedding learning followed by approximate nearest neighbor search (ANNS), can suffer from misalignment between these stages. This mismatch degrades retrieval performance. We propose End-to-end Hierarchical Indexing (EHI), a novel method that directly addresses this issue by jointly optimizing embedding generation and ANNS structure. EHI leverages a dual encoder for embedding queries and documents while simultaneously learning an inverted file index (IVF)-style tree structure. To facilitate the effective learning of this discrete structure, EHI introduces dense path embeddings that encodes the path traversed by queries and documents within the tree. Extensive evaluations on standard benchmarks, including MS MARCO (Dev set) and TREC DL19, demonstrate EHI's superiority over traditional ANNS index. Under the same computational constraints, EHI outperforms existing state-of-the-art methods by +1.45% in MRR@10 on MS MARCO (Dev) and +8.2% in nDCG@10 on TREC DL19, highlighting the benefits of our end-to-end approach. △ Less

Submitted 13 October, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.07931 [pdf, other]

D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning

Authors: Adyasha Maharana, Prateek Yadav, Mohit Bansal

Abstract: Analytical theories suggest that higher-quality data can lead to lower test errors in models trained on a fixed data budget. Moreover, a model can be trained on a lower compute budget without compromising performance if a dataset can be stripped of its redundancies. Coreset selection (or data pruning) seeks to select a subset of the training data so as to maximize the performance of models trained… ▽ More Analytical theories suggest that higher-quality data can lead to lower test errors in models trained on a fixed data budget. Moreover, a model can be trained on a lower compute budget without compromising performance if a dataset can be stripped of its redundancies. Coreset selection (or data pruning) seeks to select a subset of the training data so as to maximize the performance of models trained on this subset, also referred to as coreset. There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics. Optimizing for data diversity leads to a coreset that is biased towards easier samples, whereas, selection by difficulty ranking omits easy samples that are necessary for the training of deep learning models. This demonstrates that data diversity and importance scores are two complementary factors that need to be jointly considered during coreset selection. We represent a dataset as an undirected graph and propose a novel pruning algorithm, D2 Pruning, that uses forward and reverse message passing over this dataset graph for coreset selection. D2 Pruning updates the difficulty scores of each example by incorporating the difficulty of its neighboring examples in the dataset graph. Then, these updated difficulty scores direct a graph-based sampling method to select a coreset that encapsulates both diverse and difficult regions of the dataset space. We evaluate supervised and self-supervised versions of our method on various vision and language datasets. Results show that D2 Pruning improves coreset selection over previous state-of-the-art methods for up to 70% pruning rates. Additionally, we find that using D2 Pruning for filtering large multimodal datasets leads to increased diversity in the dataset and improved generalization of pretrained models. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 17 pages (Our code is available at https://github.com/adymaharana/d2pruning)

arXiv:2310.07727 [pdf, other]

Deep Learning based Systems for Crater Detection: A Review

Authors: Atal Tewari, K Prateek, Amrita Singh, Nitin Khanna

Abstract: Craters are one of the most prominent features on planetary surfaces, used in applications such as age estimation, hazard detection, and spacecraft navigation. Crater detection is a challenging problem due to various aspects, including complex crater characteristics such as varying sizes and shapes, data resolution, and planetary data types. Similar to other computer vision tasks, deep learning-ba… ▽ More Craters are one of the most prominent features on planetary surfaces, used in applications such as age estimation, hazard detection, and spacecraft navigation. Crater detection is a challenging problem due to various aspects, including complex crater characteristics such as varying sizes and shapes, data resolution, and planetary data types. Similar to other computer vision tasks, deep learning-based approaches have significantly impacted research on crater detection in recent years. This survey aims to assist researchers in this field by examining the development of deep learning-based crater detection algorithms (CDAs). The review includes over 140 research works covering diverse crater detection approaches, including planetary data, craters database, and evaluation metrics. To be specific, we discuss the challenges in crater detection due to the complex properties of the craters and survey the DL-based CDAs by categorizing them into three parts: (a) semantic segmentation-based, (b) object detection-based, and (c) classification-based. Additionally, we have conducted training and testing of all the semantic segmentation-based CDAs on a common dataset to evaluate the effectiveness of each architecture for crater detection and its potential applications. Finally, we have provided recommendations for potential future works. △ Less

Submitted 28 September, 2023; originally announced October 2023.

arXiv:2310.07707 [pdf, other]

MatFormer: Nested Transformer for Elastic Inference

Authors: Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain

Abstract: Foundation models are applied in a broad spectrum of settings with different inference constraints, from massive multi-accelerator clusters to resource-constrained standalone mobile devices. However, the substantial costs associated with training these models often limit the number of unique model sizes that can be offered. Consequently, practitioners are compelled to select a model that may not b… ▽ More Foundation models are applied in a broad spectrum of settings with different inference constraints, from massive multi-accelerator clusters to resource-constrained standalone mobile devices. However, the substantial costs associated with training these models often limit the number of unique model sizes that can be offered. Consequently, practitioners are compelled to select a model that may not be optimally aligned with their specific latency and cost requirements. We present MatFormer, a novel Transformer architecture designed to provide elastic inference across diverse deployment constraints. MatFormer achieves this by incorporating a nested Feed Forward Network (FFN) block structure within a standard Transformer model. During training, we optimize the parameters of multiple nested FFN blocks with varying sizes, enabling the extraction of hundreds of accurate smaller models without incurring additional computational costs. We empirically validate the efficacy of MatFormer across different model classes (decoders and encoders) and modalities (language and vision), demonstrating its potential for real-world deployment. We show that a 850M decoder-only MatFormer language model (MatLM) allows us to extract multiple smaller models spanning from 582M to 850M parameters, each exhibiting better validation loss and one-shot downstream evaluations than independently trained counterparts. Furthermore, we observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval. Finally, we showcase that speculative decoding with the accurate and consistent submodels extracted from MatFormer can lead to significant reduction in inference latency. Project website: https://devvrit.github.io/matformer/ △ Less

Submitted 14 December, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 30 pages, 11 figures, first three authors contributed equally. NeurIPS, 2024

arXiv:2310.07514 [pdf]

Causal inference for disruption management in urban metro networks

Authors: Nan Zhang, Daniel Horcher, Prateek Bansal, Daniel J. Graham

Abstract: Urban metro systems can provide highly efficient and effective movements of vast passenger volumes in cities, but they are often affected by disruptions, causing delays, crowding, and ultimately a decline in passenger satisfaction and patronage. To manage and mitigate such adverse consequences, metro operators could benefit greatly from a quantitative understanding of the causal impact of disrupti… ▽ More Urban metro systems can provide highly efficient and effective movements of vast passenger volumes in cities, but they are often affected by disruptions, causing delays, crowding, and ultimately a decline in passenger satisfaction and patronage. To manage and mitigate such adverse consequences, metro operators could benefit greatly from a quantitative understanding of the causal impact of disruptions. Such information would allow them to predict future delays, prepare effective recovery plans, and develop real-time information systems for passengers on trip re-routing options. In this paper, we develop a performance evaluation tool for metro operators that can quantify the causal effects of service disruptions on passenger flows, journey times, travel speeds and crowding densities. Our modelling framework is simple to implement, robust to statistical sources of bias, and can be used with high-frequency large-scale smart card data (over 4.85 million daily trips in our case) and train movement data. We recover disruption effects at the points of disruption (e.g. at disrupted stations) as well as spillover effects that propagate throughout the metro network. This allows us to deliver novel insights on the spatio-temporal propagation of delays in densely used urban public transport networks. We find robust empirical evidence that the causal impacts of disruptions adversely affect service quality throughout the network, in ways that would be hard to predict absent a causal model. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.03717 [pdf, other]

Beyond radial profiles: Using log-normal distributions to model the multiphase circumgalactic medium

Authors: Alankar Dutta, Mukesh Singh Bisht, Prateek Sharma, Ritali Ghosh, Manami Roy, Biman B. Nath

Abstract: Recent observations and simulations reveal that the circumgalactic medium (CGM) surrounding galaxies is multiphase, with the gas temperatures spanning a wide range at most radii, $\sim 10^4\ {\rm K}$ to the virial temperature ($\sim 10^6$ K for Milky Way). Traditional CGM models using simple density profiles are inadequate at reproducing observations that indicate a broad temperature range. Altern… ▽ More Recent observations and simulations reveal that the circumgalactic medium (CGM) surrounding galaxies is multiphase, with the gas temperatures spanning a wide range at most radii, $\sim 10^4\ {\rm K}$ to the virial temperature ($\sim 10^6$ K for Milky Way). Traditional CGM models using simple density profiles are inadequate at reproducing observations that indicate a broad temperature range. Alternatively, a model based on probability distribution functions (PDFs) with parameters motivated by simulations can better match multi-wavelength observations. In this work, we use log-normal distributions, commonly seen in the simulations of the multiphase interstellar and circumgalactic media, to model the multiphase CGM. We generalize the isothermal background model by Faerman et al. 2017 to include more general CGM profiles. We extend the existing probabilistic models from 1D-PDFs in temperature to 2D-PDFs in density-temperature phase space and constrain its parameters using a Milky Way-like {\tt Illustris TNG50-1} halo. We generate various synthetic observables such as column densities of different ions, UV/X-ray spectra, and dispersion and emission measures. X-ray and radio (Fast Radio Burst) observations mainly constrain the hot gas properties. However, interpreting cold/warm phase diagnostics is not straightforward since these phases are patchy, with inherent variability in intercepting these clouds along arbitrary lines of sight. We provide a tabulated comparison of model predictions with observations and plan to expand this into a comprehensive compilation of models and data. Our modeling provides a simple analytic framework that is useful for describing important aspects of the multiphase CGM. △ Less

Submitted 9 April, 2024; v1 submitted 26 September, 2023; originally announced October 2023.

Comments: 23 pages, 15 figures, 4 tables; submitted to MNRAS

arXiv:2310.03693 [pdf, other]

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Authors: Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson

Abstract: Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5 Turbo on custom datasets also encourage this practice. But, what are the safety costs associated with such custom fine-tuning? We note that while existing safety alignment inf… ▽ More Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5 Turbo on custom datasets also encourage this practice. But, what are the safety costs associated with such custom fine-tuning? We note that while existing safety alignment infrastructures can restrict harmful behaviors of LLMs at inference time, they do not cover safety risks when fine-tuning privileges are extended to end-users. Our red teaming studies find that the safety alignment of LLMs can be compromised by fine-tuning with only a few adversarially designed training examples. For instance, we jailbreak GPT-3.5 Turbo's safety guardrails by fine-tuning it on only 10 such examples at a cost of less than $0.20 via OpenAI's APIs, making the model responsive to nearly any harmful instructions. Disconcertingly, our research also reveals that, even without malicious intent, simply fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety alignment of LLMs, though to a lesser extent. These findings suggest that fine-tuning aligned LLMs introduces new safety risks that current safety infrastructures fall short of addressing -- even if a model's initial safety alignment is impeccable, it is not necessarily to be maintained after custom fine-tuning. We outline and critically analyze potential mitigations and advocate for further research efforts toward reinforcing safety protocols for the custom fine-tuning of aligned LLMs. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2310.02166 [pdf, other]

Large Language Models Meet Knowledge Graphs to Answer Factoid Questions

Authors: Mikhail Salnikov, Hai Le, Prateek Rajput, Irina Nikishina, Pavel Braslavski, Valentin Malykh, Alexander Panchenko

Abstract: Recently, it has been shown that the incorporation of structured knowledge into Large Language Models significantly improves the results for a variety of NLP tasks. In this paper, we propose a method for exploring pre-trained Text-to-Text Language Models enriched with additional information from Knowledge Graphs for answering factoid questions. More specifically, we propose an algorithm for subgra… ▽ More Recently, it has been shown that the incorporation of structured knowledge into Large Language Models significantly improves the results for a variety of NLP tasks. In this paper, we propose a method for exploring pre-trained Text-to-Text Language Models enriched with additional information from Knowledge Graphs for answering factoid questions. More specifically, we propose an algorithm for subgraphs extraction from a Knowledge Graph based on question entities and answer candidates. Then, we procure easily interpreted information with Transformer-based models through the linearization of the extracted subgraphs. Final re-ranking of the answer candidates with the extracted information boosts Hits@1 scores of the pre-trained text-to-text language models by 4-6%. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2310.01334 [pdf, other]

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

Authors: Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen

Abstract: Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse. Therefore, vanilla SMoE models are memory i… ▽ More Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse. Therefore, vanilla SMoE models are memory inefficient and non-scalable, especially for resource-constrained downstream scenarios. In this paper, we ask: Can we craft a compact SMoE model by consolidating expert information? What is the best recipe to merge multiple experts into fewer but more knowledgeable experts? Our pilot investigation reveals that conventional model merging methods fail to be effective in such expert merging for SMoE. The potential reasons are: (1) redundant information overshadows critical experts; (2) appropriate neuron permutation for each expert is missing to bring all of them in alignment. To address this, we propose M-SMoE, which leverages routing statistics to guide expert merging. Specifically, it starts with neuron permutation alignment for experts; then, dominant experts and their "group members" are formed; lastly, every expert group is merged into a single expert by utilizing each expert's activation frequency as their weight for merging, thus diminishing the impact of insignificant experts. Moreover, we observed that our proposed merging promotes a low dimensionality in the merged expert's weight space, naturally paving the way for additional compression. Hence, our final method, MC-SMoE (i.e., Merge, then Compress SMoE), further decomposes the merged experts into low-rank and structural sparse alternatives. Extensive experiments across 8 benchmarks validate the effectiveness of MC-SMoE. For instance, our MC-SMoE achieves up to 80% memory and a 20% FLOPs reduction, with virtually no loss in performance. △ Less

Submitted 14 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: This paper is accepted in ICLR 2024

arXiv:2309.14393 [pdf, other]

LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models

Authors: Ahmad Faiz, Sotaro Kaneda, Ruhan Wang, Rita Osi, Prateek Sharma, Fan Chen, Lei Jiang

Abstract: The carbon footprint associated with large language models (LLMs) is a significant concern, encompassing emissions from their training, inference, experimentation, and storage processes, including operational and embodied carbon emissions. An essential aspect is accurately estimating the carbon impact of emerging LLMs even before their training, which heavily relies on GPU usage. Existing studies… ▽ More The carbon footprint associated with large language models (LLMs) is a significant concern, encompassing emissions from their training, inference, experimentation, and storage processes, including operational and embodied carbon emissions. An essential aspect is accurately estimating the carbon impact of emerging LLMs even before their training, which heavily relies on GPU usage. Existing studies have reported the carbon footprint of LLM training, but only one tool, mlco2, can predict the carbon footprint of new neural networks prior to physical training. However, mlco2 has several serious limitations. It cannot extend its estimation to dense or mixture-of-experts (MoE) LLMs, disregards critical architectural parameters, focuses solely on GPUs, and cannot model embodied carbon footprints. Addressing these gaps, we introduce \textit{\carb}, an end-to-end carbon footprint projection model designed for both dense and MoE LLMs. Compared to mlco2, \carb~significantly enhances the accuracy of carbon footprint estimations for various LLMs. The source code is released at \url{https://github.com/SotaroKaneda/MLCarbon}. △ Less

Submitted 19 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: 15 pages, 8 figures

Journal ref: published in ICLR2024

arXiv:2309.13159 [pdf]

doi 10.1016/j.trb.2025.103220

Nonparametric mixed logit model with market-level parameters estimated from market share data

Authors: Xiyuan Ren, Joseph Y. J. Chow, Prateek Bansal

Abstract: We propose a nonparametric mixed logit model that is estimated using market-level choice share data. The model treats each market as an agent and represents taste heterogeneity through market-specific parameters by solving a multiagent inverse utility maximization problem, addressing the limitations of existing market-level choice models with parametric estimation. A simulation study is conducted… ▽ More We propose a nonparametric mixed logit model that is estimated using market-level choice share data. The model treats each market as an agent and represents taste heterogeneity through market-specific parameters by solving a multiagent inverse utility maximization problem, addressing the limitations of existing market-level choice models with parametric estimation. A simulation study is conducted to evaluate the performance of our model in terms of estimation time, estimation accuracy, and out-of-sample predictive accuracy. In a real data application, we estimate the travel mode choice of 53.55 million trips made by 19.53 million residents in New York State. These trips are aggregated based on population segments and census block group-level origin-destination (OD) pairs, resulting in 120,740 markets. We benchmark our model against multinomial logit (MNL), nested logit (NL), inverse product differentiation logit (IPDL), and the BLP models. The results show that the proposed model improves the out-of-sample accuracy from 65.30% to 81.78%, with a computation time less than one-tenth of that taken to estimate the BLP model. The price elasticities and diversion ratios retrieved from our model and benchmark models exhibit similar substitution patterns. Moreover, the market-level parameters estimated by our model provide additional insights and facilitate their seamless integration into supply-side optimization models for transportation design. By measuring the compensating variation for the driving mode, we found that a $9 congestion toll would impact roughly 60 % of the total travelers. As an application of supply-demand integration, we showed that a 50% discount of transit fare could bring a maximum ridership increase of 9402 trips per day under a budget of $50,000 per day. △ Less

Submitted 19 April, 2025; v1 submitted 22 September, 2023; originally announced September 2023.

Journal ref: Transportation Research Part B 196 (2025) 103220

arXiv:2309.10023 [pdf, other]

doi 10.1007/JHEP07(2024)133

Searching for axion forces with spin precession in atoms and molecules

Authors: Prateek Agrawal, Nicholas R. Hutzler, David E. Kaplan, Surjeet Rajendran, Mario Reig

Abstract: We propose to use atoms and molecules as quantum sensors of axion-mediated monopole-dipole forces. We show that electron spin precession experiments using atomic and molecular beams are well-suited for axion searches thanks to the presence of co-magnetometer states and single-shot temporal resolution. Experimental strategies to detect axion gradients from localised sources and the earth are presen… ▽ More We propose to use atoms and molecules as quantum sensors of axion-mediated monopole-dipole forces. We show that electron spin precession experiments using atomic and molecular beams are well-suited for axion searches thanks to the presence of co-magnetometer states and single-shot temporal resolution. Experimental strategies to detect axion gradients from localised sources and the earth are presented, taking ACME III as a prototype example. Other possibilities including atomic beams, and laser-cooled atoms and molecules are discussed. △ Less

Submitted 29 August, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 9 pages, 2 figures. Comments welcome. V2: matches published version. Appendix on axion co-magnetometry added

Journal ref: J. High Energy Phys. 2024, 133 (2024)

arXiv:2309.09212 [pdf, other]

RobotPerf: An Open-Source, Vendor-Agnostic, Benchmarking Suite for Evaluating Robotics Computing System Performance

Authors: Víctor Mayoral-Vilches, Jason Jabbour, Yu-Shun Hsiao, Zishen Wan, Martiño Crespo-Álvarez, Matthew Stewart, Juan Manuel Reina-Muñoz, Prateek Nagras, Gaurav Vikhe, Mohammad Bakhshalipour, Martin Pinzger, Stefan Rass, Smruti Panigrahi, Giulio Corradi, Niladri Roy, Phillip B. Gibbons, Sabrina M. Neuman, Brian Plancher, Vijay Janapa Reddi

Abstract: We introduce RobotPerf, a vendor-agnostic benchmarking suite designed to evaluate robotics computing performance across a diverse range of hardware platforms using ROS 2 as its common baseline. The suite encompasses ROS 2 packages covering the full robotics pipeline and integrates two distinct benchmarking approaches: black-box testing, which measures performance by eliminating upper layers and re… ▽ More We introduce RobotPerf, a vendor-agnostic benchmarking suite designed to evaluate robotics computing performance across a diverse range of hardware platforms using ROS 2 as its common baseline. The suite encompasses ROS 2 packages covering the full robotics pipeline and integrates two distinct benchmarking approaches: black-box testing, which measures performance by eliminating upper layers and replacing them with a test application, and grey-box testing, an application-specific measure that observes internal system states with minimal interference. Our benchmarking framework provides ready-to-use tools and is easily adaptable for the assessment of custom ROS 2 computational graphs. Drawing from the knowledge of leading robot architects and system architecture experts, RobotPerf establishes a standardized approach to robotics benchmarking. As an open-source initiative, RobotPerf remains committed to evolving with community input to advance the future of hardware-accelerated robotics. △ Less

Submitted 29 January, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.08751 [pdf, ps, other]

Diverse Audio Embeddings -- Bringing Features Back Outperforms CLAP!

Authors: Prateek Verma

Abstract: With the advent of modern AI architectures, a shift has happened towards end-to-end architectures. This pivot has led to neural architectures being trained without domain-specific biases/knowledge, optimized according to the task. We in this paper, learn audio embeddings via diverse feature representations, in this case, domain-specific. For the case of audio classification over hundreds of catego… ▽ More With the advent of modern AI architectures, a shift has happened towards end-to-end architectures. This pivot has led to neural architectures being trained without domain-specific biases/knowledge, optimized according to the task. We in this paper, learn audio embeddings via diverse feature representations, in this case, domain-specific. For the case of audio classification over hundreds of categories of sound, we learn robust separate embeddings for diverse audio properties such as pitch, timbre, and neural representation, along with also learning it via an end-to-end architecture. We observe handcrafted embeddings, e.g., pitch and timbre-based, although on their own, are not able to beat a fully end-to-end representation, yet adding these together with end-to-end embedding helps us, significantly improve performance. This work would pave the way to bring some domain expertise with end-to-end models to learn robust, diverse representations, surpassing the performance of just training end-to-end models. △ Less

Submitted 6 May, 2025; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 6 pages, 1 figure, 2 table

arXiv:2309.07330 [pdf, other]

Automated Assessment of Critical View of Safety in Laparoscopic Cholecystectomy

Authors: Yunfan Li, Himanshu Gupta, Haibin Ling, IV Ramakrishnan, Prateek Prasanna, Georgios Georgakis, Aaron Sasson

Abstract: Cholecystectomy (gallbladder removal) is one of the most common procedures in the US, with more than 1.2M procedures annually. Compared with classical open cholecystectomy, laparoscopic cholecystectomy (LC) is associated with significantly shorter recovery period, and hence is the preferred method. However, LC is also associated with an increase in bile duct injuries (BDIs), resulting in significa… ▽ More Cholecystectomy (gallbladder removal) is one of the most common procedures in the US, with more than 1.2M procedures annually. Compared with classical open cholecystectomy, laparoscopic cholecystectomy (LC) is associated with significantly shorter recovery period, and hence is the preferred method. However, LC is also associated with an increase in bile duct injuries (BDIs), resulting in significant morbidity and mortality. The primary cause of BDIs from LCs is misidentification of the cystic duct with the bile duct. Critical view of safety (CVS) is the most effective of safety protocols, which is said to be achieved during the surgery if certain criteria are met. However, due to suboptimal understanding and implementation of CVS, the BDI rates have remained stable over the last three decades. In this paper, we develop deep-learning techniques to automate the assessment of CVS in LCs. An innovative aspect of our research is on developing specialized learning techniques by incorporating domain knowledge to compensate for the limited training data available in practice. In particular, our CVS assessment process involves a fusion of two segmentation maps followed by an estimation of a certain region of interest based on anatomical structures close to the gallbladder, and then finally determination of each of the three CVS criteria via rule-based assessment of structural information. We achieved a gain of over 11.8% in mIoU on relevant classes with our two-stream semantic segmentation approach when compared to a single-model baseline, and 1.84% in mIoU with our proposed Sobel loss function when compared to a Transformer-based baseline model. For CVS criteria, we achieved up to 16% improvement and, for the overall CVS assessment, we achieved 5% improvement in balanced accuracy compared to DeepCVS under the same experiment settings. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2309.06439 [pdf, other]

Attention De-sparsification Matters: Inducing Diversity in Digital Pathology Representation Learning

Authors: Saarthak Kapse, Srijan Das, Jingwei Zhang, Rajarsi R. Gupta, Joel Saltz, Dimitris Samaras, Prateek Prasanna

Abstract: We propose DiRL, a Diversity-inducing Representation Learning technique for histopathology imaging. Self-supervised learning techniques, such as contrastive and non-contrastive approaches, have been shown to learn rich and effective representations of digitized tissue samples with limited pathologist supervision. Our analysis of vanilla SSL-pretrained models' attention distribution reveals an insi… ▽ More We propose DiRL, a Diversity-inducing Representation Learning technique for histopathology imaging. Self-supervised learning techniques, such as contrastive and non-contrastive approaches, have been shown to learn rich and effective representations of digitized tissue samples with limited pathologist supervision. Our analysis of vanilla SSL-pretrained models' attention distribution reveals an insightful observation: sparsity in attention, i.e, models tends to localize most of their attention to some prominent patterns in the image. Although attention sparsity can be beneficial in natural images due to these prominent patterns being the object of interest itself, this can be sub-optimal in digital pathology; this is because, unlike natural images, digital pathology scans are not object-centric, but rather a complex phenotype of various spatially intermixed biological components. Inadequate diversification of attention in these complex images could result in crucial information loss. To address this, we leverage cell segmentation to densely extract multiple histopathology-specific representations, and then propose a prior-guided dense pretext task for SSL, designed to match the multiple corresponding representations between the views. Through this, the model learns to attend to various components more closely and evenly, thus inducing adequate diversification in attention for capturing context rich representations. Through quantitative and qualitative analysis on multiple tasks across cancer types, we demonstrate the efficacy of our method and observe that the attention is more globally distributed. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.06349 [pdf, other]

Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors

Authors: Prateek Jaiswal, Debdeep Pati, Anirban Bhattacharya, Bani K. Mallick

Abstract: Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $α$-TS, where we use a fractional or $α$-posterior ($α\in(0,1)$) instead of the standard posterior distribution. To compute an $α$-posterior, the likelihood in the definition of the standard posterior is tempered with a factor $α$. For $α$-TS… ▽ More Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $α$-TS, where we use a fractional or $α$-posterior ($α\in(0,1)$) instead of the standard posterior distribution. To compute an $α$-posterior, the likelihood in the definition of the standard posterior is tempered with a factor $α$. For $α$-TS we obtain both instance-dependent $\mathcal{O}\left(\sum_{k \neq i^*} Δ_k\left(\frac{\log(T)}{C(α)Δ_k^2} + \frac{1}{2} \right)\right)$ and instance-independent $\mathcal{O}(\sqrt{KT\log K})$ frequentist regret bounds under very mild conditions on the prior and reward distributions, where $Δ_k$ is the gap between the true mean rewards of the $k^{th}$ and the best arms, and $C(α)$ is a known constant. Both the sub-Gaussian and exponential family models satisfy our general conditions on the reward distribution. Our conditions on the prior distribution just require its density to be positive, continuous, and bounded. We also establish another instance-dependent regret upper bound that matches (up to constants) to that of improved UCB [Auer and Ortner, 2010]. Our regret analysis carefully combines recent theoretical developments in the non-asymptotic concentration analysis and Bernstein-von Mises type results for the $α$-posterior distribution. Moreover, our analysis does not require additional structural properties such as closed-form posteriors or conjugate priors. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.05000 [pdf, other]

Multiphase Neutral Interstellar Medium: Analyzing Simulation with H I 21cm Observational Data Analysis Techniques

Authors: Soumyadeep Bhattacharjee, Nirupam Roy, Prateek Sharma, Amit Seta, Christoph Federrath

Abstract: Several different methods are regularly used to infer the properties of the neutral interstellar medium (ISM) using atomic hydrogen (H I) 21cm absorption and emission spectra. In this work, we study various techniques used for inferring ISM gas phase properties, namely the correlation between brightness temperature and optical depth $(T_B(v)$, $τ(v))$ at each channel velocity ($v$), and decomposit… ▽ More Several different methods are regularly used to infer the properties of the neutral interstellar medium (ISM) using atomic hydrogen (H I) 21cm absorption and emission spectra. In this work, we study various techniques used for inferring ISM gas phase properties, namely the correlation between brightness temperature and optical depth $(T_B(v)$, $τ(v))$ at each channel velocity ($v$), and decomposition into Gaussian components, by creating mock spectra from a 3D magnetohydrodynamic simulation of a two-phase, turbulent ISM. We propose a physically motivated model to explain the $T_B(v)-τ(v)$ distribution and relate the model parameters to properties like warm gas spin temperature and cold cloud length scales. Two methods based on Gaussian decomposition -- using only absorption spectra and both absorption and emission spectra -- are used to infer the column density distribution as a function of temperature. In observations, such analysis reveals the puzzle of large amounts (significantly higher than in simulations) of gas with temperature in the thermally unstable range of $\sim$200 K to $\sim$2000 K and a lack of the expected bimodal (two-phase) temperature distribution. We show that, in simulation, both methods are able to recover the actual gas distribution in the simulation till temperatures $\lesssim2500$~K (and the two-phase distribution in general) reasonably well. We find our results to be robust to a range of effects such as noise, varying emission beam size, and simulation resolution. This shows that the observational inferences are unlikely to be artifacts, thus highlighting a tension between observations and simulations. We discuss possible reasons for this tension and ways to resolve it. △ Less

Submitted 23 November, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

Comments: 22 pages (including appendixes), 16 figures, 3 tables, Accepted for publication in MNRAS

arXiv:2309.03934 [pdf, other]

The Monodromic Axion-Photon Coupling

Authors: Prateek Agrawal, Arthur Platschorre

Abstract: We consider the general form of the axion coupling to photons in the axion-Maxwell theory. On general grounds this coupling takes the form of a monodromic function of the axion, which we call $g(a)$, multiplying the Chern-Pontryagin density $F \widetilde{F}$ of the photon. We show that the non-linearity of $g(a)$ is a spurion for the shift symmetry of the axion. In this context, when… ▽ More We consider the general form of the axion coupling to photons in the axion-Maxwell theory. On general grounds this coupling takes the form of a monodromic function of the axion, which we call $g(a)$, multiplying the Chern-Pontryagin density $F \widetilde{F}$ of the photon. We show that the non-linearity of $g(a)$ is a spurion for the shift symmetry of the axion. In this context, when $g(a) \neq \mathbb{Z}a$, the linearized coupling of the axion $g'(a)$ is not quantized and there is a correlated mass term for the axion. Singularities in $g(a)$ due to the fast rearrangement of degrees of freedom are shown to have corresponding cusps and singularities in the axion potential. We derive the general form of $g(a)$ for the QCD axion, axions with perturbatively broken shift symmetries and axions descending from extra dimensions. In all cases, we show that there is a uniform general form of the monodromic function $g(a)$ and it is connected to the axion potential. △ Less

Submitted 10 October, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: 20 pages, 1 figure; v2: typos corrected, references added

arXiv:2309.00748 [pdf, other]

PathLDM: Text conditioned Latent Diffusion Model for Histopathology

Authors: Srikar Yellapragada, Alexandros Graikos, Prateek Prasanna, Tahsin Kurc, Joel Saltz, Dimitris Samaras

Abstract: To achieve high-quality results, diffusion models must be trained on large datasets. This can be notably prohibitive for models in specialized domains, such as computational pathology. Conditioning on labeled data is known to help in data-efficient model training. Therefore, histopathology reports, which are rich in valuable clinical information, are an ideal choice as guidance for a histopatholog… ▽ More To achieve high-quality results, diffusion models must be trained on large datasets. This can be notably prohibitive for models in specialized domains, such as computational pathology. Conditioning on labeled data is known to help in data-efficient model training. Therefore, histopathology reports, which are rich in valuable clinical information, are an ideal choice as guidance for a histopathology generative model. In this paper, we introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images. Leveraging the rich contextual information provided by pathology text reports, our approach fuses image and textual data to enhance the generation process. By utilizing GPT's capabilities to distill and summarize complex text reports, we establish an effective conditioning mechanism. Through strategic conditioning and necessary architectural enhancements, we achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1. △ Less

Submitted 30 November, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

Comments: WACV 2024 publication

arXiv:2308.15709 [pdf, other]

Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

Authors: Jiachen T. Wang, Yuqing Zhu, Yu-Xiang Wang, Ruoxi Jia, Prateek Mittal

Abstract: Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowad… ▽ More Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays. We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical difficulties in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). We show that DP-TKNN-Shapley has several advantages and offers a superior privacy-utility tradeoff compared to naively privatized KNN-Shapley in discerning data quality. Moreover, even non-private TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley, particularly for real-world applications involving sensitive data. △ Less

Submitted 25 November, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: NeurIPS 2023 Spotlight

Showing 151–200 of 906 results for author: Prateek