-
Wavelet GPT: Wavelet Inspired Large Language Models
Authors:
Prateek Verma
Abstract:
Large Language Models (LLMs) have ushered in a new wave of artificial intelligence advancements impacting every scientific field and discipline. We live in a world where most of the data around us, e.g., text, audio, and music, has a multi-scale structure. This paper infuses LLMs with a traditional signal processing idea, namely wavelets, during pre-training to take advantage of the structure. Wit…
▽ More
Large Language Models (LLMs) have ushered in a new wave of artificial intelligence advancements impacting every scientific field and discipline. We live in a world where most of the data around us, e.g., text, audio, and music, has a multi-scale structure. This paper infuses LLMs with a traditional signal processing idea, namely wavelets, during pre-training to take advantage of the structure. Without adding \textbf{any extra parameters} to a GPT-style LLM architecture in an academic setup, we achieve the same pre-training performance almost twice as fast in text, audio, and images. This is done by imposing a structure on intermediate embeddings. When trained for the same number of training steps, we achieve significant gains in performance, which is comparable to pre-training a larger neural architecture. Further, we show this extends to the Long Range Arena benchmark and several input representations such as characters, BPE tokens, bytes, waveform, math expression, and image pixels. Our architecture allows every next token prediction access to intermediate embeddings at different temporal resolutions in every decoder block. We hope this will pave the way for incorporating multi-rate signal processing into pre-training.
△ Less
Submitted 9 February, 2025; v1 submitted 3 September, 2024;
originally announced September 2024.
-
Adaptive Large Language Models By Layerwise Attention Shortcuts
Authors:
Prateek Verma,
Mert Pilanci
Abstract:
Transformer architectures are the backbone of the modern AI revolution. However, they are based on simply stacking the same blocks in dozens of layers and processing information sequentially from one block to another. In this paper, we propose to challenge this and introduce adaptive computations for LLM-like setups, which allow the final layer to attend to all of the intermediate layers as it dee…
▽ More
Transformer architectures are the backbone of the modern AI revolution. However, they are based on simply stacking the same blocks in dozens of layers and processing information sequentially from one block to another. In this paper, we propose to challenge this and introduce adaptive computations for LLM-like setups, which allow the final layer to attend to all of the intermediate layers as it deems fit through the attention mechanism, thereby introducing computational \textbf{attention shortcuts}. These shortcuts can thus make the architecture depth and context adaptive. We showcase four different datasets, namely acoustic tokens, natural language, and symbolic music, and we achieve superior performance for GPT-like architecture. We give evidence via attention maps that the models learn complex dependencies across layers that are adaptive in context and depth depending on the input tokens.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Comparing One- and Two-way Quantum Repeater Architectures
Authors:
Prateek Mantri,
Kenneth Goodenough,
Don Towsley
Abstract:
Quantum repeaters are an essential building block for realizing long-distance quantum communications. However, due to the fragile nature of quantum information, these repeaters suffer from loss and operational errors. Prior works have classified repeaters into three broad categories based on their use of probabilistic or near-deterministic methods to mitigate these errors. Besides differences in c…
▽ More
Quantum repeaters are an essential building block for realizing long-distance quantum communications. However, due to the fragile nature of quantum information, these repeaters suffer from loss and operational errors. Prior works have classified repeaters into three broad categories based on their use of probabilistic or near-deterministic methods to mitigate these errors. Besides differences in classical communication times, these approaches also vary in technological complexity, with near-deterministic methods requiring more advanced technology. Recent increases in the number of available memories, and introduction of entanglement generation through multiplexing motivate a re-comparison of one-way and two-way repeater architectures. In this work, we propose a novel protocol that optimizes multiplexed elementary link generation and distillation in memory-unconstrained 'connection-oriented' two-way repeaters to boost the entanglement generation rates. We introduce a recursive formulation to derive the probability distribution of the number of Bell pairs in multiplexed two-way repeater architectures, compatible with probabilistic $n$-to-$k$ distillation protocols. We then compare the performance of this new protocol with one-way schemes in the parameter regime where one-way schemes have previously been shown to be advantageous, and find that the multiplexed two-way protocol provides better performance with lower resource and technology requirements.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Empirical evidence of Large Language Model's influence on human spoken communication
Authors:
Hiromu Yakura,
Ezequiel Lopez-Lopez,
Levin Brinkmann,
Ignacio Serna,
Prateek Gupta,
Iyad Rahwan
Abstract:
Artificial Intelligence (AI) agents now interact with billions of humans in natural language, thanks to advances in Large Language Models (LLMs) like ChatGPT. This raises the question of whether AI has the potential to shape a fundamental aspect of human culture: the way we speak. Recent analyses revealed that scientific publications already exhibit evidence of AI-specific language. But this evide…
▽ More
Artificial Intelligence (AI) agents now interact with billions of humans in natural language, thanks to advances in Large Language Models (LLMs) like ChatGPT. This raises the question of whether AI has the potential to shape a fundamental aspect of human culture: the way we speak. Recent analyses revealed that scientific publications already exhibit evidence of AI-specific language. But this evidence is inconclusive, since scientists may simply be using AI to copy-edit their writing. To explore whether AI has influenced human spoken communication, we transcribed and analyzed about 280,000 English-language videos of presentations, talks, and speeches from more than 20,000 YouTube channels of academic institutions. We find a significant shift in the trend of word usage specific to words distinctively associated with ChatGPT following its release. These findings provide the first empirical evidence that humans increasingly imitate LLMs in their spoken language. Our results raise societal and policy-relevant concerns about the potential of AI to unintentionally reduce linguistic diversity, or to be deliberately misused for mass manipulation. They also highlight the need for further investigation into the feedback loops between machine behavior and human culture.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Multi-objective Bayesian optimization for Likelihood-Free inference in sequential sampling models of decision making
Authors:
David Chen,
Xinwei Li,
Eui-Jin Kim,
Prateek Bansal,
David Nott
Abstract:
Statistical models are often defined by a generative process for simulating synthetic data, but this can lead to intractable likelihoods. Likelihood free inference (LFI) methods enable Bayesian inference to be performed in this case. Extending a popular approach to simulation-efficient LFI for single-source data, we propose Multi-objective Bayesian Optimization for Likelihood Free Inference (MOBOL…
▽ More
Statistical models are often defined by a generative process for simulating synthetic data, but this can lead to intractable likelihoods. Likelihood free inference (LFI) methods enable Bayesian inference to be performed in this case. Extending a popular approach to simulation-efficient LFI for single-source data, we propose Multi-objective Bayesian Optimization for Likelihood Free Inference (MOBOLFI) to perform LFI using multi-source data. MOBOLFI models a multi-dimensional discrepancy between observed and simulated data, using a separate discrepancy for each data source. The use of a multivariate discrepancy allows for approximations to individual data source likelihoods in addition to the joint likelihood, enabling detection of conflicting information and deeper understanding of the importance of different data sources in estimating individual parameters. The adaptive choice of simulation parameters using multi-objective Bayesian optimization ensures simulation efficient approximation of likelihood components for all data sources. We illustrate our approach in sequential sampling models (SSMs), which are widely used in psychology and consumer-behavior modeling. SSMs are often fitted using multi-source data, such as choice and response time. The advantages of our approach are illustrated in comparison with a single discrepancy for an SSM fitted to data assessing preferences of ride-hailing drivers in Singapore to rent electric vehicles.
△ Less
Submitted 4 June, 2025; v1 submitted 3 September, 2024;
originally announced September 2024.
-
Histo-Diffusion: A Diffusion Super-Resolution Method for Digital Pathology with Comprehensive Quality Assessment
Authors:
Xuan Xu,
Saarthak Kapse,
Prateek Prasanna
Abstract:
Digital pathology has advanced significantly over the last decade, with Whole Slide Images (WSIs) encompassing vast amounts of data essential for accurate disease diagnosis. High-resolution WSIs are essential for precise diagnosis but technical limitations in scanning equipment and variablity in slide preparation can hinder obtaining these images. Super-resolution techniques can enhance low-resolu…
▽ More
Digital pathology has advanced significantly over the last decade, with Whole Slide Images (WSIs) encompassing vast amounts of data essential for accurate disease diagnosis. High-resolution WSIs are essential for precise diagnosis but technical limitations in scanning equipment and variablity in slide preparation can hinder obtaining these images. Super-resolution techniques can enhance low-resolution images; while Generative Adversarial Networks (GANs) have been effective in natural image super-resolution tasks, they often struggle with histopathology due to overfitting and mode collapse. Traditional evaluation metrics fall short in assessing the complex characteristics of histopathology images, necessitating robust histology-specific evaluation methods.
We introduce Histo-Diffusion, a novel diffusion-based method specially designed for generating and evaluating super-resolution images in digital pathology. It includes a restoration module for histopathology prior and a controllable diffusion module for generating high-quality images. We have curated two histopathology datasets and proposed a comprehensive evaluation strategy which incorporates both full-reference and no-reference metrics to thoroughly assess the quality of digital pathology images.
Comparative analyses on multiple datasets with state-of-the-art methods reveal that Histo-Diffusion outperforms GANs. Our method offers a versatile solution for histopathology image super-resolution, capable of handling multi-resolution generation from varied input sizes, providing valuable support in diagnostic processes.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Mixing of active scalars due to random shock waves in two dimensions
Authors:
Joaquim P. Jossy,
Prateek Gupta
Abstract:
In this work, we investigate the mixing of active scalars in two dimensions by the stirring action of stochastically generated shock waves. We use direct numerical simulations (DNS) of the interaction of shock waves with two non-reacting species to analyse the mixing dynamics for different Atwood numbers (At). Unlike passive scalars, the presence of density gradients in active scalars makes the sp…
▽ More
In this work, we investigate the mixing of active scalars in two dimensions by the stirring action of stochastically generated shock waves. We use direct numerical simulations (DNS) of the interaction of shock waves with two non-reacting species to analyse the mixing dynamics for different Atwood numbers (At). Unlike passive scalars, the presence of density gradients in active scalars makes the species diffusion nonlinear, introducing a concentration gradient-driven term and a density gradient-driven nonlinear dissipation term in the concentration evolution equation. We show that the direction of the concentration gradient causes the interface across which molecular diffusion occurs to expand outward or inward, even without any stirring action. Shock waves enhance the mixing process by increasing the perimeter of the interface and by sustaining concentration gradients. Negative Atwood number mixtures sustain concentration gradients for longer time than positive Atwood number mixtures due to the so-called nonlinear dissipation terms. We estimate the time till when the action of stirring is dominant over molecular mixing. We also highlight the role of baroclinicity in increasing the interface perimeter in the stirring dominant regime. We compare the stirring effect of shock waves on mixing of passive scalars with active scalars and show that the vorticity generated by baroclinicity is responsible for the folding and stretching of the interface in the case of active scalars. We conclude by showing that lighter mixtures with denser inhomogeneities (At < 0) take longer time to homogenise than the denser mixtures with lighter inhomogeneities (At > 0).
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
Authors:
Prateek Yadav,
Colin Raffel,
Mohammed Muqeeth,
Lucas Caccia,
Haokun Liu,
Tianlong Chen,
Mohit Bansal,
Leshem Choshen,
Alessandro Sordoni
Abstract:
The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task. Model MoErging methods aim to recycle expert models to create an aggregate system with improved performance or generalization. A key component of MoErging methods is the creation of a router that decides which expert model(s) to use for a par…
▽ More
The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task. Model MoErging methods aim to recycle expert models to create an aggregate system with improved performance or generalization. A key component of MoErging methods is the creation of a router that decides which expert model(s) to use for a particular input or application. The promise, effectiveness, and large design space of MoErging has spurred the development of many new methods over the past few years. This rapid pace of development has made it challenging to compare different MoErging methods, which are rarely compared to one another and are often validated in different experimental setups. To remedy such gaps, we present a comprehensive survey of MoErging methods that includes a novel taxonomy for cataloging key design choices and clarifying suitable applications for each method. Apart from surveying MoErging research, we inventory software tools and applications that make use of MoErging. We additionally discuss related fields of study such as model merging, multitask learning, and mixture-of-experts models. Taken as a whole, our survey provides a unified overview of existing MoErging methods and creates a solid foundation for future work in this burgeoning field.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
FaasMeter: Energy-First Serverless Computing
Authors:
Abdul Rehman,
Alexander Fuerst,
Prateek Sharma
Abstract:
Functions as a Service has emerged as a popular abstraction for a wide range of cloud applications and an important cloud workload. We present the design and implementation of FaasMeter, a FaaS control plane which provides energy monitoring, accounting, control, and pricing as first-class operations. The highly diverse and dynamic workloads of FaaS create additional complexity to measuring and con…
▽ More
Functions as a Service has emerged as a popular abstraction for a wide range of cloud applications and an important cloud workload. We present the design and implementation of FaasMeter, a FaaS control plane which provides energy monitoring, accounting, control, and pricing as first-class operations. The highly diverse and dynamic workloads of FaaS create additional complexity to measuring and controlling energy usage which FaasMeter can mitigate.
We develop a new statistical energy disaggregation approach to provide accurate and complete energy footprints for functions, despite using noisy and coarse-grained system-level power (not just CPU power readings). Our accurate and robust footprints are achieved by combining conventional power models with Kalman filters and Shapley values. FaasMeter is a full-spectrum energy profiler, and fairly attributes energy of shared resources to functions (such as energy used by the control plane itself). We develop new energy profiling validation metrics, and show that FaasMeter's energy footprints are accurate to within 1\% of carefully obtained marginal energy ground truth measurements.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
ConfusedPilot: Confused Deputy Risks in RAG-based LLMs
Authors:
Ayush RoyChowdhury,
Mulong Luo,
Prateek Sahu,
Sarbartha Banerjee,
Mohit Tiwari
Abstract:
Retrieval augmented generation (RAG) is a process where a large language model (LLM) retrieves useful information from a database and then generates the responses. It is becoming popular in enterprise settings for daily business operations. For example, Copilot for Microsoft 365 has accumulated millions of businesses. However, the security implications of adopting such RAG-based systems are unclea…
▽ More
Retrieval augmented generation (RAG) is a process where a large language model (LLM) retrieves useful information from a database and then generates the responses. It is becoming popular in enterprise settings for daily business operations. For example, Copilot for Microsoft 365 has accumulated millions of businesses. However, the security implications of adopting such RAG-based systems are unclear.
In this paper, we introduce ConfusedPilot, a class of security vulnerabilities of RAG systems that confuse Copilot and cause integrity and confidentiality violations in its responses. First, we investigate a vulnerability that embeds malicious text in the modified prompt in RAG, corrupting the responses generated by the LLM. Second, we demonstrate a vulnerability that leaks secret data, which leverages the caching mechanism during retrieval. Third, we investigate how both vulnerabilities can be exploited to propagate misinformation within the enterprise and ultimately impact its operations, such as sales and manufacturing. We also discuss the root cause of these attacks by investigating the architecture of a RAG-based system. This study highlights the security vulnerabilities in today's RAG-based systems and proposes design guidelines to secure future RAG-based systems.
△ Less
Submitted 23 October, 2024; v1 submitted 9 August, 2024;
originally announced August 2024.
-
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks
Authors:
Marco AF Pimentel,
Clément Christophe,
Tathagata Raha,
Prateek Munjal,
Praveen K Kanithi,
Shadab Khan
Abstract:
As large language models (LLMs) continue to evolve, the need for robust and standardized evaluation benchmarks becomes paramount. Evaluating the performance of these models is a complex challenge that requires careful consideration of various linguistic tasks, model architectures, and benchmarking methodologies. In recent years, various frameworks have emerged as noteworthy contributions to the fi…
▽ More
As large language models (LLMs) continue to evolve, the need for robust and standardized evaluation benchmarks becomes paramount. Evaluating the performance of these models is a complex challenge that requires careful consideration of various linguistic tasks, model architectures, and benchmarking methodologies. In recent years, various frameworks have emerged as noteworthy contributions to the field, offering comprehensive evaluation tests and benchmarks for assessing the capabilities of LLMs across diverse domains. This paper provides an exploration and critical analysis of some of these evaluation methodologies, shedding light on their strengths, limitations, and impact on advancing the state-of-the-art in natural language processing.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Authors:
Gagan Jain,
Nidhi Hegde,
Aditya Kusupati,
Arsha Nagrani,
Shyamal Buch,
Prateek Jain,
Anurag Arnab,
Sujoy Paul
Abstract:
The visual medium (images and videos) naturally contains a large amount of information redundancy, thereby providing a great opportunity for leveraging efficiency in processing. While Vision Transformer (ViT) based models scale effectively to large data regimes, they fail to capitalize on this inherent redundancy, leading to higher computational costs. Mixture of Experts (MoE) networks demonstrate…
▽ More
The visual medium (images and videos) naturally contains a large amount of information redundancy, thereby providing a great opportunity for leveraging efficiency in processing. While Vision Transformer (ViT) based models scale effectively to large data regimes, they fail to capitalize on this inherent redundancy, leading to higher computational costs. Mixture of Experts (MoE) networks demonstrate scalability while maintaining same inference-time costs, but they come with a larger parameter footprint. We present Mixture of Nested Experts (MoNE), which utilizes a nested structure for experts, wherein individual experts fall on an increasing compute-accuracy curve. Given a compute budget, MoNE learns to dynamically choose tokens in a priority order, and thus redundant tokens are processed through cheaper nested experts. Using this framework, we achieve equivalent performance as the baseline models, while reducing inference time compute by over two-fold. We validate our approach on standard image and video datasets - ImageNet-21K, Kinetics400, and Something-Something-v2. We further highlight MoNE$'$s adaptability by showcasing its ability to maintain strong performance across different inference-time compute budgets on videos, using only a single trained model.
△ Less
Submitted 30 July, 2024; v1 submitted 29 July, 2024;
originally announced July 2024.
-
LookupViT: Compressing visual information to a limited number of tokens
Authors:
Rajat Koner,
Gagan Jain,
Prateek Jain,
Volker Tresp,
Sujoy Paul
Abstract:
Vision Transformers (ViT) have emerged as the de-facto choice for numerous industry grade vision solutions. But their inference cost can be prohibitive for many settings, as they compute self-attention in each layer which suffers from quadratic computational complexity in the number of tokens. On the other hand, spatial information in images and spatio-temporal information in videos is usually spa…
▽ More
Vision Transformers (ViT) have emerged as the de-facto choice for numerous industry grade vision solutions. But their inference cost can be prohibitive for many settings, as they compute self-attention in each layer which suffers from quadratic computational complexity in the number of tokens. On the other hand, spatial information in images and spatio-temporal information in videos is usually sparse and redundant. In this work, we introduce LookupViT, that aims to exploit this information sparsity to reduce ViT inference cost. LookupViT provides a novel general purpose vision transformer block that operates by compressing information from higher resolution tokens to a fixed number of tokens. These few compressed tokens undergo meticulous processing, while the higher-resolution tokens are passed through computationally cheaper layers. Information sharing between these two token sets is enabled through a bidirectional cross-attention mechanism. The approach offers multiple advantages - (a) easy to implement on standard ML accelerators (GPUs/TPUs) via standard high-level operators, (b) applicable to standard ViT and its variants, thus generalizes to various tasks, (c) can handle different tokenization and attention approaches. LookupViT also offers flexibility for the compressed tokens, enabling performance-computation trade-offs in a single trained model. We show LookupViT's effectiveness on multiple domains - (a) for image-classification (ImageNet-1K and ImageNet-21K), (b) video classification (Kinetics400 and Something-Something V2), (c) image captioning (COCO-Captions) with a frozen encoder. LookupViT provides $2\times$ reduction in FLOPs while upholding or improving accuracy across these domains. In addition, LookupViT also demonstrates out-of-the-box robustness and generalization on image classification (ImageNet-C,R,A,O), improving by up to $4\%$ over ViT.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Parametric Modeling and Estimation of Photon Registrations for 3D Imaging
Authors:
Weijian Zhang,
Hashan K. Weerasooriya,
Prateek Chennuri,
Stanley H. Chan
Abstract:
In single-photon light detection and ranging (SP-LiDAR) systems, the histogram distortion due to hardware dead time fundamentally limits the precision of depth estimation. To compensate for the dead time effects, the photon registration distribution is typically modeled based on the Markov chain self-excitation process. However, this is a discrete process and it is computationally expensive, thus…
▽ More
In single-photon light detection and ranging (SP-LiDAR) systems, the histogram distortion due to hardware dead time fundamentally limits the precision of depth estimation. To compensate for the dead time effects, the photon registration distribution is typically modeled based on the Markov chain self-excitation process. However, this is a discrete process and it is computationally expensive, thus hindering potential neural network applications and fast simulations. In this paper, we overcome the modeling challenge by proposing a continuous parametric model. We introduce a Gaussian-uniform mixture model (GUMM) and periodic padding to address high noise floors and noise slopes respectively. By deriving and implementing a customized expectation maximization (EM) algorithm, we achieve accurate histogram matching in scenarios that were deemed difficult in the literature.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Authors:
Ashwinee Panda,
Berivan Isik,
Xiangyu Qi,
Sanmi Koyejo,
Tsachy Weissman,
Prateek Mittal
Abstract:
Existing methods for adapting large language models (LLMs) to new tasks are not suited to multi-task adaptation because they modify all the model weights -- causing destructive interference between tasks. The resulting effects, such as catastrophic forgetting of earlier tasks, make it challenging to obtain good performance on multiple tasks at the same time. To mitigate this, we propose Lottery Ti…
▽ More
Existing methods for adapting large language models (LLMs) to new tasks are not suited to multi-task adaptation because they modify all the model weights -- causing destructive interference between tasks. The resulting effects, such as catastrophic forgetting of earlier tasks, make it challenging to obtain good performance on multiple tasks at the same time. To mitigate this, we propose Lottery Ticket Adaptation (LoTA), a sparse adaptation method that identifies and optimizes only a sparse subnetwork of the model. We evaluate LoTA on a wide range of challenging tasks such as instruction following, reasoning, math, and summarization. LoTA obtains better performance than full fine-tuning and low-rank adaptation (LoRA), and maintains good performance even after training on other tasks -- thus, avoiding catastrophic forgetting. By extracting and fine-tuning over lottery tickets (or sparse task vectors), LoTA also enables model merging over highly dissimilar tasks. Our code is made publicly available at https://github.com/kiddyboots216/lottery-ticket-adaptation.
△ Less
Submitted 25 June, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Authors:
Terry Yue Zhuo,
Minh Chien Vu,
Jenny Chim,
Han Hu,
Wenhao Yu,
Ratnadira Widyasari,
Imam Nur Bani Yusuf,
Haolan Zhan,
Junda He,
Indraneil Paul,
Simon Brunner,
Chen Gong,
Thong Hoang,
Armel Randy Zebaze,
Xiaoheng Hong,
Wen-Ding Li,
Jean Kaddour,
Ming Xu,
Zhihan Zhang,
Prateek Yadav,
Naman Jain,
Alex Gu,
Zhoujun Cheng,
Jiawei Liu,
Qian Liu
, et al. (8 additional authors not shown)
Abstract:
Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks o…
▽ More
Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks or standalone function calls. Solving challenging and practical tasks requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs.To assess how well LLMs can solve challenging and practical tasks via programs, we introduce BigCodeBench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks. To evaluate LLMs rigorously, each task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area.
△ Less
Submitted 1 April, 2025; v1 submitted 22 June, 2024;
originally announced June 2024.
-
Resilience of the Electric Grid through Trustable IoT-Coordinated Assets (Extended version)
Authors:
Vineet J. Nair,
Venkatesh Venkataramanan,
Priyank Srivastava,
Partha S. Sarker,
Anurag Srivastava,
Laurentiu D. Marinovici,
Jun Zha,
Christopher Irwin,
Prateek Mittal,
John Williams,
Jayant Kumar,
H. Vincent Poor,
Anuradha M. Annaswamy
Abstract:
The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) including renewable generation, flexible loads, and storage provides extraordinary opportunities for improvements in efficiency and sustainability. Howev…
▽ More
The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) including renewable generation, flexible loads, and storage provides extraordinary opportunities for improvements in efficiency and sustainability. However, they can introduce new vulnerabilities in the form of cyberattacks, which can cause significant challenges in ensuring grid resilience. We propose a framework in this paper for achieving grid resilience through suitably coordinated assets including a network of Internet of Things (IoT) devices. A local electricity market is proposed to identify trustable assets and carry out this coordination. Situational Awareness (SA) of locally available DERs with the ability to inject power or reduce consumption is enabled by the market, together with a monitoring procedure for their trustability and commitment. With this SA, we show that a variety of cyberattacks can be mitigated using local trustable resources without stressing the bulk grid. Multiple demonstrations are carried out using a high-fidelity co-simulation platform, real-time hardware-in-the-loop validation, and a utility-friendly simulator.
△ Less
Submitted 30 January, 2025; v1 submitted 21 June, 2024;
originally announced June 2024.
-
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Authors:
Tinghao Xie,
Xiangyu Qi,
Yi Zeng,
Yangsibo Huang,
Udari Madhushani Sehwag,
Kaixuan Huang,
Luxi He,
Boyi Wei,
Dacheng Li,
Ying Sheng,
Ruoxi Jia,
Bo Li,
Kai Li,
Danqi Chen,
Peter Henderson,
Prateek Mittal
Abstract:
Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face three limitations that we address with SORRY-Bench, our proposed benchmark. First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics…
▽ More
Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face three limitations that we address with SORRY-Bench, our proposed benchmark. First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics. For example, among the ten existing datasets that we evaluated, tests for refusals of self-harm instructions are over 3x less represented than tests for fraudulent activities. SORRY-Bench improves on this by using a fine-grained taxonomy of 44 potentially unsafe topics, and 440 class-balanced unsafe instructions, compiled through human-in-the-loop methods. Second, linguistic characteristics and formatting of prompts are often overlooked, like different languages, dialects, and more -- which are only implicitly considered in many evaluations. We supplement SORRY-Bench with 20 diverse linguistic augmentations to systematically examine these effects. Third, existing evaluations rely on large LLMs (e.g., GPT-4) for evaluation, which can be computationally expensive. We investigate design choices for creating a fast, accurate automated safety evaluator. By collecting 7K+ human annotations and conducting a meta-evaluation of diverse LLM-as-a-judge designs, we show that fine-tuned 7B LLMs can achieve accuracy comparable to GPT-4 scale LLMs, with lower computational cost. Putting these together, we evaluate over 50 proprietary and open-weight LLMs on SORRY-Bench, analyzing their distinctive safety refusal behaviors. We hope our effort provides a building block for systematic evaluations of LLMs' safety refusal capabilities, in a balanced, granular, and efficient manner. Benchmark demo, data, code, and models are available through https://sorry-bench.github.io.
△ Less
Submitted 1 March, 2025; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Data Shapley in One Training Run
Authors:
Jiachen T. Wang,
Prateek Mittal,
Dawn Song,
Ruoxi Jia
Abstract:
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, m…
▽ More
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, meaning they cannot perform targeted attribution towards a specific model obtained from a single run of the algorithm. This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest. In its most efficient implementation, our technique incurs negligible additional runtime compared to standard model training. This dramatic efficiency improvement makes it possible to perform data attribution for the foundation model pretraining stage for the first time. We present several case studies that offer fresh insights into pretraining data's contribution and discuss their implications for copyright in generative AI and pretraining data curation.
△ Less
Submitted 7 June, 2025; v1 submitted 16 June, 2024;
originally announced June 2024.
-
Towards Signal Processing In Large Language Models
Authors:
Prateek Verma,
Mert Pilanci
Abstract:
This paper introduces the idea of applying signal processing inside a Large Language Model (LLM). With the recent explosion of generative AI, our work can help bridge two fields together, namely the field of signal processing and large language models. We draw parallels between classical Fourier-Transforms and Fourier Transform-like learnable time-frequency representations for every intermediate a…
▽ More
This paper introduces the idea of applying signal processing inside a Large Language Model (LLM). With the recent explosion of generative AI, our work can help bridge two fields together, namely the field of signal processing and large language models. We draw parallels between classical Fourier-Transforms and Fourier Transform-like learnable time-frequency representations for every intermediate activation signal of an LLM. Once we decompose every activation signal across tokens into a time-frequency representation, we learn how to filter and reconstruct them, with all components learned from scratch, to predict the next token given the previous context. We show that for GPT-like architectures, our work achieves faster convergence and significantly increases performance by adding a minuscule number of extra parameters when trained for the same epochs. We hope this work paves the way for algorithms exploring signal processing inside the signals found in neural architectures like LLMs and beyond.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Quantum Hardware-Enabled Molecular Dynamics via Transfer Learning
Authors:
Abid Khan,
Prateek Vaish,
Yaoqi Pang,
Nikhil Kowshik,
Michael S. Chen,
Clay H. Batton,
Grant M. Rotskoff,
J. Wayne Mullinax,
Bryan K. Clark,
Brenda M. Rubenstein,
Norm M. Tubman
Abstract:
The ability to perform ab initio molecular dynamics simulations using potential energies calculated on quantum computers would allow virtually exact dynamics for chemical and biochemical systems, with substantial impacts on the fields of catalysis and biophysics. However, noisy hardware, the costs of computing gradients, and the number of qubits required to simulate large systems present major cha…
▽ More
The ability to perform ab initio molecular dynamics simulations using potential energies calculated on quantum computers would allow virtually exact dynamics for chemical and biochemical systems, with substantial impacts on the fields of catalysis and biophysics. However, noisy hardware, the costs of computing gradients, and the number of qubits required to simulate large systems present major challenges to realizing the potential of dynamical simulations using quantum hardware. Here, we demonstrate that some of these issues can be mitigated by recent advances in machine learning. By combining transfer learning with techniques for building machine-learned potential energy surfaces, we propose a new path forward for molecular dynamics simulations on quantum hardware. We use transfer learning to reduce the number of energy evaluations that use quantum hardware by first training models on larger, less accurate classical datasets and then refining them on smaller, more accurate quantum datasets. We demonstrate this approach by training machine learning models to predict a molecule's potential energy using Behler-Parrinello neural networks. When successfully trained, the model enables energy gradient predictions necessary for dynamics simulations that cannot be readily obtained directly from quantum hardware. To reduce the quantum resources needed, the model is initially trained with data derived from low-cost techniques, such as Density Functional Theory, and subsequently refined with a smaller dataset obtained from the optimization of the Unitary Coupled Cluster ansatz. We show that this approach significantly reduces the size of the quantum training dataset while capturing the high accuracies needed for quantum chemistry simulations.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Authors:
Xiangyu Qi,
Ashwinee Panda,
Kaifeng Lyu,
Xiao Ma,
Subhrajit Roy,
Ahmad Beirami,
Prateek Mittal,
Peter Henderson
Abstract:
The safety alignment of current Large Language Models (LLMs) is vulnerable. Relatively simple attacks, or even benign fine-tuning, can jailbreak aligned models. We argue that many of these vulnerabilities are related to a shared underlying issue: safety alignment can take shortcuts, wherein the alignment adapts a model's generative distribution primarily over only its very first few output tokens.…
▽ More
The safety alignment of current Large Language Models (LLMs) is vulnerable. Relatively simple attacks, or even benign fine-tuning, can jailbreak aligned models. We argue that many of these vulnerabilities are related to a shared underlying issue: safety alignment can take shortcuts, wherein the alignment adapts a model's generative distribution primarily over only its very first few output tokens. We refer to this issue as shallow safety alignment. In this paper, we present case studies to explain why shallow safety alignment can exist and provide evidence that current aligned LLMs are subject to this issue. We also show how these findings help explain multiple recently discovered vulnerabilities in LLMs, including the susceptibility to adversarial suffix attacks, prefilling attacks, decoding parameter attacks, and fine-tuning attacks. Importantly, we discuss how this consolidated notion of shallow safety alignment sheds light on promising research directions for mitigating these vulnerabilities. For instance, we show that deepening the safety alignment beyond just the first few tokens can often meaningfully improve robustness against some common exploits. Finally, we design a regularized finetuning objective that makes the safety alignment more persistent against fine-tuning attacks by constraining updates on initial tokens. Overall, we advocate that future safety alignment should be made more than just a few tokens deep.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Breaking into the window of primordial black hole dark matter with x-ray microlensing
Authors:
Manish Tamta,
Nirmal Raj,
Prateek Sharma
Abstract:
Primordial black holes (PBHs) in the mass range $10^{-16}-10^{-11}~M_\odot$ may constitute all the dark matter. We show that gravitational microlensing of bright x-ray pulsars provide the most robust and immediately implementable opportunity to uncover PBH dark matter in this mass window. As proofs of concept, we show that the currently operational NICER telescope can probe this window near…
▽ More
Primordial black holes (PBHs) in the mass range $10^{-16}-10^{-11}~M_\odot$ may constitute all the dark matter. We show that gravitational microlensing of bright x-ray pulsars provide the most robust and immediately implementable opportunity to uncover PBH dark matter in this mass window. As proofs of concept, we show that the currently operational NICER telescope can probe this window near $10^{-14}~M_\odot$ with just two months of exposure on the x-ray pulsar SMC-X1, and that the forthcoming STROBE-X telescope can probe complementary regions in only a few weeks. These times are comparable to the week-long exposures obtained by NICER on various individual sources. We take into account the effects of wave optics and the finite extent of the source, which become important for the mass range of our PBHs. We also provide a spectral diagnostic to distinguish microlensing from transient background events and to broadly mark the PBH mass if true microlensing events are observed. In light of the powerful science case, i.e., the imminent discovery of dark matter searchable over multiple decades of PBH masses with achievable exposures, we strongly urge the commission of a dedicated large broadband telescope for x-ray microlensing. We derive the microlensing reach of such a telescope by assuming sensitivities of detector components of proposed missions, and find that with hard x-ray pulsar sources PBH masses down to a few $10^{-17}~M_\odot$ can be probed.
△ Less
Submitted 28 February, 2025; v1 submitted 30 May, 2024;
originally announced May 2024.
-
AI Risk Management Should Incorporate Both Safety and Security
Authors:
Xiangyu Qi,
Yangsibo Huang,
Yi Zeng,
Edoardo Debenedetti,
Jonas Geiping,
Luxi He,
Kaixuan Huang,
Udari Madhushani,
Vikash Sehwag,
Weijia Shi,
Boyi Wei,
Tinghao Xie,
Danqi Chen,
Pin-Yu Chen,
Jeffrey Ding,
Ruoxi Jia,
Jiaqi Ma,
Arvind Narayanan,
Weijie J Su,
Mengdi Wang,
Chaowei Xiao,
Bo Li,
Dawn Song,
Peter Henderson,
Prateek Mittal
Abstract:
The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this pape…
▽ More
The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Certifiably Robust RAG against Retrieval Corruption
Authors:
Chong Xiang,
Tong Wu,
Zexuan Zhong,
David Wagner,
Danqi Chen,
Prateek Mittal
Abstract:
Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks: an attacker can inject malicious passages into retrieval results to induce inaccurate responses. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we get LLM responses from each pas…
▽ More
Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks: an attacker can inject malicious passages into retrieval results to induce inaccurate responses. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we get LLM responses from each passage in isolation and then securely aggregate these isolated responses. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG can achieve certifiable robustness: we can formally prove and certify that, for certain queries, RobustRAG can always return accurate responses, even when the attacker has full knowledge of our defense and can arbitrarily inject a small number of malicious passages. We evaluate RobustRAG on open-domain QA and long-form text generation datasets and demonstrate its effectiveness and generalizability across various tasks and datasets.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Rapid Sensing of Heat Stress using Machine Learning of Micrographs of Red Blood Cells Dispersed in Liquid Crystals
Authors:
Prateek Verma,
Elizabeth Adeogun,
Elizabeth S. Greene,
Sami Dridi,
Ukash Nakarmi,
Karthik Nayani
Abstract:
An imbalance between bodily heat production and heat dissipation leads to heat stress in organisms. In addition to diminished animal well-being, heat stress is detrimental to the poultry industry as poultry entails fast growth and high yield, resulting in greater metabolic activity and higher body heat production. When stressed, cells overexpress heat shock proteins (such as HSP70, a well-establis…
▽ More
An imbalance between bodily heat production and heat dissipation leads to heat stress in organisms. In addition to diminished animal well-being, heat stress is detrimental to the poultry industry as poultry entails fast growth and high yield, resulting in greater metabolic activity and higher body heat production. When stressed, cells overexpress heat shock proteins (such as HSP70, a well-established intracellular stress indicator) and may undergo changes in their mechanical properties. Liquid crystals (LCs, fluids with orientational order) have been recently employed to rapidly characterize changes in mechanical properties of cells enabling a means of optically reporting the presence of disease in organisms. In this work, we explore the difference in the expression of HSP70 to a change in the LC response pattern via the use of convolutional neural networks (CNNs). The machine-learning (ML) models were trained on hundreds of such LC-response micrographs of chicken red blood cells with and without heat stress. Trained models exhibited remarkable accuracy of up to 99% on detecting the presence of heat stress in unseen microscope samples. We also show that crosslinking the chicken and human RBCs using glutaraldehyde in order to simulate a diseased cell was an efficient strategy for planning, building, training, and evaluating ML models. Overall, our efforts build towards the rapid detection of disease in organisms, which is accompanied by a distinct change in the mechanical properties of cells. We aim to eventuate CNN-enabled LC-sensors can rapidly report the presence of disease in scenarios where human judgment could be prohibitively difficult or slow.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Generative Active Learning for the Search of Small-molecule Protein Binders
Authors:
Maksym Korablyov,
Cheng-Hao Liu,
Moksh Jain,
Almer M. van der Sloot,
Eric Jolicoeur,
Edward Ruediger,
Andrei Cristian Nica,
Emmanuel Bengio,
Kostiantyn Lapchevskyi,
Daniel St-Cyr,
Doris Alexandra Schuetz,
Victor Ion Butoi,
Jarrid Rector-Brooks,
Simon Blackburn,
Leo Feng,
Hadi Nekoei,
SaiKrishna Gottipati,
Priyesh Vijayan,
Prateek Gupta,
Ladislav Rampášek,
Sasikanth Avancha,
Pierre-Luc Bacon,
William L. Hamilton,
Brooks Paige,
Sanchit Misra
, et al. (9 additional authors not shown)
Abstract:
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu…
▽ More
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Position: Towards Resilience Against Adversarial Examples
Authors:
Sihui Dai,
Chong Xiang,
Tong Wu,
Prateek Mittal
Abstract:
Current research on defending against adversarial examples focuses primarily on achieving robustness against a single attack type such as $\ell_2$ or $\ell_{\infty}$-bounded attacks. However, the space of possible perturbations is much larger than considered by many existing defenses and is difficult to mathematically model, so the attacker can easily bypass the defense by using a type of attack t…
▽ More
Current research on defending against adversarial examples focuses primarily on achieving robustness against a single attack type such as $\ell_2$ or $\ell_{\infty}$-bounded attacks. However, the space of possible perturbations is much larger than considered by many existing defenses and is difficult to mathematically model, so the attacker can easily bypass the defense by using a type of attack that is not covered by the defense. In this position paper, we argue that in addition to robustness, we should also aim to develop defense algorithms that are adversarially resilient -- defense algorithms should specify a means to quickly adapt the defended model to be robust against new attacks. We provide a definition of adversarial resilience and outline considerations of designing an adversarially resilient defense. We then introduce a subproblem of adversarial resilience which we call continual adaptive robustness, in which the defender gains knowledge of the formulation of possible perturbation spaces over time and can then update their model based on this information. Additionally, we demonstrate the connection between continual adaptive robustness and previously studied problems of multiattack robustness and unforeseen attack robustness and outline open directions within these fields which can contribute to improving continual adaptive robustness and adversarial resilience.
△ Less
Submitted 8 October, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis
Authors:
Prateek Verma,
Minh-Hao Van,
Xintao Wu
Abstract:
Vision language models (VLMs) have recently emerged and gained the spotlight for their ability to comprehend the dual modality of image and textual data. VLMs such as LLaVA, ChatGPT-4, and Gemini have recently shown impressive performance on tasks such as natural image captioning, visual question answering (VQA), and spatial reasoning. Additionally, a universal segmentation model by Meta AI, Segme…
▽ More
Vision language models (VLMs) have recently emerged and gained the spotlight for their ability to comprehend the dual modality of image and textual data. VLMs such as LLaVA, ChatGPT-4, and Gemini have recently shown impressive performance on tasks such as natural image captioning, visual question answering (VQA), and spatial reasoning. Additionally, a universal segmentation model by Meta AI, Segment Anything Model (SAM) shows unprecedented performance at isolating objects from unforeseen images. Since medical experts, biologists, and materials scientists routinely examine microscopy or medical images in conjunction with textual information in the form of captions, literature, or reports, and draw conclusions of great importance and merit, it is indubitably essential to test the performance of VLMs and foundation models such as SAM, on these images. In this study, we charge ChatGPT, LLaVA, Gemini, and SAM with classification, segmentation, counting, and VQA tasks on a variety of microscopy images. We observe that ChatGPT and Gemini are impressively able to comprehend the visual features in microscopy images, while SAM is quite capable at isolating artefacts in a general sense. However, the performance is not close to that of a domain expert - the models are readily encumbered by the introduction of impurities, defects, artefact overlaps and diversity present in the images.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Using sunRunner3D to interpret the global structure of the heliosphere from in situ measurements
Authors:
José Juan González-Avilés,
Pete Riley,
Michal Ben-Nun,
Prateek Mayank,
Bhargav Vaidya
Abstract:
Understanding the large-scale three-dimensional structure of the inner heliosphere, while important in its own right, is crucial for space weather applications, such as forecasting the time of arrival and propagation of coronal mass ejections (CMEs). This study uses sunRunner3D (3D), a 3-D magnetohydrodynamic (MHD) model, to simulate solar wind (SW) streams and generate background states. SR3D emp…
▽ More
Understanding the large-scale three-dimensional structure of the inner heliosphere, while important in its own right, is crucial for space weather applications, such as forecasting the time of arrival and propagation of coronal mass ejections (CMEs). This study uses sunRunner3D (3D), a 3-D magnetohydrodynamic (MHD) model, to simulate solar wind (SW) streams and generate background states. SR3D employs the boundary conditions generated by CORona-HELiosphere (CORHEL) and the PLUTO code to compute the plasma properties of the SW with the MHD approximation up to 1.1 AU in the inner heliosphere. We demonstrate that SR3D reproduces global features of Corotating Interaction Regions (CIRs) observed by Earth-based spacecraft (OMNI) and the Solar TErrestial RElations Observatory (STEREO)-A for a set of Carrington rotations (CRs) that cover a period that lays in the late declining phase of solar cycle 24. Additionally, we demonstrate that the model solutions are valid in the corotating and inertial frames of references. Moreover, a comparison between SR3D simulations and in-situ measurements shows reasonable agreement with the observations, and our results are comparable to those achieved by Predictive Science Inc.'s Magnetohydrodynamic Algorithm outside a Sphere (MAS) code. We have also undertaken a comparative analysis with the Space Weather Adaptive Simulation Framework for Solar Wind (SWASTi-SW), a PLUTO physics-based model, to evaluate the precision of various initial boundary conditions. Finally, we discuss the disparities in the solutions derived from inertial and rotating frames.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches
Authors:
Clément Christophe,
Praveen K Kanithi,
Prateek Munjal,
Tathagata Raha,
Nasir Hayat,
Ronnie Rajan,
Ahmed Al-Mahrooqi,
Avani Gupta,
Muhammad Umar Salman,
Gurpreet Gosal,
Bhargav Kanakiya,
Charles Chen,
Natalia Vassilieva,
Boulbaba Ben Amor,
Marco AF Pimentel,
Shadab Khan
Abstract:
This study presents a comprehensive analysis and comparison of two predominant fine-tuning methodologies - full-parameter fine-tuning and parameter-efficient tuning - within the context of medical Large Language Models (LLMs). We developed and refined a series of LLMs, based on the Llama-2 architecture, specifically designed to enhance medical knowledge retrieval, reasoning, and question-answering…
▽ More
This study presents a comprehensive analysis and comparison of two predominant fine-tuning methodologies - full-parameter fine-tuning and parameter-efficient tuning - within the context of medical Large Language Models (LLMs). We developed and refined a series of LLMs, based on the Llama-2 architecture, specifically designed to enhance medical knowledge retrieval, reasoning, and question-answering capabilities. Our experiments systematically evaluate the effectiveness of these tuning strategies across various well-known medical benchmarks. Notably, our medical LLM Med42 showed an accuracy level of 72% on the US Medical Licensing Examination (USMLE) datasets, setting a new standard in performance for openly available medical LLMs. Through this comparative analysis, we aim to identify the most effective and efficient method for fine-tuning LLMs in the medical domain, thereby contributing significantly to the advancement of AI-driven healthcare applications.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Chemical interactions in active droplets
Authors:
Prateek Dwivedi,
Sobiya Ashraf,
Pawan Kumar,
Dipin Pillai,
Rahul Mangal
Abstract:
Interactions among biologically active agents is facilitated by their self-generated chemical and hydrodynamic fields. In order to elucidate the pair-wise interactions between such micro-organisms, we employ active droplets as a model system, capable of self-generating chemical and hydrodynamic fields. We demonstrate that the solute Péclet number ($Pe$), characterizing the relative strength of its…
▽ More
Interactions among biologically active agents is facilitated by their self-generated chemical and hydrodynamic fields. In order to elucidate the pair-wise interactions between such micro-organisms, we employ active droplets as a model system, capable of self-generating chemical and hydrodynamic fields. We demonstrate that the solute Péclet number ($Pe$), characterizing the relative strength of its convective to diffusive transport, plays a crucial role in determining how the chemical and hydrodynamic fields impact their interactions. Our findings reveal that at low $Pe$, the interaction is predominantly governed by chemo-repulsive effects, leading to droplets avoiding physical contact. Conversely, at elevated $Pe$, hydrodynamic interactions become more influential, leading to physical engagement. However, irrespective of $Pe$, the interaction of a droplet with the chemical trail of another droplet is always governed by chemo-repulsive effects. Furthermore, our results establish that the chemo-repulsive deflection/rebounding of droplets is influenced by the droplets' inherent chemical polarity, as determined by its $Pe$, independent of their approach orientation. Our findings offer a methodology for tuning the outcomes of binary interactions among chemically active droplets, laying the groundwork for potential studies on their collective dynamics.
△ Less
Submitted 8 February, 2025; v1 submitted 21 April, 2024;
originally announced April 2024.
-
Pair statistics of oblate spheroids settling in a turbulent flow
Authors:
Prateek Anand,
Samriddhi Sankar Ray
Abstract:
We perform direct numerical simulations of sub-Kolmogorov, inertial spheroids settling under gravity in homogeneous, isotropic turbulence and find that small-scale clustering, measured via the correlation dimension, depends sensitively on their aspect ratios. In particular, such particles are shown to cluster more as their anisotropy increases. Further, the approach rate for pairs of spheroids are…
▽ More
We perform direct numerical simulations of sub-Kolmogorov, inertial spheroids settling under gravity in homogeneous, isotropic turbulence and find that small-scale clustering, measured via the correlation dimension, depends sensitively on their aspect ratios. In particular, such particles are shown to cluster more as their anisotropy increases. Further, the approach rate for pairs of spheroids are calculated and found to deviate significantly from the spherical-particle limit. Our study, spanning a range of Stokes numbers and aspect ratios, provides critical inputs for developing collision models to understand the dynamics of sedimenting, anisotropic particles in general and ice crystals in clouds in particular.
△ Less
Submitted 9 April, 2025; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Ram pressure stripping in clusters: Gravity can bind the ISM but not the CGM
Authors:
Ritali Ghosh,
Alankar Dutta,
Prateek Sharma
Abstract:
We explore the survival of a galaxy's circumgalactic medium (CGM) as it experiences ram pressure stripping (RPS) moving through the intracluster medium (ICM). For a satellite galaxy, the CGM is often assumed to be entirely stripped/evaporated, an assumption that may not always be justified. We carry out 3D-hydrodynamic simulations of the interstellar and circumgalactic media (ISM+CGM) of a galaxy…
▽ More
We explore the survival of a galaxy's circumgalactic medium (CGM) as it experiences ram pressure stripping (RPS) moving through the intracluster medium (ICM). For a satellite galaxy, the CGM is often assumed to be entirely stripped/evaporated, an assumption that may not always be justified. We carry out 3D-hydrodynamic simulations of the interstellar and circumgalactic media (ISM+CGM) of a galaxy like JO201 moving through the ICM. The CGM can survive long at cluster outskirts ($\gtrsim2 \rm \ Gyr$) but at smaller cluster-centric distances, 90\% of the CGM mass is lost within $\sim 500$ Myr. The gravitational restoring force on the CGM is mostly negligible and the CGM-ICM interaction is analogous to \textit{`cloud-wind interaction'}. The CGM stripping timescale does not depend on the ram pressure but on the CGM to ICM density contrast $χ$. Two distinct regimes emerge for CGM stripping: the $χ>1$ regime, which is the well-known \textit{`cloud crushing'} problem, and the $χ<1$ regime, which we refer to as the (relatively unexplored) \textit{`bubble drag'} problem. The first pericentric passage near the cluster core can rapidly -- over a crossing time $t_{\rm drag} \sim R/v_{\rm rel}$ -- strip the CGM in the \textit{bubble drag} regime. The ISM stripping criterion unlike the CGM criterion, still depends on the ram pressure $ρ_{\rm ICM} v_{\rm rel}^2$. The stripped tails of satellites contain contributions from both the disk and the CGM. The X-ray plume in M89 in the Virgo cluster and a lack of it in the nearby M90 might be attributed to their orbital histories. M90 has likely undergone stripping in the bubble drag regime due to a pericentric passage close to the cluster center.
△ Less
Submitted 11 August, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code
Authors:
Taishi Nakamura,
Mayank Mishra,
Simone Tedeschi,
Yekun Chai,
Jason T Stillerman,
Felix Friedrich,
Prateek Yadav,
Tanmay Laud,
Vu Minh Chien,
Terry Yue Zhuo,
Diganta Misra,
Ben Bogin,
Xuan-Son Vu,
Marzena Karpinska,
Arnav Varma Dantuluri,
Wojciech Kusa,
Tommaso Furlanello,
Rio Yokota,
Niklas Muennighoff,
Suhas Pai,
Tosin Adewumi,
Veronika Laippala,
Xiaozhe Yao,
Adalberto Junior,
Alpay Ariyak
, et al. (20 additional authors not shown)
Abstract:
Pretrained language models are an integral part of AI applications, but their high computational cost for training limits accessibility. Initiatives such as Bloom and StarCoder aim to democratize access to pretrained models for collaborative community development. Despite these efforts, such models encounter challenges such as limited multilingual capabilities, risks of catastrophic forgetting dur…
▽ More
Pretrained language models are an integral part of AI applications, but their high computational cost for training limits accessibility. Initiatives such as Bloom and StarCoder aim to democratize access to pretrained models for collaborative community development. Despite these efforts, such models encounter challenges such as limited multilingual capabilities, risks of catastrophic forgetting during continual pretraining, and the high costs of training models from scratch, alongside the need to align with AI safety standards and regulatory frameworks.
This paper presents Aurora-M, a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. Continually pretrained from StarCoderPlus on 435B additional tokens, Aurora-M surpasses 2T tokens in total training token count. It is the first open-source multilingual model fine-tuned on human-reviewed safety instructions, thus aligning its development not only with conventional red-teaming considerations, but also with the specific concerns articulated in the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.
We evaluate Aurora-M across a wide range of tasks and languages, showcasing its robustness against catastrophic forgetting and its superior performance in multilingual settings, particularly in safety evaluations. We open-source Aurora-M and its variants to encourage responsible open-source development of large language models at https://huggingface.co/aurora-m.
△ Less
Submitted 26 December, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
Positivity preservers over finite fields
Authors:
Dominique Guillot,
Himanshu Gupta,
Prateek Kumar Vishwakarma,
Chi Hoi Yip
Abstract:
We resolve an algebraic version of Schoenberg's celebrated theorem [Duke Math.J., 1942] characterizing entrywise matrix transforms that preserve positive definiteness. Compared to the classical real and complex settings, we consider matrices with entries in a finite field and obtain a complete characterization of such preservers for matrices of a fixed dimension. When the dimension of the matrices…
▽ More
We resolve an algebraic version of Schoenberg's celebrated theorem [Duke Math.J., 1942] characterizing entrywise matrix transforms that preserve positive definiteness. Compared to the classical real and complex settings, we consider matrices with entries in a finite field and obtain a complete characterization of such preservers for matrices of a fixed dimension. When the dimension of the matrices is at least $3$, we prove that, surprisingly, the positivity preservers are precisely the positive multiples of the field's automorphisms. We also obtain characterizations of preservers for matrices of dimension $2$ over a finite field with $q$ elements, unless $q \equiv 1 \pmod 4$ and $q$ is not a square. Our proofs build on several novel connections between positivity preservers and field automorphisms via the works of Weil, Carlitz, and Muzychuk-Kovács, and via the structure of cliques in Paley graphs.
△ Less
Submitted 18 October, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Authors:
Jinhyuk Lee,
Zhuyun Dai,
Xiaoqi Ren,
Blair Chen,
Daniel Cer,
Jeremy R. Cole,
Kai Hui,
Michael Boratko,
Rajvi Kapadia,
Wen Ding,
Yi Luan,
Sai Meher Karthik Duddu,
Gustavo Hernandez Abrego,
Weiqiang Shi,
Nithi Gupta,
Aditya Kusupati,
Prateek Jain,
Siddhartha Reddy Jonnalagadda,
Ming-Wei Chang,
Iftekhar Naim
Abstract:
We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each…
▽ More
We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each query, and relabeling the positive and hard negative passages using the same LLM. The effectiveness of our approach is demonstrated by the compactness of the Gecko. On the Massive Text Embedding Benchmark (MTEB), Gecko with 256 embedding dimensions outperforms all existing entries with 768 embedding size. Gecko with 768 embedding dimensions achieves an average score of 66.31, competing with 7x larger models and 5x higher dimensional embeddings.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Zutu: A Platform for Localization and Navigation of Swarm Robots Using Virtual Grids
Authors:
Prateek,
Pawan Wadhwani,
Reshesh Kumar Pathak,
Mayur Bhosale,
A Helen Victoria
Abstract:
Swarm robots, which are inspired from the way insects behave collectively in order to achieve a common goal, have become a major part of research with applications involving search and rescue, area exploration, surveillance etc. In this paper, we present a swarm of robots that do not require individual extrinsic sensors to sense the environment but instead use a single central camera to locate and…
▽ More
Swarm robots, which are inspired from the way insects behave collectively in order to achieve a common goal, have become a major part of research with applications involving search and rescue, area exploration, surveillance etc. In this paper, we present a swarm of robots that do not require individual extrinsic sensors to sense the environment but instead use a single central camera to locate and map the swarm. The robots can be easily built using readily available components with the main chassis being 3D printed, making the system low-cost, low-maintenance, and easy to replicate. We describe Zutu's hardware and software architecture, the algorithms to map the robots to the real world, and some experiments conducted using four of our robots. Eventually, we conclude the possible applications of our system in research, education, and industries.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1112 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 16 December, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering
Authors:
Saeel Sandeep Nachane,
Ojas Gramopadhye,
Prateek Chanda,
Ganesh Ramakrishnan,
Kshitij Sharad Jadhav,
Yatin Nandwani,
Dinesh Raghu,
Sachindra Joshi
Abstract:
In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers. Additionally, we implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning, reaching a correct resp…
▽ More
In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers. Additionally, we implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning, reaching a correct response to medical questions. We empirically demonstrate how CLINICR outperforms the state-of-the-art 5-shot CoT-based prompt (Liévin et al., 2022). We also present an approach that mirrors real-life clinical practice by first exploring multiple differential diagnoses through MCQ-CLINICR and subsequently narrowing down to a final diagnosis using MCQ-ELIMINATIVE. Finally, emphasizing the importance of response verification in medical settings, we utilize a reward model mechanism, replacing the elimination process performed by MCQ-ELIMINATIVE.
△ Less
Submitted 15 October, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Teach LLMs to Phish: Stealing Private Information from Language Models
Authors:
Ashwinee Panda,
Christopher A. Choquette-Choo,
Zhengming Zhang,
Yaoqing Yang,
Prateek Mittal
Abstract:
When large language models are trained on private data, it can be a significant privacy risk for them to memorize and regurgitate sensitive information. In this work, we propose a new practical data extraction attack that we call "neural phishing". This attack enables an adversary to target and extract sensitive or personally identifiable information (PII), e.g., credit card numbers, from a model…
▽ More
When large language models are trained on private data, it can be a significant privacy risk for them to memorize and regurgitate sensitive information. In this work, we propose a new practical data extraction attack that we call "neural phishing". This attack enables an adversary to target and extract sensitive or personally identifiable information (PII), e.g., credit card numbers, from a model trained on user data with upwards of 10% attack success rates, at times, as high as 50%. Our attack assumes only that an adversary can insert as few as 10s of benign-appearing sentences into the training dataset using only vague priors on the structure of the user data.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study
Authors:
Minh-Hao Van,
Prateek Verma,
Xintao Wu
Abstract:
Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large…
▽ More
Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used in the biomedical imaging field. Along that direction, there is a lack of related work to show the ability of large models to diagnose the diseases. In this work, we study the zero-shot and few-shot robustness of VLMs on the medical imaging analysis tasks. Our comprehensive experiments demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X-rays.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Finite-temperature grain boundary properties from quasistatic atomistics
Authors:
Miguel Spínola,
Shashank Saxena,
Prateek Gupta,
Brandon Runnels,
Dennis M. Kochmann
Abstract:
Grain boundary (GB) properties greatly influence the mechanical, electrical, and thermal response of polycrystalline materials. Most computational studies of GB properties at finite temperatures use molecular dynamics (MD), which is computationally expensive, limited in the range of accessible timescales, and requires cumbersome techniques like thermodynamic integration to estimate free energies.…
▽ More
Grain boundary (GB) properties greatly influence the mechanical, electrical, and thermal response of polycrystalline materials. Most computational studies of GB properties at finite temperatures use molecular dynamics (MD), which is computationally expensive, limited in the range of accessible timescales, and requires cumbersome techniques like thermodynamic integration to estimate free energies. This restricts the reasonable computation (without incurring excessive computational expense) of GB properties to regimes that are often unrealistic, such as zero temperature or extremely high strain rates. Consequently, there is a need for simulation methodology that avoids the timescale limitations of MD, while providing reliable estimates of GB properties. The Gaussian Phase-Packet (GPP) method is a temporal coarse-graining technique that can predict relaxed atomic structures at finite temperature in the quasistatic limit. This work applies GPP, combined with the quasiharmonic approximation for computing the free energy, to the problem of determining the free energy and shear coupling factor of grain boundaries in metals over a range of realistic temperatures. Validation is achieved by comparison to thermodynamic integration, which confirms that the presented approach captures relaxed-energy GB structures and shear coupling factors at finite temperature with a high degree of accuracy.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Dissipation of nonlinear acoustic waves in thermoviscous pores
Authors:
Krishna Sahithi,
Prateek Gupta
Abstract:
We derive a nonlinear acoustic wave propagation model for analysing the thermoviscous dissipation in narrow pores with wavy walls. As the nonlinear waves propagate in the thermoviscous pores, the wave-steepening effect competes with the bulk dissipation, as well as the thermoviscous heat transfer and shear from the pore walls. Consequently, the length scale of the wave is modified. We use the char…
▽ More
We derive a nonlinear acoustic wave propagation model for analysing the thermoviscous dissipation in narrow pores with wavy walls. As the nonlinear waves propagate in the thermoviscous pores, the wave-steepening effect competes with the bulk dissipation, as well as the thermoviscous heat transfer and shear from the pore walls. Consequently, the length scale of the wave is modified. We use the characteristic nonlinear wave thickness scale to obtain linear and nonlinear wave equations governing the unsteady shock-wall interaction. We also perform two-dimensional shock-resolved DNS of the wave propagation inside the pores and compare the results with model equations. We show that for flat-walls and shock strength parameter $ε$, the dimensional wall heat-flux and shear scale as $ε$. For wavy walls, the scaling becomes $ε^{3/2 - n(k)}$ where $k$ is the wall-waviness wavenumber and the exponent $n$ increases from $0.5$ for $k=0$ to $n(k)\approx0.65$ for $k=10$, $n(k)\approx 0.75$ for $k=20$, and $n(k)\approx0.85$ for $k=40$. Hence, increasing the wall waviness reduces the dependence of the wall heat-flux and shear on nonlinear acoustic wave strength.
△ Less
Submitted 15 February, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference
Authors:
Yashas Samaga B L,
Varun Yerram,
Chong You,
Srinadh Bhojanapalli,
Sanjiv Kumar,
Prateek Jain,
Praneeth Netrapalli
Abstract:
Autoregressive decoding with generative Large Language Models (LLMs) on accelerators (GPUs/TPUs) is often memory-bound where most of the time is spent on transferring model parameters from high bandwidth memory (HBM) to cache. On the other hand, recent works show that LLMs can maintain quality with significant sparsity/redundancy in the feedforward (FFN) layers by appropriately training the model…
▽ More
Autoregressive decoding with generative Large Language Models (LLMs) on accelerators (GPUs/TPUs) is often memory-bound where most of the time is spent on transferring model parameters from high bandwidth memory (HBM) to cache. On the other hand, recent works show that LLMs can maintain quality with significant sparsity/redundancy in the feedforward (FFN) layers by appropriately training the model to operate on a top-$k$ fraction of rows/columns (where $k \approx 0.05$), there by suggesting a way to reduce the transfer of model parameters, and hence latency. However, exploiting this sparsity for improving latency is hindered by the fact that identifying top rows/columns is data-dependent and is usually performed using full matrix operations, severely limiting potential gains. To address these issues, we introduce HiRE (High Recall Approximate Top-k Estimation). HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator. We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47\times$ on a single TPUv5e device.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Tandem Transformers for Inference Efficient LLMs
Authors:
Aishwarya P S,
Pranav Ajit Nair,
Yashas Samaga,
Toby Boyd,
Sanjiv Kumar,
Prateek Jain,
Praneeth Netrapalli
Abstract:
The autoregressive nature of conventional large language models (LLMs) inherently limits inference speed, as tokens are generated sequentially. While speculative and parallel decoding techniques attempt to mitigate this, they face limitations: either relying on less accurate smaller models for generation or failing to fully leverage the base LLM's representations.
We introduce a novel architectu…
▽ More
The autoregressive nature of conventional large language models (LLMs) inherently limits inference speed, as tokens are generated sequentially. While speculative and parallel decoding techniques attempt to mitigate this, they face limitations: either relying on less accurate smaller models for generation or failing to fully leverage the base LLM's representations.
We introduce a novel architecture, Tandem transformers, to address these issues. This architecture uniquely combines (1) a small autoregressive model and (2) a large model operating in block mode (processing multiple tokens simultaneously). The small model's predictive accuracy is substantially enhanced by granting it attention to the large model's richer representations. On the PaLM2 pretraining dataset, a tandem of PaLM2-Bison and PaLM2-Gecko demonstrates a 3.3% improvement in next-token prediction accuracy over a standalone PaLM2-Gecko, offering a 1.16x speedup compared to a PaLM2-Otter model with comparable downstream performance. We further incorporate the tandem model within the speculative decoding (SPEED) framework where the large model validates tokens from the small model. This ensures that the Tandem of PaLM2-Bison and PaLM2-Gecko achieves substantial speedup (around 1.14x faster than using vanilla PaLM2-Gecko in SPEED) while maintaining identical downstream task accuracy.
△ Less
Submitted 20 October, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Rhizomes and Diffusions for Processing Highly Skewed Graphs on Fine-Grain Message-Driven Systems
Authors:
Bibrak Qamar Chandio,
Prateek Srivastava,
Maciej Brodowicz,
Martin Swany,
Thomas Sterling
Abstract:
The paper provides a unified co-design of 1) a programming and execution model that allows spawning tasks from within the vertex data at runtime, 2) language constructs for \textit{actions} that send work to where the data resides, combining parallel expressiveness of local control objects (LCOs) to implement asynchronous graph processing primitives, 3) and an innovative vertex-centric data-struct…
▽ More
The paper provides a unified co-design of 1) a programming and execution model that allows spawning tasks from within the vertex data at runtime, 2) language constructs for \textit{actions} that send work to where the data resides, combining parallel expressiveness of local control objects (LCOs) to implement asynchronous graph processing primitives, 3) and an innovative vertex-centric data-structure, using the concept of Rhizomes, that parallelizes both the out and in-degree load of vertex objects across many cores and yet provides a single programming abstraction to the vertex objects. The data structure hierarchically parallelizes the out-degree load of vertices and the in-degree load laterally. The rhizomes internally communicate and remain consistent, using event-driven synchronization mechanisms, to provide a unified and correct view of the vertex.
Simulated experimental results show performance gains for BFS, SSSP, and Page Rank on large chip sizes for the tested input graph datasets containing highly skewed degree distribution. The improvements come from the ability to express and create fine-grain dynamic computing task in the form of \textit{actions}, language constructs that aid the compiler to generate code that the runtime system uses to optimally schedule tasks, and the data structure that shares both in and out-degree compute workload among memory-processing elements.
△ Less
Submitted 7 May, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Authors:
Boyi Wei,
Kaixuan Huang,
Yangsibo Huang,
Tinghao Xie,
Xiangyu Qi,
Mengzhou Xia,
Prateek Mittal,
Mengdi Wang,
Peter Henderson
Abstract:
Large language models (LLMs) show inherent brittleness in their safety mechanisms, as evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This study explores this brittleness of safety alignment by leveraging pruning and low-rank modifications. We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from util…
▽ More
Large language models (LLMs) show inherent brittleness in their safety mechanisms, as evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This study explores this brittleness of safety alignment by leveraging pruning and low-rank modifications. We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels. Surprisingly, the isolated regions we find are sparse, comprising about $3\%$ at the parameter level and $2.5\%$ at the rank level. Removing these regions compromises safety without significantly impacting utility, corroborating the inherent brittleness of the model's safety mechanisms. Moreover, we show that LLMs remain vulnerable to low-cost fine-tuning attacks even when modifications to the safety-critical regions are restricted. These findings underscore the urgent need for more robust safety strategies in LLMs.
△ Less
Submitted 24 October, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Localizing uniformly moving single-frequency sources using an inverse 2.5D approach
Authors:
Christian H. Kasess,
Wolfgang Kreuzer,
Prateek Soni,
Holger Waubke
Abstract:
Localizing linearly moving sound sources using microphone arrays is challenging as the transient nature of the signal leads to relatively short observation periods. Commonly, a moving focus is used and most methods operate at least partially in the time domain. In contrast, this manuscript presents an inverse source localization algorithm for uniformly moving single-frequency sources that acts ent…
▽ More
Localizing linearly moving sound sources using microphone arrays is challenging as the transient nature of the signal leads to relatively short observation periods. Commonly, a moving focus is used and most methods operate at least partially in the time domain. In contrast, this manuscript presents an inverse source localization algorithm for uniformly moving single-frequency sources that acts entirely in the frequency domain. For this, a 2.5D approach is utilized and a transfer function between sources and a microphone grid is derived. By solving a least squares problem using the data at the microphone grid, the unknown source distribution in the moving frame can be determined. First, the time signals need to be transformed from time into frequency domain using a windowed discrete Fourier transform (DFT), which leads to spectral leakage that depends on the length of the time interval and the analysis window used. To include spectral leakage in the numerical model, the calculation of the transfer matrix is modified using the Fourier transform of the analysis window in the DFT applied to the measurements. Currently, this approach is limited to single-frequency sources as this restriction allows for simplified calculations and reduces the computational effort. The least squares problem is solved using a Tikhonov regularization and an L-curve approach. As moving sources are considered, utilizing the Doppler effect enhances the stability of the system by combining the transfer functions for multiple frequencies in the measured signals. The performance is validated using simulated data of a moving point source with or without a reflecting ground. Numerical experiments are performed to show the effect of the choice of frequencies in the receiver spectrum, the effect of the DFT, the source frequency, the distance between source and receiver, and the robustness with respect to noise.
△ Less
Submitted 20 August, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
AI as a Medical Ally: Evaluating ChatGPT's Usage and Impact in Indian Healthcare
Authors:
Aryaman Raina,
Prateek Mishra,
Harshit goyal,
Dhruv Kumar
Abstract:
This study investigates the integration and impact of Large Language Models (LLMs), like ChatGPT, in India's healthcare sector. Our research employs a dual approach, engaging both general users and medical professionals through surveys and interviews respectively. Our findings reveal that healthcare professionals value ChatGPT in medical education and preliminary clinical settings, but exercise ca…
▽ More
This study investigates the integration and impact of Large Language Models (LLMs), like ChatGPT, in India's healthcare sector. Our research employs a dual approach, engaging both general users and medical professionals through surveys and interviews respectively. Our findings reveal that healthcare professionals value ChatGPT in medical education and preliminary clinical settings, but exercise caution due to concerns about reliability, privacy, and the need for cross-verification with medical references. General users show a preference for AI interactions in healthcare, but concerns regarding accuracy and trust persist. The study underscores the need for these technologies to complement, not replace, human medical expertise, highlighting the importance of developing LLMs in collaboration with healthcare providers. This paper enhances the understanding of LLMs in healthcare, detailing current usage, user trust, and improvement areas. Our insights inform future research and development, underscoring the need for ethically compliant, user-focused LLM advancements that address healthcare-specific challenges.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.