-
Matryoshka Quantization
Authors:
Pranav Nair,
Puranjay Datta,
Jeff Dean,
Prateek Jain,
Aditya Kusupati
Abstract:
Quantizing model weights is critical for reducing the communication and inference costs of large models. However, quantizing models -- especially to low precisions like int4 or int2 -- requires a trade-off in model quality; int2, in particular, is known to severely degrade model quality. Consequently, practitioners are often forced to maintain multiple models with different quantization levels or…
▽ More
Quantizing model weights is critical for reducing the communication and inference costs of large models. However, quantizing models -- especially to low precisions like int4 or int2 -- requires a trade-off in model quality; int2, in particular, is known to severely degrade model quality. Consequently, practitioners are often forced to maintain multiple models with different quantization levels or serve a single model that best satisfies the quality-latency trade-off. On the other hand, integer data types, such as int8, inherently possess a nested (Matryoshka) structure where smaller bit-width integers, like int4 or int2, are nested within the most significant bits. Leveraging this insight, in this paper, we propose Matryoshka Quantization (MatQuant), a novel multi-scale quantization technique that alleviates the aforementioned challenge. This technique allows us to train and maintain a single quantized model but serve it with the precision demanded by the deployment. Furthermore, leveraging MatQuant's co-training and co-distillation regularization, int2 precision models extracted by MatQuant outperform standard int2 quantization by up to to 4% and 7% with OmniQuant and QAT as base algorithms respectively. Finally, we demonstrate that by using an extra bit to represent outliers, a model with an effective precision of 2.05-bit gives an additional 6% improvement with OmniQuant as the base algorithm.
△ Less
Submitted 3 March, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
Adapting to Evolving Adversaries with Regularized Continual Robust Training
Authors:
Sihui Dai,
Christian Cianfarani,
Arjun Bhagoji,
Vikash Sehwag,
Prateek Mittal
Abstract:
Robust training methods typically defend against specific attack types, such as Lp attacks with fixed budgets, and rarely account for the fact that defenders may encounter new attacks over time. A natural solution is to adapt the defended model to new adversaries as they arise via fine-tuning, a method which we call continual robust training (CRT). However, when implemented naively, fine-tuning on…
▽ More
Robust training methods typically defend against specific attack types, such as Lp attacks with fixed budgets, and rarely account for the fact that defenders may encounter new attacks over time. A natural solution is to adapt the defended model to new adversaries as they arise via fine-tuning, a method which we call continual robust training (CRT). However, when implemented naively, fine-tuning on new attacks degrades robustness on previous attacks. This raises the question: how can we improve the initial training and fine-tuning of the model to simultaneously achieve robustness against previous and new attacks? We present theoretical results which show that the gap in a model's robustness against different attacks is bounded by how far each attack perturbs a sample in the model's logit space, suggesting that regularizing with respect to this logit space distance can help maintain robustness against previous attacks. Extensive experiments on 3 datasets (CIFAR-10, CIFAR-100, and ImageNette) and over 100 attack combinations demonstrate that the proposed regularization improves robust accuracy with little overhead in training time. Our findings and open-source code lay the groundwork for the deployment of models robust to evolving attacks.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Model Provenance Testing for Large Language Models
Authors:
Ivica Nikolic,
Teodora Baluta,
Prateek Saxena
Abstract:
Large language models are increasingly customized through fine-tuning and other adaptations, creating challenges in enforcing licensing terms and managing downstream impacts. Tracking model origins is crucial both for protecting intellectual property and for identifying derived models when biases or vulnerabilities are discovered in foundation models. We address this challenge by developing a fram…
▽ More
Large language models are increasingly customized through fine-tuning and other adaptations, creating challenges in enforcing licensing terms and managing downstream impacts. Tracking model origins is crucial both for protecting intellectual property and for identifying derived models when biases or vulnerabilities are discovered in foundation models. We address this challenge by developing a framework for testing model provenance: Whether one model is derived from another. Our approach is based on the key observation that real-world model derivations preserve significant similarities in model outputs that can be detected through statistical analysis. Using only black-box access to models, we employ multiple hypothesis testing to compare model similarities against a baseline established by unrelated models. On two comprehensive real-world benchmarks spanning models from 30M to 4B parameters and comprising over 600 models, our tester achieves 90-95% precision and 80-90% recall in identifying derived models. These results demonstrate the viability of systematic provenance verification in production environments even when only API access is available.
△ Less
Submitted 2 February, 2025;
originally announced February 2025.
-
Masked Generative Nested Transformers with Decode Time Scaling
Authors:
Sahil Goyal,
Debapriya Tula,
Gagan Jain,
Pradeep Shenoy,
Prateek Jain,
Sujoy Paul
Abstract:
Recent advances in visual generation have made significant strides in producing content of exceptional quality. However, most methods suffer from a fundamental problem - a bottleneck of inference computational efficiency. Most of these algorithms involve multiple passes over a transformer model to generate tokens or denoise inputs. However, the model size is kept consistent throughout all iteratio…
▽ More
Recent advances in visual generation have made significant strides in producing content of exceptional quality. However, most methods suffer from a fundamental problem - a bottleneck of inference computational efficiency. Most of these algorithms involve multiple passes over a transformer model to generate tokens or denoise inputs. However, the model size is kept consistent throughout all iterations, which makes it computationally expensive. In this work, we aim to address this issue primarily through two key ideas - (a) not all parts of the generation process need equal compute, and we design a decode time model scaling schedule to utilize compute effectively, and (b) we can cache and reuse some of the computation. Combining these two ideas leads to using smaller models to process more tokens while large models process fewer tokens. These different-sized models do not increase the parameter size, as they share parameters. We rigorously experiment with ImageNet256$\times$256 , UCF101, and Kinetics600 to showcase the efficacy of the proposed method for image/video generation and frame prediction. Our experiments show that with almost $3\times$ less compute than baseline, our model obtains competitive performance.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
Continuous Algebra: Algebraic Semantics for Continuous Propositional Logic
Authors:
Purbita Jana,
Prateek
Abstract:
We have introduced continuous algebra as the algebraic semantics for Continuous Propositional Logic (CPL). A Continuous algebra is an MV-algebra together with an unary operator $κ$, analogous to the unary connective $\dfrac{1}{2}$ in CPL. We establish structural results, including the subdirect representation theorem. We also introduce $\ell u^*$-groups, which are lattice ordered groups with stron…
▽ More
We have introduced continuous algebra as the algebraic semantics for Continuous Propositional Logic (CPL). A Continuous algebra is an MV-algebra together with an unary operator $κ$, analogous to the unary connective $\dfrac{1}{2}$ in CPL. We establish structural results, including the subdirect representation theorem. We also introduce $\ell u^*$-groups, which are lattice ordered groups with strong unit $u$, denoted by $\ell u$-groups, with a partial operator $^*$ that mimics the behavior of $κ$ over the interval $[id,u]$. This addition enables a natural correspondence between $\ell u^*$-groups and the continuous algebras, allowing us to prove the Chang's completeness theorem for the continuous algebras.
△ Less
Submitted 29 May, 2025; v1 submitted 18 January, 2025;
originally announced January 2025.
-
Monte Carlo Simulations of Infection Spread in Indoor Environment
Authors:
Rahul Sheshanarayana,
Prateek K. Jha
Abstract:
The dynamics of infection spread in populations has received popular attention since the outbreak of Covid-19 and many statistical models have been developed. One of the interesting areas of research is short-time dynamics in confined, indoor environments. We have modeled this using a simple Monte Carlo scheme. Our model is generally applicable for the peer-to-peer transmission case, when the infe…
▽ More
The dynamics of infection spread in populations has received popular attention since the outbreak of Covid-19 and many statistical models have been developed. One of the interesting areas of research is short-time dynamics in confined, indoor environments. We have modeled this using a simple Monte Carlo scheme. Our model is generally applicable for the peer-to-peer transmission case, when the infection spread occurs only between an infected subject and a healthy subject with a certain probability, i.e., airborne and surface transmission is neglected. The probability of infection spread is incorporated using a simple exponential decay with distance between the subjects. Simulations are performed for the cases of (1) constant subject population and (2) variable subject population due to inflow/outflow. We specifically focus on the large fluctuations in the dynamics due to finite number of subjects. Results of our study may be useful to determine social-distancing guidelines in indoor contexts.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery
Authors:
Shristi Das Biswas,
Matthew Shreve,
Xuelu Li,
Prateek Singhal,
Kaushik Roy
Abstract:
Recent advancements in language-guided diffusion models for image editing are often bottle-necked by cumbersome prompt engineering to precisely articulate desired changes. An intuitive alternative calls on guidance from in-the-wild image exemplars to help users bring their imagined edits to life. Contemporary exemplar-based editing methods shy away from leveraging the rich latent space learnt by p…
▽ More
Recent advancements in language-guided diffusion models for image editing are often bottle-necked by cumbersome prompt engineering to precisely articulate desired changes. An intuitive alternative calls on guidance from in-the-wild image exemplars to help users bring their imagined edits to life. Contemporary exemplar-based editing methods shy away from leveraging the rich latent space learnt by pre-existing large text-to-image (TTI) models and fall back on training with curated objective functions to achieve the task. Though somewhat effective, this demands significant computational resources and lacks compatibility with diverse base models and arbitrary exemplar count. On further investigation, we also find that these techniques restrict user control to only applying uniform global changes over the entire edited region. In this paper, we introduce a novel framework for progressive exemplar-driven editing with off-the-shelf diffusion models, dubbed PIXELS, to enable customization by providing granular control over edits, allowing adjustments at the pixel or region level. Our method operates solely during inference to facilitate imitative editing, enabling users to draw inspiration from a dynamic number of reference images, or multimodal prompts, and progressively incorporate all the desired changes without retraining or fine-tuning existing TTI models. This capability of fine-grained control opens up a range of new possibilities, including selective modification of individual objects and specifying gradual spatial changes. We demonstrate that PIXELS delivers high-quality edits efficiently, leading to a notable improvement in quantitative metrics as well as human evaluation. By making high-quality image editing more accessible, PIXELS has the potential to enable professional-grade edits to a wider audience with the ease of using any open-source image generation model.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark
Authors:
Alexis Roger,
Prateek Humane,
Daniel Z. Kaplan,
Kshitij Gupta,
Qi Sun,
George Adamopoulos,
Jonathan Siu Chi Lim,
Quentin Anthony,
Edwin Fennell,
Irina Rish
Abstract:
The proliferation of Vision-Language Models (VLMs) in the past several years calls for rigorous and comprehensive evaluation methods and benchmarks. This work analyzes existing VLM evaluation techniques, including automated metrics, AI-based assessments, and human evaluations across diverse tasks. We first introduce Robin - a novel suite of VLMs that we built by combining Large Language Models (LL…
▽ More
The proliferation of Vision-Language Models (VLMs) in the past several years calls for rigorous and comprehensive evaluation methods and benchmarks. This work analyzes existing VLM evaluation techniques, including automated metrics, AI-based assessments, and human evaluations across diverse tasks. We first introduce Robin - a novel suite of VLMs that we built by combining Large Language Models (LLMs) and Vision Encoders (VEs) at multiple scales, and use Robin to identify shortcomings of current evaluation approaches across scales. Next, to overcome the identified limitations, we introduce CHIRP - a new long form response benchmark we developed for more robust and complete VLM evaluation. We provide open access to the Robin training code, model suite, and CHIRP benchmark to promote reproducibility and advance VLM research.
△ Less
Submitted 20 January, 2025; v1 submitted 16 January, 2025;
originally announced January 2025.
-
Effects of pressure gradient histories on skin friction and mean flow of high Reynolds number turbulent boundary layers over smooth and rough walls
Authors:
Thomas Preskett,
Marco Virgilio,
Prateek Jaiswal,
Bharathram Ganapathisubramani
Abstract:
Experiments are conducted over smooth and rough walls to explore the influence of pressure gradient histories on skin friction and mean flow of turbulent boundary layers. Different pressure gradient histories are imposed on the boundary layer through an aerofoil mounted in the freestream. Hot-wire measurements are taken at different freestream velocities downstream of the aerofoil where the flow h…
▽ More
Experiments are conducted over smooth and rough walls to explore the influence of pressure gradient histories on skin friction and mean flow of turbulent boundary layers. Different pressure gradient histories are imposed on the boundary layer through an aerofoil mounted in the freestream. Hot-wire measurements are taken at different freestream velocities downstream of the aerofoil where the flow has locally recovered to zero pressure gradient but retains the history effects. Direct skin friction measurements are also made using oil film interferometry for smooth walls and a floating element drag balance for rough walls. The friction Reynolds number, $Re_τ$, varies between $3000$ and $27000$, depending both on the surface conditions and the freestream velocity ensuring sufficient scale separation. Results align with previous findings, showing that adverse pressure gradients just upstream of the measurement location increase wake strength and reduce the local skin friction while favourable pressure gradients suppress the wake and increase skin friction. The roughness length scale, $y_0$, remains constant across different pressure gradient histories for rough wall boundary layers. Inspired by previous works, a new correlation is proposed to infer skin friction based on the mean flow. The difference in skin friction between an arbitrary pressure gradient history and zero pressure gradient condition can be predicted using only the local wake strength parameter ($Π$), and the variations in wake strength for different histories are related to a weighted integral of the pressure gradient history normalised by local quantities. This allows us to develop a general correlation that can be used to infer skin friction for turbulent boundary layers experiencing arbitrary pressure-gradient histories.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
Computational Astrophysics, Data Science & AI/ML in Astronomy: A Perspective from Indian Community
Authors:
Prateek Sharma,
Bhargav Vaidya,
Yogesh Wadadekar,
Jasjeet Bagla,
Piyali Chatterjee,
Shravan Hanasoge,
Prayush Kumar,
Dipanjan Mukherjee,
Ninan Sajeeth Philip,
Nishant Singh
Abstract:
In contemporary astronomy and astrophysics (A&A), the integration of high-performance computing (HPC), big data analytics, and artificial intelligence/machine learning (AI/ML) has become essential for advancing research across a wide range of scientific domains. These tools are playing an increasingly pivotal role in accelerating discoveries, simulating complex astrophysical phenomena, and analyzi…
▽ More
In contemporary astronomy and astrophysics (A&A), the integration of high-performance computing (HPC), big data analytics, and artificial intelligence/machine learning (AI/ML) has become essential for advancing research across a wide range of scientific domains. These tools are playing an increasingly pivotal role in accelerating discoveries, simulating complex astrophysical phenomena, and analyzing vast amounts of observational data. For India to maintain and enhance its competitive edge in the global landscape of computational astrophysics and data science, it is crucial for the Indian A&A community to fully embrace these transformative technologies. Despite limited resources, the expanding Indian community has already made significant scientific contributions. However, to remain globally competitive in the coming years, it is vital to establish a robust national framework that provides researchers with reliable access to state-of-the-art computational resources. This system should involve the regular solicitation of computational proposals, which can be assessed by domain experts and HPC specialists, ensuring that high-impact research receives the necessary support. By building such a system, India can cultivate the talent, infrastructure, and collaborative environment necessary to foster world-class research in computational astrophysics and data science.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Revisiting In-Context Learning with Long Context Language Models
Authors:
Jinheon Baek,
Sun Jae Lee,
Prakhar Gupta,
Geunseob Oh,
Siddharth Dalmia,
Prateek Kolhar
Abstract:
In-Context Learning (ICL) is a technique by which language models make predictions based on examples provided in their input context. Previously, their context window size imposed a limit on the number of examples that can be shown, making example selection techniques crucial for identifying the maximally effective set of examples. However, the recent advent of Long Context Language Models (LCLMs)…
▽ More
In-Context Learning (ICL) is a technique by which language models make predictions based on examples provided in their input context. Previously, their context window size imposed a limit on the number of examples that can be shown, making example selection techniques crucial for identifying the maximally effective set of examples. However, the recent advent of Long Context Language Models (LCLMs) has significantly increased the number of examples that can be included in context, raising an important question of whether ICL performance in a many-shot regime is still sensitive to the method of sample selection. To answer this, we revisit these approaches in the context of LCLMs through extensive experiments on 18 datasets spanning 4 tasks. Surprisingly, we observe that sophisticated example selection techniques do not yield significant improvements over a simple random sample selection method. Instead, we discover that the advent of LCLMs has fundamentally shifted the challenge of ICL from that of selecting the most effective examples to that of collecting sufficient examples to fill the context window. Specifically, in certain datasets, including all available examples does not fully utilize the context window; however, by augmenting the examples in context with a simple data augmentation approach, we substantially improve ICL performance by 5%.
△ Less
Submitted 28 May, 2025; v1 submitted 22 December, 2024;
originally announced December 2024.
-
LearnLM: Improving Gemini for Learning
Authors:
LearnLM Team,
Abhinit Modi,
Aditya Srikanth Veerubhotla,
Aliya Rysbek,
Andrea Huber,
Brett Wiltshire,
Brian Veprek,
Daniel Gillick,
Daniel Kasenberg,
Derek Ahmed,
Irina Jurenka,
James Cohan,
Jennifer She,
Julia Wilkowski,
Kaiz Alarakyia,
Kevin R. McKee,
Lisa Wang,
Markus Kunesch,
Mike Schaekermann,
Miruna Pîslar,
Nikhil Joshi,
Parsa Mahmoudieh,
Paul Jhun,
Sara Wiltberger,
Shakir Mohamed
, et al. (21 additional authors not shown)
Abstract:
Today's generative AI systems are tuned to present information by default rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level ins…
▽ More
Today's generative AI systems are tuned to present information by default rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level instructions describing the specific pedagogy attributes present or desired in subsequent model turns. This framing avoids committing our models to any particular definition of pedagogy, and instead allows teachers or developers to specify desired model behavior. It also clears a path to improving Gemini models for learning -- by enabling the addition of our pedagogical data to post-training mixtures -- alongside their rapidly expanding set of capabilities. Both represent important changes from our initial tech report. We show how training with pedagogical instruction following produces a LearnLM model (available on Google AI Studio) that is preferred substantially by expert raters across a diverse set of learning scenarios, with average preference strengths of 31\% over GPT-4o, 11\% over Claude 3.5, and 13\% over the Gemini 1.5 Pro model LearnLM was based on.
△ Less
Submitted 25 December, 2024; v1 submitted 20 December, 2024;
originally announced December 2024.
-
Personalized Generative Low-light Image Denoising and Enhancement
Authors:
Xijun Wang,
Prateek Chennuri,
Yu Yuan,
Bole Ma,
Xingguang Zhang,
Stanley Chan
Abstract:
While smartphone cameras today can produce astonishingly good photos, their performance in low light is still not completely satisfactory because of the fundamental limits in photon shot noise and sensor read noise. Generative image restoration methods have demonstrated promising results compared to traditional methods, but they suffer from hallucinatory content generation when the signal-to-noise…
▽ More
While smartphone cameras today can produce astonishingly good photos, their performance in low light is still not completely satisfactory because of the fundamental limits in photon shot noise and sensor read noise. Generative image restoration methods have demonstrated promising results compared to traditional methods, but they suffer from hallucinatory content generation when the signal-to-noise ratio (SNR) is low. Recognizing the availability of personalized photo galleries on users' smartphones, we propose Personalized Generative Denoising (PGD) by building a diffusion model customized for different users. Our core innovation is an identity-consistent physical buffer that extracts the physical attributes of the person from the gallery. This ID-consistent physical buffer provides a strong prior that can be integrated with the diffusion model to restore the degraded images, without the need of fine-tuning. Over a wide range of low-light testing scenarios, we show that PGD achieves superior image denoising and enhancement performance compared to existing diffusion-based denoising approaches.
△ Less
Submitted 10 March, 2025; v1 submitted 18 December, 2024;
originally announced December 2024.
-
Whisper-GPT: A Hybrid Representation Audio Large Language Model
Authors:
Prateek Verma
Abstract:
We propose WHISPER-GPT: A generative large language model (LLM) for speech and music that allows us to work with continuous audio representations and discrete tokens simultaneously as part of a single architecture. There has been a huge surge in generative audio, speech, and music models that utilize discrete audio tokens derived from neural compression algorithms, e.g. ENCODEC. However, one of th…
▽ More
We propose WHISPER-GPT: A generative large language model (LLM) for speech and music that allows us to work with continuous audio representations and discrete tokens simultaneously as part of a single architecture. There has been a huge surge in generative audio, speech, and music models that utilize discrete audio tokens derived from neural compression algorithms, e.g. ENCODEC. However, one of the major drawbacks of this approach is handling the context length. It blows up for high-fidelity generative architecture if one has to account for all the audio contents at various frequencies for the next token prediction. By combining continuous audio representation like the spectrogram and discrete acoustic tokens, we retain the best of both worlds: Have all the information needed from the audio at a specific time instance in a single token, yet allow LLM to predict the future token to allow for sampling and other benefits discrete space provides. We show how our architecture improves the perplexity and negative log-likelihood scores for the next token prediction compared to a token-based LLM for speech and music.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
AI and the Future of Digital Public Squares
Authors:
Beth Goldberg,
Diana Acosta-Navas,
Michiel Bakker,
Ian Beacock,
Matt Botvinick,
Prateek Buch,
Renée DiResta,
Nandika Donthi,
Nathanael Fast,
Ravi Iyer,
Zaria Jalan,
Andrew Konya,
Grace Kwak Danciu,
Hélène Landemore,
Alice Marwick,
Carl Miller,
Aviv Ovadya,
Emily Saltz,
Lisa Schirch,
Dalit Shalom,
Divya Siddarth,
Felix Sieker,
Christopher Small,
Jonathan Stray,
Audrey Tang
, et al. (2 additional authors not shown)
Abstract:
Two substantial technological advances have reshaped the public square in recent decades: first with the advent of the internet and second with the recent introduction of large language models (LLMs). LLMs offer opportunities for a paradigm shift towards more decentralized, participatory online spaces that can be used to facilitate deliberative dialogues at scale, but also create risks of exacerba…
▽ More
Two substantial technological advances have reshaped the public square in recent decades: first with the advent of the internet and second with the recent introduction of large language models (LLMs). LLMs offer opportunities for a paradigm shift towards more decentralized, participatory online spaces that can be used to facilitate deliberative dialogues at scale, but also create risks of exacerbating societal schisms. Here, we explore four applications of LLMs to improve digital public squares: collective dialogue systems, bridging systems, community moderation, and proof-of-humanity systems. Building on the input from over 70 civil society experts and technologists, we argue that LLMs both afford promising opportunities to shift the paradigm for conversations at scale and pose distinct risks for digital public squares. We lay out an agenda for future research and investments in AI that will strengthen digital public squares and safeguard against potential misuses of AI.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
Capturing the Temporal Dependence of Training Data Influence
Authors:
Jiachen T. Wang,
Dawn Song,
James Zou,
Prateek Mittal,
Ruoxi Jia
Abstract:
Traditional data influence estimation methods, like influence function, assume that learning algorithms are permutation-invariant with respect to training data. However, modern training paradigms, especially for foundation models using stochastic algorithms and multi-stage curricula, are sensitive to data ordering, thus violating this assumption. This mismatch renders influence functions inadequat…
▽ More
Traditional data influence estimation methods, like influence function, assume that learning algorithms are permutation-invariant with respect to training data. However, modern training paradigms, especially for foundation models using stochastic algorithms and multi-stage curricula, are sensitive to data ordering, thus violating this assumption. This mismatch renders influence functions inadequate for answering a critical question in machine learning: How can we capture the dependence of data influence on the optimization trajectory during training? To address this gap, we formalize the concept of trajectory-specific leave-one-out (LOO) influence, which quantifies the impact of removing a data point from a specific iteration during training, accounting for the exact sequence of data encountered and the model's optimization trajectory. However, exactly evaluating the trajectory-specific LOO presents a significant computational challenge. To address this, we propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO. Specifically, we compute a training data embedding that encapsulates the cumulative interactions between data and the evolving model parameters. The LOO can then be efficiently approximated through a simple dot-product between the data value embedding and the gradient of the given test data. As data value embedding captures training data ordering, it offers valuable insights into model training dynamics. In particular, we uncover distinct phases of data influence, revealing that data points in the early and late stages of training exert a greater impact on the final model. These insights translate into actionable strategies for managing the computational overhead of data selection by strategically timing the selection process, potentially opening new avenues in data curation research.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
The Causal Effect of the Two-For-One Strategy in the National Basketball Association
Authors:
Prateek Sasan,
Daryl Swartzentruber
Abstract:
This study evaluates the effectiveness of the two-for-one strategy in basketball by applying a causal inference framework to play-by-play data from the 2018-19 and 2021-22 National Basketball Association regular seasons. Incorporating factors such as player lineup, betting odds, and player ratings, we compute the average treatment effect and find that the two-for-one strategy has a positive impact…
▽ More
This study evaluates the effectiveness of the two-for-one strategy in basketball by applying a causal inference framework to play-by-play data from the 2018-19 and 2021-22 National Basketball Association regular seasons. Incorporating factors such as player lineup, betting odds, and player ratings, we compute the average treatment effect and find that the two-for-one strategy has a positive impact on game outcomes, suggesting it can benefit teams when employed effectively. Additionally, we investigate potential heterogeneity in the strategy's effectiveness using the causal forest framework, with tests indicating no significant variation across different contexts. These findings offer valuable insights into the tactical advantages of the two-for-one strategy in professional basketball.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
Verification and Validation of a Vision-Based Landing System for Autonomous VTOL Air Taxis
Authors:
Ayoosh Bansal,
Duo Wang,
Mikael Yeghiazaryan,
Yangge Li,
Chuyuan Tao,
Hyung-Jin Yoon,
Prateek Arora,
Christos Papachristos,
Petros Voulgaris,
Sayan Mitra,
Lui Sha,
Naira Hovakimyan
Abstract:
Autonomous air taxis are poised to revolutionize urban mass transportation, however, ensuring their safety and reliability remains an open challenge. Validating autonomy solutions on air taxis in the real world presents complexities, risks, and costs that further convolute this challenge. Verification and Validation (V&V) frameworks play a crucial role in the design and development of highly relia…
▽ More
Autonomous air taxis are poised to revolutionize urban mass transportation, however, ensuring their safety and reliability remains an open challenge. Validating autonomy solutions on air taxis in the real world presents complexities, risks, and costs that further convolute this challenge. Verification and Validation (V&V) frameworks play a crucial role in the design and development of highly reliable systems by formally verifying safety properties and validating algorithm behavior across diverse operational scenarios. Advancements in high-fidelity simulators have significantly enhanced their capability to emulate real-world conditions, encouraging their use for validating autonomous air taxi solutions, especially during early development stages. This evolution underscores the growing importance of simulation environments, not only as complementary tools to real-world testing but as essential platforms for evaluating algorithms in a controlled, reproducible, and scalable manner.
This work presents a V&V framework for a vision-based landing system for air taxis with vertical take-off and landing (VTOL) capabilities. Specifically, we use Verse, a tool for formal verification, to model and verify the safety of the system by obtaining and analyzing the reachable sets. To conduct this analysis, we utilize a photorealistic simulation environment. The simulation environment, built on Unreal Engine, provides realistic terrain, weather, and sensor characteristics to emulate real-world conditions with high fidelity. To validate the safety analysis results, we conduct extensive scenario-based testing to assess the reachability set and robustness of the landing algorithm in various conditions. This approach showcases the representativeness of high-fidelity simulators, offering an effective means to analyze and refine algorithms before real-world deployment.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
On Evaluating the Durability of Safeguards for Open-Weight LLMs
Authors:
Xiangyu Qi,
Boyi Wei,
Nicholas Carlini,
Yangsibo Huang,
Tinghao Xie,
Luxi He,
Matthew Jagielski,
Milad Nasr,
Prateek Mittal,
Peter Henderson
Abstract:
Stakeholders -- from model developers to policymakers -- seek to minimize the dual-use risks of large language models (LLMs). An open challenge to this goal is whether technical safeguards can impede the misuse of LLMs, even when models are customizable via fine-tuning or when model weights are fully open. In response, several recent studies have proposed methods to produce durable LLM safeguards…
▽ More
Stakeholders -- from model developers to policymakers -- seek to minimize the dual-use risks of large language models (LLMs). An open challenge to this goal is whether technical safeguards can impede the misuse of LLMs, even when models are customizable via fine-tuning or when model weights are fully open. In response, several recent studies have proposed methods to produce durable LLM safeguards for open-weight LLMs that can withstand adversarial modifications of the model's weights via fine-tuning. This holds the promise of raising adversaries' costs even under strong threat models where adversaries can directly fine-tune model weights. However, in this paper, we urge for more careful characterization of the limits of these approaches. Through several case studies, we demonstrate that even evaluating these defenses is exceedingly difficult and can easily mislead audiences into thinking that safeguards are more durable than they really are. We draw lessons from the evaluation pitfalls that we identify and suggest future research carefully cabin claims to more constrained, well-defined, and rigorously examined threat models, which can provide more useful and candid assessments to stakeholders.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model
Authors:
Meilong Xu,
Saumya Gupta,
Xiaoling Hu,
Chen Li,
Shahira Abousamra,
Dimitris Samaras,
Prateek Prasanna,
Chao Chen
Abstract:
Accurately modeling multi-class cell topology is crucial in digital pathology, as it provides critical insights into tissue structure and pathology. The synthetic generation of cell topology enables realistic simulations of complex tissue environments, enhances downstream tasks by augmenting training data, aligns more closely with pathologists' domain knowledge, and offers new opportunities for co…
▽ More
Accurately modeling multi-class cell topology is crucial in digital pathology, as it provides critical insights into tissue structure and pathology. The synthetic generation of cell topology enables realistic simulations of complex tissue environments, enhances downstream tasks by augmenting training data, aligns more closely with pathologists' domain knowledge, and offers new opportunities for controlling and generalizing the tumor microenvironment. In this paper, we propose a novel approach that integrates topological constraints into a diffusion model to improve the generation of realistic, contextually accurate cell topologies. Our method refines the simulation of cell distributions and interactions, increasing the precision and interpretability of results in downstream tasks such as cell detection and classification. To assess the topological fidelity of generated layouts, we introduce a new metric, Topological Frechet Distance (TopoFD), which overcomes the limitations of traditional metrics like FID in evaluating topological structure. Experimental results demonstrate the effectiveness of our approach in generating multi-class cell layouts that capture intricate topological relationships. Code is available at https://github.com/Melon-Xu/TopoCellGen.
△ Less
Submitted 24 March, 2025; v1 submitted 8 December, 2024;
originally announced December 2024.
-
Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?
Authors:
Sravanti Addepalli,
Yerram Varun,
Arun Suggala,
Karthikeyan Shanmugam,
Prateek Jain
Abstract:
Large Language Models (LLMs) are known to be susceptible to crafted adversarial attacks or jailbreaks that lead to the generation of objectionable content despite being aligned to human preferences using safety fine-tuning methods. While the large dimensionality of input token space makes it inevitable to find adversarial prompts that can jailbreak these models, we aim to evaluate whether safety f…
▽ More
Large Language Models (LLMs) are known to be susceptible to crafted adversarial attacks or jailbreaks that lead to the generation of objectionable content despite being aligned to human preferences using safety fine-tuning methods. While the large dimensionality of input token space makes it inevitable to find adversarial prompts that can jailbreak these models, we aim to evaluate whether safety fine-tuned LLMs are safe against natural prompts which are semantically related to toxic seed prompts that elicit safe responses after alignment. We surprisingly find that popular aligned LLMs such as GPT-4 can be compromised using naive prompts that are NOT even crafted with an objective of jailbreaking the model. Furthermore, we empirically show that given a seed prompt that elicits a toxic response from an unaligned model, one can systematically generate several semantically related natural prompts that can jailbreak aligned LLMs. Towards this, we propose a method of Response Guided Question Augmentation (ReG-QA) to evaluate the generalization of safety aligned LLMs to natural prompts, that first generates several toxic answers given a seed question using an unaligned LLM (Q to A), and further leverages an LLM to generate questions that are likely to produce these answers (A to Q). We interestingly find that safety fine-tuned LLMs such as GPT-4o are vulnerable to producing natural jailbreak questions from unsafe content (without denial) and can thus be used for the latter (A to Q) step. We obtain attack success rates that are comparable to/ better than leading adversarial attack methods on the JailbreakBench leaderboard, while being significantly more stable against defenses such as Smooth-LLM and Synonym Substitution, which are effective against existing all attacks on the leaderboard.
△ Less
Submitted 25 March, 2025; v1 submitted 4 December, 2024;
originally announced December 2024.
-
Time-Reversal Provides Unsupervised Feedback to LLMs
Authors:
Yerram Varun,
Rahul Madhavan,
Sravanti Addepalli,
Arun Suggala,
Karthikeyan Shanmugam,
Prateek Jain
Abstract:
Large Language Models (LLMs) are typically trained to predict in the forward direction of time. However, recent works have shown that prompting these models to look back and critique their own generations can produce useful feedback. Motivated by this, we explore the question of whether LLMs can be empowered to think (predict and score) backwards to provide unsupervised feedback that complements f…
▽ More
Large Language Models (LLMs) are typically trained to predict in the forward direction of time. However, recent works have shown that prompting these models to look back and critique their own generations can produce useful feedback. Motivated by this, we explore the question of whether LLMs can be empowered to think (predict and score) backwards to provide unsupervised feedback that complements forward LLMs. Towards this, we introduce Time Reversed Language Models (TRLMs), which can score and generate queries when conditioned on responses, effectively functioning in the reverse direction of time. Further, to effectively infer in the response to query direction, we pre-train and fine-tune a language model (TRLM-Ba) in the reverse token order from scratch. We show empirically (and theoretically in a stylized setting) that time-reversed models can indeed complement forward model predictions when used to score the query given response for re-ranking multiple forward generations. We obtain up to 5\% improvement on the widely used AlpacaEval Leaderboard over the competent baseline of best-of-N re-ranking using self log-perplexity scores. We further show that TRLM scoring outperforms conventional forward scoring of response given query, resulting in significant gains in applications such as citation generation and passage retrieval. We next leverage the generative ability of TRLM to augment or provide unsupervised feedback to input safety filters of LLMs, demonstrating a drastic reduction in false negative rate with negligible impact on false positive rates against several attacks published on the popular JailbreakBench leaderboard.
△ Less
Submitted 2 February, 2025; v1 submitted 3 December, 2024;
originally announced December 2024.
-
Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
Authors:
Yu Yuan,
Xijun Wang,
Yichen Sheng,
Prateek Chennuri,
Xingguang Zhang,
Stanley Chan
Abstract:
Image generation today can produce somewhat realistic images from text prompts. However, if one asks the generator to synthesize a specific camera setting such as creating different fields of view using a 24mm lens versus a 70mm lens, the generator will not be able to interpret and generate scene-consistent images. This limitation not only hinders the adoption of generative tools in professional p…
▽ More
Image generation today can produce somewhat realistic images from text prompts. However, if one asks the generator to synthesize a specific camera setting such as creating different fields of view using a 24mm lens versus a 70mm lens, the generator will not be able to interpret and generate scene-consistent images. This limitation not only hinders the adoption of generative tools in professional photography but also highlights the broader challenge of aligning data-driven models with real-world physical settings. In this paper, we introduce Generative Photography, a framework that allows controlling camera intrinsic settings during content generation. The core innovation of this work are the concepts of Dimensionality Lifting and Differential Camera Intrinsics Learning, enabling smooth and consistent transitions across different camera settings. Experimental results show that our method produces significantly more scene-consistent photorealistic images than state-of-the-art models such as Stable Diffusion 3 and FLUX. Our code and additional results are available at https://generative-photography.github.io/project.
△ Less
Submitted 24 March, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning
Authors:
Varun Belagali,
Srikar Yellapragada,
Alexandros Graikos,
Saarthak Kapse,
Zilinghan Li,
Tarak Nath Nandi,
Ravi K Madduri,
Prateek Prasanna,
Joel Saltz,
Dimitris Samaras
Abstract:
Self-supervised learning (SSL) methods have emerged as strong visual representation learners by training an image encoder to maximize similarity between features of different views of the same image. To perform this view-invariance task, current SSL algorithms rely on hand-crafted augmentations such as random cropping and color jittering to create multiple views of an image. Recently, generative d…
▽ More
Self-supervised learning (SSL) methods have emerged as strong visual representation learners by training an image encoder to maximize similarity between features of different views of the same image. To perform this view-invariance task, current SSL algorithms rely on hand-crafted augmentations such as random cropping and color jittering to create multiple views of an image. Recently, generative diffusion models have been shown to improve SSL by providing a wider range of data augmentations. However, these diffusion models require pre-training on large-scale image-text datasets, which might not be available for many specialized domains like histopathology. In this work, we introduce Gen-SIS, a diffusion-based augmentation technique trained exclusively on unlabeled image data, eliminating any reliance on external sources of supervision such as text captions. We first train an initial SSL encoder on a dataset using only hand-crafted augmentations. We then train a diffusion model conditioned on embeddings from that SSL encoder. Following training, given an embedding of the source image, this diffusion model can synthesize its diverse views. We show that these `self-augmentations', i.e. generative augmentations based on the vanilla SSL encoder embeddings, facilitate the training of a stronger SSL encoder. Furthermore, based on the ability to interpolate between images in the encoder latent space, we introduce the novel pretext task of disentangling the two source images of an interpolated synthetic image. We validate Gen-SIS's effectiveness by demonstrating performance improvements across various downstream tasks in both natural images, which are generally object-centric, as well as digital histopathology images, which are typically context-based.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Misty, patchy, and turbulent: constraining the cold circumgalactic medium with mCC
Authors:
Mukesh Singh Bisht,
Prateek Sharma,
Alankar Dutta,
Biman B. Nath
Abstract:
The circumgalactic medium (CGM) is the largest baryon reservoir around galaxies, but its extent, mass, and temperature distribution remain uncertain. We propose that cold gas in the CGM resides primarily in $\sim 100 \hbox{--} 10^4$ cloud complexes (CCs), each containing a mist of tiny cold cloudlets dispersed in a warm/hot medium ($\sim 10^5 \hbox{--} 10^6$~K). Modeling CCs as uniform and misty s…
▽ More
The circumgalactic medium (CGM) is the largest baryon reservoir around galaxies, but its extent, mass, and temperature distribution remain uncertain. We propose that cold gas in the CGM resides primarily in $\sim 100 \hbox{--} 10^4$ cloud complexes (CCs), each containing a mist of tiny cold cloudlets dispersed in a warm/hot medium ($\sim 10^5 \hbox{--} 10^6$~K). Modeling CCs as uniform and misty simplifies the calculation of observables like ion absorption columns compared to resolving tiny individual cloudlets. Using Monte Carlo simulations, we explore how CC properties affect the observed spread in column densities. A power-law distribution of CCs ($dN_{\rm CC}/dR \propto R^{-1.2}$) reproduces MgII equivalent width (EW) and column density distribution trends with impact parameter ($R_\perp$). We show that the area-averaged MgII column density, combined with the covering fraction of CCs, provides a robust proxy for estimating the cold CGM mass, independent of other model parameters. Modeling individual CCs demonstrates that turbulent broadening blends cloudlet absorption lines, allowing CCs to approximate the observational effects of their constituent cloudlets analytically. Direct simulations of cloudlets within multiple CCs confirm the computational challenges of fully resolving the mist-like structure. Comparing modeled MgII absorption with observations, we estimate the cold CGM mass of Milky Way-like galaxies to be $\sim 10^{10} \, M_\odot$, about $10\%$ of the total CGM mass. This work provides a practical framework for connecting CGM models with observations, shedding light on the cold gas distribution in galaxy halos and its role in the galactic baryon cycle.
△ Less
Submitted 28 November, 2024; v1 submitted 26 November, 2024;
originally announced November 2024.
-
ZoomLDM: Latent Diffusion Model for multi-scale image generation
Authors:
Srikar Yellapragada,
Alexandros Graikos,
Kostas Triaridis,
Prateek Prasanna,
Rajarsi R. Gupta,
Joel Saltz,
Dimitris Samaras
Abstract:
Diffusion models have revolutionized image generation, yet several challenges restrict their application to large-image domains, such as digital pathology and satellite imagery. Given that it is infeasible to directly train a model on 'whole' images from domains with potential gigapixel sizes, diffusion-based generative methods have focused on synthesizing small, fixed-size patches extracted from…
▽ More
Diffusion models have revolutionized image generation, yet several challenges restrict their application to large-image domains, such as digital pathology and satellite imagery. Given that it is infeasible to directly train a model on 'whole' images from domains with potential gigapixel sizes, diffusion-based generative methods have focused on synthesizing small, fixed-size patches extracted from these images. However, generating small patches has limited applicability since patch-based models fail to capture the global structures and wider context of large images, which can be crucial for synthesizing (semantically) accurate samples. To overcome this limitation, we present ZoomLDM, a diffusion model tailored for generating images across multiple scales. Central to our approach is a novel magnification-aware conditioning mechanism that utilizes self-supervised learning (SSL) embeddings and allows the diffusion model to synthesize images at different 'zoom' levels, i.e., fixed-size patches extracted from large images at varying scales. ZoomLDM synthesizes coherent histopathology images that remain contextually accurate and detailed at different zoom levels, achieving state-of-the-art image generation quality across all scales and excelling in the data-scarce setting of generating thumbnails of entire large images. The multi-scale nature of ZoomLDM unlocks additional capabilities in large image generation, enabling computationally tractable and globally coherent image synthesis up to $4096 \times 4096$ pixels and $4\times$ super-resolution. Additionally, multi-scale features extracted from ZoomLDM are highly effective in multiple instance learning experiments.
△ Less
Submitted 24 March, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency
Authors:
Wentao Huang,
Meilong Xu,
Xiaoling Hu,
Shahira Abousamra,
Aniruddha Ganguly,
Saarthak Kapse,
Alisa Yurovsky,
Prateek Prasanna,
Tahsin Kurc,
Joel Saltz,
Michael L. Miller,
Chao Chen
Abstract:
Spatial transcriptomics (ST) provides essential spatial context by mapping gene expression within tissue, enabling detailed study of cellular heterogeneity and tissue organization. However, aligning ST data with histology images poses challenges due to inherent spatial distortions and modality-specific variations. Existing methods largely rely on direct alignment, which often fails to capture comp…
▽ More
Spatial transcriptomics (ST) provides essential spatial context by mapping gene expression within tissue, enabling detailed study of cellular heterogeneity and tissue organization. However, aligning ST data with histology images poses challenges due to inherent spatial distortions and modality-specific variations. Existing methods largely rely on direct alignment, which often fails to capture complex cross-modal relationships. To address these limitations, we propose a novel framework that aligns gene and image features using a ranking-based alignment loss, preserving relative similarity across modalities and enabling robust multi-scale alignment. To further enhance the alignment's stability, we employ self-supervised knowledge distillation with a teacher-student network architecture, effectively mitigating disruptions from high dimensionality, sparsity, and noise in gene expression data. Extensive experiments on seven public datasets that encompass gene expression prediction, slide-level classification, and survival analysis demonstrate the efficacy of our method, showing improved alignment and predictive performance over existing methods.
△ Less
Submitted 22 March, 2025; v1 submitted 22 November, 2024;
originally announced November 2024.
-
Translating C To Rust: Lessons from a User Study
Authors:
Ruishi Li,
Bo Wang,
Tianyu Li,
Prateek Saxena,
Ashish Kundu
Abstract:
Rust aims to offer full memory safety for programs, a guarantee that untamed C programs do not enjoy. How difficult is it to translate existing C code to Rust? To get a complementary view from that of automatic C to Rust translators, we report on a user study asking humans to translate real-world C programs to Rust. Our participants are able to produce safe Rust translations, whereas state-of-the-…
▽ More
Rust aims to offer full memory safety for programs, a guarantee that untamed C programs do not enjoy. How difficult is it to translate existing C code to Rust? To get a complementary view from that of automatic C to Rust translators, we report on a user study asking humans to translate real-world C programs to Rust. Our participants are able to produce safe Rust translations, whereas state-of-the-art automatic tools are not able to do so. Our analysis highlights that the high-level strategy taken by users departs significantly from those of automatic tools we study. We also find that users often choose zero-cost (static) abstractions for temporal safety, which addresses a predominant component of runtime costs in other full memory safety defenses. User-provided translations showcase a rich landscape of specialized strategies to translate the same C program in different ways to safe Rust, which future automatic translators can consider.
△ Less
Submitted 5 December, 2024; v1 submitted 21 November, 2024;
originally announced November 2024.
-
SoK: A Systems Perspective on Compound AI Threats and Countermeasures
Authors:
Sarbartha Banerjee,
Prateek Sahu,
Mulong Luo,
Anjo Vahldiek-Oberwagner,
Neeraja J. Yadwadkar,
Mohit Tiwari
Abstract:
Large language models (LLMs) used across enterprises often use proprietary models and operate on sensitive inputs and data. The wide range of attack vectors identified in prior research - targeting various software and hardware components used in training and inference - makes it extremely challenging to enforce confidentiality and integrity policies.
As we advance towards constructing compound…
▽ More
Large language models (LLMs) used across enterprises often use proprietary models and operate on sensitive inputs and data. The wide range of attack vectors identified in prior research - targeting various software and hardware components used in training and inference - makes it extremely challenging to enforce confidentiality and integrity policies.
As we advance towards constructing compound AI inference pipelines that integrate multiple large language models (LLMs), the attack surfaces expand significantly. Attackers now focus on the AI algorithms as well as the software and hardware components associated with these systems. While current research often examines these elements in isolation, we find that combining cross-layer attack observations can enable powerful end-to-end attacks with minimal assumptions about the threat model. Given, the sheer number of existing attacks at each layer, we need a holistic and systemized understanding of different attack vectors at each layer.
This SoK discusses different software and hardware attacks applicable to compound AI systems and demonstrates how combining multiple attack mechanisms can reduce the threat model assumptions required for an isolated attack. Next, we systematize the ML attacks in lines with the Mitre Att&ck framework to better position each attack based on the threat model. Finally, we outline the existing countermeasures for both software and hardware layers and discuss the necessity of a comprehensive defense strategy to enable the secure and high-performance deployment of compound AI systems.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
OASIS: Open Agent Social Interaction Simulations with One Million Agents
Authors:
Ziyi Yang,
Zaibin Zhang,
Zirui Zheng,
Yuxian Jiang,
Ziyue Gan,
Zhiyu Wang,
Zijian Ling,
Jinsong Chen,
Martz Ma,
Bowen Dong,
Prateek Gupta,
Shuyue Hu,
Zhenfei Yin,
Guohao Li,
Xu Jia,
Lijun Wang,
Bernard Ghanem,
Huchuan Lu,
Chaochao Lu,
Wanli Ouyang,
Yu Qiao,
Philip Torr,
Jing Shao
Abstract:
There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i.e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems. As a result, several LLM-based ABMs have been proposed in the past year. While they hold promise, each simulator is specifically designed to study a parti…
▽ More
There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i.e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems. As a result, several LLM-based ABMs have been proposed in the past year. While they hold promise, each simulator is specifically designed to study a particular scenario, making it time-consuming and resource-intensive to explore other phenomena using the same ABM. Additionally, these models simulate only a limited number of agents, whereas real-world social media platforms involve millions of users. To this end, we propose OASIS, a generalizable and scalable social media simulator. OASIS is designed based on real-world social media platforms, incorporating dynamically updated environments (i.e., dynamic social networks and post information), diverse action spaces (i.e., following, commenting), and recommendation systems (i.e., interest-based and hot-score-based). Additionally, OASIS supports large-scale user simulations, capable of modeling up to one million users. With these features, OASIS can be easily extended to different social media platforms to study large-scale group phenomena and behaviors. We replicate various social phenomena, including information spreading, group polarization, and herd effects across X and Reddit platforms. Moreover, we provide observations of social phenomena at different agent group scales. We observe that the larger agent group scale leads to more enhanced group dynamics and more diverse and helpful agents' opinions. These findings demonstrate OASIS's potential as a powerful tool for studying complex systems in digital environments.
△ Less
Submitted 23 March, 2025; v1 submitted 18 November, 2024;
originally announced November 2024.
-
The Jevons Paradox In Cloud Computing: A Thermodynamics Perspective
Authors:
Prateek Sharma
Abstract:
How do we explain the simultaneous growth in energy efficiency of cloud computing and its energy consumption? The Jevons paradox provides one perspective of this phenomenon. However, it is not clear or obvious \emph{why} the Jevons paradox exists, and \emph{when} is it applicable. To answer these questions, we seek inspiration from thermodynamics, and model the cloud as a thermodynamic system. We…
▽ More
How do we explain the simultaneous growth in energy efficiency of cloud computing and its energy consumption? The Jevons paradox provides one perspective of this phenomenon. However, it is not clear or obvious \emph{why} the Jevons paradox exists, and \emph{when} is it applicable. To answer these questions, we seek inspiration from thermodynamics, and model the cloud as a thermodynamic system. We find that system growth, due to the revenue generation of cloud platforms, is a key driver behind energy consumption. This thermodynamic model provides energy consumption insights into modern hyperscale clouds, and we validate it using data from Meta and Google. Our investigation points to the necessity of future work in new and meaningful efficiency metrics, implications for future applications and edge clouds, and the need for studying system-wide energy and sustainability.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
CLUE-MARK: Watermarking Diffusion Models using CLWE
Authors:
Kareem Shehata,
Aashish Kolluri,
Prateek Saxena
Abstract:
As AI-generated images become widespread, reliable watermarking is essential for content verification, copyright enforcement, and combating disinformation. Existing techniques rely on heuristic approaches and lack formal guarantees of undetectability, making them vulnerable to steganographic attacks that can expose or erase the watermark. Additionally, these techniques often degrade output quality…
▽ More
As AI-generated images become widespread, reliable watermarking is essential for content verification, copyright enforcement, and combating disinformation. Existing techniques rely on heuristic approaches and lack formal guarantees of undetectability, making them vulnerable to steganographic attacks that can expose or erase the watermark. Additionally, these techniques often degrade output quality by introducing perceptible changes, which is not only undesirable but an important barrier to adoption in practice.
In this work, we introduce CLUE-Mark, the first provably undetectable watermarking scheme for diffusion models. CLUE-Mark requires no changes to the model being watermarked, is computationally efficient, and because it is provably undetectable is guaranteed to have no impact on model output quality. Our approach leverages the Continuous Learning With Errors (CLWE) problem -- a cryptographically hard lattice problem -- to embed watermarks in the latent noise vectors used by diffusion models. By proving undetectability via reduction from a cryptographically hard problem we ensure not only that the watermark is imperceptible to human observers or adhoc heuristics, but to \emph{any} efficient detector that does not have the secret key. CLUE-Mark allows multiple keys to be embedded, enabling traceability of images to specific users without altering model parameters. Empirical evaluations on state-of-the-art diffusion models confirm that CLUE-Mark achieves high recoverability, preserves image quality, and is robust to minor perturbations such JPEG compression and brightness adjustments. Uniquely, CLUE-Mark cannot be detected nor removed by recent steganographic attacks.
△ Less
Submitted 12 December, 2024; v1 submitted 18 November, 2024;
originally announced November 2024.
-
Simulating the Arrival of Multiple Coronal Mass Ejections that Triggered the Gannon Superstorm on May 10, 2024
Authors:
Smitha V. Thampi,
Ankush Bhaskar,
Prateek Mayank,
Bhargav Vaidya,
Indu Venugopal
Abstract:
The May 10, 2024 space weather event stands out as the most powerful storm recorded during the current solar cycle. This study employs a numerical framework utilizing a semi-empirical coronal model, along with HUXt (Heliospheric Upwind eXtrapolation with time-dependence) and cone-CME models for the inner heliosphere, to forecast solar wind velocity and the arrival of CMEs associated with this even…
▽ More
The May 10, 2024 space weather event stands out as the most powerful storm recorded during the current solar cycle. This study employs a numerical framework utilizing a semi-empirical coronal model, along with HUXt (Heliospheric Upwind eXtrapolation with time-dependence) and cone-CME models for the inner heliosphere, to forecast solar wind velocity and the arrival of CMEs associated with this event. The simulations were also carried out using Space Weather Adaptive SimulaTion (SWASTi) and a drag-based model (DBM) for this complex event of multiple CMEs. Predicted arrival times and velocities from these models are compared with actual observations at the Sun-Earth L1 point. These simulations reveal that three coronal mass ejections (CMEs) reached Earth nearly simultaneously, resulting in the extreme space weather event, followed by the arrival of a few more eruptions. The simulations accurately predicted arrival times with a discrepancy of approximately 5 hours or less for these CMEs. Further, the ensemble study of DBM shows the sensitivity of the CME arrival time to the background solar wind speed and drag parameters. All three models have done fairly well in reproducing the arrival time closely to the actual observation of the CMEs responsible for the extreme geomagnetic storm of May 10, 2024. These rare solar storms offered a unique opportunity to thoroughly evaluate and validate our advanced models for predicting their arrival on the Earth.
△ Less
Submitted 13 November, 2024;
originally announced November 2024.
-
Quality of Control based Resource Dimensioning for Collaborative Edge Robotics
Authors:
Neelabhro Roy,
Mani H. Dhullipalla,
Gourav Prateek Sharma,
Dimos V. Dimarogonas,
James Gross
Abstract:
With the increasing focus on flexible automation, which emphasizes systems capable of adapting to varied tasks and conditions, exploring future deployments of cloud and edge-based network infrastructures in robotic systems becomes crucial. This work, examines how wireless solutions could support the shift from rigid, wired setups toward more adaptive, flexible automation in industrial environments…
▽ More
With the increasing focus on flexible automation, which emphasizes systems capable of adapting to varied tasks and conditions, exploring future deployments of cloud and edge-based network infrastructures in robotic systems becomes crucial. This work, examines how wireless solutions could support the shift from rigid, wired setups toward more adaptive, flexible automation in industrial environments. We provide a quality of control (QoC) based abstraction for robotic workloads, parameterized on loop latency and reliability, and jointly optimize system performance. The setup involves collaborative robots working on distributed tasks, underscoring how wireless communication can enable more dynamic coordination in flexible automation systems. We use our abstraction to optimally maximize the QoC ensuring efficient operation even under varying network conditions. Additionally, our solution allocates the communication resources in time slots, optimizing the balance between communication and control costs. Our simulation results highlight that minimizing the delay in the system may not always ensure the best QoC but can lead to substantial gains in QoC if delays are sometimes relaxed, allowing more packets to be delivered reliably.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale
Authors:
Flavio Di Palo,
Prateek Singhi,
Bilal Fadlallah
Abstract:
Large Language Models (LLMs) face significant challenges at inference time due to their high computational demands. To address this, we present Performance-Guided Knowledge Distillation (PGKD), a cost-effective and high-throughput solution for production text classification applications. PGKD utilizes teacher-student Knowledge Distillation to distill the knowledge of LLMs into smaller, task-specif…
▽ More
Large Language Models (LLMs) face significant challenges at inference time due to their high computational demands. To address this, we present Performance-Guided Knowledge Distillation (PGKD), a cost-effective and high-throughput solution for production text classification applications. PGKD utilizes teacher-student Knowledge Distillation to distill the knowledge of LLMs into smaller, task-specific models. PGKD establishes an active learning routine between the student model and the LLM; the LLM continuously generates new training data leveraging hard-negative mining, student model validation performance, and early-stopping protocols to inform the data generation. By employing a cyclical, performance-aware approach tailored for highly multi-class, sparsely annotated datasets prevalent in industrial text classification, PGKD effectively addresses training challenges and outperforms traditional BERT-base models and other knowledge distillation methods on several multi-class classification datasets. Additionally, cost and latency benchmarking reveals that models fine-tuned with PGKD are up to 130X faster and 25X less expensive than LLMs for inference on the same classification task. While PGKD is showcased for text classification tasks, its versatile framework can be extended to any LLM distillation task, including language generation, making it a powerful tool for optimizing performance across a wide range of AI applications.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Machine learning and optimization-based approaches to duality in statistical physics
Authors:
Andrea E. V. Ferrari,
Prateek Gupta,
Nabil Iqbal
Abstract:
The notion of duality -- that a given physical system can have two different mathematical descriptions -- is a key idea in modern theoretical physics. Establishing a duality in lattice statistical mechanics models requires the construction of a dual Hamiltonian and a map from the original to the dual observables. By using simple neural networks to parameterize these maps and introducing a loss fun…
▽ More
The notion of duality -- that a given physical system can have two different mathematical descriptions -- is a key idea in modern theoretical physics. Establishing a duality in lattice statistical mechanics models requires the construction of a dual Hamiltonian and a map from the original to the dual observables. By using simple neural networks to parameterize these maps and introducing a loss function that penalises the difference between correlation functions in original and dual models, we formulate the process of duality discovery as an optimization problem. We numerically solve this problem and show that our framework can rediscover the celebrated Kramers-Wannier duality for the 2d Ising model, reconstructing the known mapping of temperatures. We also discuss an alternative approach which uses known features of the mapping of topological lines to reduce the problem to optimizing the couplings in a dual Hamiltonian, and explore next-to-nearest neighbour deformations of the 2d Ising duality. We discuss future directions and prospects for discovering new dualities within this framework.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
TopoTxR: A topology-guided deep convolutional network for breast parenchyma learning on DCE-MRIs
Authors:
Fan Wang,
Zhilin Zou,
Nicole Sakla,
Luke Partyka,
Nil Rawal,
Gagandeep Singh,
Wei Zhao,
Haibin Ling,
Chuan Huang,
Prateek Prasanna,
Chao Chen
Abstract:
Characterization of breast parenchyma in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a challenging task owing to the complexity of underlying tissue structures. Existing quantitative approaches, like radiomics and deep learning models, lack explicit quantification of intricate and subtle parenchymal structures, including fibroglandular tissue. To address this, we propose a no…
▽ More
Characterization of breast parenchyma in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a challenging task owing to the complexity of underlying tissue structures. Existing quantitative approaches, like radiomics and deep learning models, lack explicit quantification of intricate and subtle parenchymal structures, including fibroglandular tissue. To address this, we propose a novel topological approach that explicitly extracts multi-scale topological structures to better approximate breast parenchymal structures, and then incorporates these structures into a deep-learning-based prediction model via an attention mechanism. Our topology-informed deep learning model, \emph{TopoTxR}, leverages topology to provide enhanced insights into tissues critical for disease pathophysiology and treatment response. We empirically validate \emph{TopoTxR} using the VICTRE phantom breast dataset, showing that the topological structures extracted by our model effectively approximate the breast parenchymal structures. We further demonstrate \emph{TopoTxR}'s efficacy in predicting response to neoadjuvant chemotherapy. Our qualitative and quantitative analyses suggest differential topological behavior of breast tissue in treatment-naïve imaging, in patients who respond favorably to therapy as achieving pathological complete response (pCR) versus those who do not. In a comparative analysis with several baselines on the publicly available I-SPY 1 dataset (N=161, including 47 patients with pCR and 114 without) and the Rutgers proprietary dataset (N=120, with 69 patients achieving pCR and 51 not), \emph{TopoTxR} demonstrates a notable improvement, achieving a 2.6\% increase in accuracy and a 4.6\% enhancement in AUC compared to the state-of-the-art method.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD
Authors:
Aniket Das,
Dheeraj Nagaraj,
Soumyabrata Pal,
Arun Suggala,
Prateek Varshney
Abstract:
We consider the problem of high-dimensional heavy-tailed statistical estimation in the streaming setting, which is much harder than the traditional batch setting due to memory constraints. We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped-SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever th…
▽ More
We consider the problem of high-dimensional heavy-tailed statistical estimation in the streaming setting, which is much harder than the traditional batch setting due to memory constraints. We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped-SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever the second moment of the stochastic gradient noise is finite. More precisely, with $T$ samples, we show that Clipped-SGD, for smooth and strongly convex objectives, achieves an error of $\sqrt{\frac{\mathsf{Tr}(Σ)+\sqrt{\mathsf{Tr}(Σ)\|Σ\|_2}\log(\frac{\log(T)}δ)}{T}}$ with probability $1-δ$, where $Σ$ is the covariance of the clipped gradient. Note that the fluctuations (depending on $\frac{1}δ$) are of lower order than the term $\mathsf{Tr}(Σ)$. This improves upon the current best rate of $\sqrt{\frac{\mathsf{Tr}(Σ)\log(\frac{1}δ)}{T}}$ for Clipped-SGD, known only for smooth and strongly convex objectives. Our results also extend to smooth convex and lipschitz convex objectives. Key to our result is a novel iterative refinement strategy for martingale concentration, improving upon the PAC-Bayes approach of Catoni and Giulini.
△ Less
Submitted 26 October, 2024;
originally announced October 2024.
-
Quanta Video Restoration
Authors:
Prateek Chennuri,
Yiheng Chi,
Enze Jiang,
G. M. Dilshan Godaliyadda,
Abhiram Gnanasambandam,
Hamid R. Sheikh,
Istvan Gyongy,
Stanley H. Chan
Abstract:
The proliferation of single-photon image sensors has opened the door to a plethora of high-speed and low-light imaging applications. However, data collected by these sensors are often 1-bit or few-bit, and corrupted by noise and strong motion. Conventional video restoration methods are not designed to handle this situation, while specialized quanta burst algorithms have limited performance when th…
▽ More
The proliferation of single-photon image sensors has opened the door to a plethora of high-speed and low-light imaging applications. However, data collected by these sensors are often 1-bit or few-bit, and corrupted by noise and strong motion. Conventional video restoration methods are not designed to handle this situation, while specialized quanta burst algorithms have limited performance when the number of input frames is low. In this paper, we introduce Quanta Video Restoration (QUIVER), an end-to-end trainable network built on the core ideas of classical quanta restoration methods, i.e., pre-filtering, flow estimation, fusion, and refinement. We also collect and publish I2-2000FPS, a high-speed video dataset with the highest temporal resolution of 2000 frames-per-second, for training and testing. On simulated and real data, QUIVER outperforms existing quanta restoration methods by a significant margin. Code and dataset available at https://github.com/chennuriprateek/Quanta_Video_Restoration-QUIVER-
△ Less
Submitted 14 November, 2024; v1 submitted 19 October, 2024;
originally announced October 2024.
-
Adaptive and Stratified Subsampling Techniques for High Dimensional Non-Standard Data Environments
Authors:
Prateek Mittal,
Jai Dalmotra,
Joohi Chauhan
Abstract:
This paper addresses the challenge of estimating high-dimensional parameters in non-standard data environments, where traditional methods often falter due to issues such as heavy-tailed distributions, data contamination, and dependent observations. We propose robust subsampling techniques, specifically Adaptive Importance Sampling (AIS) and Stratified Subsampling, designed to enhance the reliabili…
▽ More
This paper addresses the challenge of estimating high-dimensional parameters in non-standard data environments, where traditional methods often falter due to issues such as heavy-tailed distributions, data contamination, and dependent observations. We propose robust subsampling techniques, specifically Adaptive Importance Sampling (AIS) and Stratified Subsampling, designed to enhance the reliability and efficiency of parameter estimation. Under some clearly outlined conditions, we establish consistency and asymptotic normality for the proposed estimators, providing non-asymptotic error bounds that quantify their performance. Our theoretical foundations are complemented by controlled experiments demonstrating the superiority of our methods over conventional approaches. By bridging the gap between theory and practice, this work offers significant contributions to robust statistical estimation, paving the way for advancements in various applied domains.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Authors:
Tong Wu,
Shujian Zhang,
Kaiqiang Song,
Silei Xu,
Sanqiang Zhao,
Ravi Agrawal,
Sathish Reddy Indurthi,
Chong Xiang,
Prateek Mittal,
Wenxuan Zhou
Abstract:
Large Language Models (LLMs) are susceptible to security and safety threats, such as prompt injection, prompt extraction, and harmful requests. One major cause of these vulnerabilities is the lack of an instruction hierarchy. Modern LLM architectures treat all inputs equally, failing to distinguish between and prioritize various types of instructions, such as system messages, user prompts, and dat…
▽ More
Large Language Models (LLMs) are susceptible to security and safety threats, such as prompt injection, prompt extraction, and harmful requests. One major cause of these vulnerabilities is the lack of an instruction hierarchy. Modern LLM architectures treat all inputs equally, failing to distinguish between and prioritize various types of instructions, such as system messages, user prompts, and data. As a result, lower-priority user prompts may override more critical system instructions, including safety protocols. Existing approaches to achieving instruction hierarchy, such as delimiters and instruction-based training, do not address this issue at the architectural level. We introduce the Instructional Segment Embedding (ISE) technique, inspired by BERT, to modern large language models, which embeds instruction priority information directly into the model. This approach enables models to explicitly differentiate and prioritize various instruction types, significantly improving safety against malicious prompts that attempt to override priority rules. Our experiments on the Structured Query and Instruction Hierarchy benchmarks demonstrate an average robust accuracy increase of up to 15.75% and 18.68%, respectively. Furthermore, we observe an improvement in instruction-following capability of up to 4.1% evaluated on AlpacaEval. Overall, our approach offers a promising direction for enhancing the safety and effectiveness of LLM architectures.
△ Less
Submitted 1 March, 2025; v1 submitted 9 October, 2024;
originally announced October 2024.
-
Glider: Global and Local Instruction-Driven Expert Router
Authors:
Pingzhi Li,
Prateek Yadav,
Jaehong Yoon,
Jie Peng,
Yi-Lin Sung,
Mohit Bansal,
Tianlong Chen
Abstract:
The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to particular domains. This has enabled the creation of powerful and adaptive routing-based "Model MoErging" methods with the goal of using expert modules to create an aggregate system with improved performance or generalization. However, existing MoErging methods often pri…
▽ More
The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to particular domains. This has enabled the creation of powerful and adaptive routing-based "Model MoErging" methods with the goal of using expert modules to create an aggregate system with improved performance or generalization. However, existing MoErging methods often prioritize generalization to unseen tasks at the expense of performance on held-in tasks, which limits its practical applicability in real-world deployment scenarios. We observe that current token-level routing mechanisms neglect the global semantic context of the input task. This token-wise independence hinders effective expert selection for held-in tasks, as routing decisions fail to incorporate the semantic properties of the task. To address this, we propose, Global and Local Instruction Driven Expert Router (GLIDER) that integrates a multi-scale routing mechanism, encompassing a semantic global router and a learned local router. The global router leverages LLM's advanced reasoning capabilities for semantic-related contexts to enhance expert selection. Given the input query and LLM, the router generates semantic task instructions that guide the retrieval of the most relevant experts across all layers. This global guidance is complemented by a local router that facilitates token-level routing decisions within each module, enabling finer control and enhanced performance on unseen tasks. Our experiments using T5-based models for T0 and FLAN tasks demonstrate that GLIDER achieves substantially improved held-in performance while maintaining strong generalization on held-out tasks. We also perform ablations experiments to dive deeper into the components of GLIDER. Our experiments highlight the importance of our multi-scale routing that leverages LLM-driven semantic reasoning for MoErging methods.
△ Less
Submitted 11 April, 2025; v1 submitted 9 October, 2024;
originally announced October 2024.
-
Convex Distillation: Efficient Compression of Deep Networks via Convex Optimization
Authors:
Prateek Varshney,
Mert Pilanci
Abstract:
Deploying large and complex deep neural networks on resource-constrained edge devices poses significant challenges due to their computational demands and the complexities of non-convex optimization. Traditional compression methods such as distillation and pruning often retain non-convexity that complicates fine-tuning in real-time on such devices. Moreover, these methods often necessitate extensiv…
▽ More
Deploying large and complex deep neural networks on resource-constrained edge devices poses significant challenges due to their computational demands and the complexities of non-convex optimization. Traditional compression methods such as distillation and pruning often retain non-convexity that complicates fine-tuning in real-time on such devices. Moreover, these methods often necessitate extensive end-to-end network fine-tuning after compression to preserve model performance, which is not only time-consuming but also requires fully annotated datasets, thus potentially negating the benefits of efficient network compression. In this paper, we introduce a novel distillation technique that efficiently compresses the model via convex optimization -- eliminating intermediate non-convex activation functions and using only intermediate activations from the original model. Our approach enables distillation in a label-free data setting and achieves performance comparable to the original model without requiring any post-compression fine-tuning. We demonstrate the effectiveness of our method for image classification models on multiple standard datasets, and further show that in the data limited regime, our method can outperform standard non-convex distillation approaches. Our method promises significant advantages for deploying high-efficiency, low-footprint models on edge devices, making it a practical choice for real-world applications. We show that convex neural networks, when provided with rich feature representations from a large pre-trained non-convex model, can achieve performance comparable to their non-convex counterparts, opening up avenues for future research at the intersection of convex optimization and deep learning.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Axion Couplings in Heterotic String Theory
Authors:
Prateek Agrawal,
Michael Nee,
Mario Reig
Abstract:
We study the coupling of axions to gauge bosons in heterotic string theory. The axion-gauge boson couplings in the low energy 4d theory are derived by matching mixed anomalies between higher-form global symmetries and the zero-form gauge symmetry in the 10d theory. When the standard model gauge group is embedded in a single simple group in the 10d theory -- as is the case for almost all heterotic…
▽ More
We study the coupling of axions to gauge bosons in heterotic string theory. The axion-gauge boson couplings in the low energy 4d theory are derived by matching mixed anomalies between higher-form global symmetries and the zero-form gauge symmetry in the 10d theory. When the standard model gauge group is embedded in a single simple group in the 10d theory -- as is the case for almost all heterotic models studied in the literature -- the ratio of the axion-photon coupling to the axion mass is bounded above by the QCD line. This bound is relevant for a large number of axion searches which have sensitivity to axion parameter space above this line. The discovery of an axion in these searches will rule out a large class of heterotic models, making such a signal challenging to explain within heterotic string theory.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
What Matters for Model Merging at Scale?
Authors:
Prateek Yadav,
Tu Vu,
Jonathan Lai,
Alexandra Chronopoulou,
Manaal Faruqui,
Mohit Bansal,
Tsendsuren Munkhdalai
Abstract:
Model merging aims to combine multiple expert models into a more capable single model, offering benefits such as reduced storage and serving costs, improved generalization, and support for decentralized model development. Despite its promise, previous studies have primarily focused on merging a few small models. This leaves many unanswered questions about the effect of scaling model size and how i…
▽ More
Model merging aims to combine multiple expert models into a more capable single model, offering benefits such as reduced storage and serving costs, improved generalization, and support for decentralized model development. Despite its promise, previous studies have primarily focused on merging a few small models. This leaves many unanswered questions about the effect of scaling model size and how it interplays with other key factors -- like the base model quality and number of expert models -- , to affect the merged model's performance. This work systematically evaluates the utility of model merging at scale, examining the impact of these different factors. We experiment with merging fully fine-tuned models using 4 popular merging methods -- Averaging, Task~Arithmetic, Dare, and TIES -- across model sizes ranging from 1B-64B parameters and merging up to 8 different expert models. We evaluate the merged models on both held-in tasks, i.e., the expert's training tasks, and zero-shot generalization to unseen held-out tasks. Our experiments provide several new insights about model merging at scale and the interplay between different factors. First, we find that merging is more effective when experts are created from strong base models, i.e., models with good zero-shot performance. Second, larger models facilitate easier merging. Third merging consistently improves generalization capabilities. Notably, when merging 8 large expert models, the merged models often generalize better compared to the multitask trained models. Fourth, we can better merge more expert models when working with larger models. Fifth, different merging methods behave very similarly at larger scales. Overall, our findings shed light on some interesting properties of model merging while also highlighting some limitations. We hope that this study will serve as a reference point on large-scale merging for upcoming research.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Hard Negative Sample Mining for Whole Slide Image Classification
Authors:
Wentao Huang,
Xiaoling Hu,
Shahira Abousamra,
Prateek Prasanna,
Chao Chen
Abstract:
Weakly supervised whole slide image (WSI) classification is challenging due to the lack of patch-level labels and high computational costs. State-of-the-art methods use self-supervised patch-wise feature representations for multiple instance learning (MIL). Recently, methods have been proposed to fine-tune the feature representation on the downstream task using pseudo labeling, but mostly focusing…
▽ More
Weakly supervised whole slide image (WSI) classification is challenging due to the lack of patch-level labels and high computational costs. State-of-the-art methods use self-supervised patch-wise feature representations for multiple instance learning (MIL). Recently, methods have been proposed to fine-tune the feature representation on the downstream task using pseudo labeling, but mostly focusing on selecting high-quality positive patches. In this paper, we propose to mine hard negative samples during fine-tuning. This allows us to obtain better feature representations and reduce the training cost. Furthermore, we propose a novel patch-wise ranking loss in MIL to better exploit these hard negative samples. Experiments on two public datasets demonstrate the efficacy of these proposed ideas. Our codes are available at https://github.com/winston52/HNM-WSI
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Semi-Supervised Contrastive VAE for Disentanglement of Digital Pathology Images
Authors:
Mahmudul Hasan,
Xiaoling Hu,
Shahira Abousamra,
Prateek Prasanna,
Joel Saltz,
Chao Chen
Abstract:
Despite the strong prediction power of deep learning models, their interpretability remains an important concern. Disentanglement models increase interpretability by decomposing the latent space into interpretable subspaces. In this paper, we propose the first disentanglement method for pathology images. We focus on the task of detecting tumor-infiltrating lymphocytes (TIL). We propose different i…
▽ More
Despite the strong prediction power of deep learning models, their interpretability remains an important concern. Disentanglement models increase interpretability by decomposing the latent space into interpretable subspaces. In this paper, we propose the first disentanglement method for pathology images. We focus on the task of detecting tumor-infiltrating lymphocytes (TIL). We propose different ideas including cascading disentanglement, novel architecture, and reconstruction branches. We achieve superior performance on complex pathology images, thus improving the interpretability and even generalization power of TIL detection deep learning models. Our codes are available at https://github.com/Shauqi/SS-cVAE.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
RadGazeGen: Radiomics and Gaze-guided Medical Image Generation using Diffusion Models
Authors:
Moinak Bhattacharya,
Gagandeep Singh,
Shubham Jain,
Prateek Prasanna
Abstract:
In this work, we present RadGazeGen, a novel framework for integrating experts' eye gaze patterns and radiomic feature maps as controls to text-to-image diffusion models for high fidelity medical image generation. Despite the recent success of text-to-image diffusion models, text descriptions are often found to be inadequate and fail to convey detailed disease-specific information to these models…
▽ More
In this work, we present RadGazeGen, a novel framework for integrating experts' eye gaze patterns and radiomic feature maps as controls to text-to-image diffusion models for high fidelity medical image generation. Despite the recent success of text-to-image diffusion models, text descriptions are often found to be inadequate and fail to convey detailed disease-specific information to these models to generate clinically accurate images. The anatomy, disease texture patterns, and location of the disease are extremely important to generate realistic images; moreover the fidelity of image generation can have significant implications in downstream tasks involving disease diagnosis or treatment repose assessment. Hence, there is a growing need to carefully define the controls used in diffusion models for medical image generation. Eye gaze patterns of radiologists are important visuo-cognitive information, indicative of subtle disease patterns and spatial location. Radiomic features further provide important subvisual cues regarding disease phenotype. In this work, we propose to use these gaze patterns in combination with standard radiomics descriptors, as controls, to generate anatomically correct and disease-aware medical images. RadGazeGen is evaluated for image generation quality and diversity on the REFLACX dataset. To demonstrate clinical applicability, we also show classification performance on the generated images from the CheXpert test set (n=500) and long-tailed learning performance on the MIMIC-CXR-LT test set (n=23550).
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
Study of Evolution and Geo-effectiveness of CME-CME Interactions using MHD Simulations with SWASTi framework
Authors:
Prateek Mayank,
Stefan Lotz,
Bhargav Vaidya,
Wageesh Mishra,
D. Chakrabarty
Abstract:
The geo-effectiveness of Coronal Mass Ejections (CMEs) is a critical area of study in space weather, particularly in the lesser-explored domain of CME-CME interactions and their geomagnetic consequences. This study leverages the SWASTi framework to perform 3D MHD simulation of a range of CME-CME interaction scenarios within realistic solar wind conditions. The focus is on the dynamics of the initi…
▽ More
The geo-effectiveness of Coronal Mass Ejections (CMEs) is a critical area of study in space weather, particularly in the lesser-explored domain of CME-CME interactions and their geomagnetic consequences. This study leverages the SWASTi framework to perform 3D MHD simulation of a range of CME-CME interaction scenarios within realistic solar wind conditions. The focus is on the dynamics of the initial magnetic flux, speed, density, and tilt of CMEs, and their individual and combined impacts on the disturbance storm time (Dst) index. Additionally, the kinematic, magnetic, and structural impacts on the leading CME, as well as the mixing of both CMEs, are analyzed. Time series in-situ studies are conducted through virtual spacecraft positioned along three different longitudes at 1 AU. Our findings reveal that CME-CME interactions are non-uniform along different longitudes due to the inhomogeneous ambient solar wind conditions. A significant increase in the momentum and kinetic energy of the leading CME is observed due to collisions with the trailing CME, along with the formation of reverse shocks in cases of strong interaction. These reverse shocks lead to complex wave patterns inside CME2, which can prolong the storm recovery phase. Furthermore, we observed that the minimum Dst value decreases with an increase in the initial density, tilt, and speed of the trailing CME.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
A Bi-criterion Steiner Traveling Salesperson Problem with Time Windows for Last-Mile Electric Vehicle Logistics
Authors:
Prateek Agarwal,
Debojjal Bagchi,
Tarun Rambha,
Venktesh Pandey
Abstract:
This paper addresses the problem of energy-efficient and safe routing of last-mile electric freight vehicles. With the rising environmental footprint of the transportation sector and the growing popularity of E-Commerce, freight companies are likely to benefit from optimal time-window-feasible tours that minimize energy usage while reducing traffic conflicts at intersections and thereby improving…
▽ More
This paper addresses the problem of energy-efficient and safe routing of last-mile electric freight vehicles. With the rising environmental footprint of the transportation sector and the growing popularity of E-Commerce, freight companies are likely to benefit from optimal time-window-feasible tours that minimize energy usage while reducing traffic conflicts at intersections and thereby improving safety. We formulate this problem as a Bi-criterion Steiner Traveling Salesperson Problem with Time Windows (BSTSPTW) with energy consumed and the number of left turns at intersections as the two objectives while also considering regenerative braking capabilities. We first discuss an exact mixed-integer programming model with scalarization to enumerate points on the efficiency frontier for small instances. For larger networks, we develop an efficient local search-based heuristic, which uses several operators to intensify and diversify the search process. We demonstrate the utility of the proposed methods using benchmark data and real-world instances from Amazon delivery routes in Austin, US. Comparisons with state-of-the-art solvers shows that our heuristics can generate near-optimal solutions within reasonable time budgets, effectively balancing energy efficiency and safety under practical delivery constraints.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.