-
Many LLMs Are More Utilitarian Than One
Authors:
Anita Keshmirian,
Razan Baltaji,
Babak Hemmatian,
Hadi Asghari,
Lav R. Varshney
Abstract:
Moral judgment is integral to large language model (LLM) alignment and social reasoning. As multi-agent systems gain prominence, it becomes crucial to understand how LLMs function collectively during collaboration, compared to individual agents. In human moral judgment, group deliberation leads to a utilitarian boost: a tendency to endorse norm violations that maximize benefits for the greatest nu…
▽ More
Moral judgment is integral to large language model (LLM) alignment and social reasoning. As multi-agent systems gain prominence, it becomes crucial to understand how LLMs function collectively during collaboration, compared to individual agents. In human moral judgment, group deliberation leads to a utilitarian boost: a tendency to endorse norm violations that maximize benefits for the greatest number of people despite harms. We study whether a similar dynamic emerges in multi-agent LLM systems. We tested six models on well-established sets of moral dilemmas across two conditions: (1) Solo, where models reasoned independently, and (2) Group, where they engaged in multi-turn discussions in pairs or triads. In personal moral dilemmas, where agents must decide to directly harm one individual to maximize the utility for others, all models found moral violations to be more acceptable when part of a group than individually, similar to human experiments. Some models endorsed actions that maximized overall well-being, even if they benefited strangers over familiar individuals. Others became more willing to violate moral norms in groups. However, while human groups show a similar action bias, the mechanism for their utilitarian boost differs from LLMs. Whereas the human shift comes from heightened sensitivity to decision outcomes, LLM groups show either reduced norm sensitivity or enhanced impartiality. This suggests that while the surface behavior of LLM collectives mimics human group reasoning, the underlying drivers differ. We discuss the implications for AI alignment, multi-agent design, and artificial moral reasoning.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search
Authors:
Austin R. Ellis-Mohr,
Anuj K. Nayak,
Lav R. Varshney
Abstract:
Large language models (LLMs) demand considerable computational, energy, and financial resources during both training and deployment. While scaling laws for training have guided much of the field's recent progress, inference costs now represent a significant and growing component of the overall resource burden, particularly for reasoning-focused models. Existing characterizations of compute-optimal…
▽ More
Large language models (LLMs) demand considerable computational, energy, and financial resources during both training and deployment. While scaling laws for training have guided much of the field's recent progress, inference costs now represent a significant and growing component of the overall resource burden, particularly for reasoning-focused models. Existing characterizations of compute-optimality that consider model size, dataset size, and inference tokens in isolation or in fixed combinations risk overlooking more efficient operating points. We introduce directed stochastic skill search (DS3), a general framework that represents inference as stochastic traversal over a learned skill graph. From a simplified yet expressive instantiation, we derive closed-form expressions for task success and compute cost across a wide range of inference strategies -- including chain-of-thought (CoT) and tree-of-thought (ToT) -- enabling comparative analysis as a function of task difficulty and model capability. To that end, we extend a prior first-principles tripartite graph framework of LLM training to incorporate inference, and separately bridge DS3 with empirical methods that characterize LLM scaling behavior. We theoretically recover empirically observed patterns, including: linear accuracy scaling with logarithmic compute; variation in preferred inference strategies as a function of task difficulty and model capability; emergent behavior elicited by reasoning even when performance plateaus under parameter scaling; and both best-of-N (BoN) and majority voting behavior captured within a unified analytical framework. By explicitly characterizing training-inference interdependencies, our framework deepens theoretical understanding and supports principled algorithmic design and resource allocation.
△ Less
Submitted 10 June, 2025;
originally announced July 2025.
-
Concealment of Intent: A Game-Theoretic Analysis
Authors:
Xinbo Wu,
Abhishek Umrawal,
Lav R. Varshney
Abstract:
As large language models (LLMs) grow more capable, concerns about their safe deployment have also grown. Although alignment mechanisms have been introduced to deter misuse, they remain vulnerable to carefully designed adversarial prompts. In this work, we present a scalable attack strategy: intent-hiding adversarial prompting, which conceals malicious intent through the composition of skills. We d…
▽ More
As large language models (LLMs) grow more capable, concerns about their safe deployment have also grown. Although alignment mechanisms have been introduced to deter misuse, they remain vulnerable to carefully designed adversarial prompts. In this work, we present a scalable attack strategy: intent-hiding adversarial prompting, which conceals malicious intent through the composition of skills. We develop a game-theoretic framework to model the interaction between such attacks and defense systems that apply both prompt and response filtering. Our analysis identifies equilibrium points and reveals structural advantages for the attacker. To counter these threats, we propose and analyze a defense mechanism tailored to intent-hiding attacks. Empirically, we validate the attack's effectiveness on multiple real-world LLMs across a range of malicious behaviors, demonstrating clear advantages over existing adversarial prompting techniques.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Spark: A System for Scientifically Creative Idea Generation
Authors:
Aishik Sanyal,
Samuel Schapiro,
Sumuk Shashidhar,
Royce Moon,
Lav R. Varshney,
Dilek Hakkani-Tur
Abstract:
Recently, large language models (LLMs) have shown promising abilities to generate novel research ideas in science, a direction which coincides with many foundational principles in computational creativity (CC). In light of these developments, we present an idea generation system named Spark that couples retrieval-augmented idea generation using LLMs with a reviewer model named Judge trained on 600…
▽ More
Recently, large language models (LLMs) have shown promising abilities to generate novel research ideas in science, a direction which coincides with many foundational principles in computational creativity (CC). In light of these developments, we present an idea generation system named Spark that couples retrieval-augmented idea generation using LLMs with a reviewer model named Judge trained on 600K scientific reviews from OpenReview. Our work is both a system demonstration and intended to inspire other CC researchers to explore grounding the generation and evaluation of scientific ideas within foundational CC principles. To this end, we release the annotated dataset used to train Judge, inviting other researchers to explore the use of LLMs for idea generation and creative evaluations.
△ Less
Submitted 21 May, 2025; v1 submitted 25 April, 2025;
originally announced April 2025.
-
Platonic Grounding for Efficient Multimodal Language Models
Authors:
Moulik Choraria,
Xinbo Wu,
Akhil Bhimaraju,
Nitesh Sekhar,
Yue Wu,
Xu Zhang,
Prateek Singhal,
Lav R. Varshney
Abstract:
The hyperscaling of data and parameter count in Transformer-based models is yielding diminishing performance improvement, especially when weighed against training costs. Such plateauing indicates the importance of methods for more efficient finetuning and inference, while retaining similar performance. This is especially relevant for multimodal learning paradigms, where inference costs of processi…
▽ More
The hyperscaling of data and parameter count in Transformer-based models is yielding diminishing performance improvement, especially when weighed against training costs. Such plateauing indicates the importance of methods for more efficient finetuning and inference, while retaining similar performance. This is especially relevant for multimodal learning paradigms, where inference costs of processing multimodal tokens can determine the model's practical viability. At the same time, research on representations and mechanistic interpretability has improved our understanding of the inner workings of Transformer-based models; one such line of work reveals an implicit alignment in the deeper layers of pretrained models, across modalities. Taking inspiration from this, we motivate and propose a simple modification to existing multimodal frameworks that rely on aligning pretrained models. We demonstrate that our approach maintains and, in some cases, even improves performance of baseline methods while achieving significant gains in both training and inference-time compute. Our work also has implications for combining pretrained models into larger systems efficiently.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Transformational Creativity in Science: A Graphical Theory
Authors:
Samuel Schapiro,
Jonah Black,
Lav R. Varshney
Abstract:
Creative processes are typically divided into three types: combinatorial, exploratory, and transformational. Here, we provide a graphical theory of transformational scientific creativity, synthesizing Boden's insight that transformational creativity arises from changes in the "enabling constraints" of a conceptual space and Kuhn's structure of scientific revolutions as resulting from paradigm shif…
▽ More
Creative processes are typically divided into three types: combinatorial, exploratory, and transformational. Here, we provide a graphical theory of transformational scientific creativity, synthesizing Boden's insight that transformational creativity arises from changes in the "enabling constraints" of a conceptual space and Kuhn's structure of scientific revolutions as resulting from paradigm shifts. We prove that modifications made to axioms of our graphical model have the most transformative potential and then illustrate how several historical instances of transformational creativity can be captured by our framework.
△ Less
Submitted 20 May, 2025; v1 submitted 25 April, 2025;
originally announced April 2025.
-
Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning
Authors:
Raghav Singhal,
Kaustubh Ponkshe,
Rohit Vartak,
Lav R. Varshney,
Praneeth Vepakomma
Abstract:
Low-Rank Adaptation (LoRA) has become ubiquitous for efficiently fine-tuning foundation models. However, federated fine-tuning using LoRA is challenging due to suboptimal updates arising from traditional federated averaging of individual adapters. Existing solutions either incur prohibitively high communication cost that scales linearly with the number of clients or suffer from performance degrada…
▽ More
Low-Rank Adaptation (LoRA) has become ubiquitous for efficiently fine-tuning foundation models. However, federated fine-tuning using LoRA is challenging due to suboptimal updates arising from traditional federated averaging of individual adapters. Existing solutions either incur prohibitively high communication cost that scales linearly with the number of clients or suffer from performance degradation due to limited expressivity. We introduce Federated Silver Bullet (Fed-SB), a novel approach for federated fine-tuning of LLMs using LoRA-SB, a recently proposed low-rank adaptation method. LoRA-SB optimally aligns the optimization trajectory with the ideal low-rank full fine-tuning projection by learning a small square matrix (R) between adapters B and A, keeping other components fixed. Direct averaging of R guarantees exact updates, substantially reducing communication cost, which remains independent of the number of clients, and enables scalability. Fed-SB achieves state-of-the-art performance across commonsense reasoning, arithmetic reasoning, and language inference tasks while reducing communication costs by up to 230x. In private settings, Fed-SB further improves performance by (1) reducing trainable parameters, thereby lowering the noise required for differential privacy and (2) avoiding noise amplification introduced by other methods. Overall, Fed-SB establishes a new Pareto frontier in the tradeoff between communication and performance, offering an efficient and scalable solution for both private and non-private federated fine-tuning. Our code is publicly available at https://github.com/CERT-Lab/fed-sb.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
Authors:
Saurabh Jha,
Rohan Arora,
Yuji Watanabe,
Takumi Yanagawa,
Yinfang Chen,
Jackson Clark,
Bhavya Bhavya,
Mudit Verma,
Harshit Kumar,
Hirokuni Kitahara,
Noah Zheutlin,
Saki Takano,
Divya Pathak,
Felix George,
Xinbo Wu,
Bekir O. Turkkan,
Gerard Vanloo,
Michael Nidd,
Ting Dai,
Oishik Chatterjee,
Pranjal Gupta,
Suranjana Samanta,
Pooja Aggarwal,
Rong Lee,
Pavankumar Murali
, et al. (18 additional authors not shown)
Abstract:
Realizing the vision of using AI agents to automate critical IT tasks depends on the ability to measure and understand effectiveness of proposed solutions. We introduce ITBench, a framework that offers a systematic methodology for benchmarking AI agents to address real-world IT automation tasks. Our initial release targets three key areas: Site Reliability Engineering (SRE), Compliance and Securit…
▽ More
Realizing the vision of using AI agents to automate critical IT tasks depends on the ability to measure and understand effectiveness of proposed solutions. We introduce ITBench, a framework that offers a systematic methodology for benchmarking AI agents to address real-world IT automation tasks. Our initial release targets three key areas: Site Reliability Engineering (SRE), Compliance and Security Operations (CISO), and Financial Operations (FinOps). The design enables AI researchers to understand the challenges and opportunities of AI agents for IT automation with push-button workflows and interpretable metrics. ITBench includes an initial set of 94 real-world scenarios, which can be easily extended by community contributions. Our results show that agents powered by state-of-the-art models resolve only 13.8% of SRE scenarios, 25.2% of CISO scenarios, and 0% of FinOps scenarios. We expect ITBench to be a key enabler of AI-driven IT automation that is correct, safe, and fast.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Suspense and surprise in the book of technology: Understanding innovation dynamics
Authors:
Oh-Hyun Kwon,
Jisung Yoon,
Lav R. Varshney,
Woo-Sung Jung,
Hyejin Youn
Abstract:
We envision future technologies through science fiction, strategic planning, or academic research. Yet, our expectations do not always match with what actually unfolds, much like navigating a story where some events align with expectations while others surprise us. This gap indicates the inherent uncertainty of innovation-how technologies emerge and evolve in unpredictable ways. Here, we elaborate…
▽ More
We envision future technologies through science fiction, strategic planning, or academic research. Yet, our expectations do not always match with what actually unfolds, much like navigating a story where some events align with expectations while others surprise us. This gap indicates the inherent uncertainty of innovation-how technologies emerge and evolve in unpredictable ways. Here, we elaborate on this inherent uncertainty of innovation in the way technologies emerge and evolve. We define suspense captures accumulated uncertainty and describing events anticipated before their realization, while surprise represents a dramatic shift in understanding when an event occurs unexpectedly. We identify those connections in U.S. patents and show that suspenseful innovations tend to integrate more smoothly into society, achieving higher citations and market value. In contrast, surprising innovations, though often disruptive and groundbreaking, face challenges in adoption due to their extreme novelty. We further show that these categories allow us to identify distinct stages of technology life cycles, suggesting a way to identify the systematic trajectory of technologies and anticipate their future paths.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Formal Verification of Digital Twins with TLA and Information Leakage Control
Authors:
Luwen Huang,
Lav R. Varshney,
Karen E. Willcox
Abstract:
Verifying the correctness of a digital twin provides a formal guarantee that the digital twin operates as intended. Digital twin verification is challenging due to the presence of uncertainties in the virtual representation, the physical environment, and the bidirectional flow of information between physical and virtual. A further challenge is that a digital twin of a complex system is composed of…
▽ More
Verifying the correctness of a digital twin provides a formal guarantee that the digital twin operates as intended. Digital twin verification is challenging due to the presence of uncertainties in the virtual representation, the physical environment, and the bidirectional flow of information between physical and virtual. A further challenge is that a digital twin of a complex system is composed of distributed components. This paper presents a methodology to specify and verify digital twin behavior, translating uncertain processes into a formally verifiable finite state machine. We use the Temporal Logic of Actions (TLA) to create a specification, an implementation abstraction that defines the properties required for correct system behavior. Our approach includes a novel weakening of formal security properties, allowing controlled information leakage while preserving theoretical guarantees. We demonstrate this approach on a digital twin of an unmanned aerial vehicle, verifying synchronization of physical-to-virtual and virtual-to-digital data flows to detect unintended misalignments.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
From Phytochemicals to Recipes: Health Indications and Culinary Uses of Herbs and Spices
Authors:
Rishemjit Kaur,
Shuchen Zhang,
Bhavika Berwal,
Sonalika Ray,
Ritesh Kumar,
Lav R. Varshney
Abstract:
Herbs and spices each contain about 3000 phytochemicals on average and there is much traditional knowledge on their health benefits. However, there is a lack of systematic study to understand the relationship among herbs and spices, their phytochemical constituents, their potential health benefits, and their usage in regional cuisines. Here we use a network-based approach to elucidate established…
▽ More
Herbs and spices each contain about 3000 phytochemicals on average and there is much traditional knowledge on their health benefits. However, there is a lack of systematic study to understand the relationship among herbs and spices, their phytochemical constituents, their potential health benefits, and their usage in regional cuisines. Here we use a network-based approach to elucidate established relationships and predict novel associations between the phytochemicals present in herbs and spices with health indications. Our top 100 inferred indication-phytochemical relationships rediscover 40% known relationships and 20% that have been inferred via gene-chemical interactions with high confidence. The remaining 40% are hypotheses generated in a principled way for further experimental investigations. We also develop an algorithm to find the minimum set of spices needed to cover a target group of health conditions. Drawing on spice usage patterns in several regional Indian cuisines, and a copy-mutate model for regional cuisine evolution, we characterize the spectrum of health conditions covered by existing regional cuisines. The spectrum of health conditions can expand through the nationalization/globalization of culinary practice.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Online Reinforcement Learning with Passive Memory
Authors:
Anay Pattanaik,
Lav R. Varshney
Abstract:
This paper considers an online reinforcement learning algorithm that leverages pre-collected data (passive memory) from the environment for online interaction. We show that using passive memory improves performance and further provide theoretical guarantees for regret that turns out to be near-minimax optimal. Results show that the quality of passive memory determines sub-optimality of the incurre…
▽ More
This paper considers an online reinforcement learning algorithm that leverages pre-collected data (passive memory) from the environment for online interaction. We show that using passive memory improves performance and further provide theoretical guarantees for regret that turns out to be near-minimax optimal. Results show that the quality of passive memory determines sub-optimality of the incurred regret. The proposed approach and results hold in both continuous and discrete state-action spaces.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
An Information Theory of Compute-Optimal Size Scaling, Emergence, and Plateaus in Language Models
Authors:
Anuj K. Nayak,
Lav R. Varshney
Abstract:
Recent empirical studies show three phenomena with increasing size of language models: compute-optimal size scaling, emergent capabilities, and performance plateauing. We present a simple unified mathematical framework to explain all of these language model scaling phenomena, building on recent skill-text bipartite graph frameworks for semantic learning. Modeling the learning of concepts from text…
▽ More
Recent empirical studies show three phenomena with increasing size of language models: compute-optimal size scaling, emergent capabilities, and performance plateauing. We present a simple unified mathematical framework to explain all of these language model scaling phenomena, building on recent skill-text bipartite graph frameworks for semantic learning. Modeling the learning of concepts from texts as an iterative process yields an analogy to iterative decoding of low-density parity check (LDPC) codes in information theory. Thence, drawing on finite-size scaling characterizations of LDPC decoding, we derive the compute-optimal size scaling (Chinchilla rule) for language models. Further, using tools from random network theory, we provide a simple explanation for both emergence of complex skills and plateauing of performance as the size of language models scale. We see multiple plateaus.
△ Less
Submitted 15 October, 2024; v1 submitted 2 October, 2024;
originally announced October 2024.
-
Compute-Update Federated Learning: A Lattice Coding Approach Over-the-Air
Authors:
Seyed Mohammad Azimi-Abarghouyi,
Lav R. Varshney
Abstract:
This paper introduces a federated learning framework that enables over-the-air computation via digital communications, using a new joint source-channel coding scheme. Without relying on channel state information at devices, this scheme employs lattice codes to both quantize model parameters and exploit interference from the devices. We propose a novel receiver structure at the server, designed to…
▽ More
This paper introduces a federated learning framework that enables over-the-air computation via digital communications, using a new joint source-channel coding scheme. Without relying on channel state information at devices, this scheme employs lattice codes to both quantize model parameters and exploit interference from the devices. We propose a novel receiver structure at the server, designed to reliably decode an integer combination of the quantized model parameters as a lattice point for the purpose of aggregation. We present a mathematical approach to derive a convergence bound for the proposed scheme and offer design remarks. In this context, we suggest an aggregation metric and a corresponding algorithm to determine effective integer coefficients for the aggregation in each communication round. Our results illustrate that, regardless of channel dynamics and data heterogeneity, our scheme consistently delivers superior learning accuracy across various parameters and markedly surpasses other over-the-air methodologies.
△ Less
Submitted 5 November, 2024; v1 submitted 10 September, 2024;
originally announced September 2024.
-
SwitchCIT: Switching for Continual Instruction Tuning
Authors:
Xinbo Wu,
Max Hartman,
Vidhata Arjun Jayaraman,
Lav R. Varshney
Abstract:
Large language models (LLMs) and multimodal models (MMs) have exhibited impressive capabilities in various domains, particularly in general language understanding and visual reasoning. However, these models, trained on massive data, may not be finely optimized for specific tasks triggered by instructions. Continual instruction tuning is crucial to adapt a large model to evolving tasks and domains,…
▽ More
Large language models (LLMs) and multimodal models (MMs) have exhibited impressive capabilities in various domains, particularly in general language understanding and visual reasoning. However, these models, trained on massive data, may not be finely optimized for specific tasks triggered by instructions. Continual instruction tuning is crucial to adapt a large model to evolving tasks and domains, ensuring their effectiveness and relevance across a wide range of applications. In the context of continual instruction tuning, where models are sequentially trained on different tasks, catastrophic forgetting can occur, leading to performance degradation on previously learned tasks. This work addresses the catastrophic forgetting in continual instruction learning through a switching mechanism for routing computations to parameter-efficient tuned models. We demonstrate the effectiveness of our method through experiments on continual instruction tuning of different natural language generation tasks and vision-language tasks. We also showcase the advantages of our proposed method in terms of efficiency, scalability, portability, and privacy preservation.
△ Less
Submitted 18 December, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Fractional Budget Allocation for Influence Maximization under General Marketing Strategies
Authors:
Akhil Bhimaraju,
Eliot W. Robson,
Lav R. Varshney,
Abhishek K. Umrawal
Abstract:
We consider the fractional influence maximization problem, i.e., identifying users on a social network to be incentivized with potentially partial discounts to maximize the influence on the network. The larger the discount given to a user, the higher the likelihood of its activation (adopting a new product or innovation), who then attempts to activate its neighboring users, causing a cascade effec…
▽ More
We consider the fractional influence maximization problem, i.e., identifying users on a social network to be incentivized with potentially partial discounts to maximize the influence on the network. The larger the discount given to a user, the higher the likelihood of its activation (adopting a new product or innovation), who then attempts to activate its neighboring users, causing a cascade effect of influence through the network. Our goal is to devise efficient algorithms that assign initial discounts to the network's users to maximize the total number of activated users at the end of the cascade, subject to a constraint on the total sum of discounts given. In general, the activation likelihood could be any non-decreasing function of the discount, whereas, our focus lies on the case when the activation likelihood is an affine function of the discount, potentially varying across different users. As this problem is shown to be NP-hard, we propose and analyze an efficient (1-1/e)-approximation algorithm. Furthermore, we run experiments on real-world social networks to show the performance and scalability of our method.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Reliable Quantum Memories with Unreliable Components
Authors:
Anuj K. Nayak,
Eric Chitambar,
Lav R. Varshney
Abstract:
Quantum memory systems are vital in quantum information processing for dependable storage and retrieval of quantum states. Inspired by classical reliability theories that synthesize reliable computing systems from unreliable components, we formalize the problem of reliable storage of quantum information using noisy components. We introduce the notion of stable quantum memories and define the stora…
▽ More
Quantum memory systems are vital in quantum information processing for dependable storage and retrieval of quantum states. Inspired by classical reliability theories that synthesize reliable computing systems from unreliable components, we formalize the problem of reliable storage of quantum information using noisy components. We introduce the notion of stable quantum memories and define the storage rate as the ratio of the number of logical qubits to the total number of physical qubits, as well as the circuit complexity of the decoder, which includes both quantum gates and measurements. We demonstrate that a strictly positive storage rate can be achieved by constructing a quantum memory system with quantum expander codes. Moreover, by reducing the reliable storage problem to reliable quantum communication, we provide upper bounds on the achievable storage capacity. In the case of physical qubits corrupted by noise satisfying hypercontractivity conditions, we provide a tighter upper bound on storage capacity using an entropy dissipation argument. Furthermore, observing that the time complexity of the decoder scales non-trivially with the number of physical qubits, achieving asymptotic rates may not be possible due to the induced dependence of the noise on the number of physical qubits. In this constrained non-asymptotic setting, we derive upper bounds on storage capacity using finite blocklength communication bounds. Finally, we numerically analyze the gap between upper and lower bounds in both asymptotic and non-asymptotic cases, and provide suggestions to tighten the gap.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Persona Inconstancy in Multi-Agent LLM Collaboration: Conformity, Confabulation, and Impersonation
Authors:
Razan Baltaji,
Babak Hemmatian,
Lav R. Varshney
Abstract:
Multi-agent AI systems can be used for simulating collective decision-making in scientific and practical applications. They can also be used to introduce a diverse group discussion step in chatbot pipelines, enhancing the cultural sensitivity of the chatbot's responses. These applications, however, are predicated on the ability of AI agents to reliably adopt assigned personas and mimic human inter…
▽ More
Multi-agent AI systems can be used for simulating collective decision-making in scientific and practical applications. They can also be used to introduce a diverse group discussion step in chatbot pipelines, enhancing the cultural sensitivity of the chatbot's responses. These applications, however, are predicated on the ability of AI agents to reliably adopt assigned personas and mimic human interactions. To see whether LLM agents satisfy these requirements, we examine AI agent ensembles engaged in cross-national collaboration and debate by analyzing their private responses and chat transcripts. Our findings suggest that multi-agent discussions can support collective AI decisions that more often reflect diverse perspectives, yet this effect is tempered by the agents' susceptibility to conformity due to perceived peer pressure and occasional challenges in maintaining consistent personas and opinions. Instructions that encourage debate in support of one's opinions rather than collaboration increase the rate of inconstancy. Without addressing the factors we identify, the full potential of multi-agent frameworks for producing more culturally diverse AI outputs or more realistic simulations of group decision-making may remain untapped.
△ Less
Submitted 14 August, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Semantic Compression with Information Lattice Learning
Authors:
Haizi Yu,
Lav R. Varshney
Abstract:
Data-driven artificial intelligence (AI) techniques are becoming prominent for learning in support of data compression, but are focused on standard problems such as text compression. To instead address the emerging problem of semantic compression, we argue that the lattice theory of information is particularly expressive and mathematically precise in capturing notions of abstraction as a form of l…
▽ More
Data-driven artificial intelligence (AI) techniques are becoming prominent for learning in support of data compression, but are focused on standard problems such as text compression. To instead address the emerging problem of semantic compression, we argue that the lattice theory of information is particularly expressive and mathematically precise in capturing notions of abstraction as a form of lossy semantic compression. As such, we demonstrate that a novel AI technique called information lattice learning, originally developed for knowledge discovery and creativity, is powerful for learning to compress in a semantically-meaningful way. The lattice structure further implies the optimality of group codes and the successive refinement property for progressive transmission.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Federated Learning via Lattice Joint Source-Channel Coding
Authors:
Seyed Mohammad Azimi-Abarghouyi,
Lav R. Varshney
Abstract:
This paper introduces a universal federated learning framework that enables over-the-air computation via digital communications, using a new joint source-channel coding scheme. Without relying on channel state information at devices, this scheme employs lattice codes to both quantize model parameters and exploit interference from the devices. A novel two-layer receiver structure at the server is d…
▽ More
This paper introduces a universal federated learning framework that enables over-the-air computation via digital communications, using a new joint source-channel coding scheme. Without relying on channel state information at devices, this scheme employs lattice codes to both quantize model parameters and exploit interference from the devices. A novel two-layer receiver structure at the server is designed to reliably decode an integer combination of the quantized model parameters as a lattice point for the purpose of aggregation. Numerical experiments validate the effectiveness of the proposed scheme. Even with the challenges posed by channel conditions and device heterogeneity, the proposed scheme markedly surpasses other over-the-air FL strategies.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Transformer-based Causal Language Models Perform Clustering
Authors:
Xinbo Wu,
Lav R. Varshney
Abstract:
Even though large language models (LLMs) have demonstrated remarkable capability in solving various natural language tasks, the capability of an LLM to follow human instructions is still a concern. Recent works have shown great improvements in the instruction-following capability via additional training for instruction-following tasks. However, the mechanisms responsible for effective instruction-…
▽ More
Even though large language models (LLMs) have demonstrated remarkable capability in solving various natural language tasks, the capability of an LLM to follow human instructions is still a concern. Recent works have shown great improvements in the instruction-following capability via additional training for instruction-following tasks. However, the mechanisms responsible for effective instruction-following capabilities remain inadequately understood. Here, we introduce a simplified instruction-following task and use synthetic datasets to analyze a Transformer-based causal language model. Our findings suggest that the model learns task-specific information by clustering data within its hidden space, with this clustering process evolving dynamically during learning. We also demonstrate how this phenomenon assists the model in handling unseen instances, and validate our results in a more realistic setting. Furthermore, we present inspired applications regarding pre-training and alignment.
△ Less
Submitted 3 March, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Dynamic Resource Allocation to Minimize Concave Costs of Shortfalls
Authors:
Akhil Bhimaraju,
Avhishek Chatterjee,
Lav R. Varshney
Abstract:
We study a resource allocation problem over time, where a finite (random) resource needs to be distributed among a set of users at each time instant. Shortfalls in the resource allocated result in user dissatisfaction, which we model as an increasing function of the long-term average shortfall for each user. In many scenarios such as wireless multimedia streaming, renewable energy grid, or supply…
▽ More
We study a resource allocation problem over time, where a finite (random) resource needs to be distributed among a set of users at each time instant. Shortfalls in the resource allocated result in user dissatisfaction, which we model as an increasing function of the long-term average shortfall for each user. In many scenarios such as wireless multimedia streaming, renewable energy grid, or supply chain logistics, a natural choice for this cost function turns out to be concave, rather than usual convex cost functions. We consider minimizing the (normalized) cumulative cost across users. Depending on whether users' mean consumption rates are known or unknown, this problem can be reduced to two different structured non-convex problems. The "known" case is a concave minimization problem subject to a linear constraint. By exploiting a well-chosen linearization of the cost functions, we solve this provably within $\mathcal{O}\left(\frac{1}{m}\right)$ of the optimum, in $\mathcal{O}\left(m \log{m}\right)$ time, where $m$ is the number of users in the system. In the "unknown" case, we are faced with minimizing the sum of functions that are concave on part of the domain and convex on the rest, subject to a linear constraint. We present a provably exact algorithm when the cost functions and prior distributions on mean consumption are the same across all users.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Semantically Grounded QFormer for Efficient Vision Language Understanding
Authors:
Moulik Choraria,
Xinbo Wu,
Sourya Basu,
Nitesh Sekhar,
Yue Wu,
Xu Zhang,
Prateek Singhal,
Lav R. Varshney
Abstract:
General purpose Vision Language Models (VLMs) have received tremendous interest in recent years, owing to their ability to learn rich vision-language correlations as well as their broad zero-shot competencies. One immensely popular line of work utilizes frozen unimodal models, by bridging vision representations to language using a trainable module called the QFormer. However, this method relies he…
▽ More
General purpose Vision Language Models (VLMs) have received tremendous interest in recent years, owing to their ability to learn rich vision-language correlations as well as their broad zero-shot competencies. One immensely popular line of work utilizes frozen unimodal models, by bridging vision representations to language using a trainable module called the QFormer. However, this method relies heavily on large-scale multimodal pretraining with huge computational overheads. To that end, we propose a more efficient framework for QFormer-based vision-language alignment. Our key idea relies on the observation that QFormer latents correspond more strongly to the frozen LLM's intermediate latent space. Consequently, instead of using QFormer latents as inputs to the LLM, we alter the framework by using the latents to directly condition the LLM latent space for image-to-text generation. We demonstrate the effectiveness of our approach against existing baselines in improving the efficiency of vision-language pretraining.
△ Less
Submitted 16 December, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Muslim-Violence Bias Persists in Debiased GPT Models
Authors:
Babak Hemmatian,
Razan Baltaji,
Lav R. Varshney
Abstract:
Abid et al. (2021) showed a tendency in GPT-3 to generate mostly violent completions when prompted about Muslims, compared with other religions. Two pre-registered replication attempts found few violent completions and only a weak anti-Muslim bias in the more recent InstructGPT, fine-tuned to eliminate biased and toxic outputs. However, more pre-registered experiments showed that using common name…
▽ More
Abid et al. (2021) showed a tendency in GPT-3 to generate mostly violent completions when prompted about Muslims, compared with other religions. Two pre-registered replication attempts found few violent completions and only a weak anti-Muslim bias in the more recent InstructGPT, fine-tuned to eliminate biased and toxic outputs. However, more pre-registered experiments showed that using common names associated with the religions in prompts increases several-fold the rate of violent completions, revealing a significant second-order anti-Muslim bias. ChatGPT showed a bias many times stronger regardless of prompt format, suggesting that the effects of debiasing were reduced with continued model development. Our content analysis revealed religion-specific themes containing offensive stereotypes across all experiments. Our results show the need for continual de-biasing of models in ways that address both explicit and higher-order associations.
△ Less
Submitted 9 December, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Efficient Model-Agnostic Multi-Group Equivariant Networks
Authors:
Razan Baltaji,
Sourya Basu,
Lav R. Varshney
Abstract:
Constructing model-agnostic group equivariant networks, such as equitune (Basu et al., 2023b) and its generalizations (Kim et al., 2023), can be computationally expensive for large product groups. We address this problem by providing efficient model-agnostic equivariant designs for two related problems: one where the network has multiple inputs each with potentially different groups acting on them…
▽ More
Constructing model-agnostic group equivariant networks, such as equitune (Basu et al., 2023b) and its generalizations (Kim et al., 2023), can be computationally expensive for large product groups. We address this problem by providing efficient model-agnostic equivariant designs for two related problems: one where the network has multiple inputs each with potentially different groups acting on them, and another where there is a single input but the group acting on it is a large product group. For the first design, we initially consider a linear model and characterize the entire equivariant space that satisfies this constraint. This characterization gives rise to a novel fusion layer between different channels that satisfies an invariance-symmetry (IS) constraint, which we call an IS layer. We then extend this design beyond linear models, similar to equitune, consisting of equivariant and IS layers. We also show that the IS layer is a universal approximator of invariant-symmetric functions. Inspired by the first design, we use the notion of the IS property to design a second efficient model-agnostic equivariant design for large product groups acting on a single input. For the first design, we provide experiments on multi-image classification where each view is transformed independently with transformations such as rotations. We find equivariant models are robust to such transformations and perform competitively otherwise. For the second design, we consider three applications: language compositionality on the SCAN dataset to product groups; fairness in natural language generation from GPT-2 to address intersectionality; and robust zero-shot image classification with CLIP. Overall, our methods are simple and general, competitive with equitune and its variants, while also being computationally more efficient.
△ Less
Submitted 7 October, 2024; v1 submitted 14 October, 2023;
originally announced October 2023.
-
A Meta-Learning Perspective on Transformers for Causal Language Modeling
Authors:
Xinbo Wu,
Lav R. Varshney
Abstract:
The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task, by explicating an inner optimization process within the Transformer. Further,…
▽ More
The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task, by explicating an inner optimization process within the Transformer. Further, within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models. Our analysis is supported by experiments in various settings.
△ Less
Submitted 25 March, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Dynamic Batching of Online Arrivals to Leverage Economies of Scale
Authors:
Akhil Bhimaraju,
S. Rasoul Etesami,
Lav R. Varshney
Abstract:
Many settings, such as medical testing of patients in hospitals or matching riders to drivers in ride-hailing platforms, require handling arrivals over time. In such applications, it is often beneficial to group the arriving orders, samples, or requests into batches and process the larger batches rather than individual arrivals. However, waiting too long to create larger batches incurs a waiting c…
▽ More
Many settings, such as medical testing of patients in hospitals or matching riders to drivers in ride-hailing platforms, require handling arrivals over time. In such applications, it is often beneficial to group the arriving orders, samples, or requests into batches and process the larger batches rather than individual arrivals. However, waiting too long to create larger batches incurs a waiting cost for past arrivals. On the other hand, processing the arrivals too soon leads to higher processing costs by missing the economies of scale of grouping larger numbers of arrivals into larger batches. Moreover, the timing of the next arrival is often unknown, meaning that fixed-size batches or fixed wait times tend to be suboptimal. In this work, we consider the problem of finding the optimal batching schedule to minimize the average wait time plus the average processing cost under both offline and online settings. In the offline problem in which all arrival times are known a priori, we show that the optimal batching schedule can be found in polynomial time by reducing it to a shortest path problem on a weighted acyclic graph. For the online problem with unknown arrival times, we develop online algorithms that are provably competitive for a broad range of processing-cost functions. We also provide a lower bound on the competitive ratio that no online algorithm can beat. Finally, we run extensive numerical experiments on simulated and real data to demonstrate the effectiveness of our proposed algorithms against the optimal offline benchmark.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
On Simultaneous Information and Energy Transmission through Quantum Channels
Authors:
Bishal Kumar Das,
Lav R. Varshney,
Vaibhav Madhok
Abstract:
The optimal rate at which information can be sent through a quantum channel when the transmitted signal must simultaneously carry some minimum amount of energy is characterized. To do so, we introduce the quantum-classical analogue of the capacity-power function and generalize results in classical information theory for transmitting classical information through noisy channels. We show that the ca…
▽ More
The optimal rate at which information can be sent through a quantum channel when the transmitted signal must simultaneously carry some minimum amount of energy is characterized. To do so, we introduce the quantum-classical analogue of the capacity-power function and generalize results in classical information theory for transmitting classical information through noisy channels. We show that the capacity-power function for a classical-quantum channel, for both unassisted and private protocol, is concave and also prove additivity for unentangled and uncorrelated ensembles of input signals for such channels. This implies we do not need regularized formulas for calculation. We show these properties also hold for all noiseless channels when we restrict the set of input states to be pure quantum states. For general channels, we find that the capacity-power function is piece-wise concave. We give an elegant visual proof for this supported by numerical simulations. We connect channel capacity and properties of random quantum states. In particular, we obtain analytical expressions for the capacity-power function for the case of noiseless channels using properties of random quantum states under an energy constraint and concentration phenomena in large Hilbert spaces.
△ Less
Submitted 1 January, 2025; v1 submitted 24 September, 2023;
originally announced September 2023.
-
Transformers are Universal Predictors
Authors:
Sourya Basu,
Moulik Choraria,
Lav R. Varshney
Abstract:
We find limits to the Transformer architecture for language modeling and show it has a universal prediction property in an information-theoretic sense. We further analyze performance in non-asymptotic data regimes to understand the role of various components of the Transformer architecture, especially in the context of data-efficient training. We validate our theoretical analysis with experiments…
▽ More
We find limits to the Transformer architecture for language modeling and show it has a universal prediction property in an information-theoretic sense. We further analyze performance in non-asymptotic data regimes to understand the role of various components of the Transformer architecture, especially in the context of data-efficient training. We validate our theoretical analysis with experiments on both synthetic and real datasets.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Efficient Equivariant Transfer Learning from Pretrained Models
Authors:
Sourya Basu,
Pulkit Katdare,
Prasanna Sattigeri,
Vijil Chenthamarakshan,
Katherine Driggs-Campbell,
Payel Das,
Lav R. Varshney
Abstract:
Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of Basu et al. (2023) and Kaba et al. (2022) propose group averaging (equitune) and optimization-based methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While Kaba et…
▽ More
Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of Basu et al. (2023) and Kaba et al. (2022) propose group averaging (equitune) and optimization-based methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While Kaba et al. (2022) are only concerned with training from scratch, we find that equitune performs poorly on equivariant zero-shot tasks despite good finetuning results. We hypothesize that this is because pretrained models provide better quality features for certain transformations than others and simply averaging them is deleterious. Hence, we propose λ-equitune that averages the features using importance weights, λs. These weights are learned directly from the data using a small neural network, leading to excellent zero-shot and finetuned results that outperform equitune. Further, we prove that λ-equitune is equivariant and a universal approximator of equivariant functions. Additionally, we show that the method of Kaba et al. (2022) used with appropriate loss functions, which we call equizero, also gives excellent zero-shot and finetuned performance. Both equitune and equizero are special cases of λ-equitune. To show the simplicity and generality of our method, we validate on a wide range of diverse applications and models such as 1) image classification using CLIP, 2) deep Q-learning, 3) fairness in natural language generation (NLG), 4) compositional generalization in languages, and 5) image classification using pretrained CNNs such as Resnet and Alexnet.
△ Less
Submitted 10 October, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Designing Discontinuities
Authors:
Ibtihal Ferwana,
Suyoung Park,
Ting-Yi Wu,
Lav R. Varshney
Abstract:
Discontinuities can be fairly arbitrary but also cause a significant impact on outcomes in larger systems. Indeed, their arbitrariness is why they have been used to infer causal relationships among variables in numerous settings. Regression discontinuity from econometrics assumes the existence of a discontinuous variable that splits the population into distinct partitions to estimate the causal ef…
▽ More
Discontinuities can be fairly arbitrary but also cause a significant impact on outcomes in larger systems. Indeed, their arbitrariness is why they have been used to infer causal relationships among variables in numerous settings. Regression discontinuity from econometrics assumes the existence of a discontinuous variable that splits the population into distinct partitions to estimate the causal effects of a given phenomenon. Here we consider the design of partitions for a given discontinuous variable to optimize a certain effect previously studied using regression discontinuity. To do so, we propose a quantization-theoretic approach to optimize the effect of interest, first learning the causal effect size of a given discontinuous variable and then applying dynamic programming for optimal quantization design of discontinuities to balance the gain and loss in that effect size. We also develop a computationally-efficient reinforcement learning algorithm for the dynamic programming formulation of optimal quantization. We demonstrate our approach by designing optimal time zone borders for counterfactuals of social capital, social mobility, and health. This is based on regression discontinuity analyses we perform on novel data, which may be of independent empirical interest.
△ Less
Submitted 27 December, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Network Analysis as a Tool for Shaping Conservation and Development Policy: A Case Study of Timber Market Optimization in India
Authors:
Xiou Ge,
Sarah E. Brown,
Pushpendra Rana,
Lav R. Varshney,
Daniel C. Miller
Abstract:
The incorporation of trees on farms can help to improve livelihoods and build resilience among small-holder farmers in developing countries. On-farm trees can help gen- erate additional income from commercial tree harvest as well as contribute significant environmental benefits and ecosystem services to increase resiliency. Long-term benefits from tree-based livelihoods, however, depend on sustain…
▽ More
The incorporation of trees on farms can help to improve livelihoods and build resilience among small-holder farmers in developing countries. On-farm trees can help gen- erate additional income from commercial tree harvest as well as contribute significant environmental benefits and ecosystem services to increase resiliency. Long-term benefits from tree-based livelihoods, however, depend on sustainable timber harvesting. In this paper, we discuss the potential for network analysis as a tool to inform conservation and development decision-making. Specifically, we formulate the commercial tree market between farmers and traders as a transportation problem and optimize the transactions. We create a model of the commercial tree market in the Bilaspur district of Himachal Pradesh, India based on a detailed dataset of market interactions between farmers and timber traders, using the existing road network of this region. Using this model, we perform a maximum-flow-minimum-cost optimization for tree commodity flow. We compare the results of our optimized model with actual flow within the network, and we find a high potential to increase efficiency of market transactions within this region, noting a significant reduction to the minimum- cost flow value for our optimized model compared to the flow cost for actual transactions. We propose that using this network flow optimization model to strategically distribute permits can reduce costs associated with market transactions. Our results suggest that this direct policy action would be beneficial to the region. Finally, we suggest that cost savings could be used to establish tree planting programs to support a long-term sustainable tree market. Shaping policies to address these market inefficiencies in developing regions could help support and elevate tree-based livelihoods for farmers, traders, and industries.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Learning Optimal Features via Partial Invariance
Authors:
Moulik Choraria,
Ibtihal Ferwana,
Ankur Mani,
Lav R. Varshney
Abstract:
Learning models that are robust to distribution shifts is a key concern in the context of their real-life applicability. Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments. The success of IRM requires an important assumption: the underlying causal mechanisms/features remain invariant across environments. When not satisfied, we show…
▽ More
Learning models that are robust to distribution shifts is a key concern in the context of their real-life applicability. Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments. The success of IRM requires an important assumption: the underlying causal mechanisms/features remain invariant across environments. When not satisfied, we show that IRM can over-constrain the predictor and to remedy this, we propose a relaxation via $\textit{partial invariance}$. In this work, we theoretically highlight the sub-optimality of IRM and then demonstrate how learning from a partition of training domains can help improve invariant models. Several experiments, conducted both in linear settings as well as with deep neural networks on tasks over both language and image data, allow us to verify our conclusions.
△ Less
Submitted 3 April, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Limits of Fault-Tolerance on Resource-Constrained Quantum Circuits for Classical Problems
Authors:
Uthirakalyani. G,
Anuj K. Nayak,
Avhishek Chatterjee,
Lav R. Varshney
Abstract:
Existing lower bounds on redundancy in fault-tolerant quantum circuits are applicable when both the input and the intended output are quantum states. These bounds may not necessarily hold, however, when the input and the intended output are classical bits, as in the Deutsch-Jozsa, Grover, or Shor algorithms. Here we show that indeed, noise thresholds obtained from existing bounds do not apply to a…
▽ More
Existing lower bounds on redundancy in fault-tolerant quantum circuits are applicable when both the input and the intended output are quantum states. These bounds may not necessarily hold, however, when the input and the intended output are classical bits, as in the Deutsch-Jozsa, Grover, or Shor algorithms. Here we show that indeed, noise thresholds obtained from existing bounds do not apply to a simple fault-tolerant implementation of the Deutsch-Jozsa algorithm. Then we obtain the first lower bound on the minimum required redundancy for fault-tolerant quantum circuits with classical inputs and outputs. Recent results show that due to physical resource constraints in quantum circuits, increasing redundancy can increase noise, which in turn may render many fault-tolerance schemes useless. So it is of both practical and theoretical interest to characterize the effect of resource constraints on the fundamental limits of fault-tolerant quantum circuits. Thus as an application of our lower bound, we characterize the fundamental limit of fault-tolerant quantum circuits with classical inputs and outputs under resource constraint-induced noise models.
△ Less
Submitted 26 October, 2023; v1 submitted 5 January, 2023;
originally announced January 2023.
-
Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models
Authors:
Sourya Basu,
Prasanna Sattigeri,
Karthikeyan Natesan Ramamurthy,
Vijil Chenthamarakshan,
Kush R. Varshney,
Lav R. Varshney,
Payel Das
Abstract:
We introduce equi-tuning, a novel fine-tuning method that transforms (potentially non-equivariant) pretrained models into group equivariant models while incurring minimum $L_2$ loss between the feature representations of the pretrained and the equivariant models. Large pretrained models can be equi-tuned for different groups to satisfy the needs of various downstream tasks. Equi-tuned models benef…
▽ More
We introduce equi-tuning, a novel fine-tuning method that transforms (potentially non-equivariant) pretrained models into group equivariant models while incurring minimum $L_2$ loss between the feature representations of the pretrained and the equivariant models. Large pretrained models can be equi-tuned for different groups to satisfy the needs of various downstream tasks. Equi-tuned models benefit from both group equivariance as an inductive bias and semantic priors from pretrained models. We provide applications of equi-tuning on three different tasks: image classification, compositional generalization in language, and fairness in natural language generation (NLG). We also provide a novel group-theoretic definition for fairness in NLG. The effectiveness of this definition is shown by testing it against a standard empirical method of fairness in NLG. We provide experimental results for equi-tuning using a variety of pretrained models: Alexnet, Resnet, VGG, and Densenet for image classification; RNNs, GRUs, and LSTMs for compositional generalization; and GPT2 for fairness in NLG. We test these models on benchmark datasets across all considered tasks to show the generality and effectiveness of the proposed method.
△ Less
Submitted 4 February, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Optimal Recovery for Causal Inference
Authors:
Ibtihal Ferwana,
Lav R. Varshney
Abstract:
Problems in causal inference can be fruitfully addressed using signal processing techniques. As an example, it is crucial to successfully quantify the causal effects of an intervention to determine whether the intervention achieved desired outcomes. We present a new geometric signal processing approach to classical synthetic control called ellipsoidal optimal recovery (EOpR), for estimating the un…
▽ More
Problems in causal inference can be fruitfully addressed using signal processing techniques. As an example, it is crucial to successfully quantify the causal effects of an intervention to determine whether the intervention achieved desired outcomes. We present a new geometric signal processing approach to classical synthetic control called ellipsoidal optimal recovery (EOpR), for estimating the unobservable outcome of a treatment unit. EOpR provides policy evaluators with both worst-case and typical outcomes to help in decision making. It is an approximation-theoretic technique that relates to the theory of principal components, which recovers unknown observations given a learned signal class and a set of known observations. We show EOpR can improve pre-treatment fit and mitigate bias of the post-treatment estimate relative to other methods in causal inference. Beyond recovery of the unit of interest, an advantage of EOpR is that it produces worst-case limits over the estimates produced. We assess our approach on artificially-generated data, on datasets commonly used in the econometrics literature, and in the context of the COVID-19 pandemic, showing better performance than baseline techniques
△ Less
Submitted 19 December, 2023; v1 submitted 13 August, 2022;
originally announced August 2022.
-
Debiased Large Language Models Still Associate Muslims with Uniquely Violent Acts
Authors:
Babak Hemmatian,
Lav R. Varshney
Abstract:
Recent work demonstrates a bias in the GPT-3 model towards generating violent text completions when prompted about Muslims, compared with Christians and Hindus. Two pre-registered replication attempts, one exact and one approximate, found only the weakest bias in the more recent Instruct Series version of GPT-3, fine-tuned to eliminate biased and toxic outputs. Few violent completions were observe…
▽ More
Recent work demonstrates a bias in the GPT-3 model towards generating violent text completions when prompted about Muslims, compared with Christians and Hindus. Two pre-registered replication attempts, one exact and one approximate, found only the weakest bias in the more recent Instruct Series version of GPT-3, fine-tuned to eliminate biased and toxic outputs. Few violent completions were observed. Additional pre-registered experiments, however, showed that using common names associated with the religions in prompts yields a highly significant increase in violent completions, also revealing a stronger second-order bias against Muslims. Names of Muslim celebrities from non-violent domains resulted in relatively fewer violent completions, suggesting that access to individualized information can steer the model away from using stereotypes. Nonetheless, content analysis revealed religion-specific violent themes containing highly offensive ideas regardless of prompt format. Our results show the need for additional debiasing of large language models to address higher-order schemas and associations.
△ Less
Submitted 10 August, 2022; v1 submitted 8 August, 2022;
originally announced August 2022.
-
Accelerated Design and Deployment of Low-Carbon Concrete for Data Centers
Authors:
Xiou Ge,
Richard T. Goodwin,
Haizi Yu,
Pablo Romero,
Omar Abdelrahman,
Amruta Sudhalkar,
Julius Kusuma,
Ryan Cialdella,
Nishant Garg,
Lav R. Varshney
Abstract:
Concrete is the most widely used engineered material in the world with more than 10 billion tons produced annually. Unfortunately, with that scale comes a significant burden in terms of energy, water, and release of greenhouse gases and other pollutants; indeed 8% of worldwide carbon emissions are attributed to the production of cement, a key ingredient in concrete. As such, there is interest in c…
▽ More
Concrete is the most widely used engineered material in the world with more than 10 billion tons produced annually. Unfortunately, with that scale comes a significant burden in terms of energy, water, and release of greenhouse gases and other pollutants; indeed 8% of worldwide carbon emissions are attributed to the production of cement, a key ingredient in concrete. As such, there is interest in creating concrete formulas that minimize this environmental burden, while satisfying engineering performance requirements including compressive strength. Specifically for computing, concrete is a major ingredient in the construction of data centers.
In this work, we use conditional variational autoencoders (CVAEs), a type of semi-supervised generative artificial intelligence (AI) model, to discover concrete formulas with desired properties. Our model is trained just using a small open dataset from the UCI Machine Learning Repository joined with environmental impact data from standard lifecycle analysis. Computational predictions demonstrate CVAEs can design concrete formulas with much lower carbon requirements than existing formulations while meeting design requirements. Next we report laboratory-based compressive strength experiments for five AI-generated formulations, which demonstrate that the formulations exceed design requirements. The resulting formulations were then used by Ozinga Ready Mix -- a concrete supplier -- to generate field-ready concrete formulations, based on local conditions and their expertise in concrete design. Finally, we report on how these formulations were used in the construction of buildings and structures in a Meta data center in DeKalb, IL, USA. Results from field experiments as part of this real-world deployment corroborate the efficacy of AI-generated low-carbon concrete mixes.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Hypergraph-based Source Codes for Function Computation Under Maximal Distortion
Authors:
Sourya Basu,
Daewon Seo,
Lav R. Varshney
Abstract:
This work investigates functional source coding problems with maximal distortion, motivated by approximate function computation in many modern applications. The maximal distortion treats imprecise reconstruction of a function value as good as perfect computation if it deviates less than a tolerance level, while treating reconstruction that differs by more than that level as a failure. Using a geom…
▽ More
This work investigates functional source coding problems with maximal distortion, motivated by approximate function computation in many modern applications. The maximal distortion treats imprecise reconstruction of a function value as good as perfect computation if it deviates less than a tolerance level, while treating reconstruction that differs by more than that level as a failure. Using a geometric understanding of the maximal distortion, we propose a hypergraph-based source coding scheme for function computation that is constructive in the sense that it gives an explicit procedure for finding optimal or good auxiliary random variables. Moreover, we find that the hypergraph-based coding scheme achieves the optimal rate-distortion function in the setting of coding for computing with side information and achieves the Berger-Tung sum-rate inner bound in the setting of distributed source coding for computing. It also achieves the El Gamal-Cover inner bound for multiple description coding for computing and is optimal for successive refinement and cascade multiple description problems for computing. Lastly, the benefit of complexity reduction of finding a forward test channel is shown for a class of Markov sources.
△ Less
Submitted 28 December, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Advanced Methods for Connectome-Based Predictive Modeling of Human Intelligence: A Novel Approach Based on Individual Differences in Cortical Topography
Authors:
Evan D. Anderson,
Ramsey Wilcox,
Anuj Nayak,
Christopher Zwilling,
Pablo Robles-Granda,
Been Kim,
Lav R. Varshney,
Aron K. Barbey
Abstract:
Individual differences in human intelligence can be modeled and predicted from in vivo neurobiological connectivity. Many established modeling frameworks for predicting intelligence, however, discard higher-order information about individual differences in brain network topology, and show only moderate performance when generalized to make predictions in out-of-sample subjects. In this paper, we pr…
▽ More
Individual differences in human intelligence can be modeled and predicted from in vivo neurobiological connectivity. Many established modeling frameworks for predicting intelligence, however, discard higher-order information about individual differences in brain network topology, and show only moderate performance when generalized to make predictions in out-of-sample subjects. In this paper, we propose that connectome-based predictive modeling, a common predictive modeling framework for neuroscience data, can be productively modified to incorporate information about brain network topology and individual differences via the incorporation of bagged decision trees and the network based statistic. These modifications produce a novel predictive modeling framework that leverages individual differences in cortical tractography to generate accurate regression predictions of intelligence scores. Network topology-based feature selection provides for natively interpretable networks as input features, increasing the model's explainability. Investigating the proposed modeling framework's efficacy, we find that advanced connectome-based predictive modeling generates neuroscience predictions that account for a significantly greater proportion of variance in general intelligence scores than previously established methods, advancing our scientific understanding of the network architecture that underlies human intelligence.
△ Less
Submitted 3 March, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Learning from One and Only One Shot
Authors:
Haizi Yu,
Igor Mineyev,
Lav R. Varshney,
James A. Evans
Abstract:
Humans can generalize from only a few examples and from little pretraining on similar tasks. Yet, machine learning (ML) typically requires large data to learn or pre-learn to transfer. Motivated by nativism and artificial general intelligence, we directly model human-innate priors in abstract visual tasks such as character and doodle recognition. This yields a white-box model that learns general-a…
▽ More
Humans can generalize from only a few examples and from little pretraining on similar tasks. Yet, machine learning (ML) typically requires large data to learn or pre-learn to transfer. Motivated by nativism and artificial general intelligence, we directly model human-innate priors in abstract visual tasks such as character and doodle recognition. This yields a white-box model that learns general-appearance similarity by mimicking how humans naturally ``distort'' an object at first sight. Using just nearest-neighbor classification on this cognitively-inspired similarity space, we achieve human-level recognition with only $1$--$10$ examples per class and no pretraining. This differs from few-shot learning that uses massive pretraining. In the tiny-data regime of MNIST, EMNIST, Omniglot, and QuickDraw benchmarks, we outperform both modern neural networks and classical ML. For unsupervised learning, by learning the non-Euclidean, general-appearance similarity space in a $k$-means style, we achieve multifarious visual realizations of abstract concepts by generating human-intuitive archetypes as cluster centroids.
△ Less
Submitted 21 May, 2024; v1 submitted 14 January, 2022;
originally announced January 2022.
-
Balancing Fairness and Robustness via Partial Invariance
Authors:
Moulik Choraria,
Ibtihal Ferwana,
Ankur Mani,
Lav R. Varshney
Abstract:
The Invariant Risk Minimization (IRM) framework aims to learn invariant features from a set of environments for solving the out-of-distribution (OOD) generalization problem. The underlying assumption is that the causal components of the data generating distributions remain constant across the environments or alternately, the data "overlaps" across environments to find meaningful invariant features…
▽ More
The Invariant Risk Minimization (IRM) framework aims to learn invariant features from a set of environments for solving the out-of-distribution (OOD) generalization problem. The underlying assumption is that the causal components of the data generating distributions remain constant across the environments or alternately, the data "overlaps" across environments to find meaningful invariant features. Consequently, when the "overlap" assumption does not hold, the set of truly invariant features may not be sufficient for optimal prediction performance. Such cases arise naturally in networked settings and hierarchical data-generating models, wherein the IRM performance becomes suboptimal. To mitigate this failure case, we argue for a partial invariance framework. The key idea is to introduce flexibility into the IRM framework by partitioning the environments based on hierarchical differences, while enforcing invariance locally within the partitions. We motivate this framework in classification settings with causal distribution shifts across environments. Our results show the capability of the partial invariant risk minimization to alleviate the trade-off between fairness and risk in certain settings.
△ Less
Submitted 24 December, 2021; v1 submitted 17 December, 2021;
originally announced December 2021.
-
Optimizing the Energy Efficiency of Unreliable Memories for Quantized Kalman Filtering
Authors:
Jonathan Kern,
Elsa Dupraz,
Abdeldjalil Aïssa-El-Bey,
Lav R. Varshney,
François Leduc-Primeau
Abstract:
This paper presents a quantized Kalman filter implemented using unreliable memories. We consider that both the quantization and the unreliable memories introduce errors in the computations, and develop an error propagation model that takes into account these two sources of errors. In addition to providing updated Kalman filter equations, the proposed error model accurately predicts the covariance…
▽ More
This paper presents a quantized Kalman filter implemented using unreliable memories. We consider that both the quantization and the unreliable memories introduce errors in the computations, and develop an error propagation model that takes into account these two sources of errors. In addition to providing updated Kalman filter equations, the proposed error model accurately predicts the covariance of the estimation error and gives a relation between the performance of the filter and its energy consumption, depending on the noise level in the memories. Then, since memories are responsible for a large part of the energy consumption of embedded systems, optimization methods are introduced so as to minimize the memory energy consumption under a desired estimation performance of the filter. The first method computes the optimal energy levels allocated to each memory bank individually, and the second one optimizes the energy allocation per groups of memory banks. Simulations show a close match between the theoretical analysis and experimental results. Furthermore, they demonstrate an important reduction in energy consumption of more than 50%.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
Limits of Detecting Extraterrestrial Civilizations
Authors:
Ian George,
Xinan Chen,
Lav R. Varshney
Abstract:
The search for extraterrestrial intelligence (SETI) is a scientific endeavor which struggles with unique issues -- a strong indeterminacy in what data to look for and when to do so. This has led to attempts at finding both fundamental limits of the communication between extraterrestrial intelligence and human civilizations, as well as benchmarks so as to predict what kinds of signals we might most…
▽ More
The search for extraterrestrial intelligence (SETI) is a scientific endeavor which struggles with unique issues -- a strong indeterminacy in what data to look for and when to do so. This has led to attempts at finding both fundamental limits of the communication between extraterrestrial intelligence and human civilizations, as well as benchmarks so as to predict what kinds of signals we might most expect. Previous work has been formulated in terms of the information-theoretic task of communication, but we instead argue it should be viewed as a detection problem, specifically one-shot (asymmetric) hypothesis testing. With this new interpretation, we develop fundamental limits as well as provide simple examples of how to use this framework to analyze and benchmark different possible signals from extraterrestrial civilizations. We show that electromagnetic signaling for detection requires much less power than for communication, that detection as a function of power can be non-linear, and that much of the analysis in this framework may be addressed using computationally efficient optimization problems, thereby demonstrating tools for further inquiry.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
Evaluating State-of-the-Art Classification Models Against Bayes Optimality
Authors:
Ryan Theisen,
Huan Wang,
Lav R. Varshney,
Caiming Xiong,
Richard Socher
Abstract:
Evaluating the inherent difficulty of a given data-driven classification problem is important for establishing absolute benchmarks and evaluating progress in the field. To this end, a natural quantity to consider is the \emph{Bayes error}, which measures the optimal classification error theoretically achievable for a given data distribution. While generally an intractable quantity, we show that we…
▽ More
Evaluating the inherent difficulty of a given data-driven classification problem is important for establishing absolute benchmarks and evaluating progress in the field. To this end, a natural quantity to consider is the \emph{Bayes error}, which measures the optimal classification error theoretically achievable for a given data distribution. While generally an intractable quantity, we show that we can compute the exact Bayes error of generative models learned using normalizing flows. Our technique relies on a fundamental result, which states that the Bayes error is invariant under invertible transformation. Therefore, we can compute the exact Bayes error of the learned flow models by computing it for Gaussian base distributions, which can be done efficiently using Holmes-Diaconis-Ross integration. Moreover, we show that by varying the temperature of the learned flow models, we can generate synthetic datasets that closely resemble standard benchmark datasets, but with almost any desired Bayes error. We use our approach to conduct a thorough investigation of state-of-the-art classification models, and find that in some -- but not all -- cases, these models are capable of obtaining accuracy very near optimal. Finally, we use our method to evaluate the intrinsic "hardness" of standard benchmark datasets, and classes within those datasets.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Autoequivariant Network Search via Group Decomposition
Authors:
Sourya Basu,
Akshayaa Magesh,
Harshit Yadav,
Lav R. Varshney
Abstract:
Recent works show that group equivariance as an inductive bias improves neural network performance for both classification and generation. However, designing group-equivariant neural networks is challenging when the group of interest is large and is unknown. Moreover, inducing equivariance can significantly reduce the number of independent parameters in a network with fixed feature size, affecting…
▽ More
Recent works show that group equivariance as an inductive bias improves neural network performance for both classification and generation. However, designing group-equivariant neural networks is challenging when the group of interest is large and is unknown. Moreover, inducing equivariance can significantly reduce the number of independent parameters in a network with fixed feature size, affecting its overall performance. We address these problems by proving a new group-theoretic result in the context of equivariant neural networks that shows that a network is equivariant to a large group if and only if it is equivariant to smaller groups from which it is constructed. Using this result, we design a novel fast group equivariant construction algorithm, and a deep Q-learning-based search algorithm in a reduced search space, yielding what we call autoequivariant networks (AENs). AENs find the right balance between equivariance and network size when tested on new benchmark datasets, G-MNIST and G-Fashion-MNIST, obtained via group transformations on MNIST and Fashion-MNIST respectively that we release. Extending these results to group convolutional neural networks, where we optimize between equivariances, augmentations, and network sizes, we find group equivariance to be the most dominating factor in all high-performing GCNNs on several datasets like CIFAR10, SVHN, RotMNIST, ASL, EMNIST, and KMNIST.
△ Less
Submitted 8 June, 2021; v1 submitted 10 April, 2021;
originally announced April 2021.
-
Wireless Network Coding with Intelligent Reflecting Surfaces
Authors:
Amanat Kafizov,
Ahmed Elzanaty,
Lav R. Varshney,
Mohamed-Slim Alouini
Abstract:
Conventional wireless techniques are becoming inadequate for beyond fifth-generation (5G) networks due to latency and bandwidth considerations. To improve the error performance and throughput of wireless communication systems, we propose physical layer network coding (PNC) in an intelligent reflecting surface (IRS)-assisted environment. We consider an IRS-aided butterfly network, where we propose…
▽ More
Conventional wireless techniques are becoming inadequate for beyond fifth-generation (5G) networks due to latency and bandwidth considerations. To improve the error performance and throughput of wireless communication systems, we propose physical layer network coding (PNC) in an intelligent reflecting surface (IRS)-assisted environment. We consider an IRS-aided butterfly network, where we propose an algorithm for obtaining the optimal IRS phases. Also, analytic expressions for the bit error rate (BER) are derived. The numerical results demonstrate that the proposed scheme significantly improves the BER performance. For instance, the BER at the relay in the presence of a 32-element IRS is three orders of magnitudes less than that without an IRS.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Expected Extinction Times of Epidemics with State-Dependent Infectiousness
Authors:
Akhil Bhimaraju,
Avhishek Chatterjee,
Lav R. Varshney
Abstract:
We model an epidemic where the per-person infectiousness in a network of geographic localities changes with the total number of active cases. This would happen as people adopt more stringent non-pharmaceutical precautions when the population has a larger number of active cases. We show that there exists a sharp threshold such that when the curing rate for the infection is above this threshold, the…
▽ More
We model an epidemic where the per-person infectiousness in a network of geographic localities changes with the total number of active cases. This would happen as people adopt more stringent non-pharmaceutical precautions when the population has a larger number of active cases. We show that there exists a sharp threshold such that when the curing rate for the infection is above this threshold, the mean time for the epidemic to die out is logarithmic in the initial infection size, whereas when the curing rate is below this threshold, the mean time for epidemic extinction is infinite. We also show that when the per-person infectiousness goes to zero asymptotically as a function of the number of active cases, the mean extinction times all have the same asymptote independent of network structure. Simulations bear out these results, while also demonstrating that if the per-person infectiousness is large when the epidemic size is small (i.e., the precautions are lax when the epidemic is small and only get stringent after the epidemic has become large), it might take a very long time for the epidemic to die out. We also provide some analytical insight into these observations.
△ Less
Submitted 5 December, 2021; v1 submitted 21 March, 2021;
originally announced March 2021.
-
Wireless Power Transfer for Future Networks: Signal Processing, Machine Learning, Computing, and Sensing
Authors:
Bruno Clerckx,
Kaibin Huang,
Lav R. Varshney,
Sennur Ulukus,
Mohamed-Slim Alouini
Abstract:
Wireless power transfer (WPT) is an emerging paradigm that will enable using wireless to its full potential in future networks, not only to convey information but also to deliver energy. Such networks will enable trillions of future low-power devices to sense, compute, connect, and energize anywhere, anytime, and on the move. The design of such future networks brings new challenges and opportuniti…
▽ More
Wireless power transfer (WPT) is an emerging paradigm that will enable using wireless to its full potential in future networks, not only to convey information but also to deliver energy. Such networks will enable trillions of future low-power devices to sense, compute, connect, and energize anywhere, anytime, and on the move. The design of such future networks brings new challenges and opportunities for signal processing, machine learning, sensing, and computing so as to make the best use of the RF radiations, spectrum, and network infrastructure in providing cost-effective and real-time power supplies to wireless devices and enable wireless-powered applications. In this paper, we first review recent signal processing techniques to make WPT and wireless information and power transfer as efficient as possible. Topics include power amplifier and energy harvester nonlinearities, active and passive beamforming, intelligent reflecting surfaces, receive combining with multi-antenna harvester, modulation, coding, waveform, massive MIMO, channel acquisition, transmit diversity, multi-user power region characterization, coordinated multipoint, and distributed antenna systems. Then, we overview two different design methodologies: the model and optimize approach relying on analytical system models, modern convex optimization, and communication theory, and the learning approach based on data-driven end-to-end learning and physics-based learning. We discuss the pros and cons of each approach, especially when accounting for various nonlinearities in wireless-powered networks, and identify interesting emerging opportunities for the approaches to complement each other. Finally, we identify new emerging wireless technologies where WPT may play a key role -- wireless-powered mobile edge computing and wireless-powered sensing -- arguing WPT, communication, computation, and sensing must be jointly designed.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Adversarial Linear Contextual Bandits with Graph-Structured Side Observations
Authors:
Lingda Wang,
Bingcong Li,
Huozhi Zhou,
Georgios B. Giannakis,
Lav R. Varshney,
Zhizhen Zhao
Abstract:
This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: \emph{contexts} and \emph{side observations}. In this setting, a learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent not only incurs and observes…
▽ More
This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: \emph{contexts} and \emph{side observations}. In this setting, a learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in social networks, where both contexts and graph-structured side observations are available. Two efficient algorithms are developed based on \texttt{EXP3}. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, \texttt{EXP3-LGC-U}, achieves the regret of order $\mathcal{O}(\sqrt{(K+α(G)d)T\log{K}})$ over the time horizon $T$, where $α(G)$ is the average \emph{independence number} of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, \texttt{EXP3-LGC-IX}, is developed for a special class of problems, for which the regret is reduced to $\mathcal{O}(\sqrt{α(G)dT\log{K}\log(KT)})$ for both directed as well as undirected feedback graphs. Numerical tests corroborate the efficiency of proposed algorithms.
△ Less
Submitted 16 February, 2021; v1 submitted 10 December, 2020;
originally announced December 2020.