-
ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs
Authors:
Landon Butler,
Abhineet Agarwal,
Justin Singh Kang,
Yigit Efe Erginbas,
Bin Yu,
Kannan Ramchandran
Abstract:
Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features. To identify these interactions, most existing approaches require enumerating all possible combinations of features up to a given order, causing them to scale poorly with the number of inputs $n$. Recently, Kang et al. (2025) proposed SPEX, an information-theoretic approach th…
▽ More
Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features. To identify these interactions, most existing approaches require enumerating all possible combinations of features up to a given order, causing them to scale poorly with the number of inputs $n$. Recently, Kang et al. (2025) proposed SPEX, an information-theoretic approach that uses interaction sparsity to scale to $n \approx 10^3$ features. SPEX greatly improves upon prior methods but requires tens of thousands of model inferences, which can be prohibitive for large models. In this paper, we observe that LLM feature interactions are often hierarchical -- higher-order interactions are accompanied by their lower-order subsets -- which enables more efficient discovery. To exploit this hierarchy, we propose ProxySPEX, an interaction attribution algorithm that first fits gradient boosted trees to masked LLM outputs and then extracts the important interactions. Experiments across four challenging high-dimensional datasets show that ProxySPEX more faithfully reconstructs LLM outputs by 20% over marginal attribution approaches while using $10\times$ fewer inferences than SPEX. By accounting for interactions, ProxySPEX identifies features that influence model output over 20% more than those selected by marginal approaches. Further, we apply ProxySPEX to two interpretability tasks. Data attribution, where we identify interactions among CIFAR-10 training samples that influence test predictions, and mechanistic interpretability, where we uncover interactions between attention heads, both within and across layers, on a question-answering task. ProxySPEX identifies interactions that enable more aggressive pruning of heads than marginal approaches.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
$\texttt{DIAMONDs}$: A Dataset for $\mathbb{D}$ynamic $\mathbb{I}$nformation $\mathbb{A}$nd $\mathbb{M}$ental modeling $\mathbb{O}$f $\mathbb{N}$umeric $\mathbb{D}$iscussions
Authors:
Sayontan Ghosh,
Mahnaz Koupaee,
Yash Kumar Lal,
Pegah Alipoormolabashi,
Mohammad Saqib Hasan,
Jun Seok Kang,
Niranjan Balasubramanian
Abstract:
Understanding multiparty conversations demands robust Theory of Mind (ToM) capabilities, including the ability to track dynamic information, manage knowledge asymmetries, and distinguish relevant information across extended exchanges. To advance ToM evaluation in such settings, we present a carefully designed scalable methodology for generating high-quality benchmark conversation-question pairs wi…
▽ More
Understanding multiparty conversations demands robust Theory of Mind (ToM) capabilities, including the ability to track dynamic information, manage knowledge asymmetries, and distinguish relevant information across extended exchanges. To advance ToM evaluation in such settings, we present a carefully designed scalable methodology for generating high-quality benchmark conversation-question pairs with these characteristics. Using this methodology, we create $\texttt{DIAMONDs}$, a new conversational QA dataset covering common business, financial or other group interactions. In these goal-oriented conversations, participants often have to track certain numerical quantities (say $\textit{expected profit}$) of interest that can be derived from other variable quantities (like $\textit{marketing expenses, expected sales, salary}$, etc.), whose values also change over the course of the conversation. $\texttt{DIAMONDs}$ questions pose simple numerical reasoning problems over such quantities of interest (e.g., $\textit{funds required for charity events, expected company profit next quarter}$, etc.) in the context of the information exchanged in conversations. This allows for precisely evaluating ToM capabilities for carefully tracking and reasoning over participants' knowledge states.
Our evaluation of state-of-the-art language models reveals significant challenges in handling participant-centric reasoning, specifically in situations where participants have false beliefs. Models also struggle with conversations containing distractors and show limited ability to identify scenarios with insufficient information. These findings highlight current models' ToM limitations in handling real-world multi-party conversations.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
SPEX: Scaling Feature Interaction Explanations for LLMs
Authors:
Justin Singh Kang,
Landon Butler,
Abhineet Agarwal,
Yigit Efe Erginbas,
Ramtin Pedarsani,
Kannan Ramchandran,
Bin Yu
Abstract:
Large language models (LLMs) have revolutionized machine learning due to their ability to capture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide marginal feature attributions, while their extensions to interaction importances only scale to small input lengths ($\approx 20$). We propose Spectral Explainer (SPEX), a model-agnostic interaction attr…
▽ More
Large language models (LLMs) have revolutionized machine learning due to their ability to capture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide marginal feature attributions, while their extensions to interaction importances only scale to small input lengths ($\approx 20$). We propose Spectral Explainer (SPEX), a model-agnostic interaction attribution algorithm that efficiently scales to large input lengths ($\approx 1000)$. SPEX exploits underlying natural sparsity among interactions -- common in real-world data -- and applies a sparse Fourier transform using a channel decoding algorithm to efficiently identify important interactions. We perform experiments across three difficult long-context datasets that require LLMs to utilize interactions between inputs to complete the task. For large inputs, SPEX outperforms marginal attribution methods by up to 20% in terms of faithfully reconstructing LLM outputs. Further, SPEX successfully identifies key features and interactions that strongly influence model output. For one of our datasets, HotpotQA, SPEX provides interactions that align with human annotations. Finally, we use our model-agnostic approach to generate explanations to demonstrate abstract reasoning in closed-source LLMs (GPT-4o mini) and compositional reasoning in vision-language models.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Authors:
Darin Tsui,
Aryan Musharaf,
Yigit Efe Erginbas,
Justin Singh Kang,
Amirali Aghazadeh
Abstract:
The growing adoption of machine learning models for biological sequences has intensified the need for interpretable predictions, with Shapley values emerging as a theoretically grounded standard for model explanation. While effective for local explanations of individual input sequences, scaling Shapley-based interpretability to extract global biological insights requires evaluating thousands of se…
▽ More
The growing adoption of machine learning models for biological sequences has intensified the need for interpretable predictions, with Shapley values emerging as a theoretically grounded standard for model explanation. While effective for local explanations of individual input sequences, scaling Shapley-based interpretability to extract global biological insights requires evaluating thousands of sequences--incurring exponential computational cost per query. We introduce SHAP zero, a novel algorithm that amortizes the cost of Shapley value computation across large-scale biological datasets. After a one-time model sketching step, SHAP zero enables near-zero marginal cost for future queries by uncovering an underexplored connection between Shapley values, high-order feature interactions, and the sparse Fourier transform of the model. Applied to models of guide RNA efficacy, DNA repair outcomes, and protein fitness, SHAP zero explains predictions orders of magnitude faster than existing methods, recovering rich combinatorial interactions previously inaccessible at scale. This work opens the door to principled, efficient, and scalable interpretability for black-box sequence models in biology.
△ Less
Submitted 22 May, 2025; v1 submitted 24 October, 2024;
originally announced October 2024.
-
Learning to Understand: Identifying Interactions via the Möbius Transform
Authors:
Justin S. Kang,
Yigit E. Erginbas,
Landon Butler,
Ramtin Pedarsani,
Kannan Ramchandran
Abstract:
One of the key challenges in machine learning is to find interpretable representations of learned functions. The Möbius transform is essential for this purpose, as its coefficients correspond to unique importance scores for sets of input variables. This transform is closely related to widely used game-theoretic notions of importance like the Shapley and Bhanzaf value, but it also captures crucial…
▽ More
One of the key challenges in machine learning is to find interpretable representations of learned functions. The Möbius transform is essential for this purpose, as its coefficients correspond to unique importance scores for sets of input variables. This transform is closely related to widely used game-theoretic notions of importance like the Shapley and Bhanzaf value, but it also captures crucial higher-order interactions. Although computing the obius Transform of a function with $n$ inputs involves $2^n$ coefficients, it becomes tractable when the function is sparse and of low-degree as we show is the case for many real-world functions. Under these conditions, the complexity of the transform computation is significantly reduced. When there are $K$ non-zero coefficients, our algorithm recovers the Möbius transform in $O(Kn)$ samples and $O(Kn^2)$ time asymptotically under certain assumptions, the first non-adaptive algorithm to do so. We also uncover a surprising connection between group testing and the Möbius transform. For functions where all interactions involve at most $t$ inputs, we use group testing results to compute the Möbius transform with $O(Kt\log n)$ sample complexity and $O(K\mathrm{poly}(n))$ time. A robust version of this algorithm withstands noise and maintains this complexity. This marks the first $n$ sub-linear query complexity, noise-tolerant algorithm for the Möbius transform. In several examples, we observe that representations generated via sparse Möbius transform are up to twice as faithful to the original function, as compared to Shaply and Banzhaf values, while using the same number of terms.
△ Less
Submitted 15 June, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Efficiently Computing Sparse Fourier Transforms of $q$-ary Functions
Authors:
Yigit Efe Erginbas,
Justin Singh Kang,
Amirali Aghazadeh,
Kannan Ramchandran
Abstract:
Fourier transformations of pseudo-Boolean functions are popular tools for analyzing functions of binary sequences. Real-world functions often have structures that manifest in a sparse Fourier transform, and previous works have shown that under the assumption of sparsity the transform can be computed efficiently. But what if we want to compute the Fourier transform of functions defined over a $q$-a…
▽ More
Fourier transformations of pseudo-Boolean functions are popular tools for analyzing functions of binary sequences. Real-world functions often have structures that manifest in a sparse Fourier transform, and previous works have shown that under the assumption of sparsity the transform can be computed efficiently. But what if we want to compute the Fourier transform of functions defined over a $q$-ary alphabet? These types of functions arise naturally in many areas including biology. A typical workaround is to encode the $q$-ary sequence in binary, however, this approach is computationally inefficient and fundamentally incompatible with the existing sparse Fourier transform techniques. Herein, we develop a sparse Fourier transform algorithm specifically for $q$-ary functions of length $n$ sequences, dubbed $q$-SFT, which provably computes an $S$-sparse transform with vanishing error as $q^n \rightarrow \infty$ in $O(Sn)$ function evaluations and $O(S n^2 \log q)$ computations, where $S = q^{nδ}$ for some $δ< 1$. Under certain assumptions, we show that for fixed $q$, a robust version of $q$-SFT has a sample complexity of $O(Sn^2)$ and a computational complexity of $O(Sn^3)$ with the same asymptotic guarantees. We present numerical simulations on synthetic and real-world RNA data, demonstrating the scalability of $q$-SFT to massively high dimensional $q$-ary functions.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
Minimum Feedback for Collision-Free Scheduling in Massive Random Access
Authors:
Justin Singh Kang,
Wei Yu
Abstract:
Consider a massive random access scenario in which a small set of $k$ active users out of a large number of $n$ potential users need to be scheduled in $b\ge k$ slots. What is the minimum common feedback to the users needed to ensure that scheduling is collision-free? Instead of a naive scheme of listing the indices of the $k$ active users in the order in which they should transmit, at a cost of…
▽ More
Consider a massive random access scenario in which a small set of $k$ active users out of a large number of $n$ potential users need to be scheduled in $b\ge k$ slots. What is the minimum common feedback to the users needed to ensure that scheduling is collision-free? Instead of a naive scheme of listing the indices of the $k$ active users in the order in which they should transmit, at a cost of $k\log(n)$ bits, this paper shows that for the case of $b=k$, the rate of the minimum fixed-length common feedback code scales only as $k \log(e)$ bits, plus an additive term that scales in $n$ as $Θ\left(\log \log(n) \right)$ for fixed $k$. If a variable-length code can be used, assuming uniform activity among the users, the minimum average common feedback rate still requires $k \log(e)$ bits, but the dependence on $n$ can be reduced to $O(1)$. When $b>k$, the number of feedback bits needed for collision-free scheduling can be significantly further reduced. Moreover, a similar scaling on the minimum feedback rate is derived for the case of scheduling $m$ users per slot, when $k \le mb$. The problem of constructing a minimum collision-free feedback scheduling code is connected to that of constructing a perfect hashing family, which allows practical feedback scheduling codes to be constructed from perfect hashing algorithms.
△ Less
Submitted 20 September, 2021; v1 submitted 30 July, 2020;
originally announced July 2020.
-
A Cross-Domain Transferable Neural Coherence Model
Authors:
Peng Xu,
Hamidreza Saghir,
Jin Sung Kang,
Teng Long,
Avishek Joey Bose,
Yanshuai Cao,
Jackie Chi Kit Cheung
Abstract:
Coherence is an important aspect of text quality and is crucial for ensuring its readability. One important limitation of existing coherence models is that training on one domain does not easily generalize to unseen categories of text. Previous work advocates for generative models for cross-domain generalization, because for discriminative models, the space of incoherent sentence orderings to disc…
▽ More
Coherence is an important aspect of text quality and is crucial for ensuring its readability. One important limitation of existing coherence models is that training on one domain does not easily generalize to unseen categories of text. Previous work advocates for generative models for cross-domain generalization, because for discriminative models, the space of incoherent sentence orderings to discriminate against during training is prohibitively large. In this work, we propose a local discriminative neural model with a much smaller negative sampling space that can efficiently learn against incorrect orderings. The proposed coherence model is simple in structure, yet it significantly outperforms previous state-of-art methods on a standard benchmark dataset on the Wall Street Journal corpus, as well as in multiple new challenging settings of transfer to unseen categories of discourse on Wikipedia articles.
△ Less
Submitted 9 July, 2019; v1 submitted 28 May, 2019;
originally announced May 2019.
-
PoMo: Generating Entity-Specific Post-Modifiers in Context
Authors:
Jun Seok Kang,
Robert L. Logan IV,
Zewei Chu,
Yang Chen,
Dheeru Dua,
Kevin Gimpel,
Sameer Singh,
Niranjan Balasubramanian
Abstract:
We introduce entity post-modifier generation as an instance of a collaborative writing task. Given a sentence about a target entity, the task is to automatically generate a post-modifier phrase that provides contextually relevant information about the entity. For example, for the sentence, "Barack Obama, _______, supported the #MeToo movement.", the phrase "a father of two girls" is a contextually…
▽ More
We introduce entity post-modifier generation as an instance of a collaborative writing task. Given a sentence about a target entity, the task is to automatically generate a post-modifier phrase that provides contextually relevant information about the entity. For example, for the sentence, "Barack Obama, _______, supported the #MeToo movement.", the phrase "a father of two girls" is a contextually relevant post-modifier. To this end, we build PoMo, a post-modifier dataset created automatically from news articles reflecting a journalistic need for incorporating entity information that is relevant to a particular news event. PoMo consists of more than 231K sentences with post-modifiers and associated facts extracted from Wikidata for around 57K unique entities. We use crowdsourcing to show that modeling contextual relevance is necessary for accurate post-modifier generation. We adapt a number of existing generation approaches as baselines for this dataset. Our results show there is large room for improvement in terms of both identifying relevant facts to include (knowing which claims are relevant gives a >20% improvement in BLEU score), and generating appropriate post-modifier text for the context (providing relevant claims is not sufficient for accurate generation). We conduct an error analysis that suggests promising directions for future research.
△ Less
Submitted 8 April, 2019; v1 submitted 5 April, 2019;
originally announced April 2019.