-
AI-Driven MRI-based Brain Tumour Segmentation Benchmarking
Authors:
Connor Ludwig,
Khashayar Namdar,
Farzad Khalvati
Abstract:
Medical image segmentation has greatly aided medical diagnosis, with U-Net based architectures and nnU-Net providing state-of-the-art performance. There have been numerous general promptable models and medical variations introduced in recent years, but there is currently a lack of evaluation and comparison of these models across a variety of prompt qualities on a common medical dataset. This resea…
▽ More
Medical image segmentation has greatly aided medical diagnosis, with U-Net based architectures and nnU-Net providing state-of-the-art performance. There have been numerous general promptable models and medical variations introduced in recent years, but there is currently a lack of evaluation and comparison of these models across a variety of prompt qualities on a common medical dataset. This research uses Segment Anything Model (SAM), Segment Anything Model 2 (SAM 2), MedSAM, SAM-Med-3D, and nnU-Net to obtain zero-shot inference on the BraTS 2023 adult glioma and pediatrics dataset across multiple prompt qualities for both points and bounding boxes. Several of these models exhibit promising Dice scores, particularly SAM and SAM 2 achieving scores of up to 0.894 and 0.893, respectively when given extremely accurate bounding box prompts which exceeds nnU-Net's segmentation performance. However, nnU-Net remains the dominant medical image segmentation network due to the impracticality of providing highly accurate prompts to the models. The model and prompt evaluation, as well as the comparison, are extended through fine-tuning SAM, SAM 2, MedSAM, and SAM-Med-3D on the pediatrics dataset. The improvements in point prompt performance after fine-tuning are substantial and show promise for future investigation, but are unable to achieve better segmentation than bounding boxes or nnU-Net.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Logical Inferentialism & Attacks on Classical Logic
Authors:
Khashayar Irani
Abstract:
This paper undertakes a foundational inquiry into logical inferentialism with particular emphasis on the normative standards it establishes and the implications these pose for classical logic. The central question addressed herein is: 'What is Logical Inferentialism & How do its Standards challenge Classical Logic?' In response, the study begins with a survey of the three principal proof systems t…
▽ More
This paper undertakes a foundational inquiry into logical inferentialism with particular emphasis on the normative standards it establishes and the implications these pose for classical logic. The central question addressed herein is: 'What is Logical Inferentialism & How do its Standards challenge Classical Logic?' In response, the study begins with a survey of the three principal proof systems that is, David Hilbert's axiomatic systems and Gerhard Gentzen's natural deduction and his sequent calculus, thus situating logical inferentialism within a broader proof-theoretic landscape. The investigation then turns to the core tenets of logical inferentialism by focusing on the role of introduction and elimination rules in determining the meaning of logical constants. Through this framework, natural deduction is evaluated as a system that satisfies key inferentialist virtues including harmony, conservativeness and the subformula property. Ultimately, the paper presents challenges to classical logic from intuitionist and revisionist perspectives by arguing that certain classical principles fail to uphold inferentialist standards, consequently undermining their legitimacy within a meaning-theoretic framework.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
LLM-based Property-based Test Generation for Guardrailing Cyber-Physical Systems
Authors:
Khashayar Etemadi,
Marjan Sirjani,
Mahshid Helali Moghadam,
Per Strandberg,
Paul Pettersson
Abstract:
Cyber-physical systems (CPSs) are complex systems that integrate physical, computational, and communication subsystems. The heterogeneous nature of these systems makes their safety assurance challenging. In this paper, we propose a novel automated approach for guardrailing cyber-physical systems using property-based tests (PBTs) generated by Large Language Models (LLMs). Our approach employs an LL…
▽ More
Cyber-physical systems (CPSs) are complex systems that integrate physical, computational, and communication subsystems. The heterogeneous nature of these systems makes their safety assurance challenging. In this paper, we propose a novel automated approach for guardrailing cyber-physical systems using property-based tests (PBTs) generated by Large Language Models (LLMs). Our approach employs an LLM to extract properties from the code and documentation of CPSs. Next, we use the LLM to generate PBTs that verify the extracted properties on the CPS. The generated PBTs have two uses. First, they are used to test the CPS before it is deployed, i.e., at design time. Secondly, these PBTs can be used after deployment, i.e., at run time, to monitor the behavior of the system and guardrail it against unsafe states. We implement our approach in ChekProp and conduct preliminary experiments to evaluate the generated PBTs in terms of their relevance (how well they match manually crafted properties), executability (how many run with minimal manual modification), and effectiveness (coverage of the input space partitions). The results of our experiments and evaluation demonstrate a promising path forward for creating guardrails for CPSs using LLM-generated property-based tests.
△ Less
Submitted 13 June, 2025; v1 submitted 29 May, 2025;
originally announced May 2025.
-
Preserving Plasticity in Continual Learning with Adaptive Linearity Injection
Authors:
Seyed Roozbeh Razavi Rohani,
Khashayar Khajavi,
Wesley Chung,
Mo Chen,
Sharan Vaswani
Abstract:
Loss of plasticity in deep neural networks is the gradual reduction in a model's capacity to incrementally learn and has been identified as a key obstacle to learning in non-stationary problem settings. Recent work has shown that deep linear networks tend to be resilient towards loss of plasticity. Motivated by this observation, we propose Adaptive Linearization (AdaLin), a general approach that d…
▽ More
Loss of plasticity in deep neural networks is the gradual reduction in a model's capacity to incrementally learn and has been identified as a key obstacle to learning in non-stationary problem settings. Recent work has shown that deep linear networks tend to be resilient towards loss of plasticity. Motivated by this observation, we propose Adaptive Linearization (AdaLin), a general approach that dynamically adapts each neuron's activation function to mitigate plasticity loss. Unlike prior methods that rely on regularization or periodic resets, AdaLin equips every neuron with a learnable parameter and a gating mechanism that injects linearity into the activation function based on its gradient flow. This adaptive modulation ensures sufficient gradient signal and sustains continual learning without introducing additional hyperparameters or requiring explicit task boundaries. When used with conventional activation functions like ReLU, Tanh, and GeLU, we demonstrate that AdaLin can significantly improve performance on standard benchmarks, including Random Label and Permuted MNIST, Random Label and Shuffled CIFAR-10, and Class-Split CIFAR-100. Furthermore, its efficacy is shown in more complex scenarios, such as class-incremental learning on CIFAR-100 with a ResNet-18 backbone, and in mitigating plasticity loss in off-policy reinforcement learning agents. We perform a systematic set of ablations that show that neuron-level adaptation is crucial for good performance and analyze a number of metrics in the network that might be correlated to loss of plasticity.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Zero Dynamics Attack Detection and Isolation in Cyber-Physical Systems with Event-triggered Communication
Authors:
Ali Eslami,
Khashayar Khorasani
Abstract:
This paper investigates the problem of Zero Dynamics (ZD) cyber-attack detection and isolation in Cyber-Physical Systems (CPS). By utilizing the notion of auxiliary systems with event-based communications, we will develop a detection mechanism capable of detecting and isolating the ZD cyber-attack even when the attackers have full knowledge of the dynamics of the auxiliary system and can launch Fa…
▽ More
This paper investigates the problem of Zero Dynamics (ZD) cyber-attack detection and isolation in Cyber-Physical Systems (CPS). By utilizing the notion of auxiliary systems with event-based communications, we will develop a detection mechanism capable of detecting and isolating the ZD cyber-attack even when the attackers have full knowledge of the dynamics of the auxiliary system and can launch False Data Injection (FDI) attacks on all the communication channels. More specifically, we will utilize a self-triggering rule for the communication channels connecting the auxiliary system with the Command & Control (C&C) center, leveraging its properties to detect the ZD cyber-attack. Finally, the effectiveness and capabilities of our approach are verified and demonstrated through simulation case studies.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
Rethinking Invariance in In-context Learning
Authors:
Lizhe Fang,
Yifei Wang,
Khashayar Gatmiry,
Lei Fang,
Yisen Wang
Abstract:
In-Context Learning (ICL) has emerged as a pivotal capability of auto-regressive large language models, yet it is hindered by a notable sensitivity to the ordering of context examples regardless of their mutual independence. To address this issue, recent studies have introduced several variant algorithms of ICL that achieve permutation invariance. However, many of these do not exhibit comparable p…
▽ More
In-Context Learning (ICL) has emerged as a pivotal capability of auto-regressive large language models, yet it is hindered by a notable sensitivity to the ordering of context examples regardless of their mutual independence. To address this issue, recent studies have introduced several variant algorithms of ICL that achieve permutation invariance. However, many of these do not exhibit comparable performance with the standard auto-regressive ICL algorithm. In this work, we identify two crucial elements in the design of an invariant ICL algorithm: information non-leakage and context interdependence, which are not simultaneously achieved by any of the existing methods. These investigations lead us to the proposed Invariant ICL (InvICL), a methodology designed to achieve invariance in ICL while ensuring the two properties. Empirically, our findings reveal that InvICL surpasses previous models, both invariant and non-invariant, in most benchmark datasets, showcasing superior generalization capabilities across varying input lengths. Code is available at https://github.com/PKU-ML/InvICL.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
On the one-dimensional SPH approximation of fractional-order operators
Authors:
Khashayar Ghorbani,
Fabio Semperlotti
Abstract:
This work presents a theoretical formalism and the corresponding numerical techniques to obtain the approximation of fractional-order operators over a 1D domain via the smoothed particle hydrodynamics (SPH) method. The method is presented for both constant- and variable-order operators, in either integral or differential forms. Several numerical examples are presented in order to validate the theo…
▽ More
This work presents a theoretical formalism and the corresponding numerical techniques to obtain the approximation of fractional-order operators over a 1D domain via the smoothed particle hydrodynamics (SPH) method. The method is presented for both constant- and variable-order operators, in either integral or differential forms. Several numerical examples are presented in order to validate the theory against analytical results and to evaluate the performance of the methodology. This formalism paves the way for the solution of fractional-order continuum mechanics models via the SPH method.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Red Teaming Large Language Models for Healthcare
Authors:
Vahid Balazadeh,
Michael Cooper,
David Pellow,
Atousa Assadi,
Jennifer Bell,
Mark Coastworth,
Kaivalya Deshpande,
Jim Fackler,
Gabriel Funingana,
Spencer Gable-Cook,
Anirudh Gangadhar,
Abhishek Jaiswal,
Sumanth Kaja,
Christopher Khoury,
Amrit Krishnan,
Randy Lin,
Kaden McKeen,
Sara Naimimohasses,
Khashayar Namdar,
Aviraj Newatia,
Allan Pang,
Anshul Pattoo,
Sameer Peesapati,
Diana Prepelita,
Bogdana Rakova
, et al. (10 additional authors not shown)
Abstract:
We present the design process and findings of the pre-conference workshop at the Machine Learning for Healthcare Conference (2024) entitled Red Teaming Large Language Models for Healthcare, which took place on August 15, 2024. Conference participants, comprising a mix of computational and clinical expertise, attempted to discover vulnerabilities -- realistic clinical prompts for which a large lang…
▽ More
We present the design process and findings of the pre-conference workshop at the Machine Learning for Healthcare Conference (2024) entitled Red Teaming Large Language Models for Healthcare, which took place on August 15, 2024. Conference participants, comprising a mix of computational and clinical expertise, attempted to discover vulnerabilities -- realistic clinical prompts for which a large language model (LLM) outputs a response that could cause clinical harm. Red-teaming with clinicians enables the identification of LLM vulnerabilities that may not be recognised by LLM developers lacking clinical expertise. We report the vulnerabilities found, categorise them, and present the results of a replication study assessing the vulnerabilities across all LLMs provided.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Unscented Particle Filter for Visual-inertial Navigation using IMU and Landmark Measurements
Authors:
Khashayar Ghanizadegan,
Hashim A. Hashim
Abstract:
This paper introduces a geometric Quaternion-based Unscented Particle Filter for Visual-Inertial Navigation (QUPF-VIN) specifically designed for a vehicle operating with six degrees of freedom (6 DoF). The proposed QUPF-VIN technique is quaternion-based capturing the inherently nonlinear nature of true navigation kinematics. The filter fuses data from a low-cost inertial measurement unit (IMU) and…
▽ More
This paper introduces a geometric Quaternion-based Unscented Particle Filter for Visual-Inertial Navigation (QUPF-VIN) specifically designed for a vehicle operating with six degrees of freedom (6 DoF). The proposed QUPF-VIN technique is quaternion-based capturing the inherently nonlinear nature of true navigation kinematics. The filter fuses data from a low-cost inertial measurement unit (IMU) and landmark observations obtained via a vision sensor. The QUPF-VIN is implemented in discrete form to ensure seamless integration with onboard inertial sensing systems. Designed for robustness in GPS-denied environments, the proposed method has been validated through experiments with real-world dataset involving an unmanned aerial vehicle (UAV) equipped with a 6-axis IMU and a stereo camera, operating with 6 DoF. The numerical results demonstrate that the QUPF-VIN provides superior tracking accuracy compared to ground truth data. Additionally, a comparative analysis with a standard Kalman filter-based navigation technique further highlights the enhanced performance of the QUPF-VIN.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Enhancing Fault Detection and Isolation in an All-Electric Auxiliary Power Unit (APU) Gas Generator by Utilizing Starter/Generator Signal
Authors:
Haotian Mao,
Khashayar Khorasani,
Yingqing Guo
Abstract:
This study proposes a novel paradigm for enhancing fault detection and isolation (FDI) of gas generators in all-electric auxiliary power unit (APU) by utilizing shaft power information from the starter/generator. First, we conduct a pioneering investigation into the challenges and opportunities for FDI brought about by APU electrification. Our analysis reveals that the electrification of APU opens…
▽ More
This study proposes a novel paradigm for enhancing fault detection and isolation (FDI) of gas generators in all-electric auxiliary power unit (APU) by utilizing shaft power information from the starter/generator. First, we conduct a pioneering investigation into the challenges and opportunities for FDI brought about by APU electrification. Our analysis reveals that the electrification of APU opens up new possibilities for utilizing shaft power estimates from starter/generator to improve gas generator FDI. We then provide comprehensive theoretical and analytical evidence demonstrating why, how, and to what extent, the shaft power information from the starter/generator can fundamentally enhance the estimation accuracy of system states and health parameters of the gas generator, while also identifying the key factors influencing these improvements in FDI performance. The effectiveness of the proposed paradigm and its theoretical foundations are validated through extensive Monte Carlo simulations. Furthermore, through comprehensive comparative analysis with state-of-the-art gas generator fault diagnosis methods, our experimental results not only demonstrate the superior performance of the proposed approach but also validate that the diagnostic capabilities of existing advanced FDI techniques can be substantially enhanced by incorporating shaft power information. And the observed performance improvement patterns strongly align with our theoretical analysis, verifying both the effectiveness and guiding significance of our theoretical framework. These research findings provide a unique perspective in answering three fundamental questions: why joint fault diagnosis of the starter/generator and gas generator is essential, how it can be implemented, and what factors determine its effectiveness, thereby opening up promising new avenues for FDI technologies in all-electric APU systems.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
DeepUKF-VIN: Adaptively-tuned Deep Unscented Kalman Filter for 3D Visual-Inertial Navigation based on IMU-Vision-Net
Authors:
Khashayar Ghanizadegan,
Hashim A. Hashim
Abstract:
This paper addresses the challenge of estimating the orientation, position, and velocity of a vehicle operating in three-dimensional (3D) space with six degrees of freedom (6-DoF). A Deep Learning-based Adaptation Mechanism (DLAM) is proposed to adaptively tune the noise covariance matrices of Kalman-type filters for the Visual-Inertial Navigation (VIN) problem, leveraging IMU-Vision-Net. Subseque…
▽ More
This paper addresses the challenge of estimating the orientation, position, and velocity of a vehicle operating in three-dimensional (3D) space with six degrees of freedom (6-DoF). A Deep Learning-based Adaptation Mechanism (DLAM) is proposed to adaptively tune the noise covariance matrices of Kalman-type filters for the Visual-Inertial Navigation (VIN) problem, leveraging IMU-Vision-Net. Subsequently, an adaptively tuned Deep Learning Unscented Kalman Filter for 3D VIN (DeepUKF-VIN) is introduced to utilize the proposed DLAM, thereby robustly estimating key navigation components, including orientation, position, and linear velocity. The proposed DeepUKF-VIN integrates data from onboard sensors, specifically an inertial measurement unit (IMU) and visual feature points extracted from a camera, and is applicable for GPS-denied navigation. Its quaternion-based design effectively captures navigation nonlinearities and avoids the singularities commonly encountered with Euler-angle-based filters. Implemented in discrete space, the DeepUKF-VIN facilitates practical filter deployment. The filter's performance is evaluated using real-world data collected from an IMU and a stereo camera at low sampling rates. The results demonstrate filter stability and rapid attenuation of estimation errors, highlighting its high estimation accuracy. Furthermore, comparative testing against the standard Unscented Kalman Filter (UKF) in two scenarios consistently shows superior performance across all navigation components, thereby validating the efficacy and robustness of the proposed DeepUKF-VIN. Keywords: Deep Learning, Unscented Kalman Filter, Adaptive tuning, Estimation, Navigation, Unmanned Aerial Vehicle, Sensor-fusion.
△ Less
Submitted 12 March, 2025; v1 submitted 1 February, 2025;
originally announced February 2025.
-
Quaternion-based Unscented Kalman Filter for 6-DoF Vision-based Inertial Navigation in GPS-denied Regions
Authors:
Khashayar Ghanizadegan,
Hashim A. Hashim
Abstract:
This paper investigates the orientation, position, and linear velocity estimation problem of a rigid-body moving in three-dimensional (3D) space with six degrees-of-freedom (6 DoF). The highly nonlinear navigation kinematics are formulated to ensure global representation of the navigation problem. A computationally efficient Quaternion-based Navigation Unscented Kalman Filter (QNUKF) is proposed o…
▽ More
This paper investigates the orientation, position, and linear velocity estimation problem of a rigid-body moving in three-dimensional (3D) space with six degrees-of-freedom (6 DoF). The highly nonlinear navigation kinematics are formulated to ensure global representation of the navigation problem. A computationally efficient Quaternion-based Navigation Unscented Kalman Filter (QNUKF) is proposed on $\mathbb{S}^{3}\times\mathbb{R}^{3}\times\mathbb{R}^{3}$ imitating the true nonlinear navigation kinematics and utilize onboard Visual-Inertial Navigation (VIN) units to achieve successful GPS-denied navigation. The proposed QNUKF is designed in discrete form to operate based on the data fusion of photographs garnered by a vision unit (stereo or monocular camera) and information collected by a low-cost inertial measurement unit (IMU). The photographs are processed to extract feature points in 3D space, while the 6-axis IMU supplies angular velocity and accelerometer measurements expressed with respect to the body-frame. Robustness and effectiveness of the proposed QNUKF have been confirmed through experiments on a real-world dataset collected by a drone navigating in 3D and consisting of stereo images and 6-axis IMU measurements. Also, the proposed approach is validated against standard state-of-the-art filtering techniques. IEEE Keywords: Localization, Navigation, Unmanned Aerial Vehicle, Sensor-fusion, Inertial Measurement Unit, Vision Unit.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
VeCoGen: Automating Generation of Formally Verified C Code with Large Language Models
Authors:
Merlijn Sevenhuijsen,
Khashayar Etemadi,
Mattias Nyberg
Abstract:
Large language models have demonstrated impressive capabilities in generating code, yet they often produce programs with flaws or deviations from intended behavior, limiting their suitability for safety-critical applications. To address this limitation, this paper introduces VECOGEN, a novel tool that combines large language models with formal verification to automate the generation of formally ve…
▽ More
Large language models have demonstrated impressive capabilities in generating code, yet they often produce programs with flaws or deviations from intended behavior, limiting their suitability for safety-critical applications. To address this limitation, this paper introduces VECOGEN, a novel tool that combines large language models with formal verification to automate the generation of formally verified C programs. VECOGEN takes a formal specification in ANSI/ISO C Specification Language, a natural language specification, and a set of test cases to attempt to generate a verified program. This program-generation process consists of two steps. First, VECOGEN generates an initial set of candidate programs. Secondly, the tool iteratively improves on previously generated candidates. If a candidate program meets the formal specification, then we are sure the program is correct. We evaluate VECOGEN on 15 problems presented in Codeforces competitions. On these problems, VECOGEN solves 13 problems. This work shows the potential of combining large language models with formal verification to automate program generation.
△ Less
Submitted 7 April, 2025; v1 submitted 28 November, 2024;
originally announced November 2024.
-
MBExplainer: Multilevel bandit-based explanations for downstream models with augmented graph embeddings
Authors:
Ashkan Golgoon,
Ryan Franks,
Khashayar Filom,
Arjun Ravi Kannan
Abstract:
In many industrial applications, it is common that the graph embeddings generated from training GNNs are used in an ensemble model where the embeddings are combined with other tabular features (e.g., original node or edge features) in a downstream ML task. The tabular features may even arise naturally if, e.g., one tries to build a graph such that some of the node or edge features are stored in a…
▽ More
In many industrial applications, it is common that the graph embeddings generated from training GNNs are used in an ensemble model where the embeddings are combined with other tabular features (e.g., original node or edge features) in a downstream ML task. The tabular features may even arise naturally if, e.g., one tries to build a graph such that some of the node or edge features are stored in a tabular format. Here we address the problem of explaining the output of such ensemble models for which the input features consist of learned neural graph embeddings combined with additional tabular features. We propose MBExplainer, a model-agnostic explanation approach for downstream models with augmented graph embeddings. MBExplainer returns a human-legible triple as an explanation for an instance prediction of the whole pipeline consisting of three components: a subgraph with the highest importance, the topmost important nodal features, and the topmost important augmented downstream features. A game-theoretic formulation is used to take the contributions of each component and their interactions into account by assigning three Shapley values corresponding to their own specific games. Finding the explanation requires an efficient search through the corresponding local search spaces corresponding to each component. MBExplainer applies a novel multilevel search algorithm that enables simultaneous pruning of local search spaces in a computationally tractable way. In particular, three interweaved Monte Carlo Tree Search are utilized to iteratively prune the local search spaces. MBExplainer also includes a global search algorithm that uses contextual bandits to efficiently allocate pruning budget among the local search spaces. We show the effectiveness of MBExplainer by presenting a set of comprehensive numerical examples on multiple public graph datasets for both node and graph classification tasks.
△ Less
Submitted 31 October, 2024;
originally announced November 2024.
-
On the Role of Depth and Looping for In-Context Learning with Task Diversity
Authors:
Khashayar Gatmiry,
Nikunj Saunshi,
Sashank J. Reddi,
Stefanie Jegelka,
Sanjiv Kumar
Abstract:
The intriguing in-context learning (ICL) abilities of deep Transformer models have lately garnered significant attention. By studying in-context linear regression on unimodal Gaussian data, recent empirical and theoretical works have argued that ICL emerges from Transformers' abilities to simulate learning algorithms like gradient descent. However, these works fail to capture the remarkable abilit…
▽ More
The intriguing in-context learning (ICL) abilities of deep Transformer models have lately garnered significant attention. By studying in-context linear regression on unimodal Gaussian data, recent empirical and theoretical works have argued that ICL emerges from Transformers' abilities to simulate learning algorithms like gradient descent. However, these works fail to capture the remarkable ability of Transformers to learn multiple tasks in context. To this end, we study in-context learning for linear regression with diverse tasks, characterized by data covariance matrices with condition numbers ranging from $[1, κ]$, and highlight the importance of depth in this setting. More specifically, (a) we show theoretical lower bounds of $\log(κ)$ (or $\sqrtκ$) linear attention layers in the unrestricted (or restricted) attention setting and, (b) we show that multilayer Transformers can indeed solve such tasks with a number of layers that matches the lower bounds. However, we show that this expressivity of multilayer Transformer comes at the price of robustness. In particular, multilayer Transformers are not robust to even distributional shifts as small as $O(e^{-L})$ in Wasserstein distance, where $L$ is the depth of the network. We then demonstrate that Looped Transformers -- a special class of multilayer Transformers with weight-sharing -- not only exhibit similar expressive power but are also provably robust under mild assumptions. Besides out-of-distribution generalization, we also show that Looped Transformers are the only models that exhibit a monotonic behavior of loss with respect to depth.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
Computing Optimal Regularizers for Online Linear Optimization
Authors:
Khashayar Gatmiry,
Jon Schneider,
Stefanie Jegelka
Abstract:
Follow-the-Regularized-Leader (FTRL) algorithms are a popular class of learning algorithms for online linear optimization (OLO) that guarantee sub-linear regret, but the choice of regularizer can significantly impact dimension-dependent factors in the regret bound. We present an algorithm that takes as input convex and symmetric action sets and loss sets for a specific OLO instance, and outputs a…
▽ More
Follow-the-Regularized-Leader (FTRL) algorithms are a popular class of learning algorithms for online linear optimization (OLO) that guarantee sub-linear regret, but the choice of regularizer can significantly impact dimension-dependent factors in the regret bound. We present an algorithm that takes as input convex and symmetric action sets and loss sets for a specific OLO instance, and outputs a regularizer such that running FTRL with this regularizer guarantees regret within a universal constant factor of the best possible regret bound. In particular, for any choice of (convex, symmetric) action set and loss set we prove that there exists an instantiation of FTRL which achieves regret within a constant factor of the best possible learning algorithm, strengthening the universality result of Srebro et al., 2011.
Our algorithm requires preprocessing time and space exponential in the dimension $d$ of the OLO instance, but can be run efficiently online assuming a membership and linear optimization oracle for the action and loss sets, respectively (and is fully polynomial time for the case of constant dimension $d$). We complement this with a lower bound showing that even deciding whether a given regularizer is $α$-strongly-convex with respect to a given norm is NP-hard.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Simplicity Bias via Global Convergence of Sharpness Minimization
Authors:
Khashayar Gatmiry,
Zhiyuan Li,
Sashank J. Reddi,
Stefanie Jegelka
Abstract:
The remarkable generalization ability of neural networks is usually attributed to the implicit bias of SGD, which often yields models with lower complexity using simpler (e.g. linear) and low-rank features. Recent works have provided empirical and theoretical evidence for the bias of particular variants of SGD (such as label noise SGD) toward flatter regions of the loss landscape. Despite the folk…
▽ More
The remarkable generalization ability of neural networks is usually attributed to the implicit bias of SGD, which often yields models with lower complexity using simpler (e.g. linear) and low-rank features. Recent works have provided empirical and theoretical evidence for the bias of particular variants of SGD (such as label noise SGD) toward flatter regions of the loss landscape. Despite the folklore intuition that flat solutions are 'simple', the connection with the simplicity of the final trained model (e.g. low-rank) is not well understood. In this work, we take a step toward bridging this gap by studying the simplicity structure that arises from minimizers of the sharpness for a class of two-layer neural networks. We show that, for any high dimensional training data and certain activations, with small enough step size, label noise SGD always converges to a network that replicates a single linear feature across all neurons; thereby, implying a simple rank one feature matrix. To obtain this result, our main technical contribution is to show that label noise SGD always minimizes the sharpness on the manifold of models with zero loss for two-layer networks. Along the way, we discover a novel property -- a local geodesic convexity -- of the trace of Hessian of the loss at approximate stationary points on the manifold of zero loss, which links sharpness to the geometry of the manifold. This tool may be of independent interest.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
AGN feeding along a one-armed spiral in NGC 4593: A study using ALMA CO(2-1) observations
Authors:
K. Kianfar,
P. Andreani,
J. A. Fernández-Ontiveros,
F. Combes,
L. Spinoglio,
E. Hatziminaoglou,
C. Ricci,
A. Bewketu-Belete,
M. Imanishi,
M. Pereira-Santaella,
R. Slater,
M. Malheiro
Abstract:
We investigate active galactic nuclei (AGN) feeding through the molecular gas (CO(2-1) emission) properties of the local Seyfert 1 galaxy NGC 4593, using Atacama Large Millimeter Array (ALMA) observations and other multi-wavelength data. Our study aims to understand the interplay between the AGN and the interstellar medium (ISM) in this galaxy, examining the role of the AGN in steering gas dynamic…
▽ More
We investigate active galactic nuclei (AGN) feeding through the molecular gas (CO(2-1) emission) properties of the local Seyfert 1 galaxy NGC 4593, using Atacama Large Millimeter Array (ALMA) observations and other multi-wavelength data. Our study aims to understand the interplay between the AGN and the interstellar medium (ISM) in this galaxy, examining the role of the AGN in steering gas dynamics within its host galaxy, evaluating the energy injected into the ISM, and determining whether gas is inflowing or outflowing from the galaxy. After reducing the ALMA CO(2-1) images, we employed two models, 3D-Barolo and DISCFIT, to construct a disc model and fit its emission to the ALMA data. Additionally, we used photometric data to build a spectral energy distribution (SED) and applied the CIGALE code to derive key physical properties of the AGN and its host. Our analysis reveals a complex interplay within NGC 4593, including a clear rotational pattern, the influence of a non-axisymmetric bar potential, and a central molecular zone (CMZ)-like ring. We observe an outflow of CO(2-1) gas along the minor axis, at a distance of approximately 220 pc from the nucleus. The total molecular gas mass is estimated to be between $1 - 5 \times 10^8 \, M_{\odot}$, with non-circular motions contributing about 10\%. Our SED analysis indicates an AGN fraction of 0.88 and a star formation rate (SFR) of 0.42 $M_{\odot} \, \text{yr}^{-1}$. These findings highlight the complex dynamics in the centre of NGC 4593, which are significantly influenced by the presence of the AGN. The overall physical properties of this system suggest that the AGN has a substantial impact on the evolution of NGC 4593.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Authors:
Khashayar Gatmiry,
Nikunj Saunshi,
Sashank J. Reddi,
Stefanie Jegelka,
Sanjiv Kumar
Abstract:
The remarkable capability of Transformers to do reasoning and few-shot learning, without any fine-tuning, is widely conjectured to stem from their ability to implicitly simulate a multi-step algorithms -- such as gradient descent -- with their weights in a single forward pass. Recently, there has been progress in understanding this complex phenomenon from an expressivity point of view, by demonstr…
▽ More
The remarkable capability of Transformers to do reasoning and few-shot learning, without any fine-tuning, is widely conjectured to stem from their ability to implicitly simulate a multi-step algorithms -- such as gradient descent -- with their weights in a single forward pass. Recently, there has been progress in understanding this complex phenomenon from an expressivity point of view, by demonstrating that Transformers can express such multi-step algorithms. However, our knowledge about the more fundamental aspect of its learnability, beyond single layer models, is very limited. In particular, can training Transformers enable convergence to algorithmic solutions? In this work we resolve this for in-context linear regression with linear looped Transformers -- a multi-layer model with weight sharing that is conjectured to have an inductive bias to learn fix-point iterative algorithms. More specifically, for this setting we show that the global minimizer of the population training loss implements multi-step preconditioned gradient descent, with a preconditioner that adapts to the data distribution. Furthermore, we show a fast convergence for gradient flow on the regression loss, despite the non-convexity of the landscape, by proving a novel gradient dominance condition. To our knowledge, this is the first theoretical analysis for multi-layer Transformer in this setting. We further validate our theoretical findings through synthetic experiments.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
What does guidance do? A fine-grained analysis in a simple setting
Authors:
Muthu Chidambaram,
Khashayar Gatmiry,
Sitan Chen,
Holden Lee,
Jianfeng Lu
Abstract:
The use of guidance in diffusion models was originally motivated by the premise that the guidance-modified score is that of the data distribution tilted by a conditional likelihood raised to some power. In this work we clarify this misconception by rigorously proving that guidance fails to sample from the intended tilted distribution.
Our main result is to give a fine-grained characterization of…
▽ More
The use of guidance in diffusion models was originally motivated by the premise that the guidance-modified score is that of the data distribution tilted by a conditional likelihood raised to some power. In this work we clarify this misconception by rigorously proving that guidance fails to sample from the intended tilted distribution.
Our main result is to give a fine-grained characterization of the dynamics of guidance in two cases, (1) mixtures of compactly supported distributions and (2) mixtures of Gaussians, which reflect salient properties of guidance that manifest on real-world data. In both cases, we prove that as the guidance parameter increases, the guided model samples more heavily from the boundary of the support of the conditional distribution. We also prove that for any nonzero level of score estimation error, sufficiently large guidance will result in sampling away from the support, theoretically justifying the empirical finding that large guidance results in distorted generations.
In addition to verifying these results empirically in synthetic settings, we also show how our theoretical insights can offer useful prescriptions for practical deployment.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Mechanistic interpretability of large language models with applications to the financial services industry
Authors:
Ashkan Golgoon,
Khashayar Filom,
Arjun Ravi Kannan
Abstract:
Large Language Models such as GPTs (Generative Pre-trained Transformers) exhibit remarkable capabilities across a broad spectrum of applications. Nevertheless, due to their intrinsic complexity, these models present substantial challenges in interpreting their internal decision-making processes. This lack of transparency poses critical challenges when it comes to their adaptation by financial inst…
▽ More
Large Language Models such as GPTs (Generative Pre-trained Transformers) exhibit remarkable capabilities across a broad spectrum of applications. Nevertheless, due to their intrinsic complexity, these models present substantial challenges in interpreting their internal decision-making processes. This lack of transparency poses critical challenges when it comes to their adaptation by financial institutions, where concerns and accountability regarding bias, fairness, and reliability are of paramount importance. Mechanistic interpretability aims at reverse engineering complex AI models such as transformers. In this paper, we are pioneering the use of mechanistic interpretability to shed some light on the inner workings of large language models for use in financial services applications. We offer several examples of how algorithmic tasks can be designed for compliance monitoring purposes. In particular, we investigate GPT-2 Small's attention pattern when prompted to identify potential violation of Fair Lending laws. Using direct logit attribution, we study the contributions of each layer and its corresponding attention heads to the logit difference in the residual stream. Finally, we design clean and corrupted prompts and use activation patching as a causal intervention method to localize our task completion components further. We observe that the (positive) heads $10.2$ (head $2$, layer $10$), $10.7$, and $11.3$, as well as the (negative) heads $9.6$ and $10.6$ play a significant role in the task completion.
△ Less
Submitted 15 October, 2024; v1 submitted 15 July, 2024;
originally announced July 2024.
-
Adversarial Online Learning with Temporal Feedback Graphs
Authors:
Khashayar Gatmiry,
Jon Schneider
Abstract:
We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time $t$ is provided by a directed "feedback graph" known to the learner). We present a novel learning algorithm for this setting based on a strategy of partitioning the losses…
▽ More
We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time $t$ is provided by a directed "feedback graph" known to the learner). We present a novel learning algorithm for this setting based on a strategy of partitioning the losses across sub-cliques of this graph. We complement this with a lower bound that is tight in many practical settings, and which we conjecture to be within a constant factor of optimal. For the important class of transitive feedback graphs, we prove that this algorithm is efficiently implementable and obtains the optimal regret bound (up to a universal constant).
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Mokav: Execution-driven Differential Testing with LLMs
Authors:
Khashayar Etemadi,
Bardia Mohammadi,
Zhendong Su,
Martin Monperrus
Abstract:
It is essential to detect functional differences in various software engineering tasks, such as automated program repair, mutation testing, and code refactoring. The problem of detecting functional differences between two programs can be reduced to searching for a difference exposing test (DET): a test input that results in different outputs on the subject programs. In this paper, we propose Mokav…
▽ More
It is essential to detect functional differences in various software engineering tasks, such as automated program repair, mutation testing, and code refactoring. The problem of detecting functional differences between two programs can be reduced to searching for a difference exposing test (DET): a test input that results in different outputs on the subject programs. In this paper, we propose Mokav, a novel execution-driven tool that leverages LLMs to generate DETs. Mokav takes two versions of a program (P and Q) and an example test input. When successful, Mokav generates a valid DET, a test input that leads to different outputs on P and Q. Mokav iteratively prompts an LLM with a specialized prompt to generate new test inputs. At each iteration, Mokav provides execution-based feedback regarding previously generated tests until the LLM produces a DET. We evaluate Mokav on 1,535 pairs of Python programs collected from the Codeforces competition platform and 32 pairs of programs from the QuixBugs dataset. Our experiments show that Mokav outperforms the state-of-the-art, Pynguin and Differential Prompting, by a large margin. Mokav can generate DETs for 81.7% (1,255/1,535) of the program pairs in our benchmark (versus 4.9% for Pynguin and 37.3% for Differential Prompting). We demonstrate that all components in our system, including the iterative and execution-driven approaches, contribute to its high effectiveness.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving
Authors:
Jonathan Booher,
Khashayar Rohanimanesh,
Junhong Xu,
Vladislav Isenbaev,
Ashwin Balakrishna,
Ishan Gupta,
Wei Liu,
Aleksandr Petiushko
Abstract:
Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to le…
▽ More
Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings such as autonomous driving. Both of these challenges make deploying purely cloned or pure RL policies in safety critical applications such as autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a safe reinforcement learning framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation and real world driving benchmarks.
△ Less
Submitted 11 November, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Opportunities for Persian Digital Humanities Research with Artificial Intelligence Language Models; Case Study: Forough Farrokhzad
Authors:
Arash Rasti Meymandi,
Zahra Hosseini,
Sina Davari,
Abolfazl Moshiri,
Shabnam Rahimi-Golkhandan,
Khashayar Namdar,
Nikta Feizi,
Mohamad Tavakoli-Targhi,
Farzad Khalvati
Abstract:
This study explores the integration of advanced Natural Language Processing (NLP) and Artificial Intelligence (AI) techniques to analyze and interpret Persian literature, focusing on the poetry of Forough Farrokhzad. Utilizing computational methods, we aim to unveil thematic, stylistic, and linguistic patterns in Persian poetry. Specifically, the study employs AI models including transformer-based…
▽ More
This study explores the integration of advanced Natural Language Processing (NLP) and Artificial Intelligence (AI) techniques to analyze and interpret Persian literature, focusing on the poetry of Forough Farrokhzad. Utilizing computational methods, we aim to unveil thematic, stylistic, and linguistic patterns in Persian poetry. Specifically, the study employs AI models including transformer-based language models for clustering of the poems in an unsupervised framework. This research underscores the potential of AI in enhancing our understanding of Persian literary heritage, with Forough Farrokhzad's work providing a comprehensive case study. This approach not only contributes to the field of Persian Digital Humanities but also sets a precedent for future research in Persian literary studies using computational techniques.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Learning Mixtures of Gaussians Using Diffusion Models
Authors:
Khashayar Gatmiry,
Jonathan Kelner,
Holden Lee
Abstract:
We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly\,log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample complexity, under a minimum weight assumption. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of $k$…
▽ More
We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly\,log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample complexity, under a minimum weight assumption. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of $k$ balls of constant radius. In particular, this applies to the case of Gaussian convolutions of distributions on low-dimensional manifolds, or more generally sets with small covering number, for which no sub-exponential algorithm was previously known. Unlike previous approaches, most of which are algebraic in nature, our approach is analytic and relies on the framework of diffusion models. Diffusion models are a modern paradigm for generative modeling, which typically rely on learning the score function (gradient log-pdf) along a process transforming a pure noise distribution, in our case a Gaussian, to the data distribution. Despite their dazzling performance in tasks such as image generation, there are few end-to-end theoretical guarantees that they can efficiently learn nontrivial families of distributions; we give some of the first such guarantees. We proceed by deriving higher-order Gaussian noise sensitivity bounds for the score functions for a Gaussian mixture to show that that they can be inductively learned using piecewise polynomial regression (up to poly-logarithmic degree), and combine this with known convergence results for diffusion models.
△ Less
Submitted 4 March, 2025; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications
Authors:
Zihan Zhou,
Jonathan Booher,
Khashayar Rohanimanesh,
Wei Liu,
Aleksandr Petiushko,
Animesh Garg
Abstract:
Safe reinforcement learning tasks are a challenging domain despite being very common in the real world. The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. To address this issue, we first describe the problem with a stronger Uniformly Constraine…
▽ More
Safe reinforcement learning tasks are a challenging domain despite being very common in the real world. The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. To address this issue, we first describe the problem with a stronger Uniformly Constrained MDP (UCMDP) model where we impose constraints on all reachable states; we then propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic, as a solution to the Lagrangian dual of a UCMDP. We benchmark Objective Suppression in two multi-constraint safety domains, including an autonomous driving domain where any incorrect behavior can lead to disastrous consequences. On the driving domain, we evaluate on open source and proprietary data and evaluate transfer to a real autonomous fleet. Empirically, we demonstrate that our proposed method, when combined with existing safe RL algorithms, can match the task reward achieved by baselines with significantly fewer constraint violations.
△ Less
Submitted 28 August, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
CigaR: Cost-efficient Program Repair with LLMs
Authors:
Dávid Hidvégi,
Khashayar Etemadi,
Sofia Bobadilla,
Martin Monperrus
Abstract:
Large language models (LLM) have proven to be effective at automated program repair (APR). However, using LLMs can be costly, with companies invoicing users by the number of tokens. In this paper, we propose CigaR, the first LLM-based APR tool that focuses on minimizing the repair cost. CigaR works in two major steps: generating a first plausible patch and multiplying plausible patches. CigaR opti…
▽ More
Large language models (LLM) have proven to be effective at automated program repair (APR). However, using LLMs can be costly, with companies invoicing users by the number of tokens. In this paper, we propose CigaR, the first LLM-based APR tool that focuses on minimizing the repair cost. CigaR works in two major steps: generating a first plausible patch and multiplying plausible patches. CigaR optimizes the prompts and the prompt setting to maximize the information given to LLMs using the smallest possible number of tokens. Our experiments on 429 bugs from the widely used Defects4J and HumanEval-Java datasets shows that CigaR reduces the token cost by 73%. On average, CigaR spends 127k tokens per bug while the baseline uses 467k tokens per bug. On the subset of bugs that are fixed by both, CigaR spends 20k per bug while the baseline uses 608k tokens, a cost saving of 96%. Our extensive experiments show that CigaR is a cost-effective LLM-based program repair tool that uses a low number of tokens to automatically generate patches.
△ Less
Submitted 18 April, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Improving Pediatric Low-Grade Neuroepithelial Tumors Molecular Subtype Identification Using a Novel AUROC Loss Function for Convolutional Neural Networks
Authors:
Khashayar Namdar,
Matthias W. Wagner,
Cynthia Hawkins,
Uri Tabori,
Birgit B. Ertl-Wagner,
Farzad Khalvati
Abstract:
Pediatric Low-Grade Neuroepithelial Tumors (PLGNT) are the most common pediatric cancer type, accounting for 40% of brain tumors in children, and identifying PLGNT molecular subtype is crucial for treatment planning. However, the gold standard to determine the PLGNT subtype is biopsy, which can be impractical or dangerous for patients. This research improves the performance of Convolutional Neural…
▽ More
Pediatric Low-Grade Neuroepithelial Tumors (PLGNT) are the most common pediatric cancer type, accounting for 40% of brain tumors in children, and identifying PLGNT molecular subtype is crucial for treatment planning. However, the gold standard to determine the PLGNT subtype is biopsy, which can be impractical or dangerous for patients. This research improves the performance of Convolutional Neural Networks (CNNs) in classifying PLGNT subtypes through MRI scans by introducing a loss function that specifically improves the model's Area Under the Receiver Operating Characteristic (ROC) Curve (AUROC), offering a non-invasive diagnostic alternative. In this study, a retrospective dataset of 339 children with PLGNT (143 BRAF fusion, 71 with BRAF V600E mutation, and 125 non-BRAF) was curated. We employed a CNN model with Monte Carlo random data splitting. The baseline model was trained using binary cross entropy (BCE), and achieved an AUROC of 86.11% for differentiating BRAF fusion and BRAF V600E mutations, which was improved to 87.71% using our proposed AUROC loss function (p-value 0.045). With multiclass classification, the AUROC improved from 74.42% to 76. 59% (p-value 0.0016).
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Generative AI to Generate Test Data Generators
Authors:
Benoit Baudry,
Khashayar Etemadi,
Sen Fang,
Yogya Gamage,
Yi Liu,
Yuxin Liu,
Martin Monperrus,
Javier Ron,
André Silva,
Deepika Tiwari
Abstract:
Generating fake data is an essential dimension of modern software testing, as demonstrated by the number and significance of data faking libraries. Yet, developers of faking libraries cannot keep up with the wide range of data to be generated for different natural languages and domains. In this paper, we assess the ability of generative AI for generating test data in different domains. We design t…
▽ More
Generating fake data is an essential dimension of modern software testing, as demonstrated by the number and significance of data faking libraries. Yet, developers of faking libraries cannot keep up with the wide range of data to be generated for different natural languages and domains. In this paper, we assess the ability of generative AI for generating test data in different domains. We design three types of prompts for Large Language Models (LLMs), which perform test data generation tasks at different levels of integrability: 1) raw test data generation, 2) synthesizing programs in a specific language that generate useful test data, and 3) producing programs that use state-of-the-art faker libraries. We evaluate our approach by prompting LLMs to generate test data for 11 domains. The results show that LLMs can successfully generate realistic test data generators in a wide range of domains at all three levels of integrability.
△ Less
Submitted 14 June, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Scalar mixing and entrainment in an axisymmetric jet subjected to external turbulence
Authors:
Khashayar F. Kohan,
Susan J. Gaskin
Abstract:
The present study aims to understand the process of turbulent entrainment into a jet, as affected by background turbulence, using scalar statistics. Planar-laser-induced fluorescence was employed to capture the orthogonal cross sections of the jet at a fixed downstream station with varying background turbulence intensities and length scales. The conditional scalar profiles revealed that the thickn…
▽ More
The present study aims to understand the process of turbulent entrainment into a jet, as affected by background turbulence, using scalar statistics. Planar-laser-induced fluorescence was employed to capture the orthogonal cross sections of the jet at a fixed downstream station with varying background turbulence intensities and length scales. The conditional scalar profiles revealed that the thickness of the scalar turbulent/turbulent interface (TTI) is greater than that of the traditional turbulent/non-turbulent interface (TNTI), and the interfacial thickness is an increasing function of the background turbulence intensity. Although nibbling remains the primary entrainment mechanism in the far field, increased occurrence of concentration 'holes' within the interfacial layer in the presence of ambient turbulence suggests a more significant role of large-scale engulfment in the turbulent/turbulent entrainment process (although still below 1% of the total mass flux). Enhanced contribution of the area of detached jet patches (i.e. 'islands') to that of the main jet is hypothesized to be evidence of intense detrainment events in the background turbulence. This can potentially result in a reduced net entrainment into the jet, which manifests as less negative values of scalar skewness within the jet core.
△ Less
Submitted 16 September, 2024; v1 submitted 29 October, 2023;
originally announced October 2023.
-
EM for Mixture of Linear Regression with Clustered Data
Authors:
Amirhossein Reisizadeh,
Khashayar Gatmiry,
Asuman Ozdaglar
Abstract:
Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federa…
▽ More
Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federated learning where a common latent variable governs the distribution of all the samples generated by a client. It is therefore natural to ask how the underlying clustered structures in distributed data can be exploited to improve learning schemes. In this paper, we tackle this question in the special case of estimating $d$-dimensional parameters of a two-component mixture of linear regressions problem where each of $m$ nodes generates $n$ samples with a shared latent variable. We employ the well-known Expectation-Maximization (EM) method to estimate the maximum likelihood parameters from $m$ batches of dependent samples each containing $n$ measurements. Discarding the clustered structure in the mixture model, EM is known to require $O(\log(mn/d))$ iterations to reach the statistical accuracy of $O(\sqrt{d/(mn)})$. In contrast, we show that if initialized properly, EM on the structured data requires only $O(1)$ iterations to reach the same statistical accuracy, as long as $m$ grows up as $e^{o(n)}$. Our analysis establishes and combines novel asymptotic optimization and generalization guarantees for population and empirical EM with dependent samples, which may be of independent interest.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms
Authors:
Khashayar Khosravi,
Renato Paes Leme,
Chara Podimata,
Apostolis Tsorvantzis
Abstract:
We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states that we call Bandits with Deterministically Evolving States ($B$-$DES$). The workhorse applications of our model are learning for recommendation systems and learning for online ads. In both cases, the reward that the algorithm obtains at each round is a function of the short…
▽ More
We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states that we call Bandits with Deterministically Evolving States ($B$-$DES$). The workhorse applications of our model are learning for recommendation systems and learning for online ads. In both cases, the reward that the algorithm obtains at each round is a function of the short-term reward of the action chosen and how "healthy" the system is (i.e., as measured by its state). For example, in recommendation systems, the reward that the platform obtains from a user's engagement with a particular type of content depends not only on the inherent features of the specific content, but also on how the user's preferences have evolved as a result of interacting with other types of content on the platform. Our general model accounts for the different rate $λ\in [0,1]$ at which the state evolves (e.g., how fast a user's preferences shift as a result of previous content consumption) and encompasses standard multi-armed bandits as a special case. The goal of the algorithm is to minimize a notion of regret against the best-fixed sequence of arms pulled, which is significantly harder to attain compared to standard benchmark of the best-fixed action in hindsight. We present online learning algorithms for any possible value of the evolution rate $λ$ and we show the robustness of our results to various model misspecifications.
△ Less
Submitted 28 January, 2025; v1 submitted 21 July, 2023;
originally announced July 2023.
-
A Unified Approach to Controlling Implicit Regularization via Mirror Descent
Authors:
Haoyuan Sun,
Khashayar Gatmiry,
Kwangjun Ahn,
Navid Azizan
Abstract:
Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how optimization algorithms impact generalization through their "preferred" solutions, a phenomenon commonly referred to as implicit regularization. In particular, it h…
▽ More
Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how optimization algorithms impact generalization through their "preferred" solutions, a phenomenon commonly referred to as implicit regularization. In particular, it has been argued that gradient descent (GD) induces an implicit $\ell_2$-norm regularization in regression and classification problems. However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization. To address this, we present a unified approach using mirror descent (MD), a notable generalization of GD, to control implicit regularization in both regression and classification settings. More specifically, we show that MD with the general class of homogeneous potential functions converges in direction to a generalized maximum-margin solution for linear classification problems, thereby answering a long-standing question in the classification setting. Further, we show that MD can be implemented efficiently and enjoys fast convergence under suitable conditions. Through comprehensive experiments, we demonstrate that MD is a versatile method to produce learned models with different regularizers, which in turn have different generalization performances.
△ Less
Submitted 11 January, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Authors:
Khashayar Gatmiry,
Zhiyuan Li,
Ching-Yao Chuang,
Sashank Reddi,
Tengyu Ma,
Stefanie Jegelka
Abstract:
Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions. More explicit forms of flatness regularization also empirically improve the generalization performance. However, it remains unclear wh…
▽ More
Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions. More explicit forms of flatness regularization also empirically improve the generalization performance. However, it remains unclear why and when flatness regularization leads to better generalization. This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in an important setting: learning deep linear networks from linear measurements, also known as \emph{deep matrix factorization}. We show that for all depth greater than one, with the standard Restricted Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters (i.e., the product of all layer matrices), which in turn leads to better generalization. We empirically verify our theoretical findings on synthetic datasets.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Projection-Free Online Convex Optimization via Efficient Newton Iterations
Authors:
Khashayar Gatmiry,
Zakaria Mhammedi
Abstract:
This paper presents new projection-free algorithms for Online Convex Optimization (OCO) over a convex domain $\mathcal{K} \subset \mathbb{R}^d$. Classical OCO algorithms (such as Online Gradient Descent) typically need to perform Euclidean projections onto the convex set $\cK$ to ensure feasibility of their iterates. Alternative algorithms, such as those based on the Frank-Wolfe method, swap poten…
▽ More
This paper presents new projection-free algorithms for Online Convex Optimization (OCO) over a convex domain $\mathcal{K} \subset \mathbb{R}^d$. Classical OCO algorithms (such as Online Gradient Descent) typically need to perform Euclidean projections onto the convex set $\cK$ to ensure feasibility of their iterates. Alternative algorithms, such as those based on the Frank-Wolfe method, swap potentially-expensive Euclidean projections onto $\mathcal{K}$ for linear optimization over $\mathcal{K}$. However, such algorithms have a sub-optimal regret in OCO compared to projection-based algorithms. In this paper, we look at a third type of algorithms that output approximate Newton iterates using a self-concordant barrier for the set of interest. The use of a self-concordant barrier automatically ensures feasibility without the need for projections. However, the computation of the Newton iterates requires a matrix inverse, which can still be expensive. As our main contribution, we show how the stability of the Newton iterates can be leveraged to compute the inverse Hessian only a vanishing fraction of the rounds, leading to a new efficient projection-free OCO algorithm with a state-of-the-art regret bound.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Public-Key Encryption with Quantum Keys
Authors:
Khashayar Barooti,
Alex B. Grilo,
Loïs Huguenin-Dumittan,
Giulio Malavolta,
Or Sattath,
Quoc-Huy Vu,
Michael Walter
Abstract:
In the framework of Impagliazzo's five worlds, a distinction is often made between two worlds, one where public-key encryption exists (Cryptomania), and one in which only one-way functions exist (MiniCrypt). However, the boundaries between these worlds can change when quantum information is taken into account. Recent work has shown that quantum variants of oblivious transfer and multi-party comput…
▽ More
In the framework of Impagliazzo's five worlds, a distinction is often made between two worlds, one where public-key encryption exists (Cryptomania), and one in which only one-way functions exist (MiniCrypt). However, the boundaries between these worlds can change when quantum information is taken into account. Recent work has shown that quantum variants of oblivious transfer and multi-party computation, both primitives that are classically in Cryptomania, can be constructed from one-way functions, placing them in the realm of quantum MiniCrypt (the so-called MiniQCrypt). This naturally raises the following question: Is it possible to construct a quantum variant of public-key encryption, which is at the heart of Cryptomania, from one-way functions or potentially weaker assumptions?
In this work, we initiate the formal study of the notion of quantum public-key encryption (qPKE), i.e., public-key encryption where keys are allowed to be quantum states. We propose new definitions of security and several constructions of qPKE based on the existence of one-way functions (OWF), or even weaker assumptions, such as pseudorandom function-like states (PRFS) and pseudorandom function-like states with proof of destruction (PRFSPD). Finally, to give a tight characterization of this primitive, we show that computational assumptions are necessary to build quantum public-key encryption. That is, we give a self-contained proof that no quantum public-key encryption scheme can provide information-theoretic security.
△ Less
Submitted 20 June, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm?
Authors:
Yuansi Chen,
Khashayar Gatmiry
Abstract:
We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We bound the gradient complexity to reach $ε$ error in total variation distance from a warm start by $\tilde O(d^{1/4}\text{polylog}(1/ε))$ and demonstra…
▽ More
We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We bound the gradient complexity to reach $ε$ error in total variation distance from a warm start by $\tilde O(d^{1/4}\text{polylog}(1/ε))$ and demonstrate the benefit of choosing the number of leapfrog steps to be larger than 1. To surpass previous analysis on Metropolis-adjusted Langevin algorithm (MALA) that has $\tilde{O}(d^{1/2}\text{polylog}(1/ε))$ dimension dependency in Wu et al. (2022), we reveal a key feature in our proof that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant. This key feature, when shown via induction over the number of leapfrog steps, enables us to obtain estimates on moments of various quantities that appear in the acceptance rate control of Metropolized HMC. Moreover, to deal with another bottleneck on the HMC proposal distribution overlap control in the literature, we provide a new approach to upper bound the Kullback-Leibler divergence between push-forwards of the Gaussian distribution through HMC dynamics initialized at two different points. Notably, our analysis does not require log-concavity or independence of the marginals, and only relies on an isoperimetric inequality. To illustrate the applicability of our result, several examples of natural functions that fall into our framework are discussed.
△ Less
Submitted 8 June, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm under Smoothness and Isoperimetry
Authors:
Yuansi Chen,
Khashayar Gatmiry
Abstract:
We study the mixing time of Metropolis-Adjusted Langevin algorithm (MALA) for sampling a target density on $\mathbb{R}^d$. We assume that the target density satisfies $ψ_μ$-isoperimetry and that the operator norm and trace of its Hessian are bounded by $L$ and $Υ$ respectively. Our main result establishes that, from a warm start, to achieve $ε$-total variation distance to the target density, MALA…
▽ More
We study the mixing time of Metropolis-Adjusted Langevin algorithm (MALA) for sampling a target density on $\mathbb{R}^d$. We assume that the target density satisfies $ψ_μ$-isoperimetry and that the operator norm and trace of its Hessian are bounded by $L$ and $Υ$ respectively. Our main result establishes that, from a warm start, to achieve $ε$-total variation distance to the target density, MALA mixes in $O\left(\frac{(LΥ)^{\frac12}}{ψ_μ^2} \log\left(\frac{1}ε\right)\right)$ iterations. Notably, this result holds beyond the log-concave sampling setting and the mixing time depends on only $Υ$ rather than its upper bound $L d$. In the $m$-strongly logconcave and $L$-log-smooth sampling setting, our bound recovers the previous minimax mixing bound of MALA~\cite{wu2021minimax}.
△ Less
Submitted 8 June, 2023; v1 submitted 8 April, 2023;
originally announced April 2023.
-
Approximation of group explainers with coalition structure using Monte Carlo sampling on the product space of coalitions and features
Authors:
Konstandinos Kotsiopoulos,
Alexey Miroshnikov,
Khashayar Filom,
Arjun Ravi Kannan
Abstract:
In recent years, many Machine Learning (ML) explanation techniques have been designed using ideas from cooperative game theory. These game-theoretic explainers suffer from high complexity, hindering their exact computation in practical settings. In our work, we focus on a wide class of linear game values, as well as coalitional values, for the marginal game based on a given ML model and predictor…
▽ More
In recent years, many Machine Learning (ML) explanation techniques have been designed using ideas from cooperative game theory. These game-theoretic explainers suffer from high complexity, hindering their exact computation in practical settings. In our work, we focus on a wide class of linear game values, as well as coalitional values, for the marginal game based on a given ML model and predictor vector. By viewing these explainers as expectations over appropriate sample spaces, we design a novel Monte Carlo sampling algorithm that estimates them at a reduced complexity that depends linearly on the size of the background dataset. We set up a rigorous framework for the statistical analysis and obtain error bounds for our sampling methods. The advantage of this approach is that it is fast, easily implementable, and model-agnostic. Furthermore, it has similar statistical accuracy as other known estimation techniques that are more complex and model-specific. We provide rigorous proofs of statistical convergence, as well as numerical experiments whose results agree with our theoretical findings.
△ Less
Submitted 18 April, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
A Multi-Agent Adaptive Deep Learning Framework for Online Intrusion Detection
Authors:
Mahdi Soltani,
Khashayar Khajavi,
Mahdi Jafari Siavoshani,
Amir Hossein Jahangir
Abstract:
The network security analyzers use intrusion detection systems (IDSes) to distinguish malicious traffic from benign ones. The deep learning-based IDSes are proposed to auto-extract high-level features and eliminate the time-consuming and costly signature extraction process. However, this new generation of IDSes still suffers from a number of challenges. One of the main issues of an IDS is facing t…
▽ More
The network security analyzers use intrusion detection systems (IDSes) to distinguish malicious traffic from benign ones. The deep learning-based IDSes are proposed to auto-extract high-level features and eliminate the time-consuming and costly signature extraction process. However, this new generation of IDSes still suffers from a number of challenges. One of the main issues of an IDS is facing traffic concept drift which manifests itself as new (i.e., zero-day) attacks, in addition to the changing behavior of benign users/applications. Furthermore, a practical DL-based IDS needs to be conformed to a distributed architecture to handle big data challenges.
We propose a framework for adapting DL-based models to the changing attack/benign traffic behaviors, considering a more practical scenario (i.e., online adaptable IDSes). This framework employs continual deep anomaly detectors in addition to the federated learning approach to solve the above-mentioned challenges. Furthermore, the proposed framework implements sequential packet labeling for each flow, which provides an attack probability score for the flow by gradually observing each flow packet and updating its estimation. We evaluate the proposed framework by employing different deep models (including CNN-based and LSTM-based) over the CIC-IDS2017 and CSE-CIC-IDS2018 datasets. Through extensive evaluations and experiments, we show that the proposed distributed framework is well adapted to the traffic concept drift. More precisely, our results indicate that the CNN-based models are well suited for continually adapting to the traffic concept drift (i.e., achieving an average detection rate of above 95% while needing just 128 new flows for the updating phase), and the LSTM-based models are a good candidate for sequential packet labeling in practical online IDSes (i.e., detecting intrusions by just observing their first 15 packets).
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
Nonlocality under Computational Assumptions
Authors:
Khashayar Barooti,
Alexandru Gheorghiu,
Grzegorz Głuch,
Marc-Olivier Renou
Abstract:
Nonlocality and its connections to entanglement are fundamental features of quantum mechanics that have found numerous applications in quantum information science. A set of correlations is said to be nonlocal if it cannot be reproduced by spacelike-separated parties sharing randomness and performing local operations. An important practical consideration is that the runtime of the parties has to be…
▽ More
Nonlocality and its connections to entanglement are fundamental features of quantum mechanics that have found numerous applications in quantum information science. A set of correlations is said to be nonlocal if it cannot be reproduced by spacelike-separated parties sharing randomness and performing local operations. An important practical consideration is that the runtime of the parties has to be shorter than the time it takes light to travel between them. One way to model this restriction is to assume that the parties are computationally bounded. We therefore initiate the study of nonlocality under computational assumptions and derive the following results:
(a) We define the set $\mathsf{NeL}$ (not-efficiently-local) as consisting of all bipartite states whose correlations arising from local measurements cannot be reproduced with shared randomness and \emph{polynomial-time} local operations.
(b) Under the assumption that the Learning With Errors problem cannot be solved in \emph{quantum} polynomial-time, we show that $\mathsf{NeL}=\mathsf{ENT}$, where $\mathsf{ENT}$ is the set of \emph{all} bipartite entangled states (pure and mixed). This is in contrast to the standard notion of nonlocality where it is known that some entangled states, e.g. Werner states, are local. In essence, we show that there exist (efficient) local measurements producing correlations that cannot be reproduced through shared randomness and quantum polynomial-time computation.
(c) We prove that if $\mathsf{NeL}=\mathsf{ENT}$ unconditionally, then $\mathsf{BQP}\neq\mathsf{PP}$. In other words, the ability to certify all bipartite entangled states against computationally bounded adversaries gives a non-trivial separation of complexity classes.
(d) Using (c), we show that a certain natural class of 1-round delegated quantum computation protocols that are sound against $\mathsf{PP}$ provers cannot exist.
△ Less
Submitted 28 November, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
A Simple Construction of Quantum Public-Key Encryption from Quantum-Secure One-Way Functions
Authors:
Khashayar Barooti,
Giulio Malavolta,
Michael Walter
Abstract:
Quantum public-key encryption [Gottesman; Kawachi et al., Eurocrypt'05] generalizes public-key encryption (PKE) by allowing the public keys to be quantum states. Prior work indicated that quantum PKE can be constructed from assumptions that are potentially weaker than those needed to realize its classical counterpart. In this work, we show that quantum PKE can be constructed from any quantum-secur…
▽ More
Quantum public-key encryption [Gottesman; Kawachi et al., Eurocrypt'05] generalizes public-key encryption (PKE) by allowing the public keys to be quantum states. Prior work indicated that quantum PKE can be constructed from assumptions that are potentially weaker than those needed to realize its classical counterpart. In this work, we show that quantum PKE can be constructed from any quantum-secure one-way function. In contrast, classical PKE is believed to require more structured assumptions. Our construction is simple, uses only classical ciphertexts, and satisfies the strong notion of CCA security.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Sampling with Barriers: Faster Mixing via Lewis Weights
Authors:
Khashayar Gatmiry,
Jonathan Kelner,
Santosh S. Vempala
Abstract:
We analyze Riemannian Hamiltonian Monte Carlo (RHMC) for sampling a polytope defined by $m$ inequalities in $\R^n$ endowed with the metric defined by the Hessian of a convex barrier function. The advantage of RHMC over Euclidean methods such as the ball walk, hit-and-run and the Dikin walk is in its ability to take longer steps. However, in all previous work, the mixing rate has a linear dependenc…
▽ More
We analyze Riemannian Hamiltonian Monte Carlo (RHMC) for sampling a polytope defined by $m$ inequalities in $\R^n$ endowed with the metric defined by the Hessian of a convex barrier function. The advantage of RHMC over Euclidean methods such as the ball walk, hit-and-run and the Dikin walk is in its ability to take longer steps. However, in all previous work, the mixing rate has a linear dependence on the number of inequalities. We introduce a hybrid of the Lewis weights barrier and the standard logarithmic barrier and prove that the mixing rate for the corresponding RHMC is bounded by $\tilde O(m^{1/3}n^{4/3})$, improving on the previous best bound of $\tilde O(mn^{2/3})$ (based on the log barrier). This continues the general parallels between optimization and sampling, with the latter typically leading to new tools and more refined analysis. To prove our main results, we have to overcomes several challenges relating to the smoothness of Hamiltonian curves and the self-concordance properties of the barrier. In the process, we give a general framework for the analysis of Markov chains on Riemannian manifolds, derive new smoothness bounds on Hamiltonian curves, a central topic of comparison geometry, and extend self-concordance to the infinity norm, which gives sharper bounds; these properties appear to be of independent interest.
△ Less
Submitted 19 April, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
On marginal feature attributions of tree-based models
Authors:
Khashayar Filom,
Alexey Miroshnikov,
Konstandinos Kotsiopoulos,
Arjun Ravi Kannan
Abstract:
Due to their power and ease of use, tree-based machine learning models, such as random forests and gradient-boosted tree ensembles, have become very popular. To interpret them, local feature attributions based on marginal expectations, e.g. marginal (interventional) Shapley, Owen or Banzhaf values, may be employed. Such methods are true to the model and implementation invariant, i.e. dependent onl…
▽ More
Due to their power and ease of use, tree-based machine learning models, such as random forests and gradient-boosted tree ensembles, have become very popular. To interpret them, local feature attributions based on marginal expectations, e.g. marginal (interventional) Shapley, Owen or Banzhaf values, may be employed. Such methods are true to the model and implementation invariant, i.e. dependent only on the input-output function of the model. We contrast this with the popular TreeSHAP algorithm by presenting two (statistically similar) decision trees that compute the exact same function for which the "path-dependent" TreeSHAP yields different rankings of features, whereas the marginal Shapley values coincide. Furthermore, we discuss how the internal structure of tree-based models may be leveraged to help with computing their marginal feature attributions according to a linear game value. One important observation is that these are simple (piecewise-constant) functions with respect to a certain grid partition of the input space determined by the trained model. Another crucial observation, showcased by experiments with XGBoost, LightGBM and CatBoost libraries, is that only a portion of all features appears in a tree from the ensemble. Thus, the complexity of computing marginal Shapley (or Owen or Banzhaf) feature attributions may be reduced. This remains valid for a broader class of game values which we shall axiomatically characterize. A prime example is the case of CatBoost models where the trees are oblivious (symmetric) and the number of features in each of them is no larger than the depth. We exploit the symmetry to derive an explicit formula, with improved complexity and only in terms of the internal model parameters, for marginal Shapley (and Banzhaf and Owen) values of CatBoost models. This results in a fast, accurate algorithm for estimating these feature attributions.
△ Less
Submitted 5 May, 2024; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Online Tool Selection with Learned Grasp Prediction Models
Authors:
Khashayar Rohanimanesh,
Jake Metzger,
William Richards,
Aviv Tamar
Abstract:
Deep learning-based grasp prediction models have become an industry standard for robotic bin-picking systems. To maximize pick success, production environments are often equipped with several end-effector tools that can be swapped on-the-fly, based on the target object. Tool-change, however, takes time. Choosing the order of grasps to perform, and corresponding tool-change actions, can improve sys…
▽ More
Deep learning-based grasp prediction models have become an industry standard for robotic bin-picking systems. To maximize pick success, production environments are often equipped with several end-effector tools that can be swapped on-the-fly, based on the target object. Tool-change, however, takes time. Choosing the order of grasps to perform, and corresponding tool-change actions, can improve system throughput; this is the topic of our work. The main challenge in planning tool change is uncertainty - we typically cannot see objects in the bin that are currently occluded. Inspired by queuing and admission control problems, we model the problem as a Markov Decision Process (MDP), where the goal is to maximize expected throughput, and we pursue an approximate solution based on model predictive control, where at each time step we plan based only on the currently visible objects. Special to our method is the idea of void zones, which are geometrical boundaries in which an unknown object will be present, and therefore cannot be accounted for during planning. Our planning problem can be solved using integer linear programming (ILP). However, we find that an approximate solution based on sparse tree search yields near optimal performance at a fraction of the time. Another question that we explore is how to measure the performance of tool-change planning: we find that throughput alone can fail to capture delicate and smooth behavior, and propose a principled alternative. Finally, we demonstrate our algorithms on both synthetic and real world bin picking tasks.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Near-Optimal Algorithms for Group Distributionally Robust Optimization and Beyond
Authors:
Tasuku Soma,
Khashayar Gatmiry,
Sharut Gupta,
Stefanie Jegelka
Abstract:
Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods. In this paper, we devise stochastic algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk (CVaR) optimization. Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings. We also pro…
▽ More
Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods. In this paper, we devise stochastic algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk (CVaR) optimization. Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings. We also provide a new information-theoretic lower bound that implies our bounds are tight for group DRO. Empirically, too, our algorithms outperform known methods.
△ Less
Submitted 31 January, 2025; v1 submitted 27 December, 2022;
originally announced December 2022.
-
Augmenting Diffs With Runtime Information
Authors:
Khashayar Etemadi,
Aman Sharma,
Fernanda Madeiral,
Martin Monperrus
Abstract:
Source code diffs are used on a daily basis as part of code review, inspection, and auditing. To facilitate understanding, they are typically accompanied by explanations that describe the essence of what is changed in the program. As manually crafting high-quality explanations is a cumbersome task, researchers have proposed automatic techniques to generate code diff explanations. Existing explanat…
▽ More
Source code diffs are used on a daily basis as part of code review, inspection, and auditing. To facilitate understanding, they are typically accompanied by explanations that describe the essence of what is changed in the program. As manually crafting high-quality explanations is a cumbersome task, researchers have proposed automatic techniques to generate code diff explanations. Existing explanation generation methods solely focus on static analysis, i.e., they do not take advantage of runtime information to explain code changes. In this paper, we propose Collector-Sahab, a novel tool that augments code diffs with runtime difference information. Collector-Sahab compares the program states of the original (old) and patched (new) versions of a program to find unique variable values. Then, Collector-Sahab adds this novel runtime information to the source code diff as shown, for instance, in code reviewing systems. As an evaluation, we run Collector-Sahab on 584 code diffs for Defects4J bugs and find it successfully augments the code diff for 95% (555/584) of them. We also perform a user study and ask eight participants to score the augmented code diffs generated by Collector-Sahab. Per this user study, we conclude that developers find the idea of adding runtime data to code diffs promising and useful. Overall, our experiments show the effectiveness and usefulness of Collector-Sahab in augmenting code diffs with runtime difference information. Publicly-available repository: https://github.com/ASSERT-KTH/collector-sahab.
△ Less
Submitted 30 June, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Non-invasive Liver Fibrosis Screening on CT Images using Radiomics
Authors:
Jay J. Yoo,
Khashayar Namdar,
Sean Carey,
Sandra E. Fischer,
Chris McIntosh,
Farzad Khalvati,
Patrik Rogalla
Abstract:
Objectives: To develop and evaluate a radiomics machine learning model for detecting liver fibrosis on CT of the liver.
Methods: For this retrospective, single-centre study, radiomic features were extracted from Regions of Interest (ROIs) on CT images of patients who underwent simultaneous liver biopsy and CT examinations. Combinations of contrast, normalization, machine learning model, and feat…
▽ More
Objectives: To develop and evaluate a radiomics machine learning model for detecting liver fibrosis on CT of the liver.
Methods: For this retrospective, single-centre study, radiomic features were extracted from Regions of Interest (ROIs) on CT images of patients who underwent simultaneous liver biopsy and CT examinations. Combinations of contrast, normalization, machine learning model, and feature selection method were determined based on their mean test Area Under the Receiver Operating Characteristic curve (AUC) on randomly placed ROIs. The combination and selected features with the highest AUC were used to develop a final liver fibrosis screening model.
Results: The study included 101 male and 68 female patients (mean age = 51.2 years $\pm$ 14.7 [SD]). When averaging the AUC across all combinations, non-contrast enhanced (NC) CT (AUC, 0.6100; 95% CI: 0.5897, 0.6303) outperformed contrast-enhanced CT (AUC, 0.5680; 95% CI: 0.5471, 0.5890). The combination of hyperparameters and features that yielded the highest AUC was a logistic regression model with inputs features of maximum, energy, kurtosis, skewness, and small area high gray level emphasis extracted from non-contrast enhanced NC CT normalized using Gamma correction with $γ$ = 1.5 (AUC, 0.7833; 95% CI: 0.7821, 0.7845), (sensitivity, 0.9091; 95% CI: 0.9091, 0.9091).
Conclusions: Radiomics-based machine learning models allow for the detection of liver fibrosis with reasonable accuracy and high sensitivity on NC CT. Thus, these models can be used to non-invasively screen for liver fibrosis, contributing to earlier detection of the disease at a potentially curable stage.
△ Less
Submitted 26 February, 2024; v1 submitted 25 November, 2022;
originally announced November 2022.
-
Automating Cobb Angle Measurement for Adolescent Idiopathic Scoliosis using Instance Segmentation
Authors:
Chaojun Chen,
Khashayar Namdar,
Yujie Wu,
Shahob Hosseinpour,
Manohar Shroff,
Andrea S. Doria,
Farzad Khalvati
Abstract:
Scoliosis is a three-dimensional deformity of the spine, most often diagnosed in childhood. It affects 2-3% of the population, which is approximately seven million people in North America. Currently, the reference standard for assessing scoliosis is based on the manual assignment of Cobb angles at the site of the curvature center. This manual process is time consuming and unreliable as it is affec…
▽ More
Scoliosis is a three-dimensional deformity of the spine, most often diagnosed in childhood. It affects 2-3% of the population, which is approximately seven million people in North America. Currently, the reference standard for assessing scoliosis is based on the manual assignment of Cobb angles at the site of the curvature center. This manual process is time consuming and unreliable as it is affected by inter- and intra-observer variance. To overcome these inaccuracies, machine learning (ML) methods can be used to automate the Cobb angle measurement process. This paper proposes to address the Cobb angle measurement task using YOLACT, an instance segmentation model. The proposed method first segments the vertebrae in an X-Ray image using YOLACT, then it tracks the important landmarks using the minimum bounding box approach. Lastly, the extracted landmarks are used to calculate the corresponding Cobb angles. The model achieved a Symmetric Mean Absolute Percentage Error (SMAPE) score of 10.76%, demonstrating the reliability of this process in both vertebra localization and Cobb angle measurement.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.