-
Large Language and Reasoning Models are Shallow Disjunctive Reasoners
Authors:
Irtaza Khalid,
Amir Masoud Nourollah,
Steven Schockaert
Abstract:
Large Language Models (LLMs) have been found to struggle with systematic reasoning. Even on tasks where they appear to perform well, their performance often depends on shortcuts, rather than on genuine reasoning abilities, leading them to collapse on out-of-distribution (OOD) examples. Post-training strategies based on reinforcement learning and chain-of-thought prompting have recently been hailed…
▽ More
Large Language Models (LLMs) have been found to struggle with systematic reasoning. Even on tasks where they appear to perform well, their performance often depends on shortcuts, rather than on genuine reasoning abilities, leading them to collapse on out-of-distribution (OOD) examples. Post-training strategies based on reinforcement learning and chain-of-thought prompting have recently been hailed as a step change. However, little is known about the potential of the resulting ``Large Reasoning Models'' (LRMs) beyond maths and programming-based problem solving, where genuine OOD problems can be sparse. In this paper, we focus on tasks that require systematic relational composition for qualitative spatial and temporal reasoning. The setting allows fine control over problem difficulty to precisely measure OOD generalization. We find that, zero-shot LRMs generally outperform their LLM counterparts in single-path reasoning tasks but struggle in the multi-path setting. Whilst showing comparatively better results, fine-tuned LLMs are also not capable of multi-path generalization. We also provide evidence for the behavioral interpretation for this, i.e., that LRMs are shallow disjunctive reasoners.
△ Less
Submitted 2 June, 2025; v1 submitted 30 March, 2025;
originally announced March 2025.
-
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs
Authors:
Zara Siddique,
Irtaza Khalid,
Liam D. Turner,
Luis Espinosa-Anke
Abstract:
We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematically identify effective contrastive pair datasets across nine bias axes. When optimized on the BBQ dataset, our individually tuned steering vectors achieve average improvements of 12.2%, 4.7%, and 3.2…
▽ More
We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematically identify effective contrastive pair datasets across nine bias axes. When optimized on the BBQ dataset, our individually tuned steering vectors achieve average improvements of 12.2%, 4.7%, and 3.2% over the baseline for Mistral, Llama, and Qwen, respectively. Building on these promising results, we introduce Steering Vector Ensembles (SVE), a method that averages multiple individually optimized steering vectors, each targeting a specific bias axis such as age, race, or gender. By leveraging their collective strength, SVE outperforms individual steering vectors in both bias reduction and maintaining model performance. The work presents the first systematic investigation of steering vectors for bias mitigation, and we demonstrate that SVE is a powerful and computationally efficient strategy for reducing bias in LLMs, with broader implications for enhancing AI safety.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
Systematic Relational Reasoning With Epistemic Graph Neural Networks
Authors:
Irtaza Khalid,
Steven Schockaert
Abstract:
Developing models that can learn to reason is a notoriously challenging problem. We focus on reasoning in relational domains, where the use of Graph Neural Networks (GNNs) seems like a natural choice. However, previous work has shown that regular GNNs lack the ability to systematically generalize from training examples on test graphs requiring longer inference chains, which fundamentally limits th…
▽ More
Developing models that can learn to reason is a notoriously challenging problem. We focus on reasoning in relational domains, where the use of Graph Neural Networks (GNNs) seems like a natural choice. However, previous work has shown that regular GNNs lack the ability to systematically generalize from training examples on test graphs requiring longer inference chains, which fundamentally limits their reasoning abilities. A common solution relies on neuro-symbolic methods that systematically reason by learning rules, but their scalability is often limited and they tend to make unrealistically strong assumptions, e.g.\ that the answer can always be inferred from a single relational path. We propose the Epistemic GNN (EpiGNN), a novel parameter-efficient and scalable GNN architecture with an epistemic inductive bias for systematic reasoning. Node embeddings in EpiGNNs are treated as epistemic states, and message passing is implemented accordingly. We show that EpiGNNs achieve state-of-the-art results on link prediction tasks that require systematic reasoning. Furthermore, for inductive knowledge graph completion, EpiGNNs rival the performance of state-of-the-art specialized approaches. Finally, we introduce two new benchmarks that go beyond standard relational reasoning by requiring the aggregation of information from multiple paths. Here, existing neuro-symbolic approaches fail, yet EpiGNNs learn to reason accurately. Code and datasets are available at https://github.com/erg0dic/gnn-sg.
△ Less
Submitted 27 February, 2025; v1 submitted 24 July, 2024;
originally announced July 2024.
-
Sample-efficient Model-based Reinforcement Learning for Quantum Control
Authors:
Irtaza Khalid,
Carrie A. Weidner,
Edmond A. Jonckheere,
Sophie G. Shermer,
Frank C. Langbein
Abstract:
We propose a model-based reinforcement learning (RL) approach for noisy time-dependent gate optimization with improved sample complexity over model-free RL. Sample complexity is the number of controller interactions with the physical system. Leveraging an inductive bias, inspired by recent advances in neural ordinary differential equations (ODEs), we use an auto-differentiable ODE parametrised by…
▽ More
We propose a model-based reinforcement learning (RL) approach for noisy time-dependent gate optimization with improved sample complexity over model-free RL. Sample complexity is the number of controller interactions with the physical system. Leveraging an inductive bias, inspired by recent advances in neural ordinary differential equations (ODEs), we use an auto-differentiable ODE parametrised by a learnable Hamiltonian ansatz to represent the model approximating the environment whose time-dependent part, including the control, is fully known. Control alongside Hamiltonian learning of continuous time-independent parameters is addressed through interactions with the system. We demonstrate an order of magnitude advantage in the sample complexity of our method over standard model-free RL in preparing some standard unitary gates with closed and open system dynamics, in realistic numerical experiments incorporating single shot measurements, arbitrary Hilbert space truncations and uncertainty in Hamiltonian parameters. Also, the learned Hamiltonian can be leveraged by existing control methods like GRAPE for further gradient-based optimization with the controllers found by RL as initializations. Our algorithm that we apply on nitrogen vacancy (NV) centers and transmons in this paper is well suited for controlling partially characterised one and two qubit systems.
△ Less
Submitted 2 October, 2023; v1 submitted 19 April, 2023;
originally announced April 2023.
-
Analyzing and Unifying Robustness Measures for Excitation Transfer Control in Spin Networks
Authors:
S. P. O'Neil,
I. Khalid,
A. A. Rompokos,
C. A. Weidner,
F. C. Langbein,
S. G. Schirmer,
E. A. Jonckheere
Abstract:
Recent achievements in quantum control have resulted in advanced techniques for designing controllers for applications in quantum communication, computing, and sensing. However, the susceptibility of such systems to noise and uncertainties necessitates robust controllers that perform effectively under these conditions to realize the full potential of quantum devices. The time-domain log-sensitivit…
▽ More
Recent achievements in quantum control have resulted in advanced techniques for designing controllers for applications in quantum communication, computing, and sensing. However, the susceptibility of such systems to noise and uncertainties necessitates robust controllers that perform effectively under these conditions to realize the full potential of quantum devices. The time-domain log-sensitivity and a recently introduced robustness infidelity measure (RIM) are two means to quantify controller robustness in quantum systems. The former can be found analytically, while the latter requires Monte-Carlo sampling. In this work, the correlation between the log-sensitivity and the RIM for evaluating the robustness of single excitation transfer fidelity in spin chains and rings in the presence of dephasing is investigated. We show that the expected differential sensitivity of the error agrees with the differential sensitivity of the RIM, where the expectation is over the error probability distribution. Statistical analysis also demonstrates that the log-sensitivity and the RIM are linked via the differential sensitivity, and that the differential sensitivity and RIM are highly concordant. This unification of two means (one analytic and one via sampling) to assess controller robustness in a variety of realistic scenarios provides a first step in unifying various tools to model and assess robustness of quantum controllers.
△ Less
Submitted 14 June, 2023; v1 submitted 16 March, 2023;
originally announced March 2023.
-
Statistically Characterising Robustness and Fidelity of Quantum Controls and Quantum Control Algorithms
Authors:
Irtaza Khalid,
Carrie A. Weidner,
Edmond A. Jonckheere,
Sophie G. Shermer,
Frank C. Langbein
Abstract:
Robustness of quantum operations or controls is important to build reliable quantum devices. The robustness-infidelity measure (RIM$_p$) is introduced to statistically quantify the robustness and fidelity of a controller as the p-order Wasserstein distance between the fidelity distribution of the controller under any uncertainty and an ideal fidelity distribution. The RIM$_p$ is the p-th root of t…
▽ More
Robustness of quantum operations or controls is important to build reliable quantum devices. The robustness-infidelity measure (RIM$_p$) is introduced to statistically quantify the robustness and fidelity of a controller as the p-order Wasserstein distance between the fidelity distribution of the controller under any uncertainty and an ideal fidelity distribution. The RIM$_p$ is the p-th root of the p-th raw moment of the infidelity distribution. Using a metrization argument, we justify why RIM$_1$ (the average infidelity) suffices as a practical robustness measure. Based on the RIM$_p$, an algorithmic robustness-infidelity measure (ARIM) is developed to quantify the expected robustness and fidelity of controllers found by a control algorithm. The utility of the RIM and ARIM is demonstrated by considering the problem of robust control of spin-$\tfrac{1}{2}$ networks using energy landscape shaping subject to Hamiltonian uncertainty. The robustness and fidelity of individual control solutions as well as the expected robustness and fidelity of controllers found by different popular quantum control algorithms are characterized. For algorithm comparisons, stochastic and non-stochastic optimization objectives are considered, with the goal of effective RIM optimization in the latter. Although high fidelity and robustness are often conflicting objectives, some high fidelity, robust controllers can usually be found, irrespective of the choice of the quantum control algorithm. However, for noisy optimization objectives, adaptive sequential decision making approaches such as reinforcement learning have a cost advantage compared to standard control algorithms and, in contrast, the infidelities obtained are more consistent with higher RIM values for low noise levels.
△ Less
Submitted 1 March, 2023; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Reinforcement Learning vs. Gradient-Based Optimisation for Robust Energy Landscape Control of Spin-1/2 Quantum Networks
Authors:
I. Khalid,
C. A. Weidner,
E. A. Jonckheere,
S. G. Schirmer,
F. C. Langbein
Abstract:
We explore the use of policy gradient methods in reinforcement learning for quantum control via energy landscape shaping of XX-Heisenberg spin chains in a model agnostic fashion. Their performance is compared to finding controllers using gradient-based L-BFGS optimisation with restarts, with full access to an analytical model. Hamiltonian noise and coarse-graining of fidelity measurements are cons…
▽ More
We explore the use of policy gradient methods in reinforcement learning for quantum control via energy landscape shaping of XX-Heisenberg spin chains in a model agnostic fashion. Their performance is compared to finding controllers using gradient-based L-BFGS optimisation with restarts, with full access to an analytical model. Hamiltonian noise and coarse-graining of fidelity measurements are considered. Reinforcement learning is able to tackle challenging, noisy quantum control problems where L-BFGS optimization algorithms struggle to perform well. Robustness analysis under different levels of Hamiltonian noise indicates that controllers found by reinforcement learning appear to be less affected by noise than those found with L-BFGS.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Customer Engagement Plans for Peak Load Reduction in Residential Smart Grids
Authors:
Naveed Ul Hassan,
Yawar Ismail Khalid,
Chau Yuen,
Wayes Tushar
Abstract:
In this paper, we propose and study the effectiveness of customer engagement plans that clearly specify the amount of intervention in customer's load settings by the grid operator for peak load reduction. We suggest two different types of plans, including Constant Deviation Plans (CDPs) and Proportional Deviation Plans (PDPs). We define an adjustable reference temperature for both CDPs and PDPs to…
▽ More
In this paper, we propose and study the effectiveness of customer engagement plans that clearly specify the amount of intervention in customer's load settings by the grid operator for peak load reduction. We suggest two different types of plans, including Constant Deviation Plans (CDPs) and Proportional Deviation Plans (PDPs). We define an adjustable reference temperature for both CDPs and PDPs to limit the output temperature of each thermostat load and to control the number of devices eligible to participate in Demand Response Program (DRP). We model thermostat loads as power throttling devices and design algorithms to evaluate the impact of power throttling states and plan parameters on peak load reduction. Based on the simulation results, we recommend PDPs to the customers of a residential community with variable thermostat set point preferences, while CDPs are suitable for customers with similar thermostat set point preferences. If thermostat loads have multiple power throttling states, customer engagement plans with less temperature deviations from thermostat set points are recommended. Contrary to classical ON/OFF control, higher temperature deviations are required to achieve similar amount of peak load reduction. Several other interesting tradeoffs and useful guidelines for designing mutually beneficial incentives for both the grid operator and customers can also be identified.
△ Less
Submitted 13 February, 2015;
originally announced February 2015.
-
Demand Response Management For Power Throttling Air Conditioning Loads In Residential Smart Grids
Authors:
Yawar Ismail Khalid,
Naveed Ul Hassan,
Chau Yuen,
Shisheng Huang
Abstract:
In this paper we develop an algorithm for peak load reduction to reduce the impact of increased air conditioner usage in a residential smart grid community. We develop Demand Response Management (DRM) plans that clearly spell out the maximum duration as well as maximum severity of inconvenience. We model the air conditioner as a power throttling device and for any given DRM plan we study the impac…
▽ More
In this paper we develop an algorithm for peak load reduction to reduce the impact of increased air conditioner usage in a residential smart grid community. We develop Demand Response Management (DRM) plans that clearly spell out the maximum duration as well as maximum severity of inconvenience. We model the air conditioner as a power throttling device and for any given DRM plan we study the impact of increasing the number of power states on the resulting peak load reduction. Through simulations, we find out that adding just one additional state to the basic ON/OFF model, which can throttle power to 50% of the rated air conditioner power, can result in significant amount of peak reduction. However, the peak load that can be reduced is diminishing with the increase in number of states. Furthermore, we also observe the impact of inconvenience duration and inconvenience severity in terms of peak load reduction. These observations can serve as useful guidelines for developing appropriate DRM plans.
△ Less
Submitted 6 August, 2014;
originally announced August 2014.