-
Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs
Authors:
Hector Kohler,
Quentin Delfosse,
Waris Radji,
Riad Akrour,
Philippe Preux
Abstract:
There exist applications of reinforcement learning like medicine where policies need to be ''interpretable'' by humans. User studies have shown that some policy classes might be more interpretable than others. However, it is costly to conduct human studies of policy interpretability. Furthermore, there is no clear definition of policy interpretabiliy, i.e., no clear metrics for interpretability an…
▽ More
There exist applications of reinforcement learning like medicine where policies need to be ''interpretable'' by humans. User studies have shown that some policy classes might be more interpretable than others. However, it is costly to conduct human studies of policy interpretability. Furthermore, there is no clear definition of policy interpretabiliy, i.e., no clear metrics for interpretability and thus claims depend on the chosen definition. We tackle the problem of empirically evaluating policies interpretability without humans. Despite this lack of clear definition, researchers agree on the notions of ''simulatability'': policy interpretability should relate to how humans understand policy actions given states. To advance research in interpretable reinforcement learning, we contribute a new methodology to evaluate policy interpretability. This new methodology relies on proxies for simulatability that we use to conduct a large-scale empirical evaluation of policy interpretability. We use imitation learning to compute baseline policies by distilling expert neural networks into small programs. We then show that using our methodology to evaluate the baselines interpretability leads to similar conclusions as user studies. We show that increasing interpretability does not necessarily reduce performances and can sometimes increase them. We also show that there is no policy class that better trades off interpretability and performance across tasks making it necessary for researcher to have methodologies for comparing policies interpretability.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Stable Tree Labelling for Accelerating Distance Queries on Dynamic Road Networks
Authors:
Henning Koehler,
Muhammad Farhan,
Qing Wang
Abstract:
Finding the shortest-path distance between two arbitrary vertices is an important problem in road networks. Due to real-time traffic conditions, road networks undergo dynamic changes all the time. Current state-of-the-art methods incrementally maintain a distance labelling based on a hierarchy among vertices to support efficient distance computation. However, their labelling sizes are often large…
▽ More
Finding the shortest-path distance between two arbitrary vertices is an important problem in road networks. Due to real-time traffic conditions, road networks undergo dynamic changes all the time. Current state-of-the-art methods incrementally maintain a distance labelling based on a hierarchy among vertices to support efficient distance computation. However, their labelling sizes are often large and cannot be efficiently maintained. To combat these issues, we present a simple yet efficient labelling method, namely \emph{Stable Tree Labelling} (STL), for answering distance queries on dynamic road networks. We observe that the properties of an underlying hierarchy play an important role in improving and balancing query and update performance. Thus, we introduce the notion of \emph{stable tree hierarchy} which lays the ground for developing efficient maintenance algorithms on dynamic road networks. Based on stable tree hierarchy, STL can be efficiently constructed as a 2-hop labelling. A crucial ingredient of STL is to only store distances within subgraphs in labels, rather than distances in the entire graph, which restricts the labels affected by dynamic changes. We further develop two efficient maintenance algorithms upon STL: \emph{Label Search algorithm} and \emph{Pareto Search algorithm}. Label Search algorithm identifies affected ancestors in a stable tree hierarchy and performs efficient searches to update labels from those ancestors. Pareto Search algorithm explores the interaction between search spaces of different ancestors, and combines searches from multiple ancestors into only two searches for each update, eliminating duplicate graph traversals. The experiments show that our algorithms significantly outperform state-of-the-art dynamic methods in maintaining the labelling and query processing, while requiring an order of magnitude less space.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning
Authors:
Hector Kohler,
Quentin Delfosse,
Riad Akrour,
Kristian Kersting,
Philippe Preux
Abstract:
Deep reinforcement learning agents are prone to goal misalignments. The black-box nature of their policies hinders the detection and correction of such misalignments, and the trust necessary for real-world deployment. So far, solutions learning interpretable policies are inefficient or require many human priors. We propose INTERPRETER, a fast distillation method producing INTerpretable Editable tR…
▽ More
Deep reinforcement learning agents are prone to goal misalignments. The black-box nature of their policies hinders the detection and correction of such misalignments, and the trust necessary for real-world deployment. So far, solutions learning interpretable policies are inefficient or require many human priors. We propose INTERPRETER, a fast distillation method producing INTerpretable Editable tRee Programs for ReinforcEmenT lEaRning. We empirically demonstrate that INTERPRETER compact tree programs match oracles across a diverse set of sequential decision tasks and evaluate the impact of our design choices on interpretability and performances. We show that our policies can be interpreted and edited to correct misalignments on Atari games and to explain real farming strategies.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
PID Tuning using Cross-Entropy Deep Learning: a Lyapunov Stability Analysis
Authors:
Hector Kohler,
Benoit Clement,
Thomas Chaffre,
Gilles Le Chenadec
Abstract:
Underwater Unmanned Vehicles (UUVs) have to constantly compensate for the external disturbing forces acting on their body. Adaptive Control theory is commonly used there to grant the control law some flexibility in its response to process variation. Today, learning-based (LB) adaptive methods are leading the field where model-based control structures are combined with deep model-free learning algo…
▽ More
Underwater Unmanned Vehicles (UUVs) have to constantly compensate for the external disturbing forces acting on their body. Adaptive Control theory is commonly used there to grant the control law some flexibility in its response to process variation. Today, learning-based (LB) adaptive methods are leading the field where model-based control structures are combined with deep model-free learning algorithms. This work proposes experiments and metrics to empirically study the stability of such a controller. We perform this stability analysis on a LB adaptive control system whose adaptive parameters are determined using a Cross-Entropy Deep Learning method.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Towards a Research Community in Interpretable Reinforcement Learning: the InterpPol Workshop
Authors:
Hector Kohler,
Quentin Delfosse,
Paul Festor,
Philippe Preux
Abstract:
Embracing the pursuit of intrinsically explainable reinforcement learning raises crucial questions: what distinguishes explainability from interpretability? Should explainable and interpretable agents be developed outside of domains where transparency is imperative? What advantages do interpretable policies offer over neural networks? How can we rigorously define and measure interpretability in po…
▽ More
Embracing the pursuit of intrinsically explainable reinforcement learning raises crucial questions: what distinguishes explainability from interpretability? Should explainable and interpretable agents be developed outside of domains where transparency is imperative? What advantages do interpretable policies offer over neural networks? How can we rigorously define and measure interpretability in policies, without user studies? What reinforcement learning paradigms,are the most suited to develop interpretable agents? Can Markov Decision Processes integrate interpretable state representations? In addition to motivate an Interpretable RL community centered around the aforementioned questions, we propose the first venue dedicated to Interpretable RL: the InterpPol Workshop.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Hierarchical Cut Labelling -- Scaling Up Distance Queries on Road Networks
Authors:
Muhammad Farhan,
Henning Koehler,
Robert Ohms,
Qing Wang
Abstract:
Answering the shortest-path distance between two arbitrary locations is a fundamental problem in road networks. Labelling-based solutions are the current state-of-the-arts to render fast response time, which can generally be categorised into hub-based labellings, highway-based labellings, and tree decomposition labellings. Hub-based and highway-based labellings exploit hierarchical structures of r…
▽ More
Answering the shortest-path distance between two arbitrary locations is a fundamental problem in road networks. Labelling-based solutions are the current state-of-the-arts to render fast response time, which can generally be categorised into hub-based labellings, highway-based labellings, and tree decomposition labellings. Hub-based and highway-based labellings exploit hierarchical structures of road networks with the aim to reduce labelling size for improving query efficiency. However, these solutions still result in large search spaces on distance labels at query time, particularly when road networks are large. Tree decomposition labellings leverage a hierarchy of vertices to reduce search spaces over distance labels at query time, but such a hierarchy is generated using tree decomposition techniques, which may yield very large labelling sizes and slow querying. In this paper, we propose a novel solution \emph{hierarchical cut 2-hop labelling (HC2L)} to address the drawbacks of the existing works. Our solution combines the benefits of hierarchical structures from both perspectives - reduce the size of a distance labelling at preprocessing time and further reduce the search space on a distance labelling at query time. At its core, we propose a new hierarchy, \emph{balanced tree hierarchy}, which enables a fast, efficient data structure to reduce the size of distance labelling and to select a very small subset of labels to compute the shortest-path distance at query time. To speed up the construction process of HC2L, we further propose a parallel variant of our method, namely HC2L$^p$. We have evaluated our solution on 10 large real-world road networks through extensive experiments.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
Authors:
Hector Kohler,
Riad Akrour,
Philippe Preux
Abstract:
Interpretability of AI models allows for user safety checks to build trust in such AIs. In particular, Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. However, interpretability is hindered if the DT is too large. To learn compact trees, a recent Reinforcement Learning (RL) framework has been pr…
▽ More
Interpretability of AI models allows for user safety checks to build trust in such AIs. In particular, Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. However, interpretability is hindered if the DT is too large. To learn compact trees, a recent Reinforcement Learning (RL) framework has been proposed to explore the space of DTs using deep RL. This framework augments a decision problem (e.g. a supervised classification task) with additional actions that gather information about the features of an otherwise hidden input. By appropriately penalizing these actions, the agent learns to optimally trade-off size and performance of DTs. In practice, a reactive policy for a partially observable Markov decision process (MDP) needs to be learned, which is still an open problem. We show in this paper that deep RL can fail even on simple toy tasks of this class. However, when the underlying decision problem is a supervised classification task, we show that finding the optimal tree can be cast as a fully observable Markov decision problem and be solved efficiently, giving rise to a new family of algorithms for learning DTs that go beyond the classical greedy maximization ones.
△ Less
Submitted 21 January, 2024; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Breiman meets Bellman: Non-Greedy Decision Trees with MDPs
Authors:
Hector Kohler,
Riad Akrour,
Philippe Preux
Abstract:
In supervised learning, decision trees are valued for their interpretability and performance. While greedy decision tree algorithms like CART remain widely used due to their computational efficiency, they often produce sub-optimal solutions with respect to a regularized training loss. Conversely, optimal decision tree methods can find better solutions but are computationally intensive and typicall…
▽ More
In supervised learning, decision trees are valued for their interpretability and performance. While greedy decision tree algorithms like CART remain widely used due to their computational efficiency, they often produce sub-optimal solutions with respect to a regularized training loss. Conversely, optimal decision tree methods can find better solutions but are computationally intensive and typically limited to shallow trees or binary features. We present Dynamic Programming Decision Trees (DPDT), a framework that bridges the gap between greedy and optimal approaches. DPDT relies on a Markov Decision Process formulation combined with heuristic split generation to construct near-optimal decision trees with significantly reduced computational complexity. Our approach dynamically limits the set of admissible splits at each node while directly optimizing the tree regularized training loss. Theoretical analysis demonstrates that DPDT can minimize regularized training losses at least as well as CART. Our empirical study shows on multiple datasets that DPDT achieves near-optimal loss with orders of magnitude fewer operations than existing optimal solvers. More importantly, extensive benchmarking suggests statistically significant improvements of DPDT over both CART and optimal decision trees in terms of generalization to unseen data. We demonstrate DPDT practicality through applications to boosting, where it consistently outperforms baselines. Our framework provides a promising direction for developing efficient, near-optimal decision tree algorithms that scale to practical applications.
△ Less
Submitted 1 June, 2025; v1 submitted 22 September, 2023;
originally announced September 2023.
-
AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents
Authors:
Timothée Mathieu,
Riccardo Della Vecchia,
Alena Shilova,
Matheus Medeiros Centa,
Hector Kohler,
Odalric-Ambrym Maillard,
Philippe Preux
Abstract:
Recently, the scientific community has questioned the statistical reproducibility of many empirical results, especially in the field of machine learning. To contribute to the resolution of this reproducibility crisis, we propose a theoretically sound methodology for comparing the performance of a set of algorithms. We exemplify our methodology in Deep Reinforcement Learning (Deep RL). The performa…
▽ More
Recently, the scientific community has questioned the statistical reproducibility of many empirical results, especially in the field of machine learning. To contribute to the resolution of this reproducibility crisis, we propose a theoretically sound methodology for comparing the performance of a set of algorithms. We exemplify our methodology in Deep Reinforcement Learning (Deep RL). The performance of one execution of a Deep RL algorithm is a random variable. Therefore, several independent executions are needed to evaluate its performance. When comparing algorithms with random performance, a major question concerns the number of executions to perform to ensure that the result of the comparison is theoretically sound. Researchers in Deep RL often use less than 5 independent executions to compare algorithms: we claim that this is not enough in general. Moreover, when comparing more than 2 algorithms at once, we have to use a multiple tests procedure to preserve low error guarantees. We introduce AdaStop, a new statistical test based on multiple group sequential tests. When used to compare algorithms, AdaStop adapts the number of executions to stop as early as possible while ensuring that enough information has been collected to distinguish algorithms that have different score distributions. We prove theoretically that AdaStop has a low probability of making a (family-wise) error. We illustrate the effectiveness of AdaStop in various use-cases, including toy examples and Deep RL algorithms on challenging Mujoco environments. AdaStop is the first statistical test fitted to this sort of comparisons: it is both a significant contribution to statistics, and an important contribution to computational studies performed in reinforcement learning and in other domains.
△ Less
Submitted 12 December, 2024; v1 submitted 19 June, 2023;
originally announced June 2023.
-
Lp- and Risk Consistency of Localized SVMs
Authors:
Hannes Köhler
Abstract:
Kernel-based regularized risk minimizers, also called support vector machines (SVMs), are known to possess many desirable properties but suffer from their super-linear computational requirements when dealing with large data sets. This problem can be tackled by using localized SVMs instead, which also offer the additional advantage of being able to apply different hyperparameters to different regio…
▽ More
Kernel-based regularized risk minimizers, also called support vector machines (SVMs), are known to possess many desirable properties but suffer from their super-linear computational requirements when dealing with large data sets. This problem can be tackled by using localized SVMs instead, which also offer the additional advantage of being able to apply different hyperparameters to different regions of the input space. In this paper, localized SVMs are analyzed with regards to their consistency. It is proven that they inherit $L_p$- as well as risk consistency from global SVMs under very weak conditions and even if the regions underlying the localized SVMs are allowed to change as the size of the training data set increases.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Optimal Interpretability-Performance Trade-off of Classification Trees with Black-Box Reinforcement Learning
Authors:
Hector Kohler,
Riad Akrour,
Philippe Preux
Abstract:
Interpretability of AI models allows for user safety checks to build trust in these models. In particular, decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data. However, interpretability is hindered if the DT is too large. To learn compact trees, a Reinforcement Learning (RL) framework has been rece…
▽ More
Interpretability of AI models allows for user safety checks to build trust in these models. In particular, decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data. However, interpretability is hindered if the DT is too large. To learn compact trees, a Reinforcement Learning (RL) framework has been recently proposed to explore the space of DTs. A given supervised classification task is modeled as a Markov decision problem (MDP) and then augmented with additional actions that gather information about the features, equivalent to building a DT. By appropriately penalizing these actions, the RL agent learns to optimally trade-off size and performance of a DT. However, to do so, this RL agent has to solve a partially observable MDP. The main contribution of this paper is to prove that it is sufficient to solve a fully observable problem to learn a DT optimizing the interpretability-performance trade-off. As such any planning or RL algorithm can be used. We demonstrate the effectiveness of this approach on a set of classical supervised classification datasets and compare our approach with other interpretability-performance optimizing methods.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
On the Connection between $L_p$ and Risk Consistency and its Implications on Regularized Kernel Methods
Authors:
Hannes Köhler
Abstract:
As a predictor's quality is often assessed by means of its risk, it is natural to regard risk consistency as a desirable property of learning methods, and many such methods have indeed been shown to be risk consistent. The first aim of this paper is to establish the close connection between risk consistency and $L_p$-consistency for a considerably wider class of loss functions than has been done b…
▽ More
As a predictor's quality is often assessed by means of its risk, it is natural to regard risk consistency as a desirable property of learning methods, and many such methods have indeed been shown to be risk consistent. The first aim of this paper is to establish the close connection between risk consistency and $L_p$-consistency for a considerably wider class of loss functions than has been done before. The attempt to transfer this connection to shifted loss functions surprisingly reveals that this shift does not reduce the assumptions needed on the underlying probability measure to the same extent as it does for many other results. The results are applied to regularized kernel methods such as support vector machines.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
BatchHL: Answering Distance Queries on Batch-Dynamic Networks at Scale
Authors:
Muhammad Farhan,
Qing Wang,
Henning Koehler
Abstract:
Many real-world applications operate on dynamic graphs that undergo rapid changes in their topological structure over time. However, it is challenging to design dynamic algorithms that are capable of supporting such graph changes efficiently. To circumvent the challenge, we propose a batch-dynamic framework for answering distance queries, which combines offline labelling and online searching to le…
▽ More
Many real-world applications operate on dynamic graphs that undergo rapid changes in their topological structure over time. However, it is challenging to design dynamic algorithms that are capable of supporting such graph changes efficiently. To circumvent the challenge, we propose a batch-dynamic framework for answering distance queries, which combines offline labelling and online searching to leverage the advantages from both sides - accelerating query processing through a partial distance labelling that is of limited size but provides a good approximation to bound online searches. We devise batch-dynamic algorithms to dynamize a distance labelling efficiently in order to reflect batch updates on the underlying graph. In addition to providing theoretical analysis for the correctness, labelling minimality, and computational complexity, we have conducted experiments on 14 real-world networks to empirically verify the efficiency and scalability of the proposed algorithms.
△ Less
Submitted 23 April, 2022;
originally announced April 2022.
-
Query-by-Sketch: Scaling Shortest Path Graph Queries on Very Large Networks
Authors:
Ye Wang,
Qing Wang,
Henning Koehler,
Yu Lin
Abstract:
Computing shortest paths is a fundamental operation in processing graph data. In many real-world applications, discovering shortest paths between two vertices empowers us to make full use of the underlying structure to understand how vertices are related in a graph, e.g. the strength of social ties between individuals in a social network. In this paper, we study the shortest-path-graph problem tha…
▽ More
Computing shortest paths is a fundamental operation in processing graph data. In many real-world applications, discovering shortest paths between two vertices empowers us to make full use of the underlying structure to understand how vertices are related in a graph, e.g. the strength of social ties between individuals in a social network. In this paper, we study the shortest-path-graph problem that aims to efficiently compute a shortest path graph containing exactly all shortest paths between any arbitrary pair of vertices on complex networks. Our goal is to design an exact solution that can scale to graphs with millions or billions of vertices and edges. To achieve high scalability, we propose a novel method, Query-by-Sketch (QbS), which efficiently leverages offline labelling (i.e., precomputed labels) to guide online searching through a fast sketching process that summarizes the important structural aspects of shortest paths in answering shortest-path-graph queries. We theoretically prove the correctness of this method and analyze its computational complexity. To empirically verify the efficiency of QbS, we conduct experiments on 12 real-world datasets, among which the largest dataset has 1.7 billion vertices and 7.8 billion edges. The experimental results show that QbS can answer shortest-path graph queries in microseconds for million-scale graphs and less than half a second for billion-scale graphs.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Total Stability of SVMs and Localized SVMs
Authors:
Hannes Köhler,
Andreas Christmann
Abstract:
Regularized kernel-based methods such as support vector machines (SVMs) typically depend on the underlying probability measure $\mathrm{P}$ (respectively an empirical measure $\mathrm{D}_n$ in applications) as well as on the regularization parameter $λ$ and the kernel $k$. Whereas classical statistical robustness only considers the effect of small perturbations in $\mathrm{P}$, the present paper i…
▽ More
Regularized kernel-based methods such as support vector machines (SVMs) typically depend on the underlying probability measure $\mathrm{P}$ (respectively an empirical measure $\mathrm{D}_n$ in applications) as well as on the regularization parameter $λ$ and the kernel $k$. Whereas classical statistical robustness only considers the effect of small perturbations in $\mathrm{P}$, the present paper investigates the influence of simultaneous slight variations in the whole triple $(\mathrm{P},λ,k)$, respectively $(\mathrm{D}_n,λ_n,k)$, on the resulting predictor. Existing results from the literature are considerably generalized and improved. In order to also make them applicable to big data, where regular SVMs suffer from their super-linear computational requirements, we show how our results can be transferred to the context of localized learning. Here, the effect of slight variations in the applied regionalization, which might for example stem from changes in $\mathrm{P}$ respectively $\mathrm{D}_n$, is considered as well.
△ Less
Submitted 29 January, 2021;
originally announced January 2021.
-
A characterization of maximal 2-dimensional subgraphs of transitive graphs
Authors:
Henning Koehler
Abstract:
A transitive graph is 2-dimensional if it can be represented as the intersection of two linear orders. Such representations make answering of reachability queries trivial, and allow many problems that are NP-hard on arbitrary graphs to be solved in polynomial time. One may therefore be interested in finding 2-dimensional graphs that closely approximate a given graph of arbitrary order dimension.…
▽ More
A transitive graph is 2-dimensional if it can be represented as the intersection of two linear orders. Such representations make answering of reachability queries trivial, and allow many problems that are NP-hard on arbitrary graphs to be solved in polynomial time. One may therefore be interested in finding 2-dimensional graphs that closely approximate a given graph of arbitrary order dimension.
In this paper we show that the maximal 2-dimensional subgraphs of a transitive graph G are induced by the optimal near-transitive orientations of the complement of G. The same characterization holds for the maximal permutation subgraphs of a transitively orientable graph. We provide an algorithm that enables this problem reduction in near-linear time, and an approach for enlarging non-maximal 2-dimensional subgraphs, such as trees.
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
Modular decomposition of transitive graphs and transitively orienting their complements
Authors:
Henning Koehler
Abstract:
The modular decomposition of a graph is a canonical representation of its modules. Algorithms for computing the modular decomposition of directed and undirected graphs differ significantly, with the undirected case being simpler, and algorithms for directed graphs often work by reducing the problem to decomposing undirected graphs. In this paper we show that transitive acyclic digraphs have the sa…
▽ More
The modular decomposition of a graph is a canonical representation of its modules. Algorithms for computing the modular decomposition of directed and undirected graphs differ significantly, with the undirected case being simpler, and algorithms for directed graphs often work by reducing the problem to decomposing undirected graphs. In this paper we show that transitive acyclic digraphs have the same strong modules as their undirected versions. This simplifies reduction for transitive digraphs, requiring only the computation of strongly connected components. Furthermore, we are interested in permutation graphs, where both the graph and its complement are transitively orientable. Such graphs may be represented indirectly, as the transitive closure of a given graph. For non-transitive graphs we present a linear-time algorithm which allows us to identify prime-free modules w.r.t their transitive closure, which speeds up both modular decomposition and transitive orientation for sparse graphs. Finally, we show that any transitive orientation of a digraph's complement also transitively orients the complement of the digraph's transitive closure, allowing us to find such orientations in (near-)linear time.
△ Less
Submitted 11 October, 2017;
originally announced October 2017.