-
Hybrid Control Strategies for Safe and Adaptive Robot-Assisted Dressing
Authors:
Yasmin Rafiq,
Baslin A. James,
Ke Xu,
Robert M. Hierons,
Sanja Dogramadzi
Abstract:
Safety, reliability, and user trust are crucial in human-robot interaction (HRI) where the robots must address hazards in real-time. This study presents hazard driven low-level control strategies implemented in robot-assisted dressing (RAD) scenarios where hazards like garment snags and user discomfort in real-time can affect task performance and user safety. The proposed control mechanisms includ…
▽ More
Safety, reliability, and user trust are crucial in human-robot interaction (HRI) where the robots must address hazards in real-time. This study presents hazard driven low-level control strategies implemented in robot-assisted dressing (RAD) scenarios where hazards like garment snags and user discomfort in real-time can affect task performance and user safety. The proposed control mechanisms include: (1) Garment Snagging Control Strategy, which detects excessive forces and either seeks user intervention via a chatbot or autonomously adjusts its trajectory, and (2) User Discomfort/Pain Mitigation Strategy, which dynamically reduces velocity based on user feedback and aborts the task if necessary. We used physical dressing trials in order to evaluate these control strategies. Results confirm that integrating force monitoring with user feedback improves safety and task continuity. The findings emphasise the need for hybrid approaches that balance autonomous intervention, user involvement, and controlled task termination, supported by bi-directional interaction and real-time user-driven adaptability, paving the way for more responsive and personalised HRI systems.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Using Causal Inference to Test Systems with Hidden and Interacting Variables: An Evaluative Case Study
Authors:
Michael Foster,
Robert M. Hierons,
Donghwan Shin,
Neil Walkinshaw,
Christopher Wild
Abstract:
Software systems with large parameter spaces, nondeterminism and high computational cost are challenging to test. Recently, software testing techniques based on causal inference have been successfully applied to systems that exhibit such characteristics, including scientific models and autonomous driving systems. One significant limitation is that these are restricted to test properties where all…
▽ More
Software systems with large parameter spaces, nondeterminism and high computational cost are challenging to test. Recently, software testing techniques based on causal inference have been successfully applied to systems that exhibit such characteristics, including scientific models and autonomous driving systems. One significant limitation is that these are restricted to test properties where all of the variables involved can be observed and where there are no interactions between variables. In practice, this is rarely guaranteed; the logging infrastructure may not be available to record all of the necessary runtime variable values, and it can often be the case that an output of the system can be affected by complex interactions between variables. To address this, we leverage two additional concepts from causal inference, namely effect modification and instrumental variable methods. We build these concepts into an existing causal testing tool and conduct an evaluative case study which uses the concepts to test three system-level requirements of CARLA, a high-fidelity driving simulator widely used in autonomous vehicle development and testing. The results show that we can obtain reliable test outcomes without requiring large amounts of highly controlled test data or instrumentation of the code, even when variables interact with each other and are not recorded in the test data.
△ Less
Submitted 25 April, 2025; v1 submitted 23 April, 2025;
originally announced April 2025.
-
Symbolic Runtime Verification and Adaptive Decision-Making for Robot-Assisted Dressing
Authors:
Yasmin Rafiq,
Gricel Vázquez,
Radu Calinescu,
Sanja Dogramadzi,
Robert M Hierons
Abstract:
We present a control framework for robot-assisted dressing that augments low-level hazard response with runtime monitoring and formal verification. A parametric discrete-time Markov chain (pDTMC) models the dressing process, while Bayesian inference dynamically updates this pDTMC's transition probabilities based on sensory and user feedback. Safety constraints from hazard analysis are expressed in…
▽ More
We present a control framework for robot-assisted dressing that augments low-level hazard response with runtime monitoring and formal verification. A parametric discrete-time Markov chain (pDTMC) models the dressing process, while Bayesian inference dynamically updates this pDTMC's transition probabilities based on sensory and user feedback. Safety constraints from hazard analysis are expressed in probabilistic computation tree logic, and symbolically verified using a probabilistic model checker. We evaluate reachability, cost, and reward trade-offs for garment-snag mitigation and escalation, enabling real-time adaptation. Our approach provides a formal yet lightweight foundation for safety-aware, explainable robotic assistance.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Empirically Evaluating the Use of Bytecode for Diversity-Based Test Case Prioritisation
Authors:
Islam T. Elgendy,
Robert M. Hierons,
Phil McMinn
Abstract:
Regression testing assures software correctness after changes but is resource-intensive. Test Case Prioritisation (TCP) mitigates this by ordering tests to maximise early fault detection. Diversity-based TCP prioritises dissimilar tests, assuming they exercise different system parts and uncover more faults. Traditional static diversity-based TCP approaches (i.e., methods that utilise the dissimila…
▽ More
Regression testing assures software correctness after changes but is resource-intensive. Test Case Prioritisation (TCP) mitigates this by ordering tests to maximise early fault detection. Diversity-based TCP prioritises dissimilar tests, assuming they exercise different system parts and uncover more faults. Traditional static diversity-based TCP approaches (i.e., methods that utilise the dissimilarity of tests), like the state-of-the-art FAST approach, rely on textual diversity from test source code, which is effective but inefficient due to its relative verbosity and redundancies affecting similarity calculations. This paper is the first to study bytecode as the basis of diversity in TCP, leveraging its compactness for improved efficiency and accuracy. An empirical study on seven Defects4J projects shows that bytecode diversity improves fault detection by 2.3-7.8% over text-based TCP. It is also 2-3 orders of magnitude faster in one TCP approach and 2.5-6 times faster in FAST-based TCP. Filtering specific bytecode instructions improves efficiency up to fourfold while maintaining effectiveness, making bytecode diversity a superior static approach.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Complete FSM Testing Using Strong Separability
Authors:
Robert M. Hierons,
Mohammad Reza Mousavi
Abstract:
Apartness is a concept developed in constructive mathematics, which has resurfaced as a powerful notion for separating states in the area of model learning and model-based testing. We identify some fundamental shortcomings of apartness in quantitative models, such as in hybrid and stochastic systems. We propose a closely-related alternative, called strong separability and show that using it to rep…
▽ More
Apartness is a concept developed in constructive mathematics, which has resurfaced as a powerful notion for separating states in the area of model learning and model-based testing. We identify some fundamental shortcomings of apartness in quantitative models, such as in hybrid and stochastic systems. We propose a closely-related alternative, called strong separability and show that using it to replace apartness addresses the identified shortcomings. We adapt a well-known complete model-based testing method, called the Harmonized State Identifiers (HSI) method, to adopt the proposed notion of strong separability. We prove that the adapted HSI method is complete. As far as we are aware, this is the first work to show how complete test suites can be generated for quantitative models such as those found in the development of cyber-physical systems.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Bounding Random Test Set Size with Computational Learning Theory
Authors:
Neil Walkinshaw,
Michael Foster,
Jose Miguel Rojas,
Robert M Hierons
Abstract:
Random testing approaches work by generating inputs at random, or by selecting inputs randomly from some pre-defined operational profile. One long-standing question that arises in this and other testing contexts is as follows: When can we stop testing? At what point can we be certain that executing further tests in this manner will not explore previously untested (and potentially buggy) software b…
▽ More
Random testing approaches work by generating inputs at random, or by selecting inputs randomly from some pre-defined operational profile. One long-standing question that arises in this and other testing contexts is as follows: When can we stop testing? At what point can we be certain that executing further tests in this manner will not explore previously untested (and potentially buggy) software behaviors? This is analogous to the question in Machine Learning, of how many training examples are required in order to infer an accurate model. In this paper we show how probabilistic approaches to answer this question in Machine Learning (arising from Computational Learning Theory) can be applied in our testing context. This enables us to produce an upper bound on the number of tests that are required to achieve a given level of adequacy. We are the first to enable this from only knowing the number of coverage targets (e.g. lines of code) in the source code, without needing to observe a sample test executions. We validate this bound on a large set of Java units, and an autonomous driving system.
△ Less
Submitted 24 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
A Survey of the Metrics, Uses, and Subjects of Diversity-Based Techniques in Software Testing
Authors:
Islam T. Elgendy,
Robert M. Hierons,
Phil McMinn
Abstract:
There has been a significant amount of interest regarding the use of diversity-based testing techniques in software testing over the past two decades. Diversity-based testing (DBT) technique uses similarity metrics to leverage the dissimilarity between software artefacts - such as requirements, abstract models, program structures, or inputs - in order to address a software testing problem. DBT tec…
▽ More
There has been a significant amount of interest regarding the use of diversity-based testing techniques in software testing over the past two decades. Diversity-based testing (DBT) technique uses similarity metrics to leverage the dissimilarity between software artefacts - such as requirements, abstract models, program structures, or inputs - in order to address a software testing problem. DBT techniques have been used to assist in finding solutions to several different types of problems including generating test cases, prioritising them, and reducing very large test suites. This paper is a systematic survey of DBT techniques that summarises the key aspects and trends of 144 papers that report the use of 70 different similarity metrics with 24 different types of software artefacts, which have been used by researchers to tackle 11 different types of software testing problems. We further present an analysis of the recent trends in DBT techniques and review the different application domains to which the techniques have been applied, giving an overview of the tools developed by researchers to do so. Finally, the paper identifies some DBT challenges that are potential topics for future work.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Testing Causality in Scientific Modelling Software
Authors:
Andrew G. Clark,
Michael Foster,
Benedikt Prifling,
Neil Walkinshaw,
Robert M. Hierons,
Volker Schmidt,
Robert D. Turner
Abstract:
From simulating galaxy formation to viral transmission in a pandemic, scientific models play a pivotal role in developing scientific theories and supporting government policy decisions that affect us all. Given these critical applications, a poor modelling assumption or bug could have far-reaching consequences. However, scientific models possess several properties that make them notoriously diffic…
▽ More
From simulating galaxy formation to viral transmission in a pandemic, scientific models play a pivotal role in developing scientific theories and supporting government policy decisions that affect us all. Given these critical applications, a poor modelling assumption or bug could have far-reaching consequences. However, scientific models possess several properties that make them notoriously difficult to test, including a complex input space, long execution times, and non-determinism, rendering existing testing techniques impractical. In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal Inference has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse data instead of costly experiments. This paper introduces the Causal Testing Framework: a framework that uses Causal Inference techniques to establish causal effects from existing data, enabling users to conduct software testing activities concerning the effect of a change, such as Metamorphic Testing, a posteriori. We present three case studies covering real-world scientific models, demonstrating how the Causal Testing Framework can infer metamorphic test outcomes from reused, confounded test data to provide an efficient solution for testing scientific modelling software.
△ Less
Submitted 30 June, 2023; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Test case generation for agent-based models: A systematic literature review
Authors:
Andrew G. Clark,
Neil Walkinshaw,
Robert M. Hierons
Abstract:
Agent-based models play an important role in simulating complex emergent phenomena and supporting critical decisions. In this context, a software fault may result in poorly informed decisions that lead to disastrous consequences. The ability to rigorously test these models is therefore essential. In this systematic literature review, we answer five research questions related to the key aspects of…
▽ More
Agent-based models play an important role in simulating complex emergent phenomena and supporting critical decisions. In this context, a software fault may result in poorly informed decisions that lead to disastrous consequences. The ability to rigorously test these models is therefore essential. In this systematic literature review, we answer five research questions related to the key aspects of test case generation in agent-based models: What are the information artifacts used to generate tests? How are these tests generated? How is a verdict assigned to a generated test? How is the adequacy of a generated test suite measured? What level of abstraction of an agent-based model is targeted by a generated test? Our results show that whilst the majority of techniques are effective for testing functional requirements at the agent and integration levels of abstraction, there are comparatively few techniques capable of testing society-level behaviour. Additionally, we identify a need for more thorough evaluation using realistic case studies that feature challenging properties associated with a typical agent-based model.
△ Less
Submitted 18 March, 2021; v1 submitted 12 March, 2021;
originally announced March 2021.
-
Four-valued monitorability of $ω$-regular languages
Authors:
Zhe Chen,
Yunyun Chen,
Robert M. Hierons,
Yifan Wu
Abstract:
Runtime Verification (RV) is a lightweight formal technique in which program or system execution is monitored and analyzed, to check whether certain properties are satisfied or violated after a finite number of steps. The use of RV has led to interest in deciding whether a property is monitorable: whether it is always possible for the satisfaction or violation of the property to be determined afte…
▽ More
Runtime Verification (RV) is a lightweight formal technique in which program or system execution is monitored and analyzed, to check whether certain properties are satisfied or violated after a finite number of steps. The use of RV has led to interest in deciding whether a property is monitorable: whether it is always possible for the satisfaction or violation of the property to be determined after a finite future continuation. However, classical two-valued monitorability suffers from two inherent limitations. First, a property can only be evaluated as monitorable or non-monitorable; no information is available regarding whether only one verdict (satisfaction or violation) can be detected. Second, monitorability is defined at the language-level and does not tell us whether satisfaction or violation can be detected starting from the current monitor state during system execution.
To address these limitations, this paper proposes a new notion of four-valued monitorability for $ω$-languages and applies it at the state-level. Four-valued monitorability is more informative than two-valued monitorability as a property can be evaluated as a four-valued result, denoting that only satisfaction, only violation, or both are active for a monitorable property. We can also compute state-level weak monitorability, i.e., whether satisfaction or violation can be detected starting from a given state in a monitor, which enables state-level optimizations of monitoring algorithms. Based on a new six-valued semantics, we propose procedures for computing four-valued monitorability of $ω$-regular languages, both at the language-level and at the state-level. We have developed a new tool that implements the proposed procedure for computing monitorability of LTL formulas.
△ Less
Submitted 11 May, 2020; v1 submitted 16 February, 2020;
originally announced February 2020.
-
Characterizing Minimal Semantics-preserving Slices of predicate-linear, Free, Liberal Program Schemas
Authors:
Sebastian Danicic,
Robert M Hierons,
Michael R Laurence
Abstract:
A program schema defines a class of programs, all of which have identical statement structure, but whose functions and predicates may differ. A schema thus defines an entire class of programs according to how its symbols are interpreted. A subschema of a schema is obtained from a schema by deleting some of its statements. We prove that given a schema $S$ which is predicate-linear, free and liberal…
▽ More
A program schema defines a class of programs, all of which have identical statement structure, but whose functions and predicates may differ. A schema thus defines an entire class of programs according to how its symbols are interpreted. A subschema of a schema is obtained from a schema by deleting some of its statements. We prove that given a schema $S$ which is predicate-linear, free and liberal, such that the true and false parts of every if predicate satisfy a simple additional condition, and a slicing criterion defined by the final value of a given variable after execution of any program defined by $S$, the minimal subschema of $S$ which respects this slicing criterion contains all the function and predicate symbols `needed' by the variable according to the data dependence and control dependence relations used in program slicing, which is the symbol set given by Weiser's static slicing algorithm. Thus this algorithm gives predicate-minimal slices for classes of programs represented by schemas satisfying our set of conditions. We also give an example to show that the corresponding result with respect to the slicing criterion defined by termination behaviour is incorrect. This complements a result by the authors in which $S$ was required to be function-linear, instead of predicate-linear.
△ Less
Submitted 25 May, 2017;
originally announced May 2017.
-
On the computational complexity of dynamic slicing problems for program schemas
Authors:
Sebastian Danicic,
Robert M. Hierons,
Michael R. Laurence
Abstract:
Given a program, a quotient can be obtained from it by deleting zero or more statements. The field of program slicing is concerned with computing a quotient of a program which preserves part of the behaviour of the original program. All program slicing algorithms take account of the structural properties of a program such as control dependence and data dependence rather than the semantics of its f…
▽ More
Given a program, a quotient can be obtained from it by deleting zero or more statements. The field of program slicing is concerned with computing a quotient of a program which preserves part of the behaviour of the original program. All program slicing algorithms take account of the structural properties of a program such as control dependence and data dependence rather than the semantics of its functions and predicates, and thus work, in effect, with program schemas. The dynamic slicing criterion of Korel and Laski requires only that program behaviour is preserved in cases where the original program follows a particular path, and that the slice/quotient follows this path. In this paper we formalise Korel and Laski's definition of a dynamic slice as applied to linear schemas, and also formulate a less restrictive definition in which the path through the original program need not be preserved by the slice. The less restrictive definition has the benefit of leading to smaller slices. For both definitions, we compute complexity bounds for the problems of establishing whether a given slice of a linear schema is a dynamic slice and whether a linear schema has a non-trivial dynamic slice and prove that the latter problem is NP-hard in both cases. We also give an example to prove that minimal dynamic slices (whether or not they preserve the original path) need not be unique.
△ Less
Submitted 22 May, 2017;
originally announced May 2017.
-
Decidability and Complexity for Quiescent Consistency and its Variations
Authors:
Brijesh Dongol,
Robert M. Hierons
Abstract:
Quiescent consistency is a notion of correctness for a concurrent object that gives meaning to the object's behaviours in quiescent states, i.e., states in which none of the object's operations are being executed. Correctness of an implementation object is defined in terms of a corresponding abstract specification. This gives rise to two important verification questions: membership (checking wheth…
▽ More
Quiescent consistency is a notion of correctness for a concurrent object that gives meaning to the object's behaviours in quiescent states, i.e., states in which none of the object's operations are being executed. Correctness of an implementation object is defined in terms of a corresponding abstract specification. This gives rise to two important verification questions: membership (checking whether a behaviour of the implementation is allowed by the specification) and correctness (checking whether all behaviours of the implementation are allowed by the specification). In this paper, we show that the membership problem for quiescent consistency is NP-complete and that the correctness problem is decidable, but coNP-hard and in EXPSPACE. For both problems, we consider restricted versions of quiescent consistency by assuming an upper limit on the number of events between two quiescent points. Here, we show that the membership problem is in PTIME, whereas correctness is in PSPACE.
Quiescent consistency does not guarantee sequential consistency, i.e., it allows operation calls by the same process to be reordered when mapping to an abstract specification. Therefore, we also consider quiescent sequential consistency, which strengthens quiescent consistency with an additional sequential consistency condition. We show that the unrestricted versions of membership and correctness are NP-complete and undecidable, respectively. When by placing a limit on the number of events between two quiescent points, membership is in PTIME, while correctness is in PSPACE. Finally, we consider a version of quiescent sequential consistency that places an upper limit on the number of processes for every run of the implementation, and show that the membership problem for quiescent sequential consistency with this restriction is in PTIME.
△ Less
Submitted 26 November, 2015;
originally announced November 2015.
-
Checking Finite State Machine Conformance when there are Distributed Observations
Authors:
Robert M Hierons
Abstract:
This paper concerns state-based systems that interact with their environment at physically distributed interfaces, called ports. When such a system is used a projection of the global trace, called a local trace, is observed at each port. This leads to the environment having reduced observational power: the set of local traces observed need not uniquely define the global trace that occurred. We con…
▽ More
This paper concerns state-based systems that interact with their environment at physically distributed interfaces, called ports. When such a system is used a projection of the global trace, called a local trace, is observed at each port. This leads to the environment having reduced observational power: the set of local traces observed need not uniquely define the global trace that occurred. We consider the previously defined implementation relation $\sqsubseteq_s$ and start by investigating the problem of defining a language ${\mathcal {\tilde L}} (M)$ for a multi-port finite state machine (FSM) $M$ such that $N \sqsubseteq_s M$ if and only if every global trace of $N$ is in ${\mathcal {\tilde L}} (M)$. The motivation is that if we can produce such a language ${\mathcal {\tilde L}} (M)$ then this can potentially be used to inform development and testing. We show that ${\mathcal {\tilde L}} (M)$ can be uniquely defined but need not be regular. We then prove that it is generally undecidable whether $N \sqsubseteq_s M$, a consequence of this result being that it is undecidable whether there is a test case that is capable of distinguishing two states or two multi-port FSM in distributed testing. This result complements a previous result that it is undecidable whether there is a test case that is guaranteed to distinguish two states or multi-port FSMs. We also give some conditions under which $N \sqsubseteq_s M$ is decidable. We then consider the implementation relation $\sqsubseteq_s^k$ that only concerns input sequences of length $k$ or less. Naturally, given FSMs $N$ and $M$ it is decidable whether $N \sqsubseteq_s^k M$ since only a finite set of traces is relevant. We prove that if we place bounds on $k$ and the number of ports then we can decide $N \sqsubseteq_s^k M$ in polynomial time but otherwise this problem is NP-hard.
△ Less
Submitted 26 August, 2011;
originally announced August 2011.
-
Complexity of Data Dependence problems for Program Schemas with Concurrency
Authors:
Sebastian Danicic,
Robert M Hierons,
Michael R Laurence
Abstract:
The problem of deciding whether one point in a program is data dependent upon another is fundamental to program analysis and has been widely studied. In this paper we consider this problem at the abstraction level of program schemas in which computations occur in the Herbrand domain of terms and predicate symbols, which represent arbitrary predicate functions, are allowed. Given a vertex l in the…
▽ More
The problem of deciding whether one point in a program is data dependent upon another is fundamental to program analysis and has been widely studied. In this paper we consider this problem at the abstraction level of program schemas in which computations occur in the Herbrand domain of terms and predicate symbols, which represent arbitrary predicate functions, are allowed. Given a vertex l in the flowchart of a schema S having only equality (variable copying) assignments and variables v,w, we show that it is PSPACE-hard to decide whether there exists an execution of a program defined by S in which v holds the initial value of w at at least one occurrence of l on the path of execution, with membership in PSPACE holding provided there is a constant upper bound on the arity of any predicate in S. We also consider the `dual' problem in which v is required to hold the initial value of w at every occurrence of l, for which the analogous results hold. Additionally, the former problem for programs with non-deterministic branching (in effect, free schemas) in which assignments with functions are allowed is proved to be polynomial-time decidable provided a constant upper bound is placed upon the number of occurrences of the concurrency operator in the schemas being considered. This result is promising since many concurrent systems have a relatively small number of threads (concurrent processes), especially when compared with the number of statements they have.
△ Less
Submitted 12 April, 2011; v1 submitted 22 July, 2010;
originally announced July 2010.