-
Predicting At-Risk Programming Students in Small Imbalanced Datasets using Synthetic Data
Authors:
Daniel Flood,
Matthew England,
Beate Grawemeyer
Abstract:
This study is part of a larger project focused on measuring, understanding, and improving student engagement in programming education. We investigate whether synthetic data generation can help identify at-risk students earlier in a small, imbalanced dataset from an introductory programming module. The analysis used anonymised records from 379 students, with 15\% marked as failing, and applied seve…
▽ More
This study is part of a larger project focused on measuring, understanding, and improving student engagement in programming education. We investigate whether synthetic data generation can help identify at-risk students earlier in a small, imbalanced dataset from an introductory programming module. The analysis used anonymised records from 379 students, with 15\% marked as failing, and applied several machine learning algorithms. The first experiments showed poor recall for the failing group. However, using synthetic data generation methods led to a significant improvement in performance. Our results suggest that machine learning can help identify at-risk students early in programming courses when combined with synthetic data. This research lays the groundwork for validating and using these models with live student cohorts in the future, to allow for timely and effective interventions that can improve student outcomes. It also includes feature importance analysis to refine formative tasks. Overall, this study contributes to developing practical workflows that help detect disengagement early and improve student success in programming education.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
On Projective Delineability
Authors:
Lucas Michel,
Jasper Nalbach,
Pierre Mathonet,
Naïm Zénaïdi,
Christopher W. Brown,
Erika Ábrahám,
James H. Davenport,
Matthew England
Abstract:
We consider cylindrical algebraic decomposition (CAD) and the key concept of delineability which underpins CAD theory. We introduce the novel concept of projective delineability which is easier to guarantee computationally. We prove results about this which can allow reduced CAD computations.
We consider cylindrical algebraic decomposition (CAD) and the key concept of delineability which underpins CAD theory. We introduce the novel concept of projective delineability which is easier to guarantee computationally. We prove results about this which can allow reduced CAD computations.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Transformers to Predict the Applicability of Symbolic Integration Routines
Authors:
Rashid Barket,
Uzma Shafiq,
Matthew England,
Juergen Gerhard
Abstract:
Symbolic integration is a fundamental problem in mathematics: we consider how machine learning may be used to optimise this task in a Computer Algebra System (CAS). We train transformers that predict whether a particular integration method will be successful, and compare against the existing human-made heuristics (called guards) that perform this task in a leading CAS. We find the transformer can…
▽ More
Symbolic integration is a fundamental problem in mathematics: we consider how machine learning may be used to optimise this task in a Computer Algebra System (CAS). We train transformers that predict whether a particular integration method will be successful, and compare against the existing human-made heuristics (called guards) that perform this task in a leading CAS. We find the transformer can outperform these guards, gaining up to 30% accuracy and 70% precision. We further show that the inference time of the transformer is inconsequential which shows that it is well-suited to include as a guard in a CAS. Furthermore, we use Layer Integrated Gradients to interpret the decisions that the transformer is making. If guided by a subject-matter expert, the technique can explain some of the predictions based on the input tokens, which can lead to further optimisations.
△ Less
Submitted 31 October, 2024;
originally announced October 2024.
-
Recent Developments in Real Quantifier Elimination and Cylindrical Algebraic Decomposition
Authors:
Matthew England
Abstract:
This extended abstract accompanies an invited talk at CASC 2024, which surveys recent developments in Real Quantifier Elimination (QE) and Cylindrical Algebraic Decomposition (CAD). After introducing these concepts we will first consider adaptations of CAD inspired by computational logic, in particular the algorithms which underpin modern SAT solvers. CAD theory has found use in collaboration with…
▽ More
This extended abstract accompanies an invited talk at CASC 2024, which surveys recent developments in Real Quantifier Elimination (QE) and Cylindrical Algebraic Decomposition (CAD). After introducing these concepts we will first consider adaptations of CAD inspired by computational logic, in particular the algorithms which underpin modern SAT solvers. CAD theory has found use in collaboration with these via the Satisfiability Modulo Theory (SMT) paradigm; while the ideas behind SAT/SMT have led to new algorithms for Real QE. Second we will consider the optimisation of CAD through the use of Machine Learning (ML). The choice of CAD variable ordering has become a key case study for the use of ML to tune algorithms in computer algebra. We will also consider how explainable AI techniques might give insight for improved computer algebra software without any reliance on ML in the final code.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Understanding Multistationarity of Fully Open Reaction Networks
Authors:
Shenghao Yao,
AmirHosein Sadeghimanesh,
Matthew England
Abstract:
This work addresses multistationarity of fully open reaction networks equipped with mass action kinetics. We improve upon the existing results relating existence of positive feedback loops in a reaction network and multistationarity; and we provide a novel deterministic operation to generate new non-multistationary networks. This is interesting because while there were many operations to create in…
▽ More
This work addresses multistationarity of fully open reaction networks equipped with mass action kinetics. We improve upon the existing results relating existence of positive feedback loops in a reaction network and multistationarity; and we provide a novel deterministic operation to generate new non-multistationary networks. This is interesting because while there were many operations to create infinitely many new multistationary networks from a multistationary example, this is the first such operation for the non-multistationary counterpart.
Such tools for the generation of example networks have a use-case in the application of data science to reaction network theory. We demonstrate this by using the new data, along with a novel graph representation of reaction networks that is unique up to a permutation on the name of species of the network, to train a graph attention neural network model to predict multistationarity of reaction networks. This is the first time machine learning tools being used for studying classification problems of reaction networks.
△ Less
Submitted 30 January, 2025; v1 submitted 1 July, 2024;
originally announced July 2024.
-
The Liouville Generator for Producing Integrable Expressions
Authors:
Rashid Barket,
Matthew England,
Jürgen Gerhard
Abstract:
There has been a growing need to devise processes that can create comprehensive datasets in the world of Computer Algebra, both for accurate benchmarking and for new intersections with machine learning technology. We present here a method to generate integrands that are guaranteed to be integrable, dubbed the LIOUVILLE method. It is based on Liouville's theorem and the Parallel Risch Algorithm for…
▽ More
There has been a growing need to devise processes that can create comprehensive datasets in the world of Computer Algebra, both for accurate benchmarking and for new intersections with machine learning technology. We present here a method to generate integrands that are guaranteed to be integrable, dubbed the LIOUVILLE method. It is based on Liouville's theorem and the Parallel Risch Algorithm for symbolic integration.
We show that this data generation method retains the best qualities of previous data generation methods, while overcoming some of the issues built into that prior work. The LIOUVILLE generator is able to generate sufficiently complex and realistic integrands, and could be used for benchmarking or machine learning training tasks related to symbolic integration.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Constrained Neural Networks for Interpretable Heuristic Creation to Optimise Computer Algebra Systems
Authors:
Dorian Florescu,
Matthew England
Abstract:
We present a new methodology for utilising machine learning technology in symbolic computation research. We explain how a well known human-designed heuristic to make the choice of variable ordering in cylindrical algebraic decomposition may be represented as a constrained neural network. This allows us to then use machine learning methods to further optimise the heuristic, leading to new networks…
▽ More
We present a new methodology for utilising machine learning technology in symbolic computation research. We explain how a well known human-designed heuristic to make the choice of variable ordering in cylindrical algebraic decomposition may be represented as a constrained neural network. This allows us to then use machine learning methods to further optimise the heuristic, leading to new networks of similar size, representing new heuristics of similar complexity as the original human-designed one. We present this as a form of ante-hoc explainability for use in computer algebra development.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Symbolic Integration Algorithm Selection with Machine Learning: LSTMs vs Tree LSTMs
Authors:
Rashid Barket,
Matthew England,
Jürgen Gerhard
Abstract:
Computer Algebra Systems (e.g. Maple) are used in research, education, and industrial settings. One of their key functionalities is symbolic integration, where there are many sub-algorithms to choose from that can affect the form of the output integral, and the runtime. Choosing the right sub-algorithm for a given problem is challenging: we hypothesise that Machine Learning can guide this sub-algo…
▽ More
Computer Algebra Systems (e.g. Maple) are used in research, education, and industrial settings. One of their key functionalities is symbolic integration, where there are many sub-algorithms to choose from that can affect the form of the output integral, and the runtime. Choosing the right sub-algorithm for a given problem is challenging: we hypothesise that Machine Learning can guide this sub-algorithm choice. A key consideration of our methodology is how to represent the mathematics to the ML model: we hypothesise that a representation which encodes the tree structure of mathematical expressions would be well suited. We trained both an LSTM and a TreeLSTM model for sub-algorithm prediction and compared them to Maple's existing approach. Our TreeLSTM performs much better than the LSTM, highlighting the benefit of using an informed representation of mathematical expressions. It is able to produce better outputs than Maple's current state-of-the-art meta-algorithm, giving a strong basis for further research.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Assessing the Spurious Impacts of Ice-Constraining Methods on the Climate Response to Sea-Ice Loss using an Idealised Aquaplanet GCM
Authors:
Neil T. Lewis,
Mark R. England,
James A. Screen,
Ruth Geen,
Regan Mudhar,
William J. M. Seviour,
Stephen I. Thomson
Abstract:
Coupled climate model simulations designed to isolate the effects of Arctic sea-ice loss often apply artificial heating, either directly to the ice or through modification of the surface albedo, to constrain sea-ice in the absence of other forcings. Recent work has shown that this approach may lead to an overestimation of the climate response to sea-ice loss. In this study, we assess the spurious…
▽ More
Coupled climate model simulations designed to isolate the effects of Arctic sea-ice loss often apply artificial heating, either directly to the ice or through modification of the surface albedo, to constrain sea-ice in the absence of other forcings. Recent work has shown that this approach may lead to an overestimation of the climate response to sea-ice loss. In this study, we assess the spurious impacts of ice-constraining methods on the climate of an idealised aquaplanet general circulation model (GCM) with thermodynamic sea-ice. The true effect of sea-ice loss in this model is isolated by inducing ice loss through reduction of the freezing point of water, which does not require additional energy input. We compare results from freezing point modification experiments with experiments where sea-ice loss is induced using traditional ice-constraining methods, and confirm the result of previous work that traditional methods induce spurious additional warming. Furthermore, additional warming leads to an overestimation of the circulation response to sea-ice loss, which involves a weakening of the zonal wind and storm track activity in midlatitudes. Our results suggest that coupled model simulations with constrained sea-ice should be treated with caution, especially in boreal summer, where the true effect of sea-ice loss is weakest but we find the largest spurious response. Given that our results may be sensitive to the simplicity of the model we use, we suggest that devising methods to quantify the spurious effects of ice-constraining methods in more sophisticated models should be an urgent priority for future work.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Lessons on Datasets and Paradigms in Machine Learning for Symbolic Computation: A Case Study on CAD
Authors:
Tereso del Río,
Matthew England
Abstract:
Symbolic Computation algorithms and their implementation in computer algebra systems often contain choices which do not affect the correctness of the output but can significantly impact the resources required: such choices can benefit from having them made separately for each problem via a machine learning model. This study reports lessons on such use of machine learning in symbolic computation, i…
▽ More
Symbolic Computation algorithms and their implementation in computer algebra systems often contain choices which do not affect the correctness of the output but can significantly impact the resources required: such choices can benefit from having them made separately for each problem via a machine learning model. This study reports lessons on such use of machine learning in symbolic computation, in particular on the importance of analysing datasets prior to machine learning and on the different machine learning paradigms that may be utilised. We present results for a particular case study, the selection of variable ordering for cylindrical algebraic decomposition, but expect that the lessons learned are applicable to other decisions in symbolic computation.
We utilise an existing dataset of examples derived from applications which was found to be imbalanced with respect to the variable ordering decision. We introduce an augmentation technique for polynomial systems problems that allows us to balance and further augment the dataset, improving the machine learning results by 28\% and 38\% on average, respectively. We then demonstrate how the existing machine learning methodology used for the problem $-$ classification $-$ might be recast into the regression paradigm. While this does not have a radical change on the performance, it does widen the scope in which the methodology can be applied to make choices.
△ Less
Submitted 20 June, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Iterated Resultants and Rational Functions in Real Quantifier Elimination
Authors:
James H. Davenport,
Matthew England,
Scott McCallum,
Ali K. Uncu
Abstract:
This paper builds and extends on the authors' previous work related to the algorithmic tool, Cylindrical Algebraic Decomposition (CAD), and one of its core applications, Real Quantifier Elimination (QE). These topics are at the heart of symbolic computation and were first implemented in computer algebra systems decades ago, but have recently received renewed interest as part of the ongoing develop…
▽ More
This paper builds and extends on the authors' previous work related to the algorithmic tool, Cylindrical Algebraic Decomposition (CAD), and one of its core applications, Real Quantifier Elimination (QE). These topics are at the heart of symbolic computation and were first implemented in computer algebra systems decades ago, but have recently received renewed interest as part of the ongoing development of SMT solvers for non-linear real arithmetic.
First, we consider the use of iterated univariate resultants in traditional CAD, and how this leads to inefficiencies, especially in the case of an input with multiple equational constraints. We reproduce the workshop paper [Davenport and England, 2023], adding important clarifications to our suggestions first made there to make use of multivariate resultants in the projection phase of CAD. We then consider an alternative approach to this problem first documented in [McCallum and Brown, 2009] which redefines the actual object under construction, albeit only in the case of two equational constraints. We correct an unhelpful typo and provide a proof missing from that paper.
We finish by revising the topic of how to deal with SMT or Real QE problems expressed using rational functions (as opposed to the usual polynomial ones) noting that these are often found in industrial applications. We revisit a proposal made in [Uncu, Davenport and England, 2023] for doing this in the case of satisfiability, explaining why such an approach does not trivially extend to more complicated quantification structure and giving a suitable alternative.
△ Less
Submitted 26 December, 2024; v1 submitted 23 December, 2023;
originally announced December 2023.
-
SMT-Solving Induction Proofs of Inequalities
Authors:
Ali K. Uncu,
James H. Davenport,
Matthew England
Abstract:
This paper accompanies a new dataset of non-linear real arithmetic problems for the SMT-LIB benchmark collection. The problems come from an automated proof procedure of Gerhold--Kauers, which is well suited for solution by SMT. The problems of this type have not been tackled by SMT-solvers before. We describe the proof technique and give one new such proof to illustrate it. We then describe the da…
▽ More
This paper accompanies a new dataset of non-linear real arithmetic problems for the SMT-LIB benchmark collection. The problems come from an automated proof procedure of Gerhold--Kauers, which is well suited for solution by SMT. The problems of this type have not been tackled by SMT-solvers before. We describe the proof technique and give one new such proof to illustrate it. We then describe the dataset and the results of benchmarking. The benchmarks on the new dataset are quite different to the existing ones. The benchmarking also brings forward some interesting debate on the use/inclusion of rational functions and algebraic numbers in the SMT-LIB.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Iterated Resultants in CAD
Authors:
James H. Davenport,
Matthew England
Abstract:
Cylindrical Algebraic Decomposition (CAD) by projection and lifting requires many iterated univariate resultants. It has been observed that these often factor, but to date this has not been used to optimise implementations of CAD. We continue the investigation into such factorisations, writing in the specific context of SC-Square.
Cylindrical Algebraic Decomposition (CAD) by projection and lifting requires many iterated univariate resultants. It has been observed that these often factor, but to date this has not been used to optimise implementations of CAD. We continue the investigation into such factorisations, writing in the specific context of SC-Square.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Data Augmentation for Mathematical Objects
Authors:
Tereso del Rio,
Matthew England
Abstract:
This paper discusses and evaluates ideas of data balancing and data augmentation in the context of mathematical objects: an important topic for both the symbolic computation and satisfiability checking communities, when they are making use of machine learning techniques to optimise their tools. We consider a dataset of non-linear polynomial problems and the problem of selecting a variable ordering…
▽ More
This paper discusses and evaluates ideas of data balancing and data augmentation in the context of mathematical objects: an important topic for both the symbolic computation and satisfiability checking communities, when they are making use of machine learning techniques to optimise their tools. We consider a dataset of non-linear polynomial problems and the problem of selecting a variable ordering for cylindrical algebraic decomposition to tackle these with. By swapping the variable names in already labelled problems, we generate new problem instances that do not require any further labelling when viewing the selection as a classification problem. We find this augmentation increases the accuracy of ML models by 63% on average. We study what part of this improvement is due to the balancing of the dataset and what is achieved thanks to further increasing the size of the dataset, concluding that both have a very significant effect. We finish the paper by reflecting on how this idea could be applied in other uses of machine learning in mathematics.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Generating Elementary Integrable Expressions
Authors:
Rashid Barket,
Matthew England,
Jürgen Gerhard
Abstract:
There has been an increasing number of applications of machine learning to the field of Computer Algebra in recent years, including to the prominent sub-field of Symbolic Integration. However, machine learning models require an abundance of data for them to be successful and there exist few benchmarks on the scale required. While methods to generate new data already exist, they are flawed in sever…
▽ More
There has been an increasing number of applications of machine learning to the field of Computer Algebra in recent years, including to the prominent sub-field of Symbolic Integration. However, machine learning models require an abundance of data for them to be successful and there exist few benchmarks on the scale required. While methods to generate new data already exist, they are flawed in several ways which may lead to bias in machine learning models trained upon them. In this paper, we describe how to use the Risch Algorithm for symbolic integration to create a dataset of elementary integrable expressions. Further, we show that data generated this way alleviates some of the flaws found in earlier methods.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Intrinsically episodic Antarctic shelf intrusions of circumpolar deep water via canyons
Authors:
Ellie Q. Y. Ong,
Edward Doddridge,
Navid C. Constantinou,
Andrew McC. Hogg,
Matthew H. England
Abstract:
The structure of the Antarctic Slope Current at the continental shelf is crucial in governing the poleward transport of warm water. Canyons on the continental slope may provide a pathway for warm water to cross the slope current and intrude onto the continental shelf underneath ice shelves, which can increase rates of ice shelf melting, leading to reduced buttressing of ice shelves, accelerating g…
▽ More
The structure of the Antarctic Slope Current at the continental shelf is crucial in governing the poleward transport of warm water. Canyons on the continental slope may provide a pathway for warm water to cross the slope current and intrude onto the continental shelf underneath ice shelves, which can increase rates of ice shelf melting, leading to reduced buttressing of ice shelves, accelerating glacial flow and hence increased sea level rise. Observations and modelling studies of the Antarctic Slope Current and cross-shelf warm water intrusions are limited, particularly in the East Antarctica region. To explore this topic, an idealised configuration of the Antarctic Slope Current is developed, using an eddy-resolving isopycnal model that emulates the dynamics and topography of the East Antarctic sector. Warm water intrusions via canyons are found to occur in discrete episodes of large onshore flow induced by eddies, even in the absence of any temporal variability in external forcings, demonstrating the intrinsic nature of these intrusions to the slope current system. Canyon width is found to play a key role in modulating cross-shelf exchanges; warm water transport through narrower canyons is more irregular than transport through wider canyons. The intrinsically episodic cross-shelf transport is found to be driven by feedbacks between wind energy input and eddy generation in the Antarctic Slope Current. Improved understanding of the intrinsic variability of warm water intrusions can help guide future observational and modelling studies in the analysis of eddy impacts on Antarctic shelf circulation.
△ Less
Submitted 7 March, 2024; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Explainable AI Insights for Symbolic Computation: A case study on selecting the variable ordering for cylindrical algebraic decomposition
Authors:
Lynn Pickering,
Tereso Del Rio Almajano,
Matthew England,
Kelly Cohen
Abstract:
In recent years there has been increased use of machine learning (ML) techniques within mathematics, including symbolic computation where it may be applied safely to optimise or select algorithms. This paper explores whether using explainable AI (XAI) techniques on such ML models can offer new insight for symbolic computation, inspiring new implementations within computer algebra systems that do n…
▽ More
In recent years there has been increased use of machine learning (ML) techniques within mathematics, including symbolic computation where it may be applied safely to optimise or select algorithms. This paper explores whether using explainable AI (XAI) techniques on such ML models can offer new insight for symbolic computation, inspiring new implementations within computer algebra systems that do not directly call upon AI tools. We present a case study on the use of ML to select the variable ordering for cylindrical algebraic decomposition. It has already been demonstrated that ML can make the choice well, but here we show how the SHAP tool for explainability can be used to inform new heuristics of a size and complexity similar to those human-designed heuristics currently commonly used in symbolic computation.
△ Less
Submitted 29 August, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Levelwise construction of a single cylindrical algebraic cell
Authors:
Jasper Nalbach,
Erika Ábrahám,
Philippe Specht,
Christopher W. Brown,
James H. Davenport,
Matthew England
Abstract:
Satisfiability Modulo Theories (SMT) solvers check the satisfiability of quantifier-free first-order logic formulas. We consider the theory of non-linear real arithmetic where the formulae are logical combinations of polynomial constraints. Here a commonly used tool is the Cylindrical Algebraic Decomposition (CAD) to decompose real space into cells where the constraints are truth-invariant through…
▽ More
Satisfiability Modulo Theories (SMT) solvers check the satisfiability of quantifier-free first-order logic formulas. We consider the theory of non-linear real arithmetic where the formulae are logical combinations of polynomial constraints. Here a commonly used tool is the Cylindrical Algebraic Decomposition (CAD) to decompose real space into cells where the constraints are truth-invariant through the use of projection polynomials.
An improved approach is to repackage the CAD theory into a search-based algorithm: one that guesses sample points to satisfy the formula, and generalizes guesses that conflict constraints to cylindrical cells around samples which are avoided in the continuing search. Such an approach can lead to a satisfying assignment more quickly, or conclude unsatisfiability with fewer cells. A notable example of this approach is Jovanović and de Moura's NLSAT algorithm. Since these cells are produced locally to a sample we might need fewer projection polynomials than the traditional CAD projection. The original NLSAT algorithm reduced the set a little; while Brown's single cell construction reduced it much further still. However, the shape and size of the cell produced depends on the order in which the polynomials are considered.
This paper proposes a method to construct such cells levelwise, i.e. built level-by-level according to a variable ordering. We still use a reduced number of projection polynomials, but can now consider a variety of different reductions and use heuristics to select the projection polynomials in order to optimise the shape of the cell under construction. We formulate all the necessary theory as a proof system: while not a common presentation for work in this field, it allows an elegant decoupling of heuristics from the algorithm and its proof of correctness.
△ Less
Submitted 20 July, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
SC-Square: Future Progress with Machine Learning?
Authors:
Matthew England
Abstract:
The algorithms employed by our communities are often underspecified, and thus have multiple implementation choices, which do not effect the correctness of the output, but do impact the efficiency or even tractability of its production. In this extended abstract, to accompany a keynote talk at the 2021 SC-Square Workshop, we survey recent work (both the author's and from the literature) on the use…
▽ More
The algorithms employed by our communities are often underspecified, and thus have multiple implementation choices, which do not effect the correctness of the output, but do impact the efficiency or even tractability of its production. In this extended abstract, to accompany a keynote talk at the 2021 SC-Square Workshop, we survey recent work (both the author's and from the literature) on the use of Machine Learning technology to improve algorithms of interest to SC-Square.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
SC-Square: Overview to 2021
Authors:
Matthew England
Abstract:
This extended abstract was written to accompany an invited talk at the 2021 SC-Square Workshop, where the author was asked to give an overview of SC-Square progress to date. The author first reminds the reader of the definition of SC-Square, then briefly outlines some of the history, before picking out some (personal) scientific highlights.
This extended abstract was written to accompany an invited talk at the 2021 SC-Square Workshop, where the author was asked to give an overview of SC-Square progress to date. The author first reminds the reader of the definition of SC-Square, then briefly outlines some of the history, before picking out some (personal) scientific highlights.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
New heuristic to choose a cylindrical algebraic decomposition variable ordering motivated by complexity analysis
Authors:
Tereso del Río,
Matthew England
Abstract:
It is well known that the variable ordering can be critical to the efficiency or even tractability of the cylindrical algebraic decomposition (CAD) algorithm. We propose new heuristics inspired by complexity analysis of CAD to choose the variable ordering. These heuristics are evaluated against existing heuristics with experiments on the SMT-LIB benchmarks using both existing performance metrics a…
▽ More
It is well known that the variable ordering can be critical to the efficiency or even tractability of the cylindrical algebraic decomposition (CAD) algorithm. We propose new heuristics inspired by complexity analysis of CAD to choose the variable ordering. These heuristics are evaluated against existing heuristics with experiments on the SMT-LIB benchmarks using both existing performance metrics and a new metric we propose for the problem at hand. The best of these new heuristics chooses orderings that lead to timings on average 17% slower than the virtual-best: an improvement compared to the prior state-of-the-art which achieved timings 25% slower.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Resultant Tools for Parametric Polynomial Systems with Application to Population Models
Authors:
AmirHosein Sadeghimanesh,
Matthew England
Abstract:
We are concerned with the problem of decomposing the parameter space of a parametric system of polynomial equations, and possibly some polynomial inequality constraints, with respect to the number of real solutions that the system attains. Previous studies apply a two step approach to this problem, where first the discriminant variety of the system is computed via a Groebner Basis (GB), and then a…
▽ More
We are concerned with the problem of decomposing the parameter space of a parametric system of polynomial equations, and possibly some polynomial inequality constraints, with respect to the number of real solutions that the system attains. Previous studies apply a two step approach to this problem, where first the discriminant variety of the system is computed via a Groebner Basis (GB), and then a Cylindrical Algebraic Decomposition (CAD) of this is produced to give the desired computation. However, even on some reasonably small applied examples this process is too expensive, with computation of the discriminant variety alone infeasible. In this paper we develop new approaches to build the discriminant variety using resultant methods (the Dixon resultant and a new method using iterated univariate resultants). This reduces the complexity compared to GB and allows for a previous infeasible example to be tackled. We demonstrate the benefit by giving a symbolic solution to a problem from population dynamics -- the analysis of the steady states of three connected populations which exhibit Allee effects - which previously could only be tackled numerically.
△ Less
Submitted 9 February, 2022; v1 submitted 31 January, 2022;
originally announced January 2022.
-
Feedback and Engagement on an Introductory Programming Module
Authors:
Beate Grawemeyer,
John Halloran,
Matthew England,
David Croft
Abstract:
We ran a study on engagement and achievement for a first year undergraduate programming module which used an online learning environment containing tasks which generate automated feedback. Students could also access human feedback from traditional labs. We gathered quantitative data on engagement and achievement which allowed us to split the cohort into 6 groups. We then ran interviews with studen…
▽ More
We ran a study on engagement and achievement for a first year undergraduate programming module which used an online learning environment containing tasks which generate automated feedback. Students could also access human feedback from traditional labs. We gathered quantitative data on engagement and achievement which allowed us to split the cohort into 6 groups. We then ran interviews with students after the end of the module to produce qualitative data on perceptions of what feedback is, how useful it is, the uses made of it, and how it bears on engagement. A general finding was that human and automated feedback are different but complementary. However there are different feedback needs by group. Our findings imply: (1) that a blended human-automated feedback approach improves engagement; and (2) that this approach needs to be differentiated according to type of student. We give implications for the design of feedback for programming modules.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
Proving UNSAT in SMT: The Case of Quantifier Free Non-Linear Real Arithmetic
Authors:
Erika Abraham,
James H. Davenport,
Matthew England,
Gereon Kremer
Abstract:
We discuss the topic of unsatisfiability proofs in SMT, particularly with reference to quantifier free non-linear real arithmetic. We outline how the methods here do not admit trivial proofs and how past formalisation attempts are not sufficient. We note that the new breed of local search based algorithms for this domain may offer an easier path forward.
We discuss the topic of unsatisfiability proofs in SMT, particularly with reference to quantifier free non-linear real arithmetic. We outline how the methods here do not admit trivial proofs and how past formalisation attempts are not sufficient. We note that the new breed of local search based algorithms for this domain may offer an easier path forward.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
The DEWCAD Project: Pushing Back the Doubly Exponential Wall of Cylindrical Algebraic Decomposition
Authors:
R. Bradford,
J. H. Davenport,
M. England,
A. Sadeghimanesh,
A. Uncu
Abstract:
This abstract seeks to introduce the ISSAC community to the DEWCAD project, which is based at Coventry University and the University of Bath, in the United Kingdom. The project seeks to push back the Doubly Exponential Wall of Cylindrical Algebraic Decomposition, through the integration of SAT/SMT technology, the extension of Lazard projection theory, and the development of new algorithms based on…
▽ More
This abstract seeks to introduce the ISSAC community to the DEWCAD project, which is based at Coventry University and the University of Bath, in the United Kingdom. The project seeks to push back the Doubly Exponential Wall of Cylindrical Algebraic Decomposition, through the integration of SAT/SMT technology, the extension of Lazard projection theory, and the development of new algorithms based on CAD technology but without producing CADs themselves. The project also seeks to develop applications of CAD and will focus on applications in the domains of economics and bio-network analysis.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs
Authors:
Dorian Florescu,
Matthew England
Abstract:
We are interested in the application of Machine Learning (ML) technology to improve mathematical software. It may seem that the probabilistic nature of ML tools would invalidate the exact results prized by such software, however, the algorithms which underpin the software often come with a range of choices which are good candidates for ML application. We refer to choices which have no effect on th…
▽ More
We are interested in the application of Machine Learning (ML) technology to improve mathematical software. It may seem that the probabilistic nature of ML tools would invalidate the exact results prized by such software, however, the algorithms which underpin the software often come with a range of choices which are good candidates for ML application. We refer to choices which have no effect on the mathematical correctness of the software, but do impact its performance.
In the past we experimented with one such choice: the variable ordering to use when building a Cylindrical Algebraic Decomposition (CAD). We used the Python library Scikit-Learn (sklearn) to experiment with different ML models, and developed new techniques for feature generation and hyper-parameter selection.
These techniques could easily be adapted for making decisions other than our immediate application of CAD variable ordering. Hence in this paper we present a software pipeline to use sklearn to pick the variable ordering for an algorithm that acts on a polynomial system. The code described is freely available online.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
New Opportunities for the Formal Proof of Computational Real Geometry?
Authors:
Erika {Á}brahám,
James Davenport,
Matthew England,
Gereon Kremer,
Zak Tonks
Abstract:
The purpose of this paper is to explore the question "to what extent could we produce formal, machine-verifiable, proofs in real algebraic geometry?" The question has been asked before but as yet the leading algorithms for answering such questions have not been formalised. We present a thesis that a new algorithm for ascertaining satisfiability of formulae over the reals via Cylindrical Algebraic…
▽ More
The purpose of this paper is to explore the question "to what extent could we produce formal, machine-verifiable, proofs in real algebraic geometry?" The question has been asked before but as yet the leading algorithms for answering such questions have not been formalised. We present a thesis that a new algorithm for ascertaining satisfiability of formulae over the reals via Cylindrical Algebraic Coverings [Ábrahám, Davenport, England, Kremer, \emph{Deciding the Consistency of Non-Linear Real Arithmetic Constraints with a Conflict Driver Search Using Cylindrical Algebraic Coverings}, 2020] might provide trace and outputs that allow the results to be more susceptible to machine verification than those of competing algorithms.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.
-
Polynomial Superlevel Set Representation of the Multistationarity Region of Chemical Reaction Networks
Authors:
AmirHosein Sadeghimanesh,
Matthew England
Abstract:
In this paper we introduce a new representation for the multistationarity region of a reaction network, using polynomial superlevel sets. The advantages of using this polynomial superlevel set representation over the already existing representations (cylindrical algebraic decompositions, numeric sampling, rectangular divisions) is discussed, and algorithms to compute this new representation are pr…
▽ More
In this paper we introduce a new representation for the multistationarity region of a reaction network, using polynomial superlevel sets. The advantages of using this polynomial superlevel set representation over the already existing representations (cylindrical algebraic decompositions, numeric sampling, rectangular divisions) is discussed, and algorithms to compute this new representation are provided. The results are given for the general mathematical formalism of a parametric system of equations and so may be applied to other application domains.
△ Less
Submitted 12 September, 2022; v1 submitted 17 March, 2020;
originally announced March 2020.
-
Deciding the Consistency of Non-Linear Real Arithmetic Constraints with a Conflict Driven Search Using Cylindrical Algebraic Coverings
Authors:
Erika Ábrahám,
James H. Davenport,
Matthew England,
Gereon Kremer
Abstract:
We present a new algorithm for determining the satisfiability of conjunctions of non-linear polynomial constraints over the reals, which can be used as a theory solver for satisfiability modulo theory (SMT) solving for non-linear real arithmetic. The algorithm is a variant of Cylindrical Algebraic Decomposition (CAD) adapted for satisfiability, where solution candidates (sample points) are constru…
▽ More
We present a new algorithm for determining the satisfiability of conjunctions of non-linear polynomial constraints over the reals, which can be used as a theory solver for satisfiability modulo theory (SMT) solving for non-linear real arithmetic. The algorithm is a variant of Cylindrical Algebraic Decomposition (CAD) adapted for satisfiability, where solution candidates (sample points) are constructed incrementally, either until a satisfying sample is found or sufficient samples have been sampled to conclude unsatisfiability. The choice of samples is guided by the input constraints and previous conflicts.
The key idea behind our new approach is to start with a partial sample; demonstrate that it cannot be extended to a full sample; and from the reasons for that rule out a larger space around the partial sample, which build up incrementally into a cylindrical algebraic covering of the space. There are similarities with the incremental variant of CAD, the NLSAT method of Jovanovic and de Moura, and the NuCAD algorithm of Brown; but we present worked examples and experimental results on a preliminary implementation to demonstrate the differences to these, and the benefits of the new approach.
△ Less
Submitted 23 November, 2020; v1 submitted 12 March, 2020;
originally announced March 2020.
-
Improved cross-validation for classifiers that make algorithmic choices to minimise runtime without compromising output correctness
Authors:
Dorian Florescu,
Matthew England
Abstract:
Our topic is the use of machine learning to improve software by making choices which do not compromise the correctness of the output, but do affect the time taken to produce such output. We are particularly concerned with computer algebra systems (CASs), and in particular, our experiments are for selecting the variable ordering to use when performing a cylindrical algebraic decomposition of $n$-di…
▽ More
Our topic is the use of machine learning to improve software by making choices which do not compromise the correctness of the output, but do affect the time taken to produce such output. We are particularly concerned with computer algebra systems (CASs), and in particular, our experiments are for selecting the variable ordering to use when performing a cylindrical algebraic decomposition of $n$-dimensional real space with respect to the signs of a set of polynomials.
In our prior work we explored the different ML models that could be used, and how to identify suitable features of the input polynomials. In the present paper we both repeat our prior experiments on problems which have more variables (and thus exponentially more possible orderings), and examine the metric which our ML classifiers targets. The natural metric is computational runtime, with classifiers trained to pick the ordering which minimises this. However, this leads to the situation were models do not distinguish between any of the non-optimal orderings, whose runtimes may still vary dramatically. In this paper we investigate a modification to the cross-validation algorithms of the classifiers so that they do distinguish these cases, leading to improved results.
△ Less
Submitted 28 November, 2019;
originally announced November 2019.
-
First Year Computer Science Projects at Coventry University: Activity-led integrative team projects with continuous assessment
Authors:
Simon Billings,
Matthew England
Abstract:
We describe the group projects undertaken by first year undergraduate Computer Science students at Coventry University. These are integrative course projects: designed to bring together the topics from the various modules students take, to apply them as a coherent whole. They follow an activity-led approach, with students given a loose brief and a lot of freedom in how to develop their project.…
▽ More
We describe the group projects undertaken by first year undergraduate Computer Science students at Coventry University. These are integrative course projects: designed to bring together the topics from the various modules students take, to apply them as a coherent whole. They follow an activity-led approach, with students given a loose brief and a lot of freedom in how to develop their project.
We outline the new regulations at Coventry University which eases the use of such integrative projects. We then describe our continuous assessment approach: where students earn a weekly mark by demonstrating progress to a teacher as an open presentation to the class. It involves a degree of self and peer assessment and allows for an assessment of group work that is both fair, and seen to be fair. It builds attendance, self-study / continuous engagement habits, public speaking / presentation skills, and rewards group members for making meaningful individual contributions.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
Computing with CodeRunner at Coventry University: Automated summative assessment of Python and C++ code
Authors:
David Croft,
Matthew England
Abstract:
CodeRunner is a free open-source Moodle plugin for automatically marking student code. We describe our experience using CodeRunner for summative assessment in our first year undergraduate programming curriculum at Coventry University. We use it to assess both Python3 and C++14 code (CodeRunner supports other languages also). We give examples of our questions and report on how key metrics have chan…
▽ More
CodeRunner is a free open-source Moodle plugin for automatically marking student code. We describe our experience using CodeRunner for summative assessment in our first year undergraduate programming curriculum at Coventry University. We use it to assess both Python3 and C++14 code (CodeRunner supports other languages also). We give examples of our questions and report on how key metrics have changed following its use at Coventry.
△ Less
Submitted 4 December, 2019; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Algorithmically generating new algebraic features of polynomial systems for machine learning
Authors:
Dorian Florescu,
Matthew England
Abstract:
There are a variety of choices to be made in both computer algebra systems (CASs) and satisfiability modulo theory (SMT) solvers which can impact performance without affecting mathematical correctness. Such choices are candidates for machine learning (ML) approaches, however, there are difficulties in applying standard ML techniques, such as the efficient identification of ML features from input d…
▽ More
There are a variety of choices to be made in both computer algebra systems (CASs) and satisfiability modulo theory (SMT) solvers which can impact performance without affecting mathematical correctness. Such choices are candidates for machine learning (ML) approaches, however, there are difficulties in applying standard ML techniques, such as the efficient identification of ML features from input data which is typically a polynomial system. Our focus is selecting the variable ordering for cylindrical algebraic decomposition (CAD), an important algorithm implemented in several CASs, and now also SMT-solvers. We created a framework to describe all the previously identified ML features for the problem and then enumerated all options in this framework to automatically generation many more features. We validate the usefulness of these with an experiment which shows that an ML choice for CAD variable ordering is superior to those made by human created heuristics, and further improved with these additional features. We expect that this technique of feature generation could be useful for other choices related to CAD, or even choices for other algorithms with polynomial systems for input.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition
Authors:
Matthew England,
Dorian Florescu
Abstract:
There has been recent interest in the use of machine learning (ML) approaches within mathematical software to make choices that impact on the computing performance without affecting the mathematical correctness of the result. We address the problem of selecting the variable ordering for cylindrical algebraic decomposition (CAD), an important algorithm in Symbolic Computation. Prior work to apply M…
▽ More
There has been recent interest in the use of machine learning (ML) approaches within mathematical software to make choices that impact on the computing performance without affecting the mathematical correctness of the result. We address the problem of selecting the variable ordering for cylindrical algebraic decomposition (CAD), an important algorithm in Symbolic Computation. Prior work to apply ML on this problem implemented a Support Vector Machine (SVM) to select between three existing human-made heuristics, which did better than anyone heuristic alone. The present work extends to have ML select the variable ordering directly, and to try a wider variety of ML techniques.
We experimented with the NLSAT dataset and the Regular Chains Library CAD function for Maple 2018. For each problem, the variable ordering leading to the shortest computing time was selected as the target class for ML. Features were generated from the polynomial input and used to train the following ML models: k-nearest neighbours (KNN) classifier, multi-layer perceptron (MLP), decision tree (DT) and SVM, as implemented in the Python scikit-learn package. We also compared these with the two leading human constructed heuristics for the problem: Brown's heuristic and sotd. On this dataset all of the ML approaches outperformed the human made heuristics, some by a large margin.
△ Less
Submitted 5 June, 2019; v1 submitted 24 April, 2019;
originally announced April 2019.
-
Cylindrical Algebraic Decomposition with Equational Constraints
Authors:
Matthew England,
Russell Bradford,
James H. Davenport
Abstract:
Cylindrical Algebraic Decomposition (CAD) has long been one of the most important algorithms within Symbolic Computation, as a tool to perform quantifier elimination in first order logic over the reals. More recently it is finding prominence in the Satisfiability Checking community as a tool to identify satisfying solutions of problems in nonlinear real arithmetic.
The original algorithm produce…
▽ More
Cylindrical Algebraic Decomposition (CAD) has long been one of the most important algorithms within Symbolic Computation, as a tool to perform quantifier elimination in first order logic over the reals. More recently it is finding prominence in the Satisfiability Checking community as a tool to identify satisfying solutions of problems in nonlinear real arithmetic.
The original algorithm produces decompositions according to the signs of polynomials, when what is usually required is a decomposition according to the truth of a formula containing those polynomials. One approach to achieve that coarser (but hopefully cheaper) decomposition is to reduce the polynomials identified in the CAD to reflect a logical structure which reduces the solution space dimension: the presence of Equational Constraints (ECs).
This paper may act as a tutorial for the use of CAD with ECs: we describe all necessary background and the current state of the art. In particular, we present recent work on how McCallum's theory of reduced projection may be leveraged to make further savings in the lifting phase: both to the polynomials we lift with and the cells lifted over. We give a new complexity analysis to demonstrate that the double exponent in the worst case complexity bound for CAD reduces in line with the number of ECs. We show that the reduction can apply to both the number of polynomials produced and their degree.
△ Less
Submitted 20 March, 2019;
originally announced March 2019.
-
PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets
Authors:
S. Deshpande,
J. Shuttleworth,
J. Yang,
S. Taramonli,
M. England
Abstract:
Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. RNA-seq based transcriptome sequencing has been extensively used for identification of lncRNAs. However, accurate identification of lncRNAs in RNA-seq datasets is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) to…
▽ More
Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. RNA-seq based transcriptome sequencing has been extensively used for identification of lncRNAs. However, accurate identification of lncRNAs in RNA-seq datasets is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) tools fail to accurately identify them in transcriptomic data. Well-known CPC tools such as CPC2, lncScore, CPAT are primarily designed for prediction of lncRNAs based on the GENCODE, NONCODE and CANTATAdb databases. The prediction accuracy of these tools often drops when tested on transcriptomic datasets. This leads to higher false positive results and inaccuracy in the function annotation process. In this study, we present a novel tool, PLIT, for the identification of lncRNAs in plants RNA-seq datasets. PLIT implements a feature selection method based on L1 regularization and iterative Random Forests (iRF) classification for selection of optimal features. Based on sequence and codon-bias features, it classifies the RNA-seq derived FASTA sequences into coding or long non-coding transcripts. Using L1 regularization, 31 optimal features were obtained based on lncRNA and protein-coding transcripts from 8 plant species. The performance of the tool was evaluated on 7 plant RNA-seq datasets using 10-fold cross-validation. The analysis exhibited superior accuracy when evaluated against currently available state-of-the-art CPC tools.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Identifying the Parametric Occurrence of Multiple Steady States for some Biological Networks
Authors:
R. Bradford,
J. H. Davenport,
M. England,
H. Errami,
V. Gerdt,
D. Grigoriev,
C. Hoyt,
M. Kosta,
O. Radulescu,
T. Sturm,
A. Weber
Abstract:
We consider a problem from biological network analysis of determining regions in a parameter space over which there are multiple steady states for positive real values of variables and parameters. We describe multiple approaches to address the problem using tools from Symbolic Computation. We describe how progress was made to achieve semi-algebraic descriptions of the multistationarity regions of…
▽ More
We consider a problem from biological network analysis of determining regions in a parameter space over which there are multiple steady states for positive real values of variables and parameters. We describe multiple approaches to address the problem using tools from Symbolic Computation. We describe how progress was made to achieve semi-algebraic descriptions of the multistationarity regions of parameter space, and compare symbolic results to numerical methods. The biological networks studied are models of the mitogen-activated protein kinases (MAPK) network which has already consumed considerable effort using special insights into its structure of corresponding models. Our main example is a model with 11 equations in 11 variables and 19 parameters, 3 of which are of interest for symbolic treatment. The model also imposes positivity conditions on all variables and parameters.
We apply combinations of symbolic computation methods designed for mixed equality/inequality systems, specifically virtual substitution, lazy real triangularization and cylindrical algebraic decomposition, as well as a simplification technique adapted from Gaussian elimination and graph theory. We are able to determine multistationarity of our main example over a 2-dimensional parameter space. We also study a second MAPK model and a symbolic grid sampling technique which can locate such regions in 3-dimensional parameter space.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Computing with Codio at Coventry University: Online virtual Linux boxes and automated formative feedback
Authors:
David Croft,
Matthew England
Abstract:
We describe our experience using Codio at Coventry University in our undergraduate programming curriculum. Codio provides students with online virtual Linux boxes, and allows staff to equip these with guides written in markdown and supplemental tasks that provide automated feedback. The use of Codio has coincided with a steady increase in student performance and satisfaction as well as far greater…
▽ More
We describe our experience using Codio at Coventry University in our undergraduate programming curriculum. Codio provides students with online virtual Linux boxes, and allows staff to equip these with guides written in markdown and supplemental tasks that provide automated feedback. The use of Codio has coincided with a steady increase in student performance and satisfaction as well as far greater data on student engagement and performance.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
A Combined CNN and LSTM Model for Arabic Sentiment Analysis
Authors:
Abdulaziz M. Alayba,
Vasile Palade,
Matthew England,
Rahat Iqbal
Abstract:
Deep neural networks have shown good data modelling capabilities when dealing with challenging and large datasets from a wide range of application areas. Convolutional Neural Networks (CNNs) offer advantages in selecting good features and Long Short-Term Memory (LSTM) networks have proven good abilities of learning sequential data. Both approaches have been reported to provide improved results in…
▽ More
Deep neural networks have shown good data modelling capabilities when dealing with challenging and large datasets from a wide range of application areas. Convolutional Neural Networks (CNNs) offer advantages in selecting good features and Long Short-Term Memory (LSTM) networks have proven good abilities of learning sequential data. Both approaches have been reported to provide improved results in areas such image processing, voice recognition, language translation and other Natural Language Processing (NLP) tasks. Sentiment classification for short text messages from Twitter is a challenging task, and the complexity increases for Arabic language sentiment classification tasks because Arabic is a rich language in morphology. In addition, the availability of accurate pre-processing tools for Arabic is another current limitation, along with limited research available in this area. In this paper, we investigate the benefits of integrating CNNs and LSTMs and report obtained improved accuracy for Arabic sentiment analysis on different datasets. Additionally, we seek to consider the morphological diversity of particular Arabic words by using different sentiment classification levels.
△ Less
Submitted 21 July, 2018; v1 submitted 8 July, 2018;
originally announced July 2018.
-
Non-linear Real Arithmetic Benchmarks derived from Automated Reasoning in Economics
Authors:
C. Mulligan,
R. Bradford,
J. H. Davenport,
M. England,
Z. Tonks
Abstract:
We consider problems originating in economics that may be solved automatically using mathematical software. We present and make freely available a new benchmark set of such problems. The problems have been shown to fall within the framework of non-linear real arithmetic, and so are in theory soluble via Quantifier Elimination (QE) technology as usually implemented in computer algebra systems. Furt…
▽ More
We consider problems originating in economics that may be solved automatically using mathematical software. We present and make freely available a new benchmark set of such problems. The problems have been shown to fall within the framework of non-linear real arithmetic, and so are in theory soluble via Quantifier Elimination (QE) technology as usually implemented in computer algebra systems. Further, they all can be phrased in prenex normal form with only existential quantifiers and so are also admissible to those Satisfiability Module Theory (SMT) solvers that support the QF_NRA. There is a great body of work considering QE and SMT application in science and engineering, but we demonstrate here that there is potential for this technology also in the social sciences.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
TheoryGuru: A Mathematica Package to apply Quantifier Elimination
Authors:
C. Mulligan,
J. H. Davenport,
M. England
Abstract:
We consider the use of Quantifier Elimination (QE) technology for automated reasoning in economics. There is a great body of work considering QE applications in science and engineering but we demonstrate here that it also has use in the social sciences. We explain how many suggested theorems in economics could either be proven, or even have their hypotheses shown to be inconsistent, automatically…
▽ More
We consider the use of Quantifier Elimination (QE) technology for automated reasoning in economics. There is a great body of work considering QE applications in science and engineering but we demonstrate here that it also has use in the social sciences. We explain how many suggested theorems in economics could either be proven, or even have their hypotheses shown to be inconsistent, automatically via QE.
However, economists who this technology could benefit are usually unfamiliar with QE, and the use of mathematical software generally. This motivated the development of a Mathematica Package TheoryGuru, whose purpose is to lower the costs of applying QE to economics. We describe the package's functionality and give examples of its use.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
Machine Learning for Mathematical Software
Authors:
M. England
Abstract:
While there has been some discussion on how Symbolic Computation could be used for AI there is little literature on applications in the other direction. However, recent results for quantifier elimination suggest that, given enough example problems, there is scope for machine learning tools like Support Vector Machines to improve the performance of Computer Algebra Systems. We survey the authors ow…
▽ More
While there has been some discussion on how Symbolic Computation could be used for AI there is little literature on applications in the other direction. However, recent results for quantifier elimination suggest that, given enough example problems, there is scope for machine learning tools like Support Vector Machines to improve the performance of Computer Algebra Systems. We survey the authors own work and similar applications for other mathematical software.
It may seem that the inherently probabilistic nature of machine learning tools would invalidate the exact results prized by mathematical software. However, algorithms and implementations often come with a range of choices which have no effect on the mathematical correctness of the end result but a great effect on the resources required to find it, and thus here, machine learning can have a significant impact.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
Towards Incremental Cylindrical Algebraic Decomposition in Maple
Authors:
Alexander Imani Cowen-Rivers,
Matthew England
Abstract:
Cylindrical Algebraic Decomposition (CAD) is an important tool within computational real algebraic geometry, capable of solving many problems for polynomial systems over the reals. It has long been studied by the Symbolic Computation community and has found recent interest in the Satisfiability Checking community. The present report describes a proof of concept implementation of an Incremental CAD…
▽ More
Cylindrical Algebraic Decomposition (CAD) is an important tool within computational real algebraic geometry, capable of solving many problems for polynomial systems over the reals. It has long been studied by the Symbolic Computation community and has found recent interest in the Satisfiability Checking community. The present report describes a proof of concept implementation of an Incremental CAD algorithm in Maple, where CADs are built and then refined as additional polynomial constraints are added. The aim is to make CAD suitable for use as a theory solver for SMT tools who search for solutions by continually reformulating logical formula and querying whether a logical solution is admissible. We describe experiments for the proof of concept, which clearly display the computational advantages compared to iterated re-computation. In addition, the project implemented this work under the recently verified Lazard projection scheme (with corresponding Lazard valuation).
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Using Machine Learning to Improve Cylindrical Algebraic Decomposition
Authors:
Zongyan Huang,
Matthew England,
David Wilson,
James H. Davenport,
Lawrence C. Paulson
Abstract:
Cylindrical Algebraic Decomposition (CAD) is a key tool in computational algebraic geometry, best known as a procedure to enable Quantifier Elimination over real-closed fields. However, it has a worst case complexity doubly exponential in the size of the input, which is often encountered in practice. It has been observed that for many problems a change in algorithm settings or problem formulation…
▽ More
Cylindrical Algebraic Decomposition (CAD) is a key tool in computational algebraic geometry, best known as a procedure to enable Quantifier Elimination over real-closed fields. However, it has a worst case complexity doubly exponential in the size of the input, which is often encountered in practice. It has been observed that for many problems a change in algorithm settings or problem formulation can cause huge differences in runtime costs, changing problem instances from intractable to easy. A number of heuristics have been developed to help with such choices, but the complicated nature of the geometric relationships involved means these are imperfect and can sometimes make poor choices. We investigate the use of machine learning (specifically support vector machines) to make such choices instead.
Machine learning is the process of fitting a computer model to a complex function based on properties learned from measured data. In this paper we apply it in two case studies: the first to select between heuristics for choosing a CAD variable ordering; the second to identify when a CAD problem instance would benefit from Groebner Basis preconditioning. These appear to be the first such applications of machine learning to Symbolic Computation. We demonstrate in both cases that the machine learned choice outperforms human developed heuristics.
△ Less
Submitted 26 April, 2018;
originally announced April 2018.
-
Quantifier Elimination for Reasoning in Economics
Authors:
Casey B. Mulligan,
Russell Bradford,
James H. Davenport,
Matthew England,
Zak Tonks
Abstract:
We consider the use of Quantifier Elimination (QE) technology for automated reasoning in economics. QE dates back to Tarski's work in the 1940s with software to perform it dating to the 1970s. There is a great body of work considering its application in science and engineering but we show here how it can also find application in the social sciences. We explain how many suggested theorems in econom…
▽ More
We consider the use of Quantifier Elimination (QE) technology for automated reasoning in economics. QE dates back to Tarski's work in the 1940s with software to perform it dating to the 1970s. There is a great body of work considering its application in science and engineering but we show here how it can also find application in the social sciences. We explain how many suggested theorems in economics could either be proven, or even have their hypotheses shown to be inconsistent, automatically; and describe the application of this in both economics education and research. We describe a bank of QE examples gathered from economics literature and note the structure of these are, on average, quite different to those occurring in the computer algebra literature. This leads us to suggest a new incremental QE approach based on result memorization of commonly occurring generic QE results.
△ Less
Submitted 15 May, 2018; v1 submitted 26 April, 2018;
originally announced April 2018.
-
Summer Research Report: Towards Incremental Lazard Cylindrical Algebraic Decomposition
Authors:
Alexander I. Cowen-Rivers,
Matthew England
Abstract:
Cylindrical Algebraic Decomposition (CAD) is an important tool within computational real algebraic geometry, capable of solving many problems to do with polynomial systems over the reals, but known to have worst-case computational complexity doubly exponential in the number of variables. It has long been studied by the Symbolic Computation community and is implemented in a variety of computer alge…
▽ More
Cylindrical Algebraic Decomposition (CAD) is an important tool within computational real algebraic geometry, capable of solving many problems to do with polynomial systems over the reals, but known to have worst-case computational complexity doubly exponential in the number of variables. It has long been studied by the Symbolic Computation community and is implemented in a variety of computer algebra systems, however, it has also found recent interest in the Satisfiability Checking community for use with SMT-solvers. The SCSC Project seeks to build bridges between these communities.
The present report describes progress made during a Research Internship in Summer 2017 funded by the EU H2020 SCSC CSA. We describe a proof of concept implementation of an Incremental CAD algorithm in Maple, where CADs are built and refined incrementally by polynomial constraint, in contrast to the usual approach of a single computation from a single input. This advance would make CAD of use to SMT-solvers who search for solutions by constantly reformulating logical formula and querying solvers like CAD for whether a logical solution is admissible. We describe experiments for the proof of concept, which clearly display the computational advantages when compared to iterated re-computation. In addition, the project implemented this work under the recently verified Lazard projection scheme (with corresponding Lazard evaluation). That is the minimal complete CAD method in theory, and this is the first documented implementation.
△ Less
Submitted 23 April, 2018;
originally announced April 2018.
-
OpenMath and SMT-LIB
Authors:
James H. Davenport,
Matthew England,
Roberto Sebastiani,
Patrick Trentin
Abstract:
OpenMath and SMT-LIB are languages with very different origins, but both "represent mathematics". We describe SMT-LIB for the OpenMath community and consider adaptations for both languages to support the growing SC-Square initiative.
OpenMath and SMT-LIB are languages with very different origins, but both "represent mathematics". We describe SMT-LIB for the OpenMath community and consider adaptations for both languages to support the growing SC-Square initiative.
△ Less
Submitted 5 March, 2018;
originally announced March 2018.
-
Improving Sentiment Analysis in Arabic Using Word Representation
Authors:
Abdulaziz M. Alayba,
Vasile Palade,
Matthew England,
Rahat Iqbal
Abstract:
The complexities of Arabic language in morphology, orthography and dialects makes sentiment analysis for Arabic more challenging. Also, text feature extraction from short messages like tweets, in order to gauge the sentiment, makes this task even more difficult. In recent years, deep neural networks were often employed and showed very good results in sentiment classification and natural language p…
▽ More
The complexities of Arabic language in morphology, orthography and dialects makes sentiment analysis for Arabic more challenging. Also, text feature extraction from short messages like tweets, in order to gauge the sentiment, makes this task even more difficult. In recent years, deep neural networks were often employed and showed very good results in sentiment classification and natural language processing applications. Word embedding, or word distributing approach, is a current and powerful tool to capture together the closest words from a contextual text. In this paper, we describe how we construct Word2Vec models from a large Arabic corpus obtained from ten newspapers in different Arab countries. By applying different machine learning algorithms and convolutional neural networks with different text feature selections, we report improved accuracy of sentiment classification (91%-95%) on our publicly available Arabic language health sentiment dataset [1]
△ Less
Submitted 30 March, 2018; v1 submitted 28 February, 2018;
originally announced March 2018.
-
The Potential and Challenges of CAD with Equational Constraints for SC-Square
Authors:
James H. Davenport,
Matthew England
Abstract:
Cylindrical algebraic decomposition (CAD) is a core algorithm within Symbolic Computation, particularly for quantifier elimination over the reals and polynomial systems solving more generally. It is now finding increased application as a decision procedure for Satisfiability Modulo Theories (SMT) solvers when working with non-linear real arithmetic. We discuss the potentials from increased focus o…
▽ More
Cylindrical algebraic decomposition (CAD) is a core algorithm within Symbolic Computation, particularly for quantifier elimination over the reals and polynomial systems solving more generally. It is now finding increased application as a decision procedure for Satisfiability Modulo Theories (SMT) solvers when working with non-linear real arithmetic. We discuss the potentials from increased focus on the logical structure of the input brought by the SMT applications and SC-Square project, particularly the presence of equational constraints. We also highlight the challenges for exploiting these: primitivity restrictions, well-orientedness questions, and the prospect of incrementality.
△ Less
Submitted 1 November, 2017;
originally announced November 2017.
-
Symbolic Versus Numerical Computation and Visualization of Parameter Regions for Multistationarity of Biological Networks
Authors:
Matthew England,
Hassan Errami,
Dima Grigoriev,
Ovidiu Radulescu,
Thomas Sturm,
Andreas Weber
Abstract:
We investigate models of the mitogenactivated protein kinases (MAPK) network, with the aim of determining where in parameter space there exist multiple positive steady states. We build on recent progress which combines various symbolic computation methods for mixed systems of equalities and inequalities. We demonstrate that those techniques benefit tremendously from a newly implemented graph theor…
▽ More
We investigate models of the mitogenactivated protein kinases (MAPK) network, with the aim of determining where in parameter space there exist multiple positive steady states. We build on recent progress which combines various symbolic computation methods for mixed systems of equalities and inequalities. We demonstrate that those techniques benefit tremendously from a newly implemented graph theoretical symbolic preprocessing method. We compare computation times and quality of results of numerical continuation methods with our symbolic approach before and after the application of our preprocessing.
△ Less
Submitted 27 June, 2017;
originally announced June 2017.