-
Towards Environment-Sensitive Molecular Inference via Mixed Integer Linear Programming
Authors:
Jianshen Zhu,
Mao Takekida,
Naveed Ahmed Azam,
Kazuya Haraguchi,
Liang Zhao,
Tatsuya Akutsu
Abstract:
Traditional QSAR/QSPR and inverse QSAR/QSPR methods often assume that chemical properties are dictated by single molecules, overlooking the influence of molecular interactions and environmental factors. In this paper, we introduce a novel QSAR/QSPR framework that can capture the combined effects of multiple molecules (e.g., small molecules or polymers) and experimental conditions on property value…
▽ More
Traditional QSAR/QSPR and inverse QSAR/QSPR methods often assume that chemical properties are dictated by single molecules, overlooking the influence of molecular interactions and environmental factors. In this paper, we introduce a novel QSAR/QSPR framework that can capture the combined effects of multiple molecules (e.g., small molecules or polymers) and experimental conditions on property values. We design a feature function to integrate the information of multiple molecules and the environment. Specifically, for the property Flory-Huggins $χ$-parameter, which characterizes the thermodynamic properties between the solute and the solvent, and varies in temperatures, we demonstrate through computational experimental results that our approach can achieve a competitively high learning performance compared to existing works on predicting $χ$-parameter values, while inferring the solute polymers with up to 50 non-hydrogen atoms in their monomer forms in a relatively short time. A comparison study with the simulation software J-OCTA demonstrates that the polymers inferred by our methods are of high quality.
△ Less
Submitted 17 February, 2025;
originally announced March 2025.
-
Counting Tree-Like Multigraphs with a Given Number of Vertices and Multiple Edges
Authors:
Muhammad Ilyas,
Seemab Hayat,
Naveed Ahmed Azam
Abstract:
The enumeration of chemical graphs is an important topic in cheminformatics and bioinformatics, particularly in the discovery of novel drugs. These graphs are typically either tree-like multigraphs or composed of tree-like multigraphs connected to a core structure. In both cases, the tree-like components play a significant role in determining the properties and activities of chemical compounds. Th…
▽ More
The enumeration of chemical graphs is an important topic in cheminformatics and bioinformatics, particularly in the discovery of novel drugs. These graphs are typically either tree-like multigraphs or composed of tree-like multigraphs connected to a core structure. In both cases, the tree-like components play a significant role in determining the properties and activities of chemical compounds. This paper introduces a method based on dynamic programming to efficiently count tree-like multigraphs with a given number $n$ of vertices and $Δ$ multiple edges. The idea of our method is to consider multigraphs as rooted multigraphs by selecting their unicentroid or bicentroid as the root, and define their canonical representation based on maximal subgraphs rooted at the children of the root. This representation guarantees that our proposed method will not repeat a multigraph in the counting process. Finally, recursive relations are derived based on the number of vertices and multiple edges in the maximal subgraphs rooted at the children of roots. These relations lead to an algorithm with a time complexity of $\mathcal{O}(n^2(n + Δ(n + Δ^2 \cdot \min\{n, Δ\})))$ and a space complexity of $\mathcal{O}(n^2(Δ^3+1))$. Experimental results show that the proposed algorithm efficiently counts the desired multigraphs with up to 170 vertices and 50 multiple edges in approximately 930 seconds, confirming its effectiveness and potential as a valuable tool for exploring the chemical graph space in novel drug discovery.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
A Method to Generate Multi-interval Pairwise Compatibility Graphs
Authors:
Seemab Hayat,
Naveed Ahmed Azam
Abstract:
Reconstruction of evolutionary relationships between species is an important topic in the field of computational biology. Pairwise compatibility graphs (PCGs) are used to model such relationships. A graph is a PCG if its edges can be represented by the distance between the leaves of an edge-weighted tree within a fixed interval. If the number of intervals is more than one, then the graph with such…
▽ More
Reconstruction of evolutionary relationships between species is an important topic in the field of computational biology. Pairwise compatibility graphs (PCGs) are used to model such relationships. A graph is a PCG if its edges can be represented by the distance between the leaves of an edge-weighted tree within a fixed interval. If the number of intervals is more than one, then the graph with such a tree representation is called a multi-interval PCG. The aim of this paper is to generate all multi-interval PCGs with a given number of vertices. For this purpose, we propose a method to generate almost all multi-interval PCGs corresponding to a given tree by randomly assigning edge weights and selecting typical intervals. To reduce the exponential tree search space, we theoretically prove that for each multi-interval PCG there exists a tree whose internal vertices have degree exactly three, and developed an algorithm to enumerate such trees. The proposed method is applied to enumerate all two-interval PCGs with up to ten vertices. Our computational results establish that all graphs with up to ten vertices are 2-IPCGs, making significant progress towards the open problem of determining whether a non-2-IPCG exists with fewer than 135 vertices.
△ Less
Submitted 15 October, 2024; v1 submitted 14 October, 2024;
originally announced October 2024.
-
A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility
Authors:
Muniba Batool,
Naveed Ahmed Azam,
Jianshen Zhu,
Kazuya Haraguchi,
Liang Zhao,
Tatsuya Akutsu
Abstract:
Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR) and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwi…
▽ More
Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR) and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving the accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6, 1204] seconds. These findings indicate a strong correlation between the simple graph-theoretic descriptors and the AS of compounds, potentially leading to a deeper understanding of their AS without relying on widely used complicated chemical descriptors and complex machine learning models that are computationally expensive, and therefore difficult to use for inference. An implementation of the proposed approach is available at https://github.com/ku-dml/mol-infer/tree/master/AqSol.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Cycle-Configuration: A Novel Graph-theoretic Descriptor Set for Molecular Inference
Authors:
Bowen Song,
Jianshen Zhu,
Naveed Ahmed Azam,
Kazuya Haraguchi,
Liang Zhao,
Tatsuya Akutsu
Abstract:
In this paper, we propose a novel family of descriptors of chemical graphs, named cycle-configuration (CC), that can be used in the standard "two-layered (2L) model" of mol-infer, a molecular inference framework based on mixed integer linear programming (MILP) and machine learning (ML). Proposed descriptors capture the notion of ortho/meta/para patterns that appear in aromatic rings, which has bee…
▽ More
In this paper, we propose a novel family of descriptors of chemical graphs, named cycle-configuration (CC), that can be used in the standard "two-layered (2L) model" of mol-infer, a molecular inference framework based on mixed integer linear programming (MILP) and machine learning (ML). Proposed descriptors capture the notion of ortho/meta/para patterns that appear in aromatic rings, which has been impossible in the framework so far. Computational experiments show that, when the new descriptors are supplied, we can construct prediction functions of similar or better performance for all of the 27 tested chemical properties. We also provide an MILP formulation that asks for a chemical graph with desired properties under the 2L model with CC descriptors (2L+CC model). We show that a chemical graph with up to 50 non-hydrogen vertices can be inferred in a practical time.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Molecular Design Based on Integer Programming and Splitting Data Sets by Hyperplanes
Authors:
Jianshen Zhu,
Naveed Ahmed Azam,
Kazuya Haraguchi,
Liang Zhao,
Hiroshi Nagamochi,
Tatsuya Akutsu
Abstract:
A novel framework for designing the molecular structure of chemical compounds with a desired chemical property has recently been proposed. The framework infers a desired chemical graph by solving a mixed integer linear program (MILP) that simulates the computation process of a feature function defined by a two-layered model on chemical graphs and a prediction function constructed by a machine lear…
▽ More
A novel framework for designing the molecular structure of chemical compounds with a desired chemical property has recently been proposed. The framework infers a desired chemical graph by solving a mixed integer linear program (MILP) that simulates the computation process of a feature function defined by a two-layered model on chemical graphs and a prediction function constructed by a machine learning method. To improve the learning performance of prediction functions in the framework, we design a method that splits a given data set $\mathcal{C}$ into two subsets $\mathcal{C}^{(i)},i=1,2$ by a hyperplane in a chemical space so that most compounds in the first (resp., second) subset have observed values lower (resp., higher) than a threshold $θ$. We construct a prediction function $ψ$ to the data set $\mathcal{C}$ by combining prediction functions $ψ_i,i=1,2$ each of which is constructed on $\mathcal{C}^{(i)}$ independently. The results of our computational experiments suggest that the proposed method improved the learning performance for several chemical properties to which a good prediction function has been difficult to construct.
△ Less
Submitted 27 April, 2023;
originally announced May 2023.
-
Molecular Design Based on Integer Programming and Quadratic Descriptors in a Two-layered Model
Authors:
Jianshen Zhu,
Naveed Ahmed Azam,
Shengjuan Cao,
Ryota Ido,
Kazuya Haraguchi,
Liang Zhao,
Hiroshi Nagamochi,
Tatsuya Akutsu
Abstract:
A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property, where design of novel drugs is an important topic in bioinformatics and chemo-informatics. The framework infers a desired chemical graph by solving a mixed integer linear program (MILP) that simulates the computation process of a feature function defined by a t…
▽ More
A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property, where design of novel drugs is an important topic in bioinformatics and chemo-informatics. The framework infers a desired chemical graph by solving a mixed integer linear program (MILP) that simulates the computation process of a feature function defined by a two-layered model on chemical graphs and a prediction function constructed by a machine learning method. A set of graph theoretical descriptors in the feature function plays a key role to derive a compact formulation of such an MILP. To improve the learning performance of prediction functions in the framework maintaining the compactness of the MILP, this paper utilizes the product of two of those descriptors as a new descriptor and then designs a method of reducing the number of descriptors. The results of our computational experiments suggest that the proposed method improved the learning performance for many chemical properties and can infer a chemical structure with up to 50 non-hydrogen atoms.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
A Method for Inferring Polymers Based on Linear Regression and Integer Programming
Authors:
Ryota Ido,
Shengjuan Cao,
Jianshen Zhu,
Naveed Ahmed Azam,
Kazuya Haraguchi,
Liang Zhao,
Hiroshi Nagamochi,
Tatsuya Akutsu
Abstract:
A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property using both artificial neural networks and mixed integer linear programming. In this paper, we design a new method for inferring a polymer based on the framework. For this, we introduce a new way of representing a polymer as a form of monomer and define new descr…
▽ More
A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property using both artificial neural networks and mixed integer linear programming. In this paper, we design a new method for inferring a polymer based on the framework. For this, we introduce a new way of representing a polymer as a form of monomer and define new descriptors that feature the structure of polymers. We also use linear regression as a building block of constructing a prediction function in the framework. The results of our computational experiments reveal a set of chemical properties on polymers to which a prediction function constructed with linear regression performs well. We also observe that the proposed method can infer polymers with up to 50 non-hydrogen atoms in a monomer form.
△ Less
Submitted 24 August, 2021;
originally announced September 2021.
-
Molecular Design Based on Artificial Neural Networks, Integer Programming and Grid Neighbor Search
Authors:
Naveed Ahmed Azam,
Jianshen Zhu,
Kazuya Haraguchi,
Liang Zhao,
Hiroshi Nagamochi,
Tatsuya Akutsu
Abstract:
A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property using both artificial neural networks and mixed integer linear programming. In the framework, a chemical graph with a target chemical value is inferred as a feasible solution of a mixed integer linear program that represents a prediction function and other requi…
▽ More
A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property using both artificial neural networks and mixed integer linear programming. In the framework, a chemical graph with a target chemical value is inferred as a feasible solution of a mixed integer linear program that represents a prediction function and other requirements on the structure of graphs. In this paper, we propose a procedure for generating other feasible solutions of the mixed integer linear program by searching the neighbor of output chemical graph in a search space. The procedure is combined in the framework as a new building block. The results of our computational experiments suggest that the proposed method can generate an additional number of new chemical graphs with up to 50 non-hydrogen atoms.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
An Inverse QSAR Method Based on Linear Regression and Integer Programming
Authors:
Jianshen Zhu,
Naveed Ahmed Azam,
Kazuya Haraguchi,
Liang Zhao,
Hiroshi Nagamochi,
Tatsuya Akutsu
Abstract:
Recently a novel framework has been proposed for designing the molecular structure of chemical compounds using both artificial neural networks (ANNs) and mixed integer linear programming (MILP). In the framework, we first define a feature vector $f(C)$ of a chemical graph $C$ and construct an ANN that maps $x=f(C)$ to a predicted value $η(x)$ of a chemical property $π$ to $C$. After this, we formu…
▽ More
Recently a novel framework has been proposed for designing the molecular structure of chemical compounds using both artificial neural networks (ANNs) and mixed integer linear programming (MILP). In the framework, we first define a feature vector $f(C)$ of a chemical graph $C$ and construct an ANN that maps $x=f(C)$ to a predicted value $η(x)$ of a chemical property $π$ to $C$. After this, we formulate an MILP that simulates the computation process of $f(C)$ from $C$ and that of $η(x)$ from $x$. Given a target value $y^*$ of the chemical property $π$, we infer a chemical graph $C^\dagger$ such that $η(f(C^\dagger))=y^*$ by solving the MILP. In this paper, we use linear regression to construct a prediction function $η$ instead of ANNs. For this, we derive an MILP formulation that simulates the computation process of a prediction function by linear regression. The results of computational experiments suggest our method can infer chemical graphs with around up to 50 non-hydrogen atoms.
△ Less
Submitted 23 August, 2021; v1 submitted 6 July, 2021;
originally announced July 2021.
-
A Novel Method for Inference of Acyclic Chemical Compounds with Bounded Branch-height Based on Artificial Neural Networks and Integer Programming
Authors:
Naveed Ahmed Azam,
Jianshen Zhu,
Yanming Sun,
Yu Shi,
Aleksandar Shurbevski,
Liang Zhao,
Hiroshi Nagamochi,
Tatsuya Akutsu
Abstract:
Analysis of chemical graphs is a major research topic in computational molecular biology due to its potential applications to drug design. One approach is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a framework has been proposed for inverse QSAR/QSPR using arti…
▽ More
Analysis of chemical graphs is a major research topic in computational molecular biology due to its potential applications to drug design. One approach is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a framework has been proposed for inverse QSAR/QSPR using artificial neural networks (ANN) and mixed integer linear programming (MILP). This method consists of a prediction phase and an inverse prediction phase. In the first phase, a feature vector $f(G)$ of a chemical graph $G$ is introduced and a prediction function $ψ$ on a chemical property $π$ is constructed with an ANN. In the second phase, given a target value $y^*$ of property $π$, a feature vector $x^*$ is inferred by solving an MILP formulated from the trained ANN so that $ψ(x^*)$ is close to $y^*$ and then a set of chemical structures $G^*$ such that $f(G^*)= x^*$ is enumerated by a graph search algorithm. The framework has been applied to the case of chemical compounds with cycle index up to 2. The computational results conducted on instances with $n$ non-hydrogen atoms show that a feature vector $x^*$ can be inferred for up to around $n=40$ whereas graphs $G^*$ can be enumerated for up to $n=15$. When applied to the case of chemical acyclic graphs, the maximum computable diameter of $G^*$ was around up to around 8. We introduce a new characterization of graph structure, "branch-height," based on which an MILP formulation and a graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using properties such as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs $G^*$ with $n=50$ and diameter 30.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
Efficient and Secure Substitution Box and Random Number Generators Over Mordell Elliptic Curves
Authors:
Ikram Ullah,
Naveed Ahmed Azam,
Umar Hayat
Abstract:
Elliptic curve cryptography has received great attention in recent years due to its high resistance against modern cryptanalysis. The aim of this article is to present efficient generators to generate substitution boxes (S-boxes) and pseudo random numbers which are essential for many well-known cryptosystems. These generators are based on a special class of ordered Mordell elliptic curves. Rigorou…
▽ More
Elliptic curve cryptography has received great attention in recent years due to its high resistance against modern cryptanalysis. The aim of this article is to present efficient generators to generate substitution boxes (S-boxes) and pseudo random numbers which are essential for many well-known cryptosystems. These generators are based on a special class of ordered Mordell elliptic curves. Rigorous analyses are performed to test the security strength of the proposed generators. For a given prime, the experimental results reveal that the proposed generators are capable of generating a large number of distinct, mutually uncorrelated, cryptographically strong S-boxes and sequences of random numbers in low time and space complexity. Furthermore, it is evident from the comparison that the proposed schemes can efficiently generate secure S-boxes and random numbers as compared to some of the well-known existing schemes over different mathematical structures.
△ Less
Submitted 12 October, 2019;
originally announced October 2019.
-
Efficient Construction of a Substitution Box Based on a Mordell Elliptic Curve Over a Finite Field
Authors:
Naveed Ahmed Azam,
Umar Hayat,
Ikram Ullah
Abstract:
Elliptic curve cryptography (ECC) is used in many security systems due to its small key size and high security as compared to the other cryptosystems. In many well-known security systems substitution box (S-box) is the only non-linear component. Recently, it is shown that the security of a cryptosystem can be improved by using dynamic S-boxes instead of a static S-box. This fact necessitates the c…
▽ More
Elliptic curve cryptography (ECC) is used in many security systems due to its small key size and high security as compared to the other cryptosystems. In many well-known security systems substitution box (S-box) is the only non-linear component. Recently, it is shown that the security of a cryptosystem can be improved by using dynamic S-boxes instead of a static S-box. This fact necessitates the construction of new secure S-boxes. In this paper, we propose an efficient method for the generation of S-boxes based on a class of Mordell elliptic curves (MECs) over prime fields by defining different total orders. The proposed scheme is developed in such a way that for each input it outputs an S-box in linear time and constant space. Due to this property, our method takes less time and space as compared to all existing S-box construction methods over elliptic curve. Furthermore, it is shown by the computational results that the proposed method is capable of generating cryptographically strong S-boxes with comparable security to some of the existing S-boxes constructed over different mathematical structures.
△ Less
Submitted 15 January, 2019; v1 submitted 28 September, 2018;
originally announced September 2018.