An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries

Pedawi, Aryan; Gniewek, Pawel; Chang, Chaoyi; Anderson, Brandon M.; Bedem, Henry van den

Quantitative Biology > Quantitative Methods

arXiv:2211.04468 (q-bio)

[Submitted on 19 Oct 2022]

Title:An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries

Authors:Aryan Pedawi, Pawel Gniewek, Chaoyi Chang, Brandon M. Anderson, Henry van den Bedem

View PDF

Abstract:Virtual, make-on-demand chemical libraries have transformed early-stage drug discovery by unlocking vast, synthetically accessible regions of chemical space. Recent years have witnessed rapid growth in these libraries from millions to trillions of compounds, hiding undiscovered, potent hits for a variety of therapeutic targets. However, they are quickly approaching a size beyond that which permits explicit enumeration, presenting new challenges for virtual screening. To overcome these challenges, we propose the Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE). The proposed generative model represents such libraries as a differentiable, hierarchically-organized database. Given a compound from the library, the molecular encoder constructs a query for retrieval, which is utilized by the molecular decoder to reconstruct the compound by first decoding its chemical reaction and subsequently decoding its reactants. Our design minimizes autoregression in the decoder, facilitating the generation of large, valid molecular graphs. Our method performs fast and parallel batch inference for ultra-large synthesis libraries, enabling a number of important applications in early-stage drug discovery. Compounds proposed by our method are guaranteed to be in the library, and thus synthetically and cost-effectively accessible. Importantly, CSLVAE can encode out-of-library compounds and search for in-library analogues. In experiments, we demonstrate the capabilities of the proposed method in the navigation of massive combinatorial synthesis libraries.

Comments:	36th Conference on Neural Information Processing Systems (NeurIPS 2022)
Subjects:	Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:2211.04468 [q-bio.QM]
	(or arXiv:2211.04468v1 [q-bio.QM] for this version)
	https://doi.org/10.48550/arXiv.2211.04468

Submission history

From: Aryan Pedawi [view email]
[v1] Wed, 19 Oct 2022 15:43:13 UTC (1,600 KB)

Quantitative Biology > Quantitative Methods

Title:An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Quantitative Methods

Title:An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators