FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

Morehead, Alex; Cheng, Jianlin

Computer Science > Machine Learning

arXiv:2412.10966 (cs)

[Submitted on 14 Dec 2024 (v1), last revised 24 Mar 2025 (this version, v3)]

Title:FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

Authors:Alex Morehead, Jianlin Cheng

View PDF HTML (experimental)

Abstract:Powerful generative AI models of protein-ligand structure have recently been proposed, but few of these methods support both flexible protein-ligand docking and affinity estimation. Of those that do, none can directly model multiple binding ligands concurrently or have been rigorously benchmarked on pharmacologically relevant drug targets, hindering their widespread adoption in drug discovery efforts. In this work, we propose FlowDock, the first deep geometric generative model based on conditional flow matching that learns to directly map unbound (apo) structures to their bound (holo) counterparts for an arbitrary number of binding ligands. Furthermore, FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures, enabling fast virtual screening of new (multi-ligand) drug targets. For the well-known PoseBusters Benchmark dataset, FlowDock outperforms single-sequence AlphaFold 3 with a 51% blind docking success rate using unbound (apo) protein input structures and without any information derived from multiple sequence alignments, and for the challenging new DockGen-E dataset, FlowDock outperforms single-sequence AlphaFold 3 and matches single-sequence Chai-1 for binding pocket generalization. Additionally, in the ligand category of the 16th community-wide Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock ranked among the top-5 methods for pharmacological binding affinity estimation across 140 protein-ligand complexes, demonstrating the efficacy of its learned representations in virtual screening. Source code, data, and pre-trained models are available at this https URL.

Comments:	15 pages, 2 tables, 2 algorithms, 11 figures. Code, data, pre-trained models, and baseline method predictions are available at this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)
ACM classes:	I.2.1; J.3
Cite as:	arXiv:2412.10966 [cs.LG]
	(or arXiv:2412.10966v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.10966

Submission history

From: Alex Morehead [view email]
[v1] Sat, 14 Dec 2024 20:54:37 UTC (4,280 KB)
[v2] Wed, 15 Jan 2025 21:20:03 UTC (5,065 KB)
[v3] Mon, 24 Mar 2025 16:50:30 UTC (19,721 KB)

Computer Science > Machine Learning

Title:FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators