-
Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation
Authors:
Guillaume Huguet,
James Vuckovic,
Kilian Fatras,
Eric Thibodeau-Laufer,
Pablo Lemos,
Riashat Islam,
Cheng-Hao Liu,
Jarrid Rector-Brooks,
Tara Akhound-Sadegh,
Michael Bronstein,
Alexander Tong,
Avishek Joey Bose
Abstract:
Proteins are essential for almost all biological processes and derive their diverse functions from complex 3D structures, which are in turn determined by their amino acid sequences. In this paper, we exploit the rich biological inductive bias of amino acid sequences and introduce FoldFlow-2, a novel sequence-conditioned SE(3)-equivariant flow matching model for protein structure generation. FoldFl…
▽ More
Proteins are essential for almost all biological processes and derive their diverse functions from complex 3D structures, which are in turn determined by their amino acid sequences. In this paper, we exploit the rich biological inductive bias of amino acid sequences and introduce FoldFlow-2, a novel sequence-conditioned SE(3)-equivariant flow matching model for protein structure generation. FoldFlow-2 presents substantial new architectural features over the previous FoldFlow family of models including a protein large language model to encode sequence, a new multi-modal fusion trunk that combines structure and sequence representations, and a geometric transformer based decoder. To increase diversity and novelty of generated samples -- crucial for de-novo drug design -- we train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works, containing both known proteins in PDB and high-quality synthetic structures achieved through filtering. We further demonstrate the ability to align FoldFlow-2 to arbitrary rewards, e.g. increasing secondary structures diversity, by introducing a Reinforced Finetuning (ReFT) objective. We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models, improving over RFDiffusion in terms of unconditional generation across all metrics including designability, diversity, and novelty across all protein lengths, as well as exhibiting generalization on the task of equilibrium conformation sampling. Finally, we demonstrate that a fine-tuned FoldFlow-2 makes progress on challenging conditional design tasks such as designing scaffolds for the VHH nanobody.
△ Less
Submitted 11 December, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Optimal control of oscillatory neuronal models with applications to communication through coherence
Authors:
Michael Orieux,
Antoni Guillamon,
Gemma Huguet
Abstract:
Macroscopic oscillations in the brain are involved in various cognitive and physiological processes, yet their precise function is not not completely understood. Communication Through Coherence (CTC) theory proposes that these rhythmic electrical patterns might serve to regulate the information flow between neural populations. Thus, to communicate effectively, neural populations must synchronize t…
▽ More
Macroscopic oscillations in the brain are involved in various cognitive and physiological processes, yet their precise function is not not completely understood. Communication Through Coherence (CTC) theory proposes that these rhythmic electrical patterns might serve to regulate the information flow between neural populations. Thus, to communicate effectively, neural populations must synchronize their oscillatory activity, ensuring that input volleys from the presynaptic population reach the postsynaptic one at its maximum phase of excitability. We consider an Excitatory-Inhibitory (E-I) network whose macroscopic activity is described by an exact mean-field model. The E-I network receives periodic inputs from either one or two external sources, for which effective communication will not be achieved in the absence of control. We explore strategies based on optimal control theory for phase-amplitude dynamics to design a control that sets the target population in the optimal phase to synchronize its activity with a specific presynaptic input signal and establish communication. The control mechanism resembles the role of a higher cortical area in the context of selective attention. To design the control, we use the phase-amplitude reduction of a limit cycle and leverage recent developments in this field in order to find the most effective control strategy regarding a defined cost function. Furthermore, we present results that guarantee the local controllability of the system close to the limit cycle.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Authors:
Guillaume Huguet,
Alexander Tong,
Edward De Brouwer,
Yanlei Zhang,
Guy Wolf,
Ian Adelstein,
Smita Krishnaswamy
Abstract:
Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensional, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoret…
▽ More
Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensional, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoretical links have been established. Here, we establish such a link via results in Riemannian geometry explicitly connecting heat diffusion to manifold distances. In this process, we also formulate a more general heat kernel based manifold embedding method that we call heat geodesic embeddings. This novel perspective makes clearer the choices available in manifold learning and denoising. Results show that our method outperforms existing state of the art in preserving ground truth manifold distances, and preserving cluster structure in toy datasets. We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure, where our method enables interpolation of withheld timepoints of data. Finally, we show that parameters of our more general method can be configured to give results similar to PHATE (a state-of-the-art diffusion based manifold learning method) as well as SNE (an attraction/repulsion neighborhood based method that forms the basis of t-SNE).
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Geodesic Sinkhorn for Fast and Accurate Optimal Transport on Manifolds
Authors:
Guillaume Huguet,
Alexander Tong,
María Ramos Zapatero,
Christopher J. Tape,
Guy Wolf,
Smita Krishnaswamy
Abstract:
Efficient computation of optimal transport distance between distributions is of growing importance in data science. Sinkhorn-based methods are currently the state-of-the-art for such computations, but require $O(n^2)$ computations. In addition, Sinkhorn-based methods commonly use an Euclidean ground distance between datapoints. However, with the prevalence of manifold structured scientific data, i…
▽ More
Efficient computation of optimal transport distance between distributions is of growing importance in data science. Sinkhorn-based methods are currently the state-of-the-art for such computations, but require $O(n^2)$ computations. In addition, Sinkhorn-based methods commonly use an Euclidean ground distance between datapoints. However, with the prevalence of manifold structured scientific data, it is often desirable to consider geodesic ground distance. Here, we tackle both issues by proposing Geodesic Sinkhorn -- based on diffusing a heat kernel on a manifold graph. Notably, Geodesic Sinkhorn requires only $O(n\log n)$ computation, as we approximate the heat kernel with Chebyshev polynomials based on the sparse graph Laplacian. We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy. In particular, we define the barycentric distance as the distance between two such barycenters. Using this definition, we identify an optimal transport distance and path associated with the effect of treatment on cellular data.
△ Less
Submitted 26 September, 2023; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Phase-locked states in oscillating neural networks and their role in neural communication
Authors:
Alberto Pérez-Cervera,
Tere M. Seara,
Gemma Huguet
Abstract:
The theory of communication through coherence (CTC) proposes that brain oscillations reflect changes in the excitability of neurons, and therefore the successful communication between two oscillating neural populations depends not only on the strength of the signal emitted but also on the relative phases between them. More precisely, effective communication occurs when the emitting and receiving p…
▽ More
The theory of communication through coherence (CTC) proposes that brain oscillations reflect changes in the excitability of neurons, and therefore the successful communication between two oscillating neural populations depends not only on the strength of the signal emitted but also on the relative phases between them. More precisely, effective communication occurs when the emitting and receiving populations are properly phase locked so the inputs sent by the emitting population arrive at the phases of maximal excitability of the receiving population. To study this setting, we consider a population rate model consisting of excitatory and inhibitory cells modelling the receiving population, and we perturb it with a time-dependent periodic function modelling the input from the emitting population. We consider the stroboscopic map for this system and compute numerically the fixed and periodic points of this map and their bifurcations as the amplitude and the frequency of the perturbation are varied. From the bifurcation diagram, we identify the phase-locked states as well as different regions of bistability. We explore carefully the dynamics emphasizing its implications for the CTC theory. In particular, we study how the input gain depends on the timing between the input and the inhibitory action of the receiving population. Our results show that naturally an optimal phase locking for CTC emerges, and provide a mechanism by which the receiving population can implement selective communication. Moreover, the presence of bistable regions, suggests a mechanism by which different communication regimes between brain areas can be established without changing the structure of the network
△ Less
Submitted 11 September, 2019; v1 submitted 15 May, 2019;
originally announced May 2019.