-
Addressing Model Overcomplexity in Drug-Drug Interaction Prediction With Molecular Fingerprints
Authors:
Manel Gil-Sorribes,
Alexis Molina
Abstract:
Accurately predicting drug-drug interactions (DDIs) is crucial for pharmaceutical research and clinical safety. Recent deep learning models often suffer from high computational costs and limited generalization across datasets. In this study, we investigate a simpler yet effective approach using molecular representations such as Morgan fingerprints (MFPS), graph-based embeddings from graph convolut…
▽ More
Accurately predicting drug-drug interactions (DDIs) is crucial for pharmaceutical research and clinical safety. Recent deep learning models often suffer from high computational costs and limited generalization across datasets. In this study, we investigate a simpler yet effective approach using molecular representations such as Morgan fingerprints (MFPS), graph-based embeddings from graph convolutional networks (GCNs), and transformer-derived embeddings from MoLFormer integrated into a straightforward neural network. We benchmark our implementation on DrugBank DDI splits and a drug-drug affinity (DDA) dataset from the Food and Drug Administration. MFPS along with MoLFormer and GCN representations achieve competitive performance across tasks, even in the more challenging leak-proof split, highlighting the sufficiency of simple molecular representations. Moreover, we are able to identify key molecular motifs and structural patterns relevant to drug interactions via gradient-based analyses using the representations under study. Despite these results, dataset limitations such as insufficient chemical diversity, limited dataset size, and inconsistent labeling impact robust evaluation and challenge the need for more complex approaches. Our work provides a meaningful baseline and emphasizes the need for better dataset curation and progressive complexity scaling.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
Character-level Tokenizations as Powerful Inductive Biases for RNA Foundational Models
Authors:
Adrián Morales-Pastor,
Raquel Vázquez-Reza,
Miłosz Wieczór,
Clàudia Valverde,
Manel Gil-Sorribes,
Bertran Miquel-Oliver,
Álvaro Ciudad,
Alexis Molina
Abstract:
RNA is a vital biomolecule with numerous roles and functions within cells, and interest in targeting it for therapeutic purposes has grown significantly in recent years. However, fully understanding and predicting RNA behavior, particularly for applications in drug discovery, remains a challenge due to the complexity of RNA structures and interactions. While foundational models in biology have dem…
▽ More
RNA is a vital biomolecule with numerous roles and functions within cells, and interest in targeting it for therapeutic purposes has grown significantly in recent years. However, fully understanding and predicting RNA behavior, particularly for applications in drug discovery, remains a challenge due to the complexity of RNA structures and interactions. While foundational models in biology have demonstrated success in modeling several biomolecules, especially proteins, achieving similar breakthroughs for RNA has proven more difficult. Current RNA models have yet to match the performance observed in the protein domain, leaving an important gap in computational biology. In this work, we present ChaRNABERT, a suite of sample and parameter-efficient RNA foundational models, that through a learnable tokenization process, are able to reach state-of-the-art performance on several tasks in established benchmarks. We extend its testing in relevant downstream tasks such as RNA-protein and aptamer-protein interaction prediction. Weights and inference code for ChaRNABERT-8M will be provided for academic research use. The other models will be available upon request.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
ICML Topological Deep Learning Challenge 2024: Beyond the Graph Domain
Authors:
Guillermo Bernárdez,
Lev Telyatnikov,
Marco Montagna,
Federica Baccini,
Mathilde Papillon,
Miquel Ferriol-Galmés,
Mustafa Hajij,
Theodore Papamarkou,
Maria Sofia Bucarelli,
Olga Zaghen,
Johan Mathe,
Audun Myers,
Scott Mahan,
Hansen Lillemark,
Sharvaree Vadgama,
Erik Bekkers,
Tim Doster,
Tegan Emerson,
Henry Kvinge,
Katrina Agate,
Nesreen K Ahmed,
Pengfei Bai,
Michael Banf,
Claudio Battiloro,
Maxim Beketov
, et al. (48 additional authors not shown)
Abstract:
This paper describes the 2nd edition of the ICML Topological Deep Learning Challenge that was hosted within the ICML 2024 ELLIS Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). The challenge focused on the problem of representing data in different discrete topological domains in order to bridge the gap between Topological Deep Learning (TDL) and other types of…
▽ More
This paper describes the 2nd edition of the ICML Topological Deep Learning Challenge that was hosted within the ICML 2024 ELLIS Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). The challenge focused on the problem of representing data in different discrete topological domains in order to bridge the gap between Topological Deep Learning (TDL) and other types of structured datasets (e.g. point clouds, graphs). Specifically, participants were asked to design and implement topological liftings, i.e. mappings between different data structures and topological domains --like hypergraphs, or simplicial/cell/combinatorial complexes. The challenge received 52 submissions satisfying all the requirements. This paper introduces the main scope of the challenge, and summarizes the main results and findings.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
TopoBench: A Framework for Benchmarking Topological Deep Learning
Authors:
Lev Telyatnikov,
Guillermo Bernardez,
Marco Montagna,
Mustafa Hajij,
Martin Carrasco,
Pavlo Vasylenko,
Mathilde Papillon,
Ghada Zamzmi,
Michael T. Schaub,
Jonas Verhellen,
Pavel Snopov,
Bertran Miquel-Oliver,
Manel Gil-Sorribes,
Alexis Molina,
Victor Guallar,
Theodore Long,
Julian Suk,
Patryk Rygiel,
Alexander Nikitin,
Giordan Escalona,
Michael Banf,
Dominik Filipiak,
Max Schattauer,
Liliya Imasheva,
Alvaro Martinez
, et al. (12 additional authors not shown)
Abstract:
This work introduces TopoBench, an open-source library designed to standardize benchmarking and accelerate research in topological deep learning (TDL). TopoBench decomposes TDL into a sequence of independent modules for data generation, loading, transforming and processing, as well as model training, optimization and evaluation. This modular organization provides flexibility for modifications and…
▽ More
This work introduces TopoBench, an open-source library designed to standardize benchmarking and accelerate research in topological deep learning (TDL). TopoBench decomposes TDL into a sequence of independent modules for data generation, loading, transforming and processing, as well as model training, optimization and evaluation. This modular organization provides flexibility for modifications and facilitates the adaptation and optimization of various TDL pipelines. A key feature of TopoBench is its support for transformations and lifting across topological domains. Mapping the topology and features of a graph to higher-order topological domains, such as simplicial and cell complexes, enables richer data representations and more fine-grained analyses. The applicability of TopoBench is demonstrated by benchmarking several TDL architectures across diverse tasks and datasets.
△ Less
Submitted 26 March, 2025; v1 submitted 9 June, 2024;
originally announced June 2024.