-
MODIS: Multi-Omics Data Integration for Small and Unpaired Datasets
Authors:
Daniel Lepe-Soltero,
Thierry Artières,
Anaïs Baudot,
Paul Villoutreix
Abstract:
A key challenge today lies in the ability to efficiently handle multi-omics data since such multimodal data may provide a more comprehensive overview of the underlying processes in a system. Yet it comes with challenges: multi-omics data are most often unpaired and only partially labeled, moreover only small amounts of data are available in some situation such as rare diseases. We propose MODIS wh…
▽ More
A key challenge today lies in the ability to efficiently handle multi-omics data since such multimodal data may provide a more comprehensive overview of the underlying processes in a system. Yet it comes with challenges: multi-omics data are most often unpaired and only partially labeled, moreover only small amounts of data are available in some situation such as rare diseases. We propose MODIS which stands for Multi-Omics Data Integration for Small and unpaired datasets, a semi supervised approach to account for these particular settings. MODIS learns a probabilistic coupling of heterogeneous data modalities and learns a shared latent space where modalities are aligned. We rely on artificial data to build controlled experiments to explore how much supervision is needed for an accurate alignment of modalities, and how our approach enables dealing with new conditions for which few data are available. The code is available athttps://github.com/VILLOUTREIXLab/MODIS.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Hierarchical novel class discovery for single-cell transcriptomic profiles
Authors:
Malek Senoussi,
Thierry Artières,
Paul Villoutreix
Abstract:
One of the major challenges arising from single-cell transcriptomics experiments is the question of how to annotate the associated single-cell transcriptomic profiles. Because of the large size and the high dimensionality of the data, automated methods for annotation are needed. We focus here on datasets obtained in the context of developmental biology, where the differentiation process leads to a…
▽ More
One of the major challenges arising from single-cell transcriptomics experiments is the question of how to annotate the associated single-cell transcriptomic profiles. Because of the large size and the high dimensionality of the data, automated methods for annotation are needed. We focus here on datasets obtained in the context of developmental biology, where the differentiation process leads to a hierarchical structure. We consider a frequent setting where both labeled and unlabeled data are available at training time, but the sets of the labels of labeled data on one side and of the unlabeled data on the other side, are disjoint. It is an instance of the Novel Class Discovery problem. The goal is to achieve two objectives, clustering the data and mapping the clusters with labels. We propose extensions of k-Means and GMM clustering methods for solving the problem and report comparative results on artificial and experimental transcriptomic datasets. Our approaches take advantage of the hierarchical nature of the data.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Random walk informed community detection reveals heterogeneities in the lymph node conduits network
Authors:
Solène Song,
Malek Senoussi,
Paul Escande,
Paul Villoutreix
Abstract:
Random walks on networks are widely used to model stochastic processes such as search strategies, transportation problems or disease propagation. A prominent example of such process is the guiding of naive T cells by the lymph node conduits network. Here,we propose a general framework to find network heterogeneities, which we define as connectivity patterns that affect the random walk. We propose…
▽ More
Random walks on networks are widely used to model stochastic processes such as search strategies, transportation problems or disease propagation. A prominent example of such process is the guiding of naive T cells by the lymph node conduits network. Here,we propose a general framework to find network heterogeneities, which we define as connectivity patterns that affect the random walk. We propose to characterize and measure these heterogeneities by detecting communities in a way that is interpretable in terms of random walk. Moreover, we use an approximation to accurately and efficiently compute these quantities on large networks. Finally, we propose an interactive data visualization platform to follow the dynamics of the random walks and their characteristics on our datasets, and a ready-to-use pipeline for other datasets upon download. By computing quantitative feature of random walk informed communities detected within the network, we show that the lymph node conduit network is spatially coherent, however, despite its quasi-regularity, contains some random walk related heterogeneities. To evaluate these characteristics, we applied the same workflow of diffusion based community detection and analysis on the LNCN and a series of generated toy networks.
△ Less
Submitted 20 October, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Cross-view kernel transfer
Authors:
Riikka Huusari,
Cécile Capponi,
Paul Villoutreix,
Hachem Kadri
Abstract:
We consider the kernel completion problem with the presence of multiple views in the data. In this context the data samples can be fully missing in some views, creating missing columns and rows to the kernel matrices that are calculated individually for each view. We propose to solve the problem of completing the kernel matrices with Cross-View Kernel Transfer (CVKT) procedure, in which the featur…
▽ More
We consider the kernel completion problem with the presence of multiple views in the data. In this context the data samples can be fully missing in some views, creating missing columns and rows to the kernel matrices that are calculated individually for each view. We propose to solve the problem of completing the kernel matrices with Cross-View Kernel Transfer (CVKT) procedure, in which the features of the other views are transformed to represent the view under consideration. The transformations are learned with kernel alignment to the known part of the kernel matrix, allowing for finding generalizable structures in the kernel matrix under completion. Its missing values can then be predicted with the data available in other views. We illustrate the benefits of our approach with simulated data, multivariate digits dataset and multi-view dataset on gesture classification, as well as with real biological datasets from studies of pattern formation in early \textit{Drosophila melanogaster} embryogenesis.
△ Less
Submitted 31 May, 2022; v1 submitted 14 October, 2019;
originally announced October 2019.