Neural Networks beyond explainability: Selective inference for sequence motifs
Authors:
Antoine VilliƩ,
Philippe Veber,
Yohann de Castro,
Laurent Jacob
Abstract:
Over the past decade, neural networks have been successful at making predictions from biological sequences, especially in the context of regulatory genomics. As in other fields of deep learning, tools have been devised to extract features such as sequence motifs that can explain the predictions made by a trained network. Here we intend to go beyond explainable machine learning and introduce SEISM,…
▽ More
Over the past decade, neural networks have been successful at making predictions from biological sequences, especially in the context of regulatory genomics. As in other fields of deep learning, tools have been devised to extract features such as sequence motifs that can explain the predictions made by a trained network. Here we intend to go beyond explainable machine learning and introduce SEISM, a selective inference procedure to test the association between these extracted features and the predicted phenotype. In particular, we discuss how training a one-layer convolutional network is formally equivalent to selecting motifs maximizing some association score. We adapt existing sampling-based selective inference procedures by quantizing this selection over an infinite set to a large but finite grid. Finally, we show that sampling under a specific choice of parameters is sufficient to characterize the composite null hypothesis typically used for selective inference-a result that goes well beyond our particular framework. We illustrate the behavior of our method in terms of calibration, power and speed and discuss its power/speed trade-off with a simpler data-split strategy. SEISM paves the way to an easier analysis of neural networks used in regulatory genomics, and to more powerful methods for genome wide association studies (GWAS).
△ Less
Submitted 23 December, 2022;
originally announced December 2022.
Detecting Inconsistencies in Large Biological Networks with Answer Set Programming
Authors:
Martin Gebser,
Torsten Schaub,
Sven Thiele,
Philippe Veber
Abstract:
We introduce an approach to detecting inconsistencies in large biological networks by using Answer Set Programming (ASP). To this end, we build upon a recently proposed notion of consistency between biochemical/genetic reactions and high-throughput profiles of cell activity. We then present an approach based on ASP to check the consistency of large-scale data sets. Moreover, we extend this methodo…
▽ More
We introduce an approach to detecting inconsistencies in large biological networks by using Answer Set Programming (ASP). To this end, we build upon a recently proposed notion of consistency between biochemical/genetic reactions and high-throughput profiles of cell activity. We then present an approach based on ASP to check the consistency of large-scale data sets. Moreover, we extend this methodology to provide explanations for inconsistencies by determining minimal representations of conflicts. In practice, this can be used to identify unreliable data or to indicate missing reactions.
△ Less
Submitted 1 July, 2010;
originally announced July 2010.