Search | arXiv e-print repository

Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence

Authors: Yingying Sun, Jun A, Zhiwei Liu, Rui Sun, Liujia Qian, Samuel H. Payne, Wout Bittremieux, Markus Ralser, Chen Li, Yi Chen, Zhen Dong, Yasset Perez-Riverol, Asif Khan, Chris Sander, Ruedi Aebersold, Juan Antonio Vizcaíno, Jonathan R Krieger, Jianhua Yao, Han Wen, Linfeng Zhang, Yunping Zhu, Yue Xuan, Benjamin Boyang Sun, Liang Qiao, Henning Hermjakob , et al. (37 additional authors not shown)

Abstract: Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights.… ▽ More Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights. These include developing an AI-friendly ecosystem for proteomics data generation, sharing, and analysis; improving peptide and protein identification and quantification; characterizing protein-protein interactions and protein complexes; advancing spatial and perturbation proteomics; integrating multi-omics data; and ultimately enabling AI-empowered virtual cells. △ Less

Submitted 21 February, 2025; originally announced February 2025.

Comments: 28 pages, 2 figures, perspective in AI proteomics

arXiv:2407.15220 [pdf]

doi 10.1038/s43588-025-00832-7

Privacy-Preserving Multi-Center Differential Protein Abundance Analysis with FedProt

Authors: Yuliya Burankova, Miriam Abele, Mohammad Bakhtiari, Christine von Törne, Teresa Barth, Lisa Schweizer, Pieter Giesbertz, Johannes R. Schmidt, Stefan Kalkhof, Janina Müller-Deile, Peter A van Veelen, Yassene Mohammed, Elke Hammer, Lis Arend, Klaudia Adamowicz, Tanja Laske, Anne Hartebrodt, Tobias Frisch, Chen Meng, Julian Matschinske, Julian Späth, Richard Röttger, Veit Schwämmle, Stefanie M. Hauck, Stefan Lichtenthaler , et al. (6 additional authors not shown)

Abstract: Quantitative mass spectrometry has revolutionized proteomics by enabling simultaneous quantification of thousands of proteins. Pooling patient-derived data from multiple institutions enhances statistical power but raises significant privacy concerns. Here we introduce FedProt, the first privacy-preserving tool for collaborative differential protein abundance analysis of distributed data, which uti… ▽ More Quantitative mass spectrometry has revolutionized proteomics by enabling simultaneous quantification of thousands of proteins. Pooling patient-derived data from multiple institutions enhances statistical power but raises significant privacy concerns. Here we introduce FedProt, the first privacy-preserving tool for collaborative differential protein abundance analysis of distributed data, which utilizes federated learning and additive secret sharing. In the absence of a multicenter patient-derived dataset for evaluation, we created two, one at five centers from LFQ E.coli experiments and one at three centers from TMT human serum. Evaluations using these datasets confirm that FedProt achieves accuracy equivalent to DEqMS applied to pooled data, with completely negligible absolute differences no greater than $\text{$4 \times 10^{-12}$}$. In contrast, -log10(p-values) computed by the most accurate meta-analysis methods diverged from the centralized analysis results by up to 25-27. FedProt is available as a web tool with detailed documentation as a FeatureCloud App. △ Less

Submitted 21 July, 2024; originally announced July 2024.

Comments: 52 pages, 16 figures, 12 tables. Last two authors listed are joint last authors

arXiv:2212.13543 [pdf]

Democratising Knowledge Representation with BioCypher

Authors: Sebastian Lobentanzer, Patrick Aloy, Jan Baumbach, Balazs Bohar, Pornpimol Charoentong, Katharina Danhauser, Tunca Doğan, Johann Dreo, Ian Dunham, Adrià Fernandez-Torras, Benjamin M. Gyori, Michael Hartung, Charles Tapley Hoyt, Christoph Klein, Tamas Korcsmaros, Andreas Maier, Matthias Mann, David Ochoa, Elena Pareja-Lorente, Ferdinand Popp, Martin Preusse, Niklas Probul, Benno Schwikowski, Bünyamin Sen, Maximilian T. Strauss , et al. (4 additional authors not shown)

Abstract: Standardising the representation of biomedical knowledge among all researchers is an insurmountable task, hindering the effectiveness of many computational methods. To facilitate harmonisation and interoperability despite this fundamental challenge, we propose to standardise the framework of knowledge graph creation instead. We implement this standardisation in BioCypher, a FAIR (findable, accessi… ▽ More Standardising the representation of biomedical knowledge among all researchers is an insurmountable task, hindering the effectiveness of many computational methods. To facilitate harmonisation and interoperability despite this fundamental challenge, we propose to standardise the framework of knowledge graph creation instead. We implement this standardisation in BioCypher, a FAIR (findable, accessible, interoperable, reusable) framework to transparently build biomedical knowledge graphs while preserving provenances of the source data. Mapping the knowledge onto biomedical ontologies helps to balance the needs for harmonisation, human and machine readability, and ease of use and accessibility to non-specialist researchers. We demonstrate the usefulness of this framework on a variety of use cases, from maintenance of task-specific knowledge stores, to interoperability between biomedical domains, to on-demand building of task-specific knowledge graphs for federated learning. BioCypher (https://biocypher.org) frees up valuable developer time; we encourage further development and usage by the community. △ Less

Submitted 17 January, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

Comments: 34 pages, 6 figures; submitted to Nature Biotechnology

arXiv:1404.0270 [pdf, other]

doi 10.1093/bioinformatics/btu337

Memory efficient RNA energy landscape exploration

Authors: Martin Mann, Marcel Kucharik, Christoph Flamm, Michael T. Wolfinger

Abstract: Energy landscapes provide a valuable means for studying the folding dynamics of short RNA molecules in detail by modeling all possible structures and their transitions. Higher abstraction levels based on a macro-state decomposition of the landscape enable the study of larger systems, however they are still restricted by huge memory requirements of exact approaches. We present a highly paralleliz… ▽ More Energy landscapes provide a valuable means for studying the folding dynamics of short RNA molecules in detail by modeling all possible structures and their transitions. Higher abstraction levels based on a macro-state decomposition of the landscape enable the study of larger systems, however they are still restricted by huge memory requirements of exact approaches. We present a highly parallelizable local enumeration scheme that enables the computation of exact macro-state transition models with highly reduced memory requirements. The approach is evaluated on RNA secondary structure landscapes using a gradient basin definition for macro-states. Furthermore, we demonstrate the need for exact transition models by comparing two barrier-based appoaches and perform a detailed investigation of gradient basins in RNA energy landscapes. Source code is part of the C++ Energy Landscape Library available at http://www.bioinf.uni-freiburg.de/Software/. △ Less

Submitted 28 April, 2014; v1 submitted 1 April, 2014; originally announced April 2014.

arXiv:1304.1356 [pdf, other]

The Graph Grammar Library - a generic framework for chemical graph rewrite systems

Authors: Martin Mann, Heinz Ekker, Christoph Flamm

Abstract: Graph rewrite systems are powerful tools to model and study complex problems in various fields of research. Their successful application to chemical reaction modelling on a molecular level was shown but no appropriate and simple system is available at the moment. The presented Graph Grammar Library (GGL) implements a generic Double Push Out approach for general graph rewrite systems. The framewo… ▽ More Graph rewrite systems are powerful tools to model and study complex problems in various fields of research. Their successful application to chemical reaction modelling on a molecular level was shown but no appropriate and simple system is available at the moment. The presented Graph Grammar Library (GGL) implements a generic Double Push Out approach for general graph rewrite systems. The framework focuses on a high level of modularity as well as high performance, using state-of-the-art algorithms and data structures, and comes with extensive documentation. The large GGL chemistry module enables extensive and detailed studies of chemical systems. It well meets the requirements and abilities envisioned by Yadav et al. (2004) for such chemical rewrite systems. Here, molecules are represented as undirected labeled graphs while chemical reactions are described by according graph grammar rules. Beside the graph transformation, the GGL offers advanced cheminformatics algorithms for instance to estimate energies ofmolecules or aromaticity perception. These features are illustrated using a set of reactions from polyketide chemistry a huge class of natural compounds of medical relevance. The graph grammar based simulation of chemical reactions offered by the GGL is a powerful tool for extensive cheminformatics studies on a molecular level. The GGL already provides rewrite rules for all enzymes listed in the KEGG LIGAND database is freely available at http://www.tbi.univie.ac.at/software/GGL/. △ Less

Submitted 4 April, 2013; originally announced April 2013.

Comments: Extended version of an abstract published in proceedings of the International Conference on Model Transformation (ICMT) 2013

arXiv:1005.1853 [pdf, other]

Lattice model refinement of protein structures

Authors: Martin Mann, Alessandro Dal Palù

Abstract: To find the best lattice model representation of a given full atom protein structure is a hard computational problem. Several greedy methods have been suggested where results are usually biased and leave room for improvement. In this paper we formulate and implement a Constraint Programming method to refine such lattice structure models. We show that the approach is able to provide better quality… ▽ More To find the best lattice model representation of a given full atom protein structure is a hard computational problem. Several greedy methods have been suggested where results are usually biased and leave room for improvement. In this paper we formulate and implement a Constraint Programming method to refine such lattice structure models. We show that the approach is able to provide better quality solutions. The prototype is implemented in COLA and is based on limited discrepancy search. Finally, some promising extensions based on local search are discussed. △ Less

Submitted 10 May, 2010; originally announced May 2010.

Comments: In Proceedings of Workshop on Constraint Based Methods for Bioinformatics (WCB 2010); Jul 21, 2010; Edinburgh, UK (co-located with ICLP 2010); 7 pages

arXiv:0910.3880 [pdf, ps, other]

Constraint-based Local Move Definitions for Lattice Protein Models Including Side Chains

Authors: Martin Mann, Mohamed Abou Hamra, Kathleen Steinhöfel, Rolf Backofen

Abstract: The simulation of a protein's folding process is often done via stochastic local search, which requires a procedure to apply structural changes onto a given conformation. Here, we introduce a constraint-based approach to enumerate lattice protein structures according to k-local moves in arbitrary lattices. Our declarative description is much more flexible for extensions than standard operational… ▽ More The simulation of a protein's folding process is often done via stochastic local search, which requires a procedure to apply structural changes onto a given conformation. Here, we introduce a constraint-based approach to enumerate lattice protein structures according to k-local moves in arbitrary lattices. Our declarative description is much more flexible for extensions than standard operational formulations. It enables a generic calculation of k-local neighbors in backbone-only and side chain models. We exemplify the procedure using a simple hierarchical folding scheme. △ Less

Submitted 20 October, 2009; originally announced October 2009.

Comments: Published in Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB09), 2009, 10 pages

ACM Class: D.1.6; G.2.1; J.2; J.3

Journal ref: In Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB09), 2009, Lisbon

arXiv:0910.3848 [pdf, ps, other]

Equivalence Classes of Optimal Structures in HP Protein Models Including Side Chains

Authors: Martin Mann, Rolf Backofen, Sebastian Will

Abstract: Lattice protein models, as the Hydrophobic-Polar (HP) model, are a common abstraction to enable exhaustive studies on structure, function, or evolution of proteins. A main issue is the high number of optimal structures, resulting from the hydrophobicity-based energy function applied. We introduce an equivalence relation on protein structures that correlates to the energy function. We discuss the… ▽ More Lattice protein models, as the Hydrophobic-Polar (HP) model, are a common abstraction to enable exhaustive studies on structure, function, or evolution of proteins. A main issue is the high number of optimal structures, resulting from the hydrophobicity-based energy function applied. We introduce an equivalence relation on protein structures that correlates to the energy function. We discuss the efficient enumeration of optimal representatives of the corresponding equivalence classes and the application of the results. △ Less

Submitted 20 October, 2009; originally announced October 2009.

Comments: Published in Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB09), 2009, 9 pages

ACM Class: D.1.6; G.2.1; J.2; J.3

Journal ref: In Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB09), 2009, Lisbon

arXiv:0910.2559 [pdf, ps, other]

doi 10.1103/PhysRevE.83.011113

Efficient exploration of discrete energy landscapes

Authors: Martin Mann, Konstantin Klemm

Abstract: Many physical and chemical processes, such as folding of biopolymers, are best described as dynamics on large combinatorial energy landscapes. A concise approximate description of dynamics is obtained by partitioning the micro-states of the landscape into macro-states. Since most landscapes of interest are not tractable analytically, the probabilities of transitions between macro-states need to be… ▽ More Many physical and chemical processes, such as folding of biopolymers, are best described as dynamics on large combinatorial energy landscapes. A concise approximate description of dynamics is obtained by partitioning the micro-states of the landscape into macro-states. Since most landscapes of interest are not tractable analytically, the probabilities of transitions between macro-states need to be extracted numerically from the microscopic ones, typically by full enumeration of the state space. Here we propose to approximate transition probabilities by a Markov chain Monte-Carlo method. For landscapes of the number partitioning problem and an RNA switch molecule we show that the method allows for accurate probability estimates with significantly reduced computational cost. △ Less

Submitted 18 January, 2011; v1 submitted 14 October, 2009; originally announced October 2009.

Comments: 7 pages, 5 figures

Journal ref: Physical Review E 83, 011113 (2011)

Showing 1–9 of 9 results for author: Mann, M