FAPS: A Fast Platform for Protein Structureomics Analysis
Authors:
Lucas Wilken,
Nihjum Paul,
Troy Timmerman,
Sara A. Tolba,
Amara Arshad,
Di Wu,
Wenjie Xia,
Bakhtiyor Rasulev,
Rick Jansen,
Dali Sun
Abstract:
Protein quantification and analysis are well-accepted approaches for biomarker discovery but are limited to identification without structural information. High-throughput omics data (i.e., genomics, transcriptomics, and proteomics) have become pervasive in cancer biology studies and reach well beyond more specialized areas such as metabolomics, epigenomics, pharmacogenomics, and interact-omics. Ho…
▽ More
Protein quantification and analysis are well-accepted approaches for biomarker discovery but are limited to identification without structural information. High-throughput omics data (i.e., genomics, transcriptomics, and proteomics) have become pervasive in cancer biology studies and reach well beyond more specialized areas such as metabolomics, epigenomics, pharmacogenomics, and interact-omics. However, large-scale analysis based on the structure of the biomolecules, namely structure-omics, is still underexplored due to a lack of handy tools. In response, we developed the Fast Analysis of Protein Structure (FAPS) database, a platform designed to advance quantitative proteomics to structure-omics analysis, which significantly shortens large-scale structure-omics from weeks to seconds. FAPS can serve as a new protein secondary structure database, providing a centralized and functional database for both simulated and experimentally determined bioinformatics statistics relating to secondary structure. Stored data is generated both through the structure simulation, currently SWISS-MODEL and AlphaFold, performed by high-performance computers, and the pre-existing UniProt database. FAPS provides user-friendly features that create a straightforward and effective way of accessing accurate data on the proportion of secondary structure in different protein chains, providing a fast numerical and visual reference for protein structure calculations and analysis. FAPS is accessible through http://fapsdb.org.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
Computing phylogenetic invariants for time-reversible models: from TN93 to its submodels
Authors:
Marta Casanellas,
Jennifer Garbett,
Roser Homs,
Annachiara Korchmaros,
Niharika Chakrabarty Paul
Abstract:
Phylogenetic invariants are equations that vanish on algebraic varieties associated with Markov processes that model molecular substitutions on phylogenetic trees. For practical applications, it is essential to understand these equations across a wide range of substitution models. Recent work has shown that, for equivariant models, phylogenetic invariants can be derived from those of the general M…
▽ More
Phylogenetic invariants are equations that vanish on algebraic varieties associated with Markov processes that model molecular substitutions on phylogenetic trees. For practical applications, it is essential to understand these equations across a wide range of substitution models. Recent work has shown that, for equivariant models, phylogenetic invariants can be derived from those of the general Markov model by restricting to the linear space defined by the model (namely, the space of mixtures of distributions on the model). Following this philosophy, we describe the space of mixtures and phylogenetic invariants for time-reversible models that are not equivariant. Specifically, we study two submodels of the Tamura-Nei nucleotide substitution model (Felsenstein 81 and 84) using an orthogonal change of basis recently introduced for algebraic time-reversible models.
For tripods, we prove that the algebraic variety of each submodel coincides with the variety of Tamura-Nei intersected with the linear space of the submodel. In the case of quartets, we show that it is an irreducible component of this intersection. Moreover, we demonstrate that it suffices to consider only the binomial equations defining the linear space, which correspond to the natural symmetries of the model in the new coordinates. For each submodel, we explicitly provide equations defining a local complete intersection that characterizes the phylogenetic variety on a dense open subset containing the biologically relevant points.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.