Quantitative Biology > Biomolecules
[Submitted on 2 Jun 2025 (v1), last revised 16 Jun 2025 (this version, v2)]
Title:Protein folding classes -- High-dimensional geometry of amino acid composition space revisited
View PDFAbstract:In this study, the distributions of protein structure classes (or folding types) of experimentally determined structures from a legacy dataset and a comprehensive database SCOP are modeled precisely with geometric constructs such as convex polytopes in high-dimensional amino acid composition space. This is a follow-up of a previous non-statistical, geometry-motivated modeling of protein classes with ellipsoidal models, which are superseded presently in three important respects: (1) as a paradigm shift descriptive 'distribution model' of experimental data is de-coupled from, and serves as the basis for, possible future predictive 'domain model' generalizable to proteins in the same class for which 3D structures have yet to be determined experimentally, (2) the geometric and analytic characteristics of class distributions are obtained via exact computational geometry calculations, and (3) the full data from a comprehensive database are included in such calculations, eschewing training set selection and biases. In contrast to statistical and machine-learning approaches, the analytical, non-statistical geometry models of protein class distributions demonstrated in this study furnish complete and precise information on their size and relative disposition in the high-dimensional space (vis-à-vis any overlaps leading to ambiguity and limits in classification). Intended principally as accurate and summary description of the complex relationships between amino acid composition and protein classes, and suitably as a basis for predictive modeling where permissible, the results suggest that pen-ultimately they may be useful adjuncts for validating sequence-based protein structure predictions and contribute to theoretical and fundamental understanding of secondary structure formation and protein folding, demonstrating the role of high dimensional amino acid composition space in protein studies.
Submission history
From: Boryeu Mao [view email][v1] Mon, 2 Jun 2025 16:44:02 UTC (530 KB)
[v2] Mon, 16 Jun 2025 00:18:49 UTC (593 KB)
Current browse context:
q-bio.BM
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.