-
PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset
Authors:
Liangrui Pan,
Qingchun Liang,
Shen Zhao,
Songqing Fan,
Shaoliang Peng
Abstract:
Accurately predicting gene mutations, mutation subtypes and their exons in lung cancer is critical for personalized treatment planning and prognostic assessment. Faced with regional disparities in medical resources and the high cost of genomic assays, using artificial intelligence to infer these mutations and exon variants from routine histopathology images could greatly facilitate precision thera…
▽ More
Accurately predicting gene mutations, mutation subtypes and their exons in lung cancer is critical for personalized treatment planning and prognostic assessment. Faced with regional disparities in medical resources and the high cost of genomic assays, using artificial intelligence to infer these mutations and exon variants from routine histopathology images could greatly facilitate precision therapy. Although some prior studies have shown that deep learning can accelerate the prediction of key gene mutations from lung cancer pathology slides, their performance remains suboptimal and has so far been limited mainly to early screening tasks. To address these limitations, we have assembled PathGene, which comprises histopathology images paired with next-generation sequencing reports from 1,576 patients at the Second Xiangya Hospital, Central South University, and 448 TCGA-LUAD patients. This multi-center dataset links whole-slide images to driver gene mutation status, mutation subtypes, exon, and tumor mutational burden (TMB) status, with the goal of leveraging pathology images to predict mutations, subtypes, exon locations, and TMB for early genetic screening and to advance precision oncology. Unlike existing datasets, we provide molecular-level information related to histopathology images in PathGene to facilitate the development of biomarker prediction models. We benchmarked 11 multiple-instance learning methods on PathGene for mutation, subtype, exon, and TMB prediction tasks. These experimental methods provide valuable alternatives for early genetic screening of lung cancer patients and assisting clinicians to quickly develop personalized precision targeted treatment plans for patients. Code and data are available at https://github.com/panliangrui/NIPS2025/.
△ Less
Submitted 30 May, 2025;
originally announced June 2025.
-
Quantitative analysis of cell size control mechanisms
Authors:
Shuqi Fan,
Jinzhi Lei
Abstract:
Cell size control is crucial for maintaining cellular function and homeostasis. In this study, we develop a first-order partial differential equation model to examine the effects of three key size control mechanisms: the sizer, timer, and adder. Each mechanism is incorporated into the model through distinct boundary conditions. Exact solutions for these mechanisms are derived using the method of c…
▽ More
Cell size control is crucial for maintaining cellular function and homeostasis. In this study, we develop a first-order partial differential equation model to examine the effects of three key size control mechanisms: the sizer, timer, and adder. Each mechanism is incorporated into the model through distinct boundary conditions. Exact solutions for these mechanisms are derived using the method of characteristics, allowing us to explore how the steady-state size distribution depends on control parameters. Additionally, individual-cell-based stochastic simulations are performed to validate our theoretical findings and investigate the size distribution under various conditions. This study provides new insights into the quantitative dynamics of cell size regulation, highlighting the underlying mechanisms and laying the groundwork for future theoretical and experimental work on size homeostasis in biological systems.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
PharMolixFM: All-Atom Foundation Models for Molecular Modeling and Generation
Authors:
Yizhen Luo,
Jiashuo Wang,
Siqi Fan,
Zaiqing Nie
Abstract:
Structural biology relies on accurate three-dimensional biomolecular structures to advance our understanding of biological functions, disease mechanisms, and therapeutics. While recent advances in deep learning have enabled the development of all-atom foundation models for molecular modeling and generation, existing approaches face challenges in generalization due to the multi-modal nature of atom…
▽ More
Structural biology relies on accurate three-dimensional biomolecular structures to advance our understanding of biological functions, disease mechanisms, and therapeutics. While recent advances in deep learning have enabled the development of all-atom foundation models for molecular modeling and generation, existing approaches face challenges in generalization due to the multi-modal nature of atomic data and the lack of comprehensive analysis of training and sampling strategies. To address these limitations, we propose PharMolixFM, a unified framework for constructing all-atom foundation models based on multi-modal generative techniques. Our framework includes three variants using state-of-the-art multi-modal generative models. By formulating molecular tasks as a generalized denoising process with task-specific priors, PharMolixFM achieves robust performance across various structural biology applications. Experimental results demonstrate that PharMolixFM-Diff achieves competitive prediction accuracy in protein-small-molecule docking (83.9% vs. 90.2% RMSD < 2Å, given pocket) with significantly improved inference speed. Moreover, we explore the empirical inference scaling law by introducing more sampling repeats or steps. Our code and model are available at https://github.com/PharMolix/OpenBioMed.
△ Less
Submitted 31 March, 2025; v1 submitted 12 March, 2025;
originally announced March 2025.
-
The Mitochondrial Genome of Cathaya argyrophylla Reaches 18.99 Mb: Analysis of Super-Large Mitochondrial Genomes in Pinaceae
Authors:
Kerui Huang,
Wenbo Xu,
Haoliang Hu,
Xiaolong Jiang,
Lei Sun,
Wenyan Zhao,
Binbin Long,
Shaogang Fan,
Zhibo Zhou,
Ping Mo,
Xiaocheng Jiang,
Jianhong Tian,
Aihua Deng,
Peng Xie,
Yun Wang
Abstract:
Mitochondrial genomes in the Pinaceae family are notable for their large size and structural complexity. In this study, we sequenced and analyzed the mitochondrial genome of Cathaya argyrophylla, an endangered and endemic Pinaceae species, uncovering a genome size of 18.99 Mb, meaning the largest mitochondrial genome reported to date. To investigate the mechanisms behind this exceptional size, we…
▽ More
Mitochondrial genomes in the Pinaceae family are notable for their large size and structural complexity. In this study, we sequenced and analyzed the mitochondrial genome of Cathaya argyrophylla, an endangered and endemic Pinaceae species, uncovering a genome size of 18.99 Mb, meaning the largest mitochondrial genome reported to date. To investigate the mechanisms behind this exceptional size, we conducted comparative analyses with other Pinaceae species possessing both large and small mitochondrial genomes, as well as with other gymnosperms. We focused on repeat sequences, transposable element activity, RNA editing events, chloroplast-derived sequence transfers (mtpts), and sequence homology with nuclear genomes. Our findings indicate that while Cathaya argyrophylla and other extremely large Pinaceae mitochondrial genomes contain substantial amounts of repeat sequences and show increased activity of LINEs and LTR retrotransposons, these factors alone do not fully account for the genome expansion. Notably, we observed a significant incorporation of chloroplast-derived sequences in Cathaya argyrophylla and other large mitochondrial genomes, suggesting that extensive plastid-to-mitochondrial DNA transfer may play a crucial role in genome enlargement. Additionally, large mitochondrial genomes exhibited distinct patterns of RNA editing and limited similarity with nuclear genomes compared to smaller genomes. These results suggest that the massive mitochondrial genomes in Pinaceae are likely the result of multiple contributing factors, including repeat sequences, transposon activity, and extensive plastid sequence incorporation. Our study enhances the understanding of mitochondrial genome evolution in plants and provides valuable genetic information for the conservation and study of Cathaya argyrophylla.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Hypergraph Models of Biological Networks to Identify Genes Critical to Pathogenic Viral Response
Authors:
Song Feng,
Emily Heath,
Brett Jefferson,
Cliff Joslyn,
Henry Kvinge,
Hugh D. Mitchell,
Brenda Praggastis,
Amie J. Eisfeld,
Amy C. Sims,
Larissa B. Thackray,
Shufang Fan,
Kevin B. Walters,
Peter J. Halfmann,
Danielle Westhoff-Smith,
Qing Tan,
Vineet D. Menachery,
Timothy P. Sheahan,
Adam S. Cockrell,
Jacob F. Kocher,
Kelly G. Stratton,
Natalie C. Heller,
Lisa M. Bramer,
Michael S. Diamond,
Ralph S. Baric,
Katrina M. Waters
, et al. (3 additional authors not shown)
Abstract:
Background: Representing biological networks as graphs is a powerful approach to reveal underlying patterns, signatures, and critical components from high-throughput biomolecular data. However, graphs do not natively capture the multi-way relationships present among genes and proteins in biological systems. Hypergraphs are generalizations of graphs that naturally model multi-way relationships and…
▽ More
Background: Representing biological networks as graphs is a powerful approach to reveal underlying patterns, signatures, and critical components from high-throughput biomolecular data. However, graphs do not natively capture the multi-way relationships present among genes and proteins in biological systems. Hypergraphs are generalizations of graphs that naturally model multi-way relationships and have shown promise in modeling systems such as protein complexes and metabolic reactions. In this paper we seek to understand how hypergraphs can more faithfully identify, and potentially predict, important genes based on complex relationships inferred from genomic expression data sets.
Results: We compiled a novel data set of transcriptional host response to pathogenic viral infections and formulated relationships between genes as a hypergraph where hyperedges represent significantly perturbed genes, and vertices represent individual biological samples with specific experimental conditions. We find that hypergraph betweenness centrality is a superior method for identification of genes important to viral response when compared with graph centrality.
Conclusions: Our results demonstrate the utility of using hypergraphs to represent complex biological systems and highlight central important responses in common to a variety of highly pathogenic viruses.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Ultrasmall Au10-12(SG)10-12 Nanomolecules for High Tumor Specificity and Cancer Radiotherapy
Authors:
Xiao-Dong Zhang,
Zhentao Luo,
Jie Chen,
Xiu Shen,
Shasha Song,
Yuanming Sun,
Saijun Fan,
Feiyue Fan,
David Tai Leong,
Jianping Xie
Abstract:
Radiosensitizers can increase the local treatment efficacy under a relatively low and safe radiation dose, thereby facilitating tumor eradication and minimizing side effects. Here, we report a new class of radiosensitizers that contain several gold (Au) atoms embedded inside a peptide shell (e.g., Au10-12(SG)10-12) and can achieve ultrahigh tumor uptake (10.86 SUV at 24 h post injection) and targe…
▽ More
Radiosensitizers can increase the local treatment efficacy under a relatively low and safe radiation dose, thereby facilitating tumor eradication and minimizing side effects. Here, we report a new class of radiosensitizers that contain several gold (Au) atoms embedded inside a peptide shell (e.g., Au10-12(SG)10-12) and can achieve ultrahigh tumor uptake (10.86 SUV at 24 h post injection) and targeting specificity, efficient renal clearance, and high radiotherapy enhancement.
△ Less
Submitted 12 May, 2014;
originally announced May 2014.
-
In Vivo Renal Clearance, Biodistribution, Toxicity of Gold nanoclusters
Authors:
Xiao-Dong Zhang,
Di Wu,
Xiu Shen,
Pei-Xun Liu,
Fei-Yue Fan,
Sai-Jun Fan
Abstract:
Gold nanoparticles have shown great prospective in cancer diagnosis and therapy, but they can not be metabolized and prefer to accumulate in liver and spleen due to their large size. The gold nanoclusters with small size can penetrate kidney tissue and have promise to decrease in vivo toxicity by renal clearance. In this work, we explore the in vivo renal clearance, biodistribution, and toxicity r…
▽ More
Gold nanoparticles have shown great prospective in cancer diagnosis and therapy, but they can not be metabolized and prefer to accumulate in liver and spleen due to their large size. The gold nanoclusters with small size can penetrate kidney tissue and have promise to decrease in vivo toxicity by renal clearance. In this work, we explore the in vivo renal clearance, biodistribution, and toxicity responses of the BSA- and GSH-protected gold nanoclusters for 24 hours and 28 days. The BSA-protected gold nanoclusters have low-efficient renal clearance and only 1% of gold can be cleared, but the GSH-protected gold nanoclusters have high-efficient renal clearance and 36 % of gold can be cleared after 24 hours. The biodistribution further reveals that 94% of gold can be metabolized for the GSH-protected nanoclusters, but only less than 5% of gold can be metabolized for the BSA-protected nanoclusters after 28 days. Both of the GSH- and BSA-protected gold nanoclusters cause acute infection, inflammation, and kidney function damage after 24 hours, but these toxicity responses for the GSH-protected gold nanoclusters can be eliminated after 28 days. Immune system can also be affected by the two kinds of gold nanoclusters, but the immune response for the GSH-protected gold nanoclusters can also be recovered after 28 days. These findings show that the GSH-protected gold nanoclusters have small size and can be metabolized by renal clearance and thus the toxicity can be significantly decreased. The BSA- protected gold nanoclusters, however, can form large compounds and further accumulate in liver and spleen which can cause irreparable toxicity response. Therefore, the GSH-protected gold nanoclusters have great potential for in vivo imaging and therapy, and the BSA-protected gold nanoclusters can be used as the agent of liver cancer therapy.
△ Less
Submitted 27 September, 2012;
originally announced October 2012.
-
Multiplexed five-color molecular imaging of cancer cells and tumor tissues with carbon nanotube Raman tags in the near-infrared
Authors:
Zhuang Liu,
Scott Tabakman,
Sarah Sherlock,
Xiaolin Li,
Zhuo Chen,
Kaili Jiang,
Shoushan Fan,
Hongjie Dai
Abstract:
Single-walled carbon nanotubes (SWNTs) with five different C13/C12 isotope compositions and well-separated Raman peaks have been synthesized and conjugated to five targeting ligands in order to impart molecular specificity. Multiplexed Raman imaging of live cells has been carried out by highly specific staining of cells with a five-color mixture of SWNTs. Ex vivo multiplexed Raman imaging of tumor…
▽ More
Single-walled carbon nanotubes (SWNTs) with five different C13/C12 isotope compositions and well-separated Raman peaks have been synthesized and conjugated to five targeting ligands in order to impart molecular specificity. Multiplexed Raman imaging of live cells has been carried out by highly specific staining of cells with a five-color mixture of SWNTs. Ex vivo multiplexed Raman imaging of tumor samples uncovers a surprising up-regulation of epidermal growth factor receptor (EGFR) on LS174T colon cancer cells from cell culture to in vivo tumor growth. This is the first time five-color multiplexed molecular imaging has been performed in the near-infrared (NIR) region under a single laser excitation. Near zero interfering background of imaging is achieved due to the sharp Raman peaks unique to nanotubes over the low, smooth autofluorescence background of biological species.
△ Less
Submitted 30 March, 2010;
originally announced March 2010.