Search | arXiv e-print repository

CaliciBoost: Performance-Driven Evaluation of Molecular Representations for Caco-2 Permeability Prediction

Authors: Huong Van Le, Weibin Ren, Junhong Kim, Yukyung Yun, Young Bin Park, Young Jun Kim, Bok Kyung Han, Inho Choi, Jong IL Park, Hwi-Yeol Yun, Jae-Mun Choi

Abstract: Caco-2 permeability serves as a critical in vitro indicator for predicting the oral absorption of drug candidates during early-stage drug discovery. To enhance the accuracy and efficiency of computational predictions, we systematically investigated the impact of eight molecular feature representation types including 2D/3D descriptors, structural fingerprints, and deep learning-based embeddings com… ▽ More Caco-2 permeability serves as a critical in vitro indicator for predicting the oral absorption of drug candidates during early-stage drug discovery. To enhance the accuracy and efficiency of computational predictions, we systematically investigated the impact of eight molecular feature representation types including 2D/3D descriptors, structural fingerprints, and deep learning-based embeddings combined with automated machine learning techniques to predict Caco-2 permeability. Using two datasets of differing scale and diversity (TDC benchmark and curated OCHEM data), we assessed model performance across representations and identified PaDEL, Mordred, and RDKit descriptors as particularly effective for Caco-2 prediction. Notably, the AutoML-based model CaliciBoost achieved the best MAE performance. Furthermore, for both PaDEL and Mordred representations, the incorporation of 3D descriptors resulted in a 15.73% reduction in MAE compared to using 2D features alone, as confirmed by feature importance analysis. These findings highlight the effectiveness of AutoML approaches in ADMET modeling and offer practical guidance for feature selection in data-limited prediction tasks. △ Less

Submitted 9 June, 2025; originally announced June 2025.

Comments: 49 pages, 11 figures

arXiv:2501.17207 [pdf, other]

Rethinking Functional Brain Connectome Analysis: Do Graph Deep Learning Models Help?

Authors: Keqi Han, Yao Su, Lifang He, Liang Zhan, Sergey Plis, Vince Calhoun, Carl Yang

Abstract: Functional brain connectome is crucial for deciphering the neural mechanisms underlying cognitive functions and neurological disorders. Graph deep learning models have recently gained tremendous popularity in this field. However, their actual effectiveness in modeling the brain connectome remains unclear. In this study, we re-examine graph deep learning models based on four large-scale neuroimagin… ▽ More Functional brain connectome is crucial for deciphering the neural mechanisms underlying cognitive functions and neurological disorders. Graph deep learning models have recently gained tremendous popularity in this field. However, their actual effectiveness in modeling the brain connectome remains unclear. In this study, we re-examine graph deep learning models based on four large-scale neuroimaging studies encompassing diverse cognitive and clinical outcomes. Surprisingly, we find that the message aggregation mechanism, a hallmark of graph deep learning models, does not help with predictive performance as typically assumed, but rather consistently degrades it. To address this issue, we propose a hybrid model combining a linear model with a graph attention network through dual pathways, achieving robust predictions and enhanced interpretability by revealing both localized and global neural connectivity patterns. Our findings urge caution in adopting complex deep learning models for functional brain connectome analysis, emphasizing the need for rigorous experimental designs to establish tangible performance gains and perhaps more importantly, to pursue improvements in model interpretability. △ Less

Submitted 28 January, 2025; originally announced January 2025.

Comments: 22 pages, 6 figures

arXiv:2407.12337 [pdf]

doi 10.1126/sciadv.ads2757

Virtual Gram staining of label-free bacteria using darkfield microscopy and deep learning

Authors: Cagatay Isil, Hatice Ceylan Koydemir, Merve Eryilmaz, Kevin de Haan, Nir Pillar, Koray Mentesoglu, Aras Firat Unal, Yair Rivenson, Sukantha Chandrasekaran, Omai B. Garner, Aydogan Ozcan

Abstract: Gram staining has been one of the most frequently used staining protocols in microbiology for over a century, utilized across various fields, including diagnostics, food safety, and environmental monitoring. Its manual procedures make it vulnerable to staining errors and artifacts due to, e.g., operator inexperience and chemical variations. Here, we introduce virtual Gram staining of label-free ba… ▽ More Gram staining has been one of the most frequently used staining protocols in microbiology for over a century, utilized across various fields, including diagnostics, food safety, and environmental monitoring. Its manual procedures make it vulnerable to staining errors and artifacts due to, e.g., operator inexperience and chemical variations. Here, we introduce virtual Gram staining of label-free bacteria using a trained deep neural network that digitally transforms darkfield images of unstained bacteria into their Gram-stained equivalents matching brightfield image contrast. After a one-time training effort, the virtual Gram staining model processes an axial stack of darkfield microscopy images of label-free bacteria (never seen before) to rapidly generate Gram staining, bypassing several chemical steps involved in the conventional staining process. We demonstrated the success of the virtual Gram staining workflow on label-free bacteria samples containing Escherichia coli and Listeria innocua by quantifying the staining accuracy of the virtual Gram staining model and comparing the chromatic and morphological features of the virtually stained bacteria against their chemically stained counterparts. This virtual bacteria staining framework effectively bypasses the traditional Gram staining protocol and its challenges, including stain standardization, operator errors, and sensitivity to chemical variations. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 25 Pages, 5 Figures

Journal ref: Science Advances (2025)

arXiv:2311.11046 [pdf]

Classification of Major Depressive Disorder Using Vertex-Wise Brain Sulcal Depth, Curvature, and Thickness with a Deep and a Shallow Learning Model

Authors: Roberto Goya-Maldonado, Tracy Erwin-Grabner, Ling-Li Zeng, Christopher R. K. Ching, Andre Aleman, Alyssa R. Amod, Zeynep Basgoze, Francesco Benedetti, Bianca Besteher, Katharina Brosch, Robin Bülow, Romain Colle, Colm G. Connolly, Emmanuelle Corruble, Baptiste Couvy-Duchesne, Kathryn Cullen, Udo Dannlowski, Christopher G. Davey, Annemiek Dols, Jan Ernsting, Jennifer W. Evans, Lukas Fisch, Paola Fuentes-Claramonte, Ali Saffet Gonul, Ian H. Gotlib , et al. (62 additional authors not shown)

Abstract: Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, h… ▽ More Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, has the potential to provide diagnostic and predictive biomarkers for MDD. However, previous attempts to demarcate MDD patients and healthy controls (HC) based on segmented cortical features via linear machine learning approaches have reported low accuracies. Here, we used globally representative data from the ENIGMA-MDD working group containing 7,012 participants from 30 sites (N=2,772 MDD and N=4,240 HC), which allows a comprehensive analysis with generalizable results. Based on the hypothesis that integration of vertex-wise cortical features can improve classification performance, we evaluated the classification of a DenseNet and a Support Vector Machine (SVM), with the expectation that the former would outperform the latter. We found that both classifiers exhibited close to chance performance (balanced accuracy DenseNet: 51%; SVM: 53%), when estimated on unseen sites. Slightly higher classification performance (balanced accuracy DenseNet: 58%; SVM: 55%) was found when the cross-validation folds contained subjects from all sites, indicating site effect. In conclusion, the integration of vertex-wise morphometric features and the use of the non-linear classifier did not lead to the differentiability between MDD and HC. Our results support the notion that MDD classification on this combination of such features and classifiers is unfeasible. Perhaps more sophisticated integration of multimodal information may lead to a higher performance in this diagnostic task. △ Less

Submitted 24 January, 2025; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: arXiv admin note: text overlap with arXiv:2206.08122

arXiv:2307.04603 [pdf, other]

Solvent: A Framework for Protein Folding

Authors: Jaemyung Lee, Kyeongtak Han, Jaehoon Kim, Hasun Yu, Youhan Lee

Abstract: Consistency and reliability are crucial for conducting AI research. Many famous research fields, such as object detection, have been compared and validated with solid benchmark frameworks. After AlphaFold2, the protein folding task has entered a new phase, and many methods are proposed based on the component of AlphaFold2. The importance of a unified research framework in protein folding contains… ▽ More Consistency and reliability are crucial for conducting AI research. Many famous research fields, such as object detection, have been compared and validated with solid benchmark frameworks. After AlphaFold2, the protein folding task has entered a new phase, and many methods are proposed based on the component of AlphaFold2. The importance of a unified research framework in protein folding contains implementations and benchmarks to consistently and fairly compare various approaches. To achieve this, we present Solvent, a protein folding framework that supports significant components of state-of-the-art models in the manner of an off-the-shelf interface Solvent contains different models implemented in a unified codebase and supports training and evaluation for defined models on the same dataset. We benchmark well-known algorithms and their components and provide experiments that give helpful insights into the protein structure modeling field. We hope that Solvent will increase the reliability and consistency of proposed models and give efficiency in both speed and costs, resulting in acceleration on protein folding modeling research. The code is available at https://github.com/kakaobrain/solvent, and the project will continue to be developed. △ Less

Submitted 31 July, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: preprint, 9pages

arXiv:2207.06578 [pdf]

doi 10.1021/acsphotonics.2c00932

Virtual stain transfer in histology via cascaded deep neural networks

Authors: Xilin Yang, Bijie Bai, Yijie Zhang, Yuzhu Li, Kevin de Haan, Tairan Liu, Aydogan Ozcan

Abstract: Pathological diagnosis relies on the visual inspection of histologically stained thin tissue specimens, where different types of stains are applied to bring contrast to and highlight various desired histological features. However, the destructive histochemical staining procedures are usually irreversible, making it very difficult to obtain multiple stains on the same tissue section. Here, we demon… ▽ More Pathological diagnosis relies on the visual inspection of histologically stained thin tissue specimens, where different types of stains are applied to bring contrast to and highlight various desired histological features. However, the destructive histochemical staining procedures are usually irreversible, making it very difficult to obtain multiple stains on the same tissue section. Here, we demonstrate a virtual stain transfer framework via a cascaded deep neural network (C-DNN) to digitally transform hematoxylin and eosin (H&E) stained tissue images into other types of histological stains. Unlike a single neural network structure which only takes one stain type as input to digitally output images of another stain type, C-DNN first uses virtual staining to transform autofluorescence microscopy images into H&E and then performs stain transfer from H&E to the domain of the other stain in a cascaded manner. This cascaded structure in the training phase allows the model to directly exploit histochemically stained image data on both H&E and the target special stain of interest. This advantage alleviates the challenge of paired data acquisition and improves the image quality and color accuracy of the virtual stain transfer from H&E to another stain. We validated the superior performance of this C-DNN approach using kidney needle core biopsy tissue sections and successfully transferred the H&E-stained tissue images into virtual PAS (periodic acid-Schiff) stain. This method provides high-quality virtual images of special stains using existing, histochemically stained slides and creates new opportunities in digital pathology by performing highly accurate stain-to-stain transformations. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: 14 Pages, 4 Figures, 1 Table

Journal ref: ACS Photonics (2022)

arXiv:2112.05240 [pdf]

doi 10.34133/2022/9786242

Label-free virtual HER2 immunohistochemical staining of breast tissue using deep learning

Authors: Bijie Bai, Hongda Wang, Yuzhu Li, Kevin de Haan, Francesco Colonnese, Yujie Wan, Jingyi Zuo, Ngan B. Doan, Xiaoran Zhang, Yijie Zhang, Jingxi Li, Wenjie Dong, Morgan Angus Darrow, Elham Kamangar, Han Sung Lee, Yair Rivenson, Aydogan Ozcan

Abstract: The immunohistochemical (IHC) staining of the human epidermal growth factor receptor 2 (HER2) biomarker is widely practiced in breast tissue analysis, preclinical studies and diagnostic decisions, guiding cancer treatment and investigation of pathogenesis. HER2 staining demands laborious tissue treatment and chemical processing performed by a histotechnologist, which typically takes one day to pre… ▽ More The immunohistochemical (IHC) staining of the human epidermal growth factor receptor 2 (HER2) biomarker is widely practiced in breast tissue analysis, preclinical studies and diagnostic decisions, guiding cancer treatment and investigation of pathogenesis. HER2 staining demands laborious tissue treatment and chemical processing performed by a histotechnologist, which typically takes one day to prepare in a laboratory, increasing analysis time and associated costs. Here, we describe a deep learning-based virtual HER2 IHC staining method using a conditional generative adversarial network that is trained to rapidly transform autofluorescence microscopic images of unlabeled/label-free breast tissue sections into bright-field equivalent microscopic images, matching the standard HER2 IHC staining that is chemically performed on the same tissue sections. The efficacy of this virtual HER2 staining framework was demonstrated by quantitative analysis, in which three board-certified breast pathologists blindly graded the HER2 scores of virtually stained and immunohistochemically stained HER2 whole slide images (WSIs) to reveal that the HER2 scores determined by inspecting virtual IHC images are as accurate as their immunohistochemically stained counterparts. A second quantitative blinded study performed by the same diagnosticians further revealed that the virtually stained HER2 images exhibit a comparable staining quality in the level of nuclear detail, membrane clearness, and absence of staining artifacts with respect to their immunohistochemically stained counterparts. This virtual HER2 staining framework bypasses the costly, laborious, and time-consuming IHC staining procedures in laboratory, and can be extended to other types of biomarkers to accelerate the IHC tissue staining used in life sciences and biomedical workflow. △ Less

Submitted 8 December, 2021; originally announced December 2021.

Comments: 26 Pages, 5 Figures

Journal ref: BME Frontiers (2022)

arXiv:2001.07267 [pdf]

doi 10.1038/s41377-020-0315-y

Digital synthesis of histological stains using micro-structured and multiplexed virtual staining of label-free tissue

Authors: Yijie Zhang, Kevin de Haan, Yair Rivenson, Jingxi Li, Apostolos Delis, Aydogan Ozcan

Abstract: Histological staining is a vital step used to diagnose various diseases and has been used for more than a century to provide contrast to tissue sections, rendering the tissue constituents visible for microscopic analysis by medical experts. However, this process is time-consuming, labor-intensive, expensive and destructive to the specimen. Recently, the ability to virtually-stain unlabeled tissue… ▽ More Histological staining is a vital step used to diagnose various diseases and has been used for more than a century to provide contrast to tissue sections, rendering the tissue constituents visible for microscopic analysis by medical experts. However, this process is time-consuming, labor-intensive, expensive and destructive to the specimen. Recently, the ability to virtually-stain unlabeled tissue sections, entirely avoiding the histochemical staining step, has been demonstrated using tissue-stain specific deep neural networks. Here, we present a new deep learning-based framework which generates virtually-stained images using label-free tissue, where different stains are merged following a micro-structure map defined by the user. This approach uses a single deep neural network that receives two different sources of information at its input: (1) autofluorescence images of the label-free tissue sample, and (2) a digital staining matrix which represents the desired microscopic map of different stains to be virtually generated at the same tissue section. This digital staining matrix is also used to virtually blend existing stains, digitally synthesizing new histological stains. We trained and blindly tested this virtual-staining network using unlabeled kidney tissue sections to generate micro-structured combinations of Hematoxylin and Eosin (H&E), Jones silver stain, and Masson's Trichrome stain. Using a single network, this approach multiplexes virtual staining of label-free tissue with multiple types of stains and paves the way for synthesizing new digital histological stains that can be created on the same tissue cross-section, which is currently not feasible with standard histochemical staining methods. △ Less

Submitted 20 January, 2020; originally announced January 2020.

Comments: 19 pages, 5 figures, 2 tables

MSC Class: 68T01; 68T05; 68U10; 62M45; 78M32; 92C50; 92C55; 94A08 ACM Class: I.2; I.2.1; I.2.6; I.2.10; I.3; I.3.3; I.4.3; I.4.4; I.4.9; J.3

Journal ref: Light: Science & Applications (2020)

arXiv:1912.05155 [pdf]

doi 10.1038/s41746-020-0282-y

Automated screening of sickle cells using a smartphone-based microscope and deep learning

Authors: Kevin de Haan, Hatice Ceylan Koydemir, Yair Rivenson, Derek Tseng, Elizabeth Van Dyne, Lissette Bakic, Doruk Karinca, Kyle Liang, Megha Ilango, Esin Gumustekin, Aydogan Ozcan

Abstract: Sickle cell disease (SCD) is a major public health priority throughout much of the world, affecting millions of people. In many regions, particularly those in resource-limited settings, SCD is not consistently diagnosed. In Africa, where the majority of SCD patients reside, more than 50% of the 0.2-0.3 million children born with SCD each year will die from it; many of these deaths are in fact prev… ▽ More Sickle cell disease (SCD) is a major public health priority throughout much of the world, affecting millions of people. In many regions, particularly those in resource-limited settings, SCD is not consistently diagnosed. In Africa, where the majority of SCD patients reside, more than 50% of the 0.2-0.3 million children born with SCD each year will die from it; many of these deaths are in fact preventable with correct diagnosis and treatment. Here we present a deep learning framework which can perform automatic screening of sickle cells in blood smears using a smartphone microscope. This framework uses two distinct, complementary deep neural networks. The first neural network enhances and standardizes the blood smear images captured by the smartphone microscope, spatially and spectrally matching the image quality of a laboratory-grade benchtop microscope. The second network acts on the output of the first image enhancement neural network and is used to perform the semantic segmentation between healthy and sickle cells within a blood smear. These segmented images are then used to rapidly determine the SCD diagnosis per patient. We blindly tested this mobile sickle cell detection method using blood smears from 96 unique patients (including 32 SCD patients) that were imaged by our smartphone microscope, and achieved ~98% accuracy, with an area-under-the-curve (AUC) of 0.998. With its high accuracy, this mobile and cost-effective method has the potential to be used as a screening tool for SCD and other blood cell disorders in resource-limited settings. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: 30 pages, 5 figures

Journal ref: npj Digital Medicine (2020)

arXiv:1910.01091 [pdf, other]

W-Net: A CNN-based Architecture for White Blood Cells Image Classification

Authors: Changhun Jung, Mohammed Abuhamad, Jumabek Alikhanov, Aziz Mohaisen, Kyungja Han, DaeHun Nyang

Abstract: Computer-aided methods for analyzing white blood cells (WBC) have become widely popular due to the complexity of the manual process. Recent works have shown highly accurate segmentation and detection of white blood cells from microscopic blood images. However, the classification of the observed cells is still a challenge and highly demanded as the distribution of the five types reflects on the con… ▽ More Computer-aided methods for analyzing white blood cells (WBC) have become widely popular due to the complexity of the manual process. Recent works have shown highly accurate segmentation and detection of white blood cells from microscopic blood images. However, the classification of the observed cells is still a challenge and highly demanded as the distribution of the five types reflects on the condition of the immune system. This work proposes W-Net, a CNN-based method for WBC classification. We evaluate W-Net on a real-world large-scale dataset, obtained from The Catholic University of Korea, that includes 6,562 real images of the five WBC types. W-Net achieves an average accuracy of 97%. △ Less

Submitted 2 October, 2019; originally announced October 2019.

arXiv:1810.06808 [pdf, other]

doi 10.3938/jkps.74.512

Essentiality landscape of metabolic networks

Authors: P. Kim, K. Han, D. -S. Lee, B. Kahng

Abstract: Local perturbations of individual metabolic reactions may result in different levels of lethality, depending on their roles in metabolism and the size of subsequent cascades induced by their failure. Moreover, essentiality of individual metabolic reactions may show large variations within and across species. Here we quantify their essentialities in hundreds of species by computing the growth rate… ▽ More Local perturbations of individual metabolic reactions may result in different levels of lethality, depending on their roles in metabolism and the size of subsequent cascades induced by their failure. Moreover, essentiality of individual metabolic reactions may show large variations within and across species. Here we quantify their essentialities in hundreds of species by computing the growth rate after removal of individual and pairs of reactions by flux balance analysis. We find that about 10% of reactions are essential, i.e., growth stops without them, and most of the remaining reactions are redundant in the metabolic network of each species. This large-scale and cross-species study allows us to determine ad hoc ages of each reaction and species. We find that when a reaction is older and contained in younger species, the reaction is more likely to be essential. Such correlations of essentiality with the ages of reactions and species may be attributable to the evolution of cellular metabolism, in which alternative pathways are recruited to ensure the stability of important reactions to various degrees across species. △ Less

Submitted 16 October, 2018; originally announced October 2018.

arXiv:nlin/0612004 [pdf, ps, other]

doi 10.1103/PhysRevE.76.065201

Emergence of Chaotic Itinerancy in Simple Ecological Systems

Authors: Pan-Jun Kim, Tae-Wook Ko, Hawoong Jeong, Kyoung J. Lee, Seung Kee Han

Abstract: Chaotic itinerancy is a universal dynamical concept that describes itinerant motion among many different ordered states through chaotic transition in dynamical systems. Unlike the expectation of the prevalence of chaotic itinerancy in high-dimensional systems, we identify chaotic itinerant behavior from a relatively simple ecological system, which consists only of two coupled consumer-resource p… ▽ More Chaotic itinerancy is a universal dynamical concept that describes itinerant motion among many different ordered states through chaotic transition in dynamical systems. Unlike the expectation of the prevalence of chaotic itinerancy in high-dimensional systems, we identify chaotic itinerant behavior from a relatively simple ecological system, which consists only of two coupled consumer-resource pairs. The system exhibits chaotic bursting activity, in which the explosion and the shrinkage of the population alternate indefinitely, while the explosion of one pair co-occurs with the shrinkage of the other pair. We analyze successfully the bursting activity in the framework of chaotic itinerancy, and find that large duration times of bursts tend to cluster in time, allowing the effective burst prognosis. We also investigate the control schemes on the bursting activity, and demonstrate that invoking the competitive rise of the consumer in one pair can even elongate the burst of the other pair rather than shorten it. △ Less

Submitted 16 December, 2007; v1 submitted 3 December, 2006; originally announced December 2006.

Journal ref: Phys. Rev. E 76 065201(R) (2007)

Showing 1–12 of 12 results for author: Han, K