-
Symbiotic Message Passing Model for Transfer Learning between Anti-Fungal and Anti-Bacterial Domains
Authors:
Ronen Taub,
Tanya Wasserman,
Yonatan Savir
Abstract:
Machine learning, and representation learning in particular, has the potential to facilitate drug discovery by screening billions of compounds. For example, a successful approach is representing the molecules as a graph and utilizing graph neural networks (GNN). Yet, these approaches still require experimental measurements of thousands of compounds to construct a proper training set. While in some…
▽ More
Machine learning, and representation learning in particular, has the potential to facilitate drug discovery by screening billions of compounds. For example, a successful approach is representing the molecules as a graph and utilizing graph neural networks (GNN). Yet, these approaches still require experimental measurements of thousands of compounds to construct a proper training set. While in some domains it is easier to acquire experimental data, in others it might be more limited. For example, it is easier to test the compounds on bacteria than perform in-vivo experiments. Thus, a key question is how to utilize information from a large available dataset together with a small subset of compounds where both domains are measured to predict compounds' effect on the second, experimentally less available domain. Current transfer learning approaches for drug discovery, including training of pre-trained modules or meta-learning, have limited success. In this work, we develop a novel method, named Symbiotic Message Passing Neural Network (SMPNN), for merging graph-neural-network models from different domains. Using routing new message passing lanes between them, our approach resolves some of the potential conflicts between the different domains, and implicit constraints induced by the larger datasets. By collecting public data and performing additional high-throughput experiments, we demonstrate the advantage of our approach by predicting anti-fungal activity from anti-bacterial activity. We compare our method to the standard transfer learning approach and show that SMPNN provided better and less variable performances. Our approach is general and can be used to facilitate information transfer between any two domains such as different organisms, different organelles, or different environments.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Harnessing Artificial Intelligence to Infer Novel Spatial Biomarkers for the Diagnosis of Eosinophilic Esophagitis
Authors:
Ariel Larey,
Eliel Aknin,
Nati Daniel,
Garrett A. Osswald,
Julie M. Caldwell,
Mark Rochman,
Tanya Wasserman,
Margaret H. Collins,
Nicoleta C. Arva,
Guang-Yu Yang,
Marc E. Rothenberg,
Yonatan Savir
Abstract:
Eosinophilic esophagitis (EoE) is a chronic allergic inflammatory condition of the esophagus associated with elevated esophageal eosinophils. Second only to gastroesophageal reflux disease, EoE is one of the leading causes of chronic refractory dysphagia in adults and children. EoE diagnosis requires enumerating the density of esophageal eosinophils in esophageal biopsies, a somewhat subjective ta…
▽ More
Eosinophilic esophagitis (EoE) is a chronic allergic inflammatory condition of the esophagus associated with elevated esophageal eosinophils. Second only to gastroesophageal reflux disease, EoE is one of the leading causes of chronic refractory dysphagia in adults and children. EoE diagnosis requires enumerating the density of esophageal eosinophils in esophageal biopsies, a somewhat subjective task that is time-consuming, thus reducing the ability to process the complex tissue structure. Previous artificial intelligence (AI) approaches that aimed to improve histology-based diagnosis focused on recapitulating identification and quantification of the area of maximal eosinophil density. However, this metric does not account for the distribution of eosinophils or other histological features, over the whole slide image. Here, we developed an artificial intelligence platform that infers local and spatial biomarkers based on semantic segmentation of intact eosinophils and basal zone distributions. Besides the maximal density of eosinophils (referred to as Peak Eosinophil Count [PEC]) and a maximal basal zone fraction, we identify two additional metrics that reflect the distribution of eosinophils and basal zone fractions. This approach enables a decision support system that predicts EoE activity and classifies the histological severity of EoE patients. We utilized a cohort that includes 1066 biopsy slides from 400 subjects to validate the system's performance and achieved a histological severity classification accuracy of 86.70%, sensitivity of 84.50%, and specificity of 90.09%. Our approach highlights the importance of systematically analyzing the distribution of biopsy features over the entire slide and paves the way towards a personalized decision support system that will assist not only in counting cells but can also potentially improve diagnosis and provide treatment prediction.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
PECNet: A Deep Multi-Label Segmentation Network for Eosinophilic Esophagitis Biopsy Diagnostics
Authors:
Nati Daniel,
Ariel Larey,
Eliel Aknin,
Garrett A. Osswald,
Julie M. Caldwell,
Mark Rochman,
Margaret H. Collins,
Guang-Yu Yang,
Nicoleta C. Arva,
Kelley E. Capocelli,
Marc E. Rothenberg,
Yonatan Savir
Abstract:
Background. Eosinophilic esophagitis (EoE) is an allergic inflammatory condition of the esophagus associated with elevated numbers of eosinophils. Disease diagnosis and monitoring requires determining the concentration of eosinophils in esophageal biopsies, a time-consuming, tedious and somewhat subjective task currently performed by pathologists. Methods. Herein, we aimed to use machine learning…
▽ More
Background. Eosinophilic esophagitis (EoE) is an allergic inflammatory condition of the esophagus associated with elevated numbers of eosinophils. Disease diagnosis and monitoring requires determining the concentration of eosinophils in esophageal biopsies, a time-consuming, tedious and somewhat subjective task currently performed by pathologists. Methods. Herein, we aimed to use machine learning to identify, quantitate and diagnose EoE. We labeled more than 100M pixels of 4345 images obtained by scanning whole slides of H&E-stained sections of esophageal biopsies derived from 23 EoE patients. We used this dataset to train a multi-label segmentation deep network. To validate the network, we examined a replication cohort of 1089 whole slide images from 419 patients derived from multiple institutions. Findings. PECNet segmented both intact and not-intact eosinophils with a mean intersection over union (mIoU) of 0.93. This segmentation was able to quantitate intact eosinophils with a mean absolute error of 0.611 eosinophils and classify EoE disease activity with an accuracy of 98.5%. Using whole slide images from the validation cohort, PECNet achieved an accuracy of 94.8%, sensitivity of 94.3%, and specificity of 95.14% in reporting EoE disease activity. Interpretation. We have developed a deep learning multi-label semantic segmentation network that successfully addresses two of the main challenges in EoE diagnostics and digital pathology, the need to detect several types of small features simultaneously and the ability to analyze whole slides efficiently. Our results pave the way for an automated diagnosis of EoE and can be utilized for other conditions with similar challenges.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Machine learning approach for biopsy-based identification of eosinophilic esophagitis reveals importance of global features
Authors:
Tomer Czyzewski,
Nati Daniel,
Mark Rochman,
Julie M. Caldwell,
Garrett A. Osswald,
Margaret H. Collins,
Marc E. Rothenberg,
Yonatan Savir
Abstract:
Goal: Eosinophilic esophagitis (EoE) is an allergic inflammatory condition characterized by eosinophil accumulation in the esophageal mucosa. EoE diagnosis includes a manual assessment of eosinophil levels in mucosal biopsies - a time-consuming, laborious task that is difficult to standardize. One of the main challenges in automating this process, like many other biopsy-based diagnostics, is detec…
▽ More
Goal: Eosinophilic esophagitis (EoE) is an allergic inflammatory condition characterized by eosinophil accumulation in the esophageal mucosa. EoE diagnosis includes a manual assessment of eosinophil levels in mucosal biopsies - a time-consuming, laborious task that is difficult to standardize. One of the main challenges in automating this process, like many other biopsy-based diagnostics, is detecting features that are small relative to the size of the biopsy. Results: In this work, we utilized hematoxylin- and eosin-stained slides from esophageal biopsies from patients with active EoE and control subjects to develop a platform based on a deep convolutional neural network (DCNN) that can classify esophageal biopsies with an accuracy of 85%, sensitivity of 82.5%, and specificity of 87%. Moreover, by combining several downscaling and cropping strategies, we show that some of the features contributing to the correct classification are global rather than specific, local features. Conclusions: We report the ability of artificial intelligence to identify EoE using computer vision analysis of esophageal biopsy slides. Further, the DCNN features associated with EoE are based on not only local eosinophils but also global histologic changes. Our approach can be used for other conditions that rely on biopsy-based histologic diagnostics.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
RecA-mediated homology search as a nearly optimal signal detection system
Authors:
Yonatan Savir,
Tsvi Tlusty
Abstract:
Homologous recombination facilitates the exchange of genetic material between homologous DNA molecules. This crucial process requires detecting a specific homologous DNA sequence within a huge variety of heterologous sequences. The detection is mediated by RecA in E. coli, or members of its superfamily in other organisms. Here we examine how well is the RecA-DNA interaction adjusted to its task. B…
▽ More
Homologous recombination facilitates the exchange of genetic material between homologous DNA molecules. This crucial process requires detecting a specific homologous DNA sequence within a huge variety of heterologous sequences. The detection is mediated by RecA in E. coli, or members of its superfamily in other organisms. Here we examine how well is the RecA-DNA interaction adjusted to its task. By formulating the DNA recognition process as a signal detection problem, we find the optimal value of binding energy that maximizes the ability to detect homologous sequences. We show that the experimentally observed binding energy is nearly optimal. This implies that the RecA-induced deformation and the binding energetics are fine-tuned to ensure optimal sequence detection. Our analysis suggests a possible role for DNA extension by RecA, in which deformation enhances detection. The present signal detection approach provides a general recipe for testing the optimality of other molecular recognition systems.
△ Less
Submitted 19 November, 2010;
originally announced November 2010.
-
Optimal Design of a Molecular Recognizer: Molecular Recognition as a Bayesian Signal Detection Problem
Authors:
Yonatan Savir,
Tsvi Tlusty
Abstract:
Numerous biological functions-such as enzymatic catalysis, the immune response system, and the DNA-protein regulatory network-rely on the ability of molecules to specifically recognize target molecules within a large pool of similar competitors in a noisy biochemical environment. Using the basic framework of signal detection theory, we treat the molecular recognition process as a signal detection…
▽ More
Numerous biological functions-such as enzymatic catalysis, the immune response system, and the DNA-protein regulatory network-rely on the ability of molecules to specifically recognize target molecules within a large pool of similar competitors in a noisy biochemical environment. Using the basic framework of signal detection theory, we treat the molecular recognition process as a signal detection problem and examine its overall performance. Thus, we evaluate the optimal properties of a molecular recognizer in the presence of competition and noise. Our analysis reveals that the optimal design undergoes a "phase transition" as the structural properties of the molecules and interaction energies between them vary. In one phase, the recognizer should be complementary in structure to its target (like a lock and a key), while in the other, conformational changes upon binding, which often accompany molecular recognition, enhance recognition quality. Using this framework, the abundance of conformational changes may be explained as a result of increasing the fitness of the recognizer. Furthermore, this analysis may be used in future design of artificial signal processing devices based on biomolecules.
△ Less
Submitted 26 July, 2010;
originally announced July 2010.
-
Molecular Recognition as an Information Channel: The Role of Conformational Changes
Authors:
Yonatan Savir,
Tsvi Tlusty
Abstract:
Molecular recognition, which is essential in processing information in biological systems, takes place in a crowded noisy biochemical environment and requires the recognition of a specific target within a background of various similar competing molecules. We consider molecular recognition as a transmission of information via a noisy channel and use this analogy to gain insights on the optimal, or…
▽ More
Molecular recognition, which is essential in processing information in biological systems, takes place in a crowded noisy biochemical environment and requires the recognition of a specific target within a background of various similar competing molecules. We consider molecular recognition as a transmission of information via a noisy channel and use this analogy to gain insights on the optimal, or fittest, molecular recognizer. We focus on the optimal structural properties of the molecules such as flexibility and conformation. We show that conformational changes upon binding, which often occur during molecular recognition, may optimize the detection performance of the recognizer. We thus suggest a generic design principle termed 'conformational proofreading' in which deformation enhances detection. We evaluate the optimal flexibility of the molecular recognizer, which is analogous to the stochasticity in a decision unit. In some scenarios, a flexible recognizer, i.e., a stochastic decision unit, performs better than a rigid, deterministic one. As a biological example, we discuss conformational changes during homologous recombination, the process of genetic exchange between two DNA strands.
△ Less
Submitted 26 July, 2010;
originally announced July 2010.
-
Cross-species analysis traces adaptation of Rubisco towards optimality in a low dimensional landscape
Authors:
Yonatan Savir,
Elad Noor,
Ron Milo,
Tsvi Tlusty
Abstract:
Rubisco, probably the most abundant protein in the biosphere, performs an essential part in the process of carbon fixation through photosynthesis thus facilitating life on earth. Despite the significant effect that Rubisco has on the fitness of plants and other photosynthetic organisms, this enzyme is known to have a remarkably low catalytic rate and a tendency to confuse its substrate, carbon dio…
▽ More
Rubisco, probably the most abundant protein in the biosphere, performs an essential part in the process of carbon fixation through photosynthesis thus facilitating life on earth. Despite the significant effect that Rubisco has on the fitness of plants and other photosynthetic organisms, this enzyme is known to have a remarkably low catalytic rate and a tendency to confuse its substrate, carbon dioxide, with oxygen. This apparent inefficiency is puzzling and raises questions regarding the roles of evolution versus biochemical constraints in shaping Rubisco. Here we examine these questions by analyzing the measured kinetic parameters of Rubisco from various organisms in various environments. The analysis presented here suggests that the evolution of Rubisco is confined to an effectively one-dimensional landscape, which is manifested in simple power law correlations between its kinetic parameters. Within this one dimensional landscape, which may represent biochemical and structural constraints, Rubisco appears to be tuned to the intracellular environment in which it resides such that the net photosynthesis rate is nearly optimal. Our analysis indicates that the specificity of Rubisco is not the main determinant of its efficiency but rather the tradeoff between the carboxylation velocity and CO2 affinity. As a result, the presence of oxygen has only moderate effect on the optimal performance of Rubisco, which is determined mostly by the local CO2 concentration. Rubisco appears as an experimentally testable example for the evolution of proteins subject both to strong selection pressure and to biochemical constraints which strongly confine the evolutionary plasticity to a low dimensional landscape.
△ Less
Submitted 26 July, 2010;
originally announced July 2010.
-
Conformational Proofreading: The Impact of Conformational Changes on the Specificity of Molecular Recognition
Authors:
Yonatan Savir,
Tsvi Tlusty
Abstract:
To perform recognition, molecules must locate and specifically bind their targets within a noisy biochemical environment with many look-alikes. Molecular recognition processes, especially the induced-fit mechanism, are known to involve conformational changes. This arises a basic question: does molecular recognition gain any advantage by such conformational changes? By introducing a simple statisti…
▽ More
To perform recognition, molecules must locate and specifically bind their targets within a noisy biochemical environment with many look-alikes. Molecular recognition processes, especially the induced-fit mechanism, are known to involve conformational changes. This arises a basic question: does molecular recognition gain any advantage by such conformational changes? By introducing a simple statistical-mechanics approach, we study the effect of conformation and flexibility on the quality of recognition processes. Our model relates specificity to the conformation of the participant molecules and thus suggests a possible answer: Optimal specificity is achieved when the ligand is slightly off target, that is a conformational mismatch between the ligand and its main target improves the selectivity of the process. This indicates that deformations upon binding serve as a conformational proofreading mechanism, which may be selected for via evolution.
△ Less
Submitted 26 July, 2010;
originally announced July 2010.