-
Towards Trustworthy Artificial Intelligence for Equitable Global Health
Authors:
Hong Qin,
Jude Kong,
Wandi Ding,
Ramneek Ahluwalia,
Christo El Morr,
Zeynep Engin,
Jake Okechukwu Effoduh,
Rebecca Hwa,
Serena Jingchuan Guo,
Laleh Seyyed-Kalantari,
Sylvia Kiwuwa Muyingo,
Candace Makeda Moore,
Ravi Parikh,
Reva Schwartz,
Dongxiao Zhu,
Xiaoqian Wang,
Yiye Zhang
Abstract:
Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a glob…
▽ More
Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a global mix of experts from various disciplines, community health practitioners, policymakers, and more. Topics covered included managing AI bias in socio-technical systems, AI's potential impacts on global health, and balancing data privacy with transparency. Panel discussions examined the cultural, political, and ethical dimensions of AI in global health. FairMI4GH aimed to stimulate dialogue, facilitate knowledge transfer, and spark innovative solutions. Drawing from NIST's AI Risk Management Framework, it provided suggestions for handling AI risks and biases. The need to mitigate data biases from the research design stage, adopt a human-centered approach, and advocate for AI transparency was recognized. Challenges such as updating legal frameworks, managing cross-border data sharing, and motivating developers to reduce bias were acknowledged. The event emphasized the necessity of diverse viewpoints and multi-dimensional dialogue for creating a fair and ethical AI framework for equitable global health.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
A Diffusion-Based Embedding of the Stochastic Simulation Algorithm in Continuous Space
Authors:
Marcus Thomas,
Russell Schwartz
Abstract:
A variety of simulation methodologies have been used for modeling reaction-diffusion dynamics -- including approaches based on Differential Equations (DE), the Stochastic Simulation Algorithm (SSA), Brownian Dynamics (BD), Green's Function Reaction Dynamics (GFRD), and variations thereon -- each offering trade-offs with respect to the ranges of phenomena they can model, their computational tractab…
▽ More
A variety of simulation methodologies have been used for modeling reaction-diffusion dynamics -- including approaches based on Differential Equations (DE), the Stochastic Simulation Algorithm (SSA), Brownian Dynamics (BD), Green's Function Reaction Dynamics (GFRD), and variations thereon -- each offering trade-offs with respect to the ranges of phenomena they can model, their computational tractability, and the difficulty of fitting them to experimental measurements. Here, we develop a multiscale approach combining efficient SSA-like sampling suitable for well-mixed systems with aspects of the slower but space-aware GFRD model, assuming as with GFRD that reactions occur in a spatially heterogeneous environment that must be explicitly modeled. Our method extends the SSA approach in two major ways. First, we sample bimolecular association reactions following diffusive motion with a time-dependent reaction propensity. Second, reaction locations are sampled from within overlapping diffusion spheres describing the spatial probability densities of individual reactants. We show the approach to provide efficient simulation of spatially heterogeneous biochemistry in comparison to alternative methods via application to a Michaelis-Menten model.
△ Less
Submitted 20 May, 2021; v1 submitted 3 April, 2020;
originally announced April 2020.
-
A neuronal dynamics study on a neuromorphic chip
Authors:
Wenyuan Li,
Igor V. Ovchinnikov,
Honglin Chen,
Zhe Wang,
Albert Lee,
Hochul Lee,
Carlos Cepeda,
Robert N. Schwartz,
Karlheinz Meier,
Kang L. Wang
Abstract:
Neuronal firing activities have attracted a lot of attention since a large population of spatiotemporal patterns in the brain is the basis for adaptive behavior and can also reveal the signs for various neurological disorders including Alzheimer's, schizophrenia, epilepsy and others. Here, we study the dynamics of a simple neuronal network using different sets of settings on a neuromorphic chip. W…
▽ More
Neuronal firing activities have attracted a lot of attention since a large population of spatiotemporal patterns in the brain is the basis for adaptive behavior and can also reveal the signs for various neurological disorders including Alzheimer's, schizophrenia, epilepsy and others. Here, we study the dynamics of a simple neuronal network using different sets of settings on a neuromorphic chip. We observed three different types of collective neuronal firing activities, which agree with the clinical data taken from the brain. We constructed a brain phase diagram and showed that within the weak noise region, the brain is operating in an expected noise-induced phase (N-phase) rather than at a so-called self-organized critical boundary. The significance of this study is twofold: first, the deviation of neuronal activities from the normal brain could be symptomatic of diseases of the central nervous system, thus paving the way for new diagnostics and treatments; second, the normal brain states in the N-phase are optimal for computation and information processing. The latter may provide a way to establish powerful new computing paradigm using collective behavior of networks of spiking neurons.
△ Less
Submitted 10 March, 2017;
originally announced March 2017.
-
Criticality or Supersymmetry Breaking ?
Authors:
Igor V. Ovchinnikov,
Wenyuan Li,
Yuquan Sun,
Robert N. Schwartz,
Andrew E. Hudson,
Karlheinz Meier,
Kang L. Wang
Abstract:
In many stochastic dynamical systems, ordinary chaotic behavior is preceded by a full-dimensional phase that exhibits 1/f-type power-spectra and/or scale-free statistics of (anti)instantons such as neuroavalanches, earthquakes, etc. In contrast with the phenomenological concept of self-organized criticality, the recently developed approximation-free supersymmetric theory of stochastic differential…
▽ More
In many stochastic dynamical systems, ordinary chaotic behavior is preceded by a full-dimensional phase that exhibits 1/f-type power-spectra and/or scale-free statistics of (anti)instantons such as neuroavalanches, earthquakes, etc. In contrast with the phenomenological concept of self-organized criticality, the recently developed approximation-free supersymmetric theory of stochastic differential equations, or stochastics, (STS) identifies this phase as the noise-induced chaos (N-phase), i.e., the phase where the topological supersymmetry pertaining to all stochastic dynamical systems is broken spontaneously by the condensation of the noise-induced (anti-)instantons. Here, we support this picture in the context of neurodynamics. We study a 1D chain of neuron-like elements and find that the dynamics in the N-phase is indeed featured by positive stochastic Lyapunov exponents and dominated by (anti)instantonic processes of (creation)annihilation of kinks and antikinks, which can be viewed as predecessors of boundaries of neuroavalanches. We also construct the phase diagram of emulated stochastic neurodynamics on Spikey neuromorphic hardware and demonstrate that the width of the N-phase vanishes in the deterministic limit in accordance with STS. As a first result of the application of STS to neurodynamics comes the conclusion that a conscious brain can reside only in the N-phase.
△ Less
Submitted 6 February, 2020; v1 submitted 30 August, 2016;
originally announced September 2016.
-
Automated deconvolution of structured mixtures from bulk tumor genomic data
Authors:
Theodore Roman,
Lu Xie,
Russell Schwartz
Abstract:
Motivation: As cancer researchers have come to appreciate the importance of intratumor heterogeneity, much attention has focused on the challenges of accurately profiling heterogeneity in individual patients. Experimental technologies for directly profiling genomes of single cells are rapidly improving, but they are still impractical for large-scale sampling. Bulk genomic assays remain the standar…
▽ More
Motivation: As cancer researchers have come to appreciate the importance of intratumor heterogeneity, much attention has focused on the challenges of accurately profiling heterogeneity in individual patients. Experimental technologies for directly profiling genomes of single cells are rapidly improving, but they are still impractical for large-scale sampling. Bulk genomic assays remain the standard for population-scale studies, but conflate the influences of mixtures of genetically distinct tumor, stromal, and infiltrating immune cells. Many computational approaches have been developed to deconvolute these mixed samples and reconstruct the genomics of genetically homogeneous clonal subpopulations. All such methods, however, are limited to reconstructing only coarse approximations to a few major subpopulations. In prior work, we showed that one can improve deconvolution of genomic data by leveraging substructure in cellular mixtures through a strategy called simplicial complex inference. This strategy, however, is also limited by the difficulty of inferring mixture structure from sparse, noisy assays. Results: We improve on past work by introducing enhancements to automate learning of substructured genomic mixtures, with specific emphasis on genome-wide copy number variation (CNV) data. We introduce methods for dimensionality estimation to better decompose mixture model substructure; fuzzy clustering to better identify substructure in sparse, noisy data; and automated model inference methods for other key model parameters. We show that these improvements lead to more accurate inference of cell populations and mixture proportions in simulated scenarios. We further demonstrate their effectiveness in identifying mixture substructure in real tumor CNV data. Availability: Source code is available at http://www.cs.cmu.edu/~russells/software/WSCUnmix.zip
△ Less
Submitted 8 April, 2016;
originally announced April 2016.
-
Derivative-free optimization of rate parameters of capsid assembly models from bulk in vitro data
Authors:
Lu Xie,
Gregory R. Smith,
Russell Schwartz
Abstract:
The assembly of virus capsids from free coat proteins proceeds by a complicated cascade of association and dissociation steps, the great majority of which cannot be directly experimentally observed. This has made capsid assembly a rich field for computational models to attempt to fill the gaps in what is experimentally observable. Nonetheless, accurate simulation predictions depend on accurate mod…
▽ More
The assembly of virus capsids from free coat proteins proceeds by a complicated cascade of association and dissociation steps, the great majority of which cannot be directly experimentally observed. This has made capsid assembly a rich field for computational models to attempt to fill the gaps in what is experimentally observable. Nonetheless, accurate simulation predictions depend on accurate models and there are substantial obstacles to model inference for such systems. Here, we describe progress in learning parameters for capsid assembly systems, particularly kinetic rate constants of coat-coat interactions, by computationally fitting simulations to experimental data. We previously developed an approach to learn rate parameters of coat-coat interactions by minimizing the deviation between real and simulated light scattering data monitoring bulk capsid assembly in vitro. This is a difficult data-fitting problem, however, because of the high computational cost of simulating assembly trajectories, the stochastic noise inherent to the models, and the limited and noisy data available for fitting. Here we show that a newer classes of methods, based on derivative-free optimization (DFO), can more quickly and precisely learn physical parameters from static light scattering data. We further explore how the advantages of the approaches might be affected by alternative data sources through simulation of a model of time-resolved mass spectrometry data, an alternative technology for monitoring bulk capsid assembly that can be expected to provide much richer data. The results show that advances in both the data and the algorithms can improve model inference, with rich data leading to high-quality fits for all methods, but DFO methods showing substantial advantages over less informative data sources better representative of the current experimental practice.
△ Less
Submitted 7 July, 2015;
originally announced July 2015.
-
A Composite Genome Approach to Identify Phylogenetically Informative Data from Next-Generation Sequencing
Authors:
Rachel S. Schwartz,
Kelly Harkins,
Anne C. Stone,
Reed A. Cartwright
Abstract:
We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, genome-genome alignment, and annotation. For simulations SISRS is able to identify large numbers of loci containing variable s…
▽ More
We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, genome-genome alignment, and annotation. For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered phylogenies from multiple datasets that were consistent with previous conflicting estimates of the relationships among mammals. SISRS is open source and freely available at https://github.com/rachelss/SISRS.
△ Less
Submitted 12 November, 2014; v1 submitted 15 May, 2013;
originally announced May 2013.
-
CGHTRIMMER: Discretizing noisy Array CGH Data
Authors:
Charalampos E. Tsourakakis,
David Tolliver,
Maria A. Tsiarli,
Stanley Shackney,
Russell Schwartz
Abstract:
The development of cancer is largely driven by the gain or loss of subsets of the genome, promoting uncontrolled growth or disabling defenses against it. Identifying genomic regions whose DNA copy number deviates from the normal is therefore central to understanding cancer evolution. Array-based comparative genomic hybridization (aCGH) is a high-throughput technique for identifying DNA gain or l…
▽ More
The development of cancer is largely driven by the gain or loss of subsets of the genome, promoting uncontrolled growth or disabling defenses against it. Identifying genomic regions whose DNA copy number deviates from the normal is therefore central to understanding cancer evolution. Array-based comparative genomic hybridization (aCGH) is a high-throughput technique for identifying DNA gain or loss by quantifying total amounts of DNA matching defined probes relative to healthy diploid control samples. Due to the high level of noise in microarray data, however, interpretation of aCGH output is a difficult and error-prone task.
In this work, we tackle the computational task of inferring the DNA copy number per genomic position from noisy aCGH data. We propose CGHTRIMMER, a novel segmentation method that uses a fast dynamic programming algorithm to solve for a least-squares objective function for copy number assignment. CGHTRIMMER consistently achieves superior precision and recall to leading competitors on benchmarks of synthetic data and real data from the Coriell cell lines. In addition, it finds several novel markers not recorded in the benchmarks but plausibly supported in the oncology literature. Furthermore, CGHTRIMMER achieves superior results with run-times from 1 to 3 orders of magnitude faster than its state-of-art competitors.
CGHTRIMMER provides a new alternative for the problem of aCGH discretization that provides superior detection of fine-scale regions of gain or loss yet is fast enough to process very large data sets in seconds. It thus meets an important need for methods capable of handling the vast amounts of data being accumulated in high-throughput studies of tumor genetics.
△ Less
Submitted 23 February, 2010;
originally announced February 2010.
-
Generalized Buneman pruning for inferring the most parsimonious multi-state phylogeny
Authors:
Navodit Misra,
Guy Blelloch,
R. Ravi,
Russell Schwartz
Abstract:
Accurate reconstruction of phylogenies remains a key challenge in evolutionary biology. Most biologically plausible formulations of the problem are formally NP-hard, with no known efficient solution. The standard in practice are fast heuristic methods that are empirically known to work very well in general, but can yield results arbitrarily far from optimal. Practical exact methods, which yield e…
▽ More
Accurate reconstruction of phylogenies remains a key challenge in evolutionary biology. Most biologically plausible formulations of the problem are formally NP-hard, with no known efficient solution. The standard in practice are fast heuristic methods that are empirically known to work very well in general, but can yield results arbitrarily far from optimal. Practical exact methods, which yield exponential worst-case running times but generally much better times in practice, provide an important alternative. We report progress in this direction by introducing a provably optimal method for the weighted multi-state maximum parsimony phylogeny problem. The method is based on generalizing the notion of the Buneman graph, a construction key to efficient exact methods for binary sequences, so as to apply to sequences with arbitrary finite numbers of states with arbitrary state transition weights. We implement an integer linear programming (ILP) method for the multi-state problem using this generalized Buneman graph and demonstrate that the resulting method is able to solve data sets that are intractable by prior exact methods in run times comparable with popular heuristics. Our work provides the first method for provably optimal maximum parsimony phylogeny inference that is practical for multi-state data sets of more than a few characters.
△ Less
Submitted 14 April, 2010; v1 submitted 9 October, 2009;
originally announced October 2009.
-
Efficient stochastic sampling of first-passage times with applications to self-assembly simulations
Authors:
Navodit Misra,
Russell Schwartz
Abstract:
Models of reaction chemistry based on the stochastic simulation algorithm (SSA) have become a crucial tool for simulating complicated biological reaction networks due to their ability to handle extremely complicated reaction networks and to represent noise in small-scale chemistry. These methods can, however, become highly inefficient for stiff reaction systems, those in which different reaction…
▽ More
Models of reaction chemistry based on the stochastic simulation algorithm (SSA) have become a crucial tool for simulating complicated biological reaction networks due to their ability to handle extremely complicated reaction networks and to represent noise in small-scale chemistry. These methods can, however, become highly inefficient for stiff reaction systems, those in which different reaction channels operate on widely varying time scales. In this paper, we develop two methods for accelerating sampling in SSA models: an exact method and a scheme allowing for sampling accuracy up to any arbitrary error bound. Both methods depend on analysis of the eigenvalues of continuous time Markov model graphs that define the behavior of the SSA. We demonstrate these methods for the specific application of sampling breakage times for multiply-connected bond networks, a class of stiff system important to models of self-assembly processes. We show theoretically and empirically that our eigenvalue methods provide substantially reduced sampling times for a wide range of network breakage models. These techniques are also likely to have broad use in accelerating SSA models so as to apply them to systems and parameter ranges that are currently computationally intractable.
△ Less
Submitted 6 November, 2008; v1 submitted 2 April, 2008;
originally announced April 2008.