-
baker: An R package for Nested Partially-Latent Class Models
Authors:
Irena B Chen,
Qiyuan Shi,
Scott L Zeger,
Zhenke Wu
Abstract:
This paper describes and illustrates the functionality of the baker R package. The package estimates a suite of nested partially-latent class models (NPLCM) for multivariate binary responses that are observed under a case-control design. The baker package allows researchers to flexibly estimate population-level class prevalences and posterior probabilities of class membership for individual cases.…
▽ More
This paper describes and illustrates the functionality of the baker R package. The package estimates a suite of nested partially-latent class models (NPLCM) for multivariate binary responses that are observed under a case-control design. The baker package allows researchers to flexibly estimate population-level class prevalences and posterior probabilities of class membership for individual cases. Estimation is accomplished by calling a cross-platform automatic Bayesian inference software JAGS through a wrapper R function that parses model specifications and data inputs. The baker package provides many useful features, including data ingestion, exploratory data analyses, model diagnostics, extensive plotting and visualization options, catalyzing communications between practitioners and domain scientists. Package features and workflows are illustrated using simulated and real data sets. Package URL: https://github.com/zhenkewu/baker
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Learning and Predicting from Dynamic Models for COVID-19 Patient Monitoring
Authors:
Zitong Wang,
Mary Grace Bowring,
Antony Rosen,
Brian T. Garibaldi,
Akihiko Nishimura,
Scott L. Zeger
Abstract:
COVID-19 has challenged health systems to learn how to learn. This paper describes the context, methods and challenges for learning to improve COVID-19 care at one academic health center. Challenges to learning include: (1) choosing a right clinical target; (2) designing methods for accurate predictions by borrowing strength from prior patients' experiences; (3) communicating the methodology to cl…
▽ More
COVID-19 has challenged health systems to learn how to learn. This paper describes the context, methods and challenges for learning to improve COVID-19 care at one academic health center. Challenges to learning include: (1) choosing a right clinical target; (2) designing methods for accurate predictions by borrowing strength from prior patients' experiences; (3) communicating the methodology to clinicians so they understand and trust it; (4) communicating the predictions to the patient at the moment of clinical decision; and (5) continuously evaluating and revising the methods so they adapt to changing patients and clinical demands. To illustrate these challenges, this paper contrasts two statistical modeling approaches - prospective longitudinal models in common use and retrospective analogues complementary in the COVID-19 context - for predicting future biomarker trajectories and major clinical events. The methods are applied to and validated on a cohort of 1,678 patients who were hospitalized with COVID-19 during the early months of the pandemic. We emphasize graphical tools to promote physician learning and inform clinical decision making.
△ Less
Submitted 21 March, 2022; v1 submitted 2 November, 2021;
originally announced November 2021.
-
A hierarchical model for estimating exposure-response curves from multiple studies
Authors:
Joshua P. Keller,
Joanne Katz,
Amid K. Pokhrel,
Michael N. Bates,
James Tielsch,
Scott L. Zeger
Abstract:
Cookstove replacement trials have found mixed results on their impact on respiratory health. The limited range of concentrations and small sample sizes of individual studies are important factors that may be limiting their statistical power. We present a hierarchical approach to modeling exposure concentrations and pooling data from multiple studies in order to estimate a common exposure-response…
▽ More
Cookstove replacement trials have found mixed results on their impact on respiratory health. The limited range of concentrations and small sample sizes of individual studies are important factors that may be limiting their statistical power. We present a hierarchical approach to modeling exposure concentrations and pooling data from multiple studies in order to estimate a common exposure-response curve. The exposure concentration model accommodates temporally sparse, clustered longitudinal observations. The exposure-response curve model provides a flexible, semi-parametric estimate of the exposure-response relationship while accommodating heterogeneous clustered data. We apply this model to data from three studies of cookstoves and respiratory infections in children in Nepal, which represent three study types: crossover trial, parallel trial, and case-control study. We find evidence of increased odds of disease for particulate matter concentrations between 50 and 200 $μ$g/m$^3$ and a flattening of the exposure-response curve for higher exposure concentrations. The model we present can incorporate additional studies and be applied to other settings.
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
A Bayesian Approach to Restricted Latent Class Models for Scientifically-Structured Clustering of Multivariate Binary Outcomes
Authors:
Zhenke Wu,
Livia Casciola-Rosen,
Antony Rosen,
Scott L. Zeger
Abstract:
In this paper, we propose a general framework for combining evidence of varying quality to estimate underlying binary latent variables in the presence of restrictions imposed to respect the scientific context. The resulting algorithms cluster the multivariate binary data in a manner partly guided by prior knowledge. The primary model assumptions are that 1) subjects belong to classes defined by un…
▽ More
In this paper, we propose a general framework for combining evidence of varying quality to estimate underlying binary latent variables in the presence of restrictions imposed to respect the scientific context. The resulting algorithms cluster the multivariate binary data in a manner partly guided by prior knowledge. The primary model assumptions are that 1) subjects belong to classes defined by unobserved binary states, such as the true presence or absence of pathogens in epidemiology, or of antibodies in medicine, or the "ability" to correctly answer test questions in psychology, 2) a binary design matrix $Γ$ specifies relevant features in each class, and 3) measurements are independent given the latent class but can have different error rates. Conditions ensuring parameter identifiability from the likelihood function are discussed and inform the design of a novel posterior inference algorithm that simultaneously estimates the number of clusters, design matrix $Γ$, and model parameters. In finite samples and dimensions, we propose prior assumptions so that the posterior distribution of the number of clusters and the patterns of latent states tend to concentrate on smaller values and sparser patterns, respectively. The model readily extends to studies where some subjects' latent classes are known or important prior knowledge about differential measurement accuracy is available from external sources. The methods are illustrated with an analysis of protein data to detect clusters representing auto-antibody classes among scleroderma patients.
△ Less
Submitted 24 August, 2018;
originally announced August 2018.
-
Fast Out-of-Sample Predictions for Bayesian Hierarchical Models of Latent Health States
Authors:
Aaron J Fisher,
R Yates Coley,
Scott L Zeger
Abstract:
Hierarchical Bayesian models can be especially useful in precision medicine settings, where clinicians are interested in estimating the patient-level latent variables associated with an individual's current health state and its trajectory. Such models are often fit using batch Markov Chain Monte Carlo (MCMC). However, the slow speed of batch MCMC computation makes it difficult to implement in clin…
▽ More
Hierarchical Bayesian models can be especially useful in precision medicine settings, where clinicians are interested in estimating the patient-level latent variables associated with an individual's current health state and its trajectory. Such models are often fit using batch Markov Chain Monte Carlo (MCMC). However, the slow speed of batch MCMC computation makes it difficult to implement in clinical settings, where immediate latent variable estimates are often desired in response to new patient data. In this report, we discuss how importance sampling (IS) can instead be used to obtain fast, in-clinic estimates of patient-level latent variables. We apply IS to the hierarchical model proposed in Coley et al (2015) for predicting an individual's underlying prostate cancer state. We find that latent variable estimates via IS can typically be obtained in 1-10 seconds per person and have high agreement with estimates coming from longer-running batch MCMC methods. Alternative options for out-of-sample fitting and online updating are also discussed.
△ Less
Submitted 29 October, 2015;
originally announced October 2015.
-
A Bayesian Hierarchical Model for Prediction of Latent Health States from Multiple Data Sources with Application to Active Surveillance of Prostate Cancer
Authors:
R. Yates Coley,
Aaron J. Fisher,
Mufaddal Mamawala,
H. Ballentine Carter,
Kenneth J. Pienta,
Scott L. Zeger
Abstract:
In this article, we present a Bayesian hierarchical model for predicting a latent health state from longitudinal clinical measurements. Model development is motivated by the need to integrate multiple sources of data to improve clinical decisions about whether to remove or irradiate a patient's prostate cancer. Existing modeling approaches are extended to accommodate measurement error in cancer st…
▽ More
In this article, we present a Bayesian hierarchical model for predicting a latent health state from longitudinal clinical measurements. Model development is motivated by the need to integrate multiple sources of data to improve clinical decisions about whether to remove or irradiate a patient's prostate cancer. Existing modeling approaches are extended to accommodate measurement error in cancer state determinations based on biopsied tissue, clinical measurements possibly not missing at random, and informative partial observation of the true state. The proposed model enables estimation of whether an individual's underlying prostate cancer is aggressive, requiring surgery and/or radiation, or indolent, permitting continued surveillance. These individualized predictions can then be communicated to clinicians and patients to inform decision-making. We demonstrate the model with data from a cohort of low risk prostate cancer patients at Johns Hopkins University and assess predictive accuracy among a subset for whom true cancer state is observed. Simulation studies confirm model performance and explore the impact of adjusting for informative missingness on true state predictions. R code and simulated data available at https://github.com/rycoley/prediction-prostate-surveillance.
△ Less
Submitted 1 June, 2016; v1 submitted 29 August, 2015;
originally announced August 2015.
-
Hiding Symbols and Functions: New Metrics and Constructions for Information-Theoretic Security
Authors:
Flavio du Pin Calmon,
Muriel Médard,
Mayank Varia,
Ken R. Duffy,
Mark M. Christiansen,
Linda M. Zeger
Abstract:
We present information-theoretic definitions and results for analyzing symmetric-key encryption schemes beyond the perfect secrecy regime, i.e. when perfect secrecy is not attained. We adopt two lines of analysis, one based on lossless source coding, and another akin to rate-distortion theory. We start by presenting a new information-theoretic metric for security, called symbol secrecy, and derive…
▽ More
We present information-theoretic definitions and results for analyzing symmetric-key encryption schemes beyond the perfect secrecy regime, i.e. when perfect secrecy is not attained. We adopt two lines of analysis, one based on lossless source coding, and another akin to rate-distortion theory. We start by presenting a new information-theoretic metric for security, called symbol secrecy, and derive associated fundamental bounds. We then introduce list-source codes (LSCs), which are a general framework for mapping a key length (entropy) to a list size that an eavesdropper has to resolve in order to recover a secret message. We provide explicit constructions of LSCs, and demonstrate that, when the source is uniformly distributed, the highest level of symbol secrecy for a fixed key length can be achieved through a construction based on minimum-distance separable (MDS) codes. Using an analysis related to rate-distortion theory, we then show how symbol secrecy can be used to determine the probability that an eavesdropper correctly reconstructs functions of the original plaintext. We illustrate how these bounds can be applied to characterize security properties of symmetric-key encryption schemes, and, in particular, extend security claims based on symbol secrecy to a functional setting.
△ Less
Submitted 29 March, 2015;
originally announced March 2015.
-
Partially-Latent Class Models (pLCM) for Case-Control Studies of Childhood Pneumonia Etiology
Authors:
Zhenke Wu,
Maria Deloria-Knoll,
Laura L. Hammitt,
Scott L. Zeger
Abstract:
In population studies on the etiology of disease, one goal is the estimation of the fraction of cases attributable to each of several causes. For example, pneumonia is a clinical diagnosis of lung infection that may be caused by viral, bacterial, fungal, or other pathogens. The study of pneumonia etiology is challenging because directly sampling from the lung to identify the etiologic pathogen is…
▽ More
In population studies on the etiology of disease, one goal is the estimation of the fraction of cases attributable to each of several causes. For example, pneumonia is a clinical diagnosis of lung infection that may be caused by viral, bacterial, fungal, or other pathogens. The study of pneumonia etiology is challenging because directly sampling from the lung to identify the etiologic pathogen is not standard clinical practice in most settings. Instead, measurements from multiple peripheral specimens are made. This paper introduces the statistical methodology designed for estimating the population etiology distribution and the individual etiology probabilities in the Pneumonia Etiology Research for Child Health (PERCH) study of 9; 500 children for 7 sites around the world. We formulate the scientific problem in statistical terms as estimating the mixing weights and latent class indicators under a partially-latent class model (pLCM) that combines heterogeneous measurements with different error rates obtained from a case-control study. We introduce the pLCM as an extension of the latent class model. We also introduce graphical displays of the population data and inferred latent-class frequencies. The methods are tested with simulated data, and then applied to PERCH data. The paper closes with a brief description of extensions of the pLCM to the regression setting and to the case where conditional independence among the measures is relaxed.
△ Less
Submitted 21 November, 2014;
originally announced November 2014.
-
On Scalability of Wireless Networks: A Practical Primer for Large Scale Cooperation
Authors:
Linda Zeger,
Muriel Médard
Abstract:
An intuitive overview of the scalability of a variety of types of wireless networks is presented. Simple heuris- tic arguments are demonstrated here for scaling laws presented in other works, as well as for conditions not previously considered in the literature. Unicast and multicast messages, topology, hierarchy, and effects of reliability protocols are discussed. We show how two key factors, bot…
▽ More
An intuitive overview of the scalability of a variety of types of wireless networks is presented. Simple heuris- tic arguments are demonstrated here for scaling laws presented in other works, as well as for conditions not previously considered in the literature. Unicast and multicast messages, topology, hierarchy, and effects of reliability protocols are discussed. We show how two key factors, bottlenecks and erasures, can often domi- nate the network scaling behavior. Scaling of through- put or delay with the number of transmitting nodes, the number of receiving nodes, and the file size is described.
△ Less
Submitted 7 February, 2014;
originally announced February 2014.
-
Multi-Path TCP with Network Coding for Mobile Devices in Heterogeneous Networks
Authors:
Jason Cloud,
Flavio du Pin Calmon,
Weifei Zeng,
Giovanni Pau,
Linda Zeger,
Muriel Medard
Abstract:
Existing mobile devices have the capability to use multiple network technologies simultaneously to help increase performance; but they rarely, if at all, effectively use these technologies in parallel. We first present empirical data to help understand the mobile environment when three heterogeneous networks are available to the mobile device (i.e., a WiFi network, WiMax network, and an Iridium sa…
▽ More
Existing mobile devices have the capability to use multiple network technologies simultaneously to help increase performance; but they rarely, if at all, effectively use these technologies in parallel. We first present empirical data to help understand the mobile environment when three heterogeneous networks are available to the mobile device (i.e., a WiFi network, WiMax network, and an Iridium satellite network). We then propose a reliable, multi-path protocol called Multi-Path TCP with Network Coding (MPTCP/NC) that utilizes each of these networks in parallel. An analytical model is developed and a mean-field approximation is derived that gives an estimate of the protocol's achievable throughput. Finally, a comparison between MPTCP and MPTCP/NC is presented using both the empirical data and mean-field approximation. Our results show that network coding can provide users in mobile environments a higher quality of service by enabling the use of multiple network technologies and the capability to overcome packet losses due to lossy, wireless network connections.
△ Less
Submitted 10 June, 2013;
originally announced June 2013.
-
Lists that are smaller than their parts: A coding approach to tunable secrecy
Authors:
Flavio du Pin Calmon,
Muriel Médard,
Linda M. Zeger,
João Barros,
Mark M. Christiansen,
Ken. R. Duffy
Abstract:
We present a new information-theoretic definition and associated results, based on list decoding in a source coding setting. We begin by presenting list-source codes, which naturally map a key length (entropy) to list size. We then show that such codes can be analyzed in the context of a novel information-theoretic metric, ε-symbol secrecy, that encompasses both the one-time pad and traditional ra…
▽ More
We present a new information-theoretic definition and associated results, based on list decoding in a source coding setting. We begin by presenting list-source codes, which naturally map a key length (entropy) to list size. We then show that such codes can be analyzed in the context of a novel information-theoretic metric, ε-symbol secrecy, that encompasses both the one-time pad and traditional rate-based asymptotic metrics, but, like most cryptographic constructs, can be applied in non-asymptotic settings. We derive fundamental bounds for ε-symbol secrecy and demonstrate how these bounds can be achieved with MDS codes when the source is uniformly distributed. We discuss applications and implementation issues of our codes.
△ Less
Submitted 7 October, 2012;
originally announced October 2012.
-
Effects of MAC Approaches on Non-Monotonic Saturation with COPE - A Simple Case Study
Authors:
Jason Cloud,
Linda Zeger,
Muriel Médard
Abstract:
We construct a simple network model to provide insight into network design strategies. We show that the model can be used to address various approaches to network coding, MAC, and multi-packet reception so that their effects on network throughput can be evaluated. We consider several topology components which exhibit the same non-monotonic saturation behavior found within the Katti et. al. COPE ex…
▽ More
We construct a simple network model to provide insight into network design strategies. We show that the model can be used to address various approaches to network coding, MAC, and multi-packet reception so that their effects on network throughput can be evaluated. We consider several topology components which exhibit the same non-monotonic saturation behavior found within the Katti et. al. COPE experiments. We further show that fairness allocation by the MAC can seriously impact performance and cause this non-monotonic saturation. Using our model, we develop a MAC that provides monotonic saturation, higher saturation throughput gains and fairness among flows rather than nodes. The proposed model provides an estimate of the achievable gains for the cross-layer design of network coding, multi-packet reception, and MAC showing that super-additive throughput gains on the order of six times that of routing are possible.
△ Less
Submitted 11 August, 2011;
originally announced August 2011.
-
MAC Centered Cooperation - Synergistic Design of Network Coding, Multi-Packet Reception, and Improved Fairness to Increase Network Throughput
Authors:
Jason Cloud,
Linda Zeger,
Muriel Médard
Abstract:
We design a cross-layer approach to aid in develop- ing a cooperative solution using multi-packet reception (MPR), network coding (NC), and medium access (MAC). We construct a model for the behavior of the IEEE 802.11 MAC protocol and apply it to key small canonical topology components and their larger counterparts. The results obtained from this model match the available experimental results with…
▽ More
We design a cross-layer approach to aid in develop- ing a cooperative solution using multi-packet reception (MPR), network coding (NC), and medium access (MAC). We construct a model for the behavior of the IEEE 802.11 MAC protocol and apply it to key small canonical topology components and their larger counterparts. The results obtained from this model match the available experimental results with fidelity. Using this model, we show that fairness allocation by the IEEE 802.11 MAC can significantly impede performance; hence, we devise a new MAC that not only substantially improves throughput, but provides fairness to flows of information rather than to nodes. We show that cooperation between NC, MPR, and our new MAC achieves super-additive gains of up to 6.3 times that of routing with the standard IEEE 802.11 MAC. Furthermore, we extend the model to analyze our MAC's asymptotic and throughput behaviors as the number of nodes increases or the MPR capability is limited to only a single node. Finally, we show that although network performance is reduced under substantial asymmetry or limited implementation of MPR to a central node, there are some important practical cases, even under these conditions, where MPR, NC, and their combination provide significant gains.
△ Less
Submitted 19 July, 2011;
originally announced July 2011.
-
Speeding Multicast by Acknowledgment Reduction Technique (SMART)
Authors:
Arman Rezaee,
Linda Zeger,
Muriel Médard
Abstract:
We present a novel feedback protocol for wireless broadcast networks that utilize linear network coding. We consider transmission of packets from one source to many receivers over a single-hop broadcast erasure channel. Our method utilizes a predictive model to request feedback only when the probability that all receivers have completed decoding is significant. In addition, our proposed NACK-based…
▽ More
We present a novel feedback protocol for wireless broadcast networks that utilize linear network coding. We consider transmission of packets from one source to many receivers over a single-hop broadcast erasure channel. Our method utilizes a predictive model to request feedback only when the probability that all receivers have completed decoding is significant. In addition, our proposed NACK-based feedback mechanism enables all receivers to request, within a single time slot, the number of retransmissions needed for successful decoding. We present simulation results as well as analytical results that show the favorable scalability of our technique as the number of receivers, file size, and packet erasure probability increase. We also show the robustness of this scheme to uncertainty in the predictive model, including uncertainty in the number of receiving nodes and the packet erasure probability, as well as to losses of the feedback itself. Our scheme, SMART, is shown to perform nearly as well as an omniscient transmitter that requires no feedback. Furthermore, SMART, is shown to outperform current state of the art methods at any given erasure probability, file size, and numbers of receivers.
△ Less
Submitted 9 September, 2011; v1 submitted 14 April, 2011;
originally announced April 2011.
-
Co-Designing Multi-Packet Reception, Network Coding, and MAC Using a Simple Predictive Model
Authors:
Jason Cloud,
Linda Zeger,
Muriel Médard
Abstract:
We design a cross-layer approach to optimize the joint use of multi-packet reception and network coding, in order to relieve congestion. We construct a model for the behavior of the 802.11 MAC and apply it to several key canonical topology components and their extensions to any number of nodes. The results obtained from this model match the available experimental results, which are for routing and…
▽ More
We design a cross-layer approach to optimize the joint use of multi-packet reception and network coding, in order to relieve congestion. We construct a model for the behavior of the 802.11 MAC and apply it to several key canonical topology components and their extensions to any number of nodes. The results obtained from this model match the available experimental results, which are for routing and opportunistic network coding, with fidelity. Using this model, we show that fairness allocation by the MAC can seriously impact performance; hence, we devise a new MAC that not only substantially improves throughput relative to the current 802.11 MAC, but also provides fairness to flows of information rather than to nodes. We show that the proper combination of network coding, multi-packet reception, and our new MAC protocol achieves super-additive throughput gains of up to 6.3 times that of routing alone with the use of the standard 802.11 MAC. Finally, we extend the model to analyze the asymptotic behavior of our new MAC as the number of nodes increases.
△ Less
Submitted 30 January, 2011;
originally announced January 2011.
-
Theoretical Study of Cubic Structures Based on Fullerene Carbon Clusters: C$_{28}$C and (C$_{28})_{2}$
Authors:
Linda M. Zeger,
Yu-Min Juan,
Efthimios Kaxiras,
A. Antonelli
Abstract:
We study a new hypothetical form of solid carbon \csc, with a unit cell which is composed of the \cs \ fullerene cluster and an additional single carbon atom arranged in the zincblende structure. Using {\it ab initio} calculations, we show that this new form of solid carbon has lower energy than hyperdiamond, the recently proposed form composed of \cs \ units in the diamond structure. To understan…
▽ More
We study a new hypothetical form of solid carbon \csc, with a unit cell which is composed of the \cs \ fullerene cluster and an additional single carbon atom arranged in the zincblende structure. Using {\it ab initio} calculations, we show that this new form of solid carbon has lower energy than hyperdiamond, the recently proposed form composed of \cs \ units in the diamond structure. To understand the bonding character of of these cluster-based solids, we analyze the electronic structure of \csc \ and of hyperdiamond and compare them to the electronic states of crystalline cubic diamond.
△ Less
Submitted 13 February, 1995;
originally announced February 1995.