Skip to main content

Showing 1–18 of 18 results for author: Williams, M R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.21528  [pdf, other

    stat.ML cs.CR cs.LG

    Bayesian Pseudo Posterior Mechanism for Differentially Private Machine Learning

    Authors: Robert Chew, Matthew R. Williams, Elan A. Segarra, Alexander J. Preiss, Amanda Konet, Terrance D. Savitsky

    Abstract: Differential privacy (DP) is becoming increasingly important for deployed machine learning applications because it provides strong guarantees for protecting the privacy of individuals whose data is used to train models. However, DP mechanisms commonly used in machine learning tend to struggle on many real world distributions, including highly imbalanced or small labeled training sets. In this work… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  2. arXiv:2503.07240  [pdf, other

    stat.ME stat.AP

    Representative dietary behavior patterns and associations with cardiometabolic outcomes in Puerto Rico using a Bayesian latent class analysis for non-probability samples

    Authors: Stephanie M. Wu, Abrania Marrero, Matthew R. Williams, Terrance D. Savitsky, Josiemer Mattei, José Rodríguez-Orengo, Briana J. K. Stephenson

    Abstract: There is limited understanding of how dietary behaviors cluster together and influence cardiometabolic health at a population level in Puerto Rico. Data availability is scarce, particularly outside of urban areas, and is often limited to non-probability sample (NPS) data where sample inclusion mechanisms are unknown. In order to generalize results to the broader Puerto Rican population, adjustment… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 40 pages, 7 tables, 14 figures

  3. arXiv:2502.09524  [pdf, other

    stat.ME

    Thresholding Nonprobability Units in Combined Data for Efficient Domain Estimation

    Authors: Terrance D. Savitsky, Matthew R. Williams, Julie Gerrshunskaya, Vladislav Beresovsky

    Abstract: Quasi-randomization approaches estimate latent participation probabilities for units from a nonprobability / convenience sample. Estimation of participation probabilities for convenience units allows their combination with units from the randomized survey sample to form a survey weighted domain estimate. One leverages convenience units for domain estimation under the expectation that estimation pr… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: 20 pages, 3 figures

  4. arXiv:2310.01575  [pdf, other

    stat.ME stat.AP

    Derivation of outcome-dependent dietary patterns for low-income women obtained from survey data using a Supervised Weighted Overfitted Latent Class Analysis

    Authors: Stephanie M. Wu, Matthew R. Williams, Terrance D. Savitsky, Briana J. K. Stephenson

    Abstract: Poor diet quality is a key modifiable risk factor for hypertension and disproportionately impacts low-income women. \sw{Analyzing diet-driven hypertensive outcomes in this demographic is challenging due to the complexity of dietary data and selection bias when the data come from surveys, a main data source for understanding diet-disease relationships in understudied populations. Supervised Bayesia… ▽ More

    Submitted 28 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: 16 pages, 8 tables, 7 figures

  5. arXiv:2308.06845  [pdf, other

    stat.CO stat.AP

    csSampling: An R Package for Bayesian Models for Complex Survey Data

    Authors: Ryan Hornby, Matthew R. Williams, Terrance D. Savitsky, Mahmoud Elkasabi

    Abstract: We present csSampling, an R package for estimation of Bayesian models for data collected from complex survey samples. csSampling combines functionality from the probabilistic programming language Stan (via the rstan and brms R packages) and the handling of complex survey data from the survey R package. Under this approach, the user creates a survey-weighted model in brms or provides a custom weigh… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

    Comments: 22 pages, 5 figures

  6. arXiv:2208.14541  [pdf, other

    stat.ME stat.CO

    Methods for Combining Probability and Nonprobability Samples Under Unknown Overlaps

    Authors: Terrance D. Savitsky, Matthew R. Williams, Julie Gershunskaya, Vladislav Beresovsky, Nels G. Johnson

    Abstract: Nonprobability (convenience) samples are increasingly sought to reduce the estimation variance for one or more population variables of interest that are estimated using a randomized survey (reference) sample by increasing the effective sample size. Estimation of a population quantity derived from a convenience sample will typically result in bias since the distribution of variables of interest in… ▽ More

    Submitted 9 June, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: 37 pages, 11 figures. arXiv admin note: substantial text overlap with arXiv:2204.02271

  7. arXiv:2205.05003  [pdf, other

    stat.ME

    Mechanisms for Global Differential Privacy under Bayesian Data Synthesis

    Authors: Jingchen Hu, Matthew R. Williams, Terrance D. Savitsky

    Abstract: This paper introduces a new method that embeds any Bayesian model used to generate synthetic data and converts it into a differentially private (DP) mechanism. We propose an alteration of the model synthesizer to utilize a censored likelihood that induces upper and lower bounds of [$\exp(-ε/ 2), \exp(ε/ 2)$], where $ε$ denotes the level of the DP guarantee. This censoring mechanism equipped with a… ▽ More

    Submitted 3 August, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

  8. arXiv:2204.02271   

    stat.ME

    Methods for Combining Probability and Nonprobability Samples Under Unknown Overlaps

    Authors: Terrance D. Savitsky, Matthew R. Williams, Julie Gershunskaya, Vladislav Beresovsky, Nels G. Johnson

    Abstract: Nonprobability (convenience) samples are increasingly sought to stabilize estimations for one or more population variables of interest that are performed using a randomized survey (reference) sample by increasing the effective sample size. Estimation of a population quantity derived from a convenience sample will typically result in bias since the distribution of variables of interest in the conve… ▽ More

    Submitted 9 June, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Duplication with arXiv.2208.14541

  9. arXiv:2101.06188  [pdf, other

    stat.ME stat.AP

    Private Tabular Survey Data Products through Synthetic Microdata Generation

    Authors: Jingchen Hu, Terrance D. Savitsky, Matthew R. Williams

    Abstract: We propose two synthetic microdata approaches to generate private tabular survey data products for public release. We adapt a pseudo posterior mechanism that downweights by-record likelihood contributions with weights $\in [0,1]$ based on their identification disclosure risks to producing tabular products for survey data. Our method applied to an observed survey database achieves an asymptotic glo… ▽ More

    Submitted 3 March, 2022; v1 submitted 15 January, 2021; originally announced January 2021.

  10. arXiv:2006.01230  [pdf, other

    stat.ME stat.AP

    Re-weighting of Vector-weighted Mechanisms for Utility Maximization under Differential Privacy

    Authors: Terrance D. Savitsky, Jingchen Hu, Matthew R. Williams

    Abstract: We address practical implementation of a risk-weighted pseudo posterior synthesizer for microdata dissemination with a new re-weighting strategy that maximizes utility of released synthetic data under at any level of formal privacy guarantee. Our re-weighting strategy applies to any vector-weighted pseudo posterior mechanism under which a vector of observation-indexed weights are used to downweigh… ▽ More

    Submitted 28 April, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

  11. arXiv:2004.06191  [pdf, other

    stat.ME

    Pseudo Bayesian Estimation of One-way ANOVA Model in Complex Surveys

    Authors: Terrance D. Savitsky, Matthew R. Williams, Sanvesh Srivastava

    Abstract: We devise survey-weighted pseudo posterior distribution estimators under two-stage informative sampling of both primary clusters and secondary nested units for a one-way analysis of variance (ANOVA) population generating model as a simple canonical case where population model random effects are defined to be coincident with the primary clusters, for example student performance based on a survey of… ▽ More

    Submitted 12 May, 2023; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: 45 pages, 12 figures

    MSC Class: 62D05; 62F15; 62J05

  12. arXiv:1909.11796  [pdf, other

    stat.ME

    Bayesian Pseudo Posterior Mechanism under Asymptotic Differential Privacy

    Authors: Terrance D. Savitsky, Matthew R. Williams, Jingchen Hu

    Abstract: We propose a Bayesian pseudo posterior mechanism to generate record-level synthetic databases equipped with an $(ε,δ)-$ probabilistic differential privacy (pDP) guarantee, where $δ$ denotes the probability that any observed database exceeds $ε$. The pseudo posterior mechanism employs a data record-indexed, risk-based weight vector with weight values $\in [0, 1]$ that surgically downweight the like… ▽ More

    Submitted 13 August, 2021; v1 submitted 25 September, 2019; originally announced September 2019.

    Comments: 35 pages, 7 figures, 2 tables

  13. arXiv:1908.07639  [pdf, other

    stat.ME stat.AP

    Risk-Efficient Bayesian Data Synthesis for Privacy Protection

    Authors: Jingchen Hu, Terrance D. Savitsky, Matthew R. Williams

    Abstract: Statistical agencies utilize models to synthesize respondent-level data for release to the public for privacy protection. In this work, we efficiently induce privacy protection into any Bayesian synthesis model by employing a pseudo likelihood that exponentiates each likelihood contribution by an observation record-indexed weight in [0, 1], defined to be inversely proportional to the identificatio… ▽ More

    Submitted 8 February, 2021; v1 submitted 20 August, 2019; originally announced August 2019.

    Journal ref: Journal of Survey Statistics and Methodology, 2021

  14. arXiv:1904.07680  [pdf, other

    stat.ME

    Pseudo Bayesian Mixed Models under Informative Sampling

    Authors: Terrance D. Savitsky, Matthew R. Williams

    Abstract: When random effects are correlated with sample design variables, the usual approach of employing individual survey weights (constructed to be inversely proportional to the unit survey inclusion probabilities) to form a pseudo-likelihood no longer produces asymptotically unbiased inference. We construct a weight-exponentiated formulation for the random effects distribution that achieves unbiased in… ▽ More

    Submitted 24 August, 2021; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: 31 pages, 6 figures, 2 table

    MSC Class: 62F15; 62D05

  15. arXiv:1901.03791  [pdf, other

    stat.CO stat.ME

    Optimization of Survey Weights under a Large Number of Conflicting Constraints

    Authors: Matthew R. Williams, Terrance D. Savitsky

    Abstract: In the analysis of survey data, sampling weights are needed for consistent estimation of the population. However, the original inverse probability weights from the survey sample design are typically modified to account for non-response, to increase efficiency by incorporating auxiliary population information, and to reduce the variability in estimates due to extreme weights. It is often the case t… ▽ More

    Submitted 11 January, 2019; originally announced January 2019.

    Comments: 23 pages, 2 figures, 3 tables

  16. Bayesian Uncertainty Estimation Under Complex Sampling

    Authors: Matthew R. Williams, Terrance D. Savitsky

    Abstract: Social and economic studies are often implemented as complex survey designs. For example, multistage, unequal probability sampling designs utilized by federal statistical agencies are typically constructed to maximize the efficiency of the target domain level estimator (e.g., indexed by geographic area) within cost constraints for survey administration. Such designs may induce dependence between t… ▽ More

    Submitted 29 July, 2019; v1 submitted 31 July, 2018; originally announced July 2018.

    Comments: 45 pages, 4 figures, 1 table

    MSC Class: 62D05; 62F15; 62F12

    Journal ref: International Statistical Review 2020

  17. Bayesian Estimation Under Informative Sampling with Unattenuated Dependence

    Authors: Matthew R. Williams, Terrance D. Savitsky

    Abstract: An informative sampling design leads to unit inclusion probabilities that are correlated with the response variable of interest. However, multistage sampling designs may also induce higher order dependencies, which are typically ignored in the literature when establishing consistency of estimators for survey data under a condition requiring asymptotic independence among the unit inclusion probabil… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

    Comments: 35 pages, 5 figures. arXiv admin note: text overlap with arXiv:1710.10102

    Journal ref: Bayesian Anal., advance publication, 4 January 2019

  18. Bayesian Pairwise Estimation Under Dependent Informative Sampling

    Authors: Matthew R. Williams, Terrance D. Savitsky

    Abstract: An informative sampling design leads to the selection of units whose inclusion probabilities are correlated with the response variable of interest. Model inference performed on the resulting observed sample will be biased for the population generative model. One approach that produces asymptotically unbiased inference employs marginal inclusion probabilities to form sampling weights used to expone… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.

    Comments: 35 pages, 9 figures

    MSC Class: 62D05; 62G20

    Journal ref: Electron. J. Statist. Volume 12, Number 1 (2018), 1631-1661