Skip to main content

Showing 1–17 of 17 results for author: Jackson, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2408.02513  [pdf, other

    stat.ME

    The appeal of the gamma family distribution to protect the confidentiality of contingency tables

    Authors: James Jackson, Robin Mitra, Brian Francis, Iain Dove

    Abstract: Administrative databases, such as the English School Census (ESC), are rich sources of information that are potentially useful for researchers. For such data sources to be made available, however, strict guarantees of privacy would be required. To achieve this, synthetic data methods can be used. Such methods, when protecting the confidentiality of tabular data (contingency tables), often utilise… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  2. arXiv:2407.18572  [pdf, other

    stat.AP math.ST stat.OT

    Bernoulli amputation

    Authors: Marius Hofert, James Jackson, Niels Hagenbuch

    Abstract: An approach to amputation, the process of introducing missing values to a complete dataset, is presented. It allows to construct missingness indicators in a flexible and principled way via copulas and Bernoulli margins and to incorporate dependence in missingness patterns. Besides more classical missingness models such as missing completely at random, missing at random, and missing not at random,… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    MSC Class: 62D10; 62H99; 65C60

  3. arXiv:2407.04970  [pdf, other

    cs.LG stat.ML

    Idiographic Personality Gaussian Process for Psychological Assessment

    Authors: Yehu Chen, Muchen Xi, Jacob Montgomery, Joshua Jackson, Roman Garnett

    Abstract: We develop a novel measurement framework based on a Gaussian process coregionalization model to address a long-lasting debate in psychometrics: whether psychological features like personality share a common structure across the population, vary uniquely for individuals, or some combination. We propose the idiographic personality Gaussian process (IPGP) framework, an intermediate model that accommo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 9 pages, 4 figures

  4. arXiv:2407.00417  [pdf, other

    cs.CR stat.ME

    Obtaining $(ε,δ)$-differential privacy guarantees when using a Poisson mechanism to synthesize contingency tables

    Authors: James Jackson, Robin Mitra, Brian Francis, Iain Dove

    Abstract: We show that differential privacy type guarantees can be obtained when using a Poisson synthesis mechanism to protect counts in contingency tables. Specifically, we show how to obtain $(ε, δ)$-probabilistic differential privacy guarantees via the Poisson distribution's cumulative distribution function. We demonstrate this empirically with the synthesis of an administrative-type confidential databa… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  5. arXiv:2307.02650  [pdf, other

    stat.ME stat.AP stat.ML

    A Complete Characterisation of Structured Missingness

    Authors: James Jackson, Robin Mitra, Niels Hagenbuch, Sarah McGough, Chris Harbron

    Abstract: Our capacity to process large complex data sources is ever-increasing, providing us with new, important applied research questions to address, such as how to handle missing values in large-scale databases. Mitra et al. (2023) noted the phenomenon of Structured Missingness (SM), which is where missingness has an underlying structure. Existing taxonomies for defining missingness mechanisms typically… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  6. arXiv:2207.00530  [pdf

    stat.ME

    The Target Study: A Conceptual Model and Framework for Measuring Disparity

    Authors: John W. Jackson, Yea-Jen Hsu, Raquel C. Greer, Romsai T. Boonyasai, Chanelle J. Howe

    Abstract: We present a conceptual model to measure disparity--the target study--where social groups may be similarly situated (i.e., balanced) on allowable covariates. Our model, based on a sampling design, does not intervene to assign social group membership or alter allowable covariates. To address non-random sample selection, we extend our model to generalize or transport disparity or to assess disparity… ▽ More

    Submitted 11 January, 2025; v1 submitted 1 July, 2022; originally announced July 2022.

    Comments: Link to sample code added. Minor clarification of Design 4c. Forthcoming in Sociological Methods & Research

  7. arXiv:2205.05993  [pdf, other

    stat.ME

    On integrating the number of synthetic data sets $m$ into the 'a priori' synthesis approach

    Authors: James Edward Jackson, Robin Mitra, Brian Joseph Francis, Iain Dove

    Abstract: Until recently, multiple synthetic data sets were always released to analysts, to allow valid inferences to be obtained. However, under certain conditions - including when saturated count models are used to synthesize categorical data - single imputation ($m=1$) is sufficient. Nevertheless, increasing $m$ causes utility to improve, but at the expense of higher risk, an example of the risk-utility… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

  8. arXiv:2107.08062  [pdf, other

    stat.ME

    Using saturated count models for user-friendly synthesis of categorical data

    Authors: James Edward Jackson, Robin Mitra, Brian Joseph Francis, Iain Dove

    Abstract: Over the past three decades, synthetic data methods for statistical disclosure control have continually evolved, but mainly within the domain of survey data sets. There are certain characteristics of administrative databases, such as their size, which present challenges from a synthesis perspective and require special attention. This paper, through the fitting of saturated count models, presents a… ▽ More

    Submitted 12 May, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: 37 pages, 6 figures

  9. Increasing the efficiency of Sequential Monte Carlo samplers through the use of approximately optimal L-kernels

    Authors: Peter L Green, Robert E Moore, Ryan J Jackson, Jinglai Li, Simon Maskell

    Abstract: By facilitating the generation of samples from arbitrary probability distributions, Markov Chain Monte Carlo (MCMC) is, arguably, \emph{the} tool for the evaluation of Bayesian inference problems that yield non-standard posterior distributions. In recent years, however, it has become apparent that Sequential Monte Carlo (SMC) samplers have the potential to outperform MCMC in a number of ways. SMC… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

    Comments: 29 pages, 14 figures

  10. arXiv:2002.03387  [pdf

    cs.HC cs.CY cs.LG stat.ML

    Data Vision: Learning to See Through Algorithmic Abstraction

    Authors: Samir Passi, Steven J. Jackson

    Abstract: Learning to see through data is central to contemporary forms of algorithmic knowledge production. While often represented as a mechanical application of rules, making algorithms work with data requires a great deal of situated work. This paper examines how the often-divergent demands of mechanization and discretion manifest in data analytic learning environments. Drawing on research in CSCW and t… ▽ More

    Submitted 9 February, 2020; originally announced February 2020.

    Journal ref: In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, New York, NY, USA, 2436-2447

  11. arXiv:1909.10060  [pdf

    stat.ME econ.EM

    Meaningful causal decompositions in health equity research: definition, identification, and estimation through a weighting framework

    Authors: John W. Jackson

    Abstract: Causal decomposition analyses can help build the evidence base for interventions that address health disparities (inequities). They ask how disparities in outcomes may change under hypothetical intervention. Through study design and assumptions, they can rule out alternate explanations such as confounding, selection-bias, and measurement error, thereby identifying potential targets for interventio… ▽ More

    Submitted 15 September, 2020; v1 submitted 22 September, 2019; originally announced September 2019.

    Comments: 39 pages, 1 Table and 1 Figure in the main text, 1 Figure in the Supplement. Notational system changed for clarity

  12. arXiv:1906.07552  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

    Authors: Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

    Abstract: Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available. Both individual sources and mixing filters need to be estimated. In addition, a mixture may contain non-stationary noise which is unseen in the training set. We propose a synt… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: 7 pages. Accepted by IJCAI 2019

    Journal ref: International Joint Conference on Artificial Intelligence (IJCAI), 2019, pp. 2747-2753

  13. arXiv:1906.00431  [pdf, other

    cs.LG cs.AI stat.ML

    An Empirical Study on Hyperparameters and their Interdependence for RL Generalization

    Authors: Xingyou Song, Yilun Du, Jacob Jackson

    Abstract: Recent results in Reinforcement Learning (RL) have shown that agents with limited training environments are susceptible to a large amount of overfitting across many domains. A key challenge for RL generalization is to quantitatively explain the effects of changing parameters on testing performance. Such parameters include architecture, regularization, and RL-dependent variables such as discount fa… ▽ More

    Submitted 2 June, 2019; originally announced June 2019.

    Comments: Published in ICML 2019 Workshop "Understanding and Improving Generalization in Deep Learning"

  14. arXiv:1902.02336  [pdf, other

    cs.LG stat.ML

    Semi-Supervised Learning by Label Gradient Alignment

    Authors: Jacob Jackson, John Schulman

    Abstract: We present label gradient alignment, a novel algorithm for semi-supervised learning which imputes labels for the unlabeled data and trains on the imputed labels. We define a semantically meaningful distance metric on the input space by mapping a point (x, y) to the gradient of the model at (x, y). We then formulate an optimization problem whose objective is to minimize the distance between the lab… ▽ More

    Submitted 6 February, 2019; originally announced February 2019.

  15. arXiv:1812.03940  [pdf

    stat.OT

    Rapid Prototyping Model for Healthcare Alternative Payment Models: Replicating the Federally Qualified Health Center Advanced Primary Care Practice Demonstration

    Authors: Jarrod Olson, Amir Rahimi, Po Hsu Allen Chen, J. Elizabeth Jackson, Tyler Coy, Adrienne Cocci, Nancy McMillan, Jeff Geppert

    Abstract: Innovation in healthcare payment and service delivery utilizes high cost, high risk pilots paired with traditional program evaluations. Decision-makers are unable to reliably forecast the impacts of pilot interventions in this complex system, complicating the feasibility assessment of proposed healthcare models. We developed and validated a Discrete Event Simulation (DES) model of primary care for… ▽ More

    Submitted 10 December, 2018; originally announced December 2018.

    Comments: Working paper

    MSC Class: 91C-02

  16. arXiv:1703.05899  [pdf

    stat.ME

    Decomposition analysis to identify intervention targets for reducing disparities

    Authors: John W. Jackson, Tyler J. VanderWeele

    Abstract: There has been considerable interest in using decomposition methods in epidemiology (mediation analysis) and economics (Oaxaca-Blinder decomposition) to understand how health disparities arise and how they might change upon intervention. It has not been clear when estimates from the Oaxaca-Blinder decomposition can be interpreted causally because its implementation does not explicitly address pote… ▽ More

    Submitted 17 March, 2017; originally announced March 2017.

    Comments: John Jackson is Assistant Professor in the Departments of Epidemiology and Mental Health at the Johns Hopkins Bloomberg School of Public Health and Tyler VanderWeele is Professor in the Departments of Epidemiology and Biostatistics at the Harvard T.H. Chan School of Public Health. Correspondence to [email protected]

  17. A two-state mixed hidden Markov model for risky teenage driving behavior

    Authors: John C. Jackson, Paul S. Albert, Zhiwei Zhang

    Abstract: This paper proposes a joint model for longitudinal binary and count outcomes. We apply the model to a unique longitudinal study of teen driving where risky driving behavior and the occurrence of crashes or near crashes are measured prospectively over the first 18 months of licensure. Of scientific interest is relating the two processes and predicting crash and near crash outcomes. We propose a two… ▽ More

    Submitted 16 September, 2015; originally announced September 2015.

    Comments: Published at http://dx.doi.org/10.1214/14-AOAS765 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS765

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 2, 849-865