Search | arXiv e-print repository

Slowly Scaling Per-Record Differential Privacy

Authors: Brian Finley, Anthony M Caruso, Justin C Doty, Ashwin Machanavajjhala, Mikaela R Meyer, David Pujol, William Sexton, Zachary Terner

Abstract: We develop formal privacy mechanisms for releasing statistics from data with many outlying values, such as income data. These mechanisms ensure that a per-record differential privacy guarantee degrades slowly in the protected records' influence on the statistics being released. Formal privacy mechanisms generally add randomness, or "noise," to published statistics. If a noisy statistic's distrib… ▽ More We develop formal privacy mechanisms for releasing statistics from data with many outlying values, such as income data. These mechanisms ensure that a per-record differential privacy guarantee degrades slowly in the protected records' influence on the statistics being released. Formal privacy mechanisms generally add randomness, or "noise," to published statistics. If a noisy statistic's distribution changes little with the addition or deletion of a single record in the underlying dataset, an attacker looking at this statistic will find it plausible that any particular record was present or absent, preserving the records' privacy. More influential records -- those whose addition or deletion would change the statistics' distribution more -- typically suffer greater privacy loss. The per-record differential privacy framework quantifies these record-specific privacy guarantees, but existing mechanisms let these guarantees degrade rapidly (linearly or quadratically) with influence. While this may be acceptable in cases with some moderately influential records, it results in unacceptably high privacy losses when records' influence varies widely, as is common in economic data. We develop mechanisms with privacy guarantees that instead degrade as slowly as logarithmically with influence. These mechanisms allow for the accurate, unbiased release of statistics, while providing meaningful protection for highly influential records. As an example, we consider the private release of sums of unbounded establishment data such as payroll, where our mechanisms extend meaningful privacy protection even to very large establishments. We evaluate these mechanisms empirically and demonstrate their utility. △ Less

Submitted 2 May, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

Comments: This version fixes a mistaken variance formula in the first column of Table 3 and updates Figure 1 to use this variance formula

arXiv:2406.08168 [pdf, ps, other]

Global Tests for Smoothed Functions in Mean Field Variational Additive Models

Authors: Mark J. Meyer, Junyi Wei

Abstract: Variational regression methods are an increasingly popular tool for their efficient estimation of complex. Given the mixed model representation of penalized effects, additive regression models with smoothed effects and scalar-on-function regression models can be fit relatively efficiently in a variational framework. However, inferential procedures for smoothed and functional effects in such a cont… ▽ More Variational regression methods are an increasingly popular tool for their efficient estimation of complex. Given the mixed model representation of penalized effects, additive regression models with smoothed effects and scalar-on-function regression models can be fit relatively efficiently in a variational framework. However, inferential procedures for smoothed and functional effects in such a context is limited. We demonstrate that by using the Mean Field Variational Bayesian (MFVB) approximation to the additive model and the subsequent Coordinate Ascent Variational Inference (CAVI) algorithm, we can obtain a form of the estimated effects required of a Frequentist test for semiparametric curves. We establish MFVB approximations and CAVI algorithms for both Gaussian and binary additive models with an arbitrary number of smoothed and functional effects. We then derive a global testing framework for smoothed and functional effects. Our empirical study demonstrates that the test maintains good Frequentist properties in the variational framework and can be used to directly test results from a converged, MFVB approximation and CAVI algorithm. We illustrate the applicability of this approach in a wide range of data illustrations. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2311.18053 [pdf, other]

On Non- and Weakly-Informative Priors for the Conway-Maxwell-Poisson (COM-Poisson) Distribution

Authors: Mark J. Meyer, Amia Graye, Kimberly F. Sellers

Abstract: Previous Bayesian evaluations of the Conway-Maxwell-Poisson (COM-Poisson) distribution have little discussion of non- and weakly-informative priors for the model. While only considering priors with such limited information restricts potential analyses, these priors serve an important first step in the modeling process and are useful when performing sensitivity analyses. We develop and derive sever… ▽ More Previous Bayesian evaluations of the Conway-Maxwell-Poisson (COM-Poisson) distribution have little discussion of non- and weakly-informative priors for the model. While only considering priors with such limited information restricts potential analyses, these priors serve an important first step in the modeling process and are useful when performing sensitivity analyses. We develop and derive several weakly- and non-informative priors using both the established conjugate prior and Jeffreys' prior. Our evaluation of each prior involves an empirical study under varying dispersion types and sample sizes. In general, we find the weakly informative priors tend to perform better than the non-informative priors. We also consider several data examples for illustration and provide code for implementation of each resulting posterior. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.09961 [pdf, other]

Scan statistics for the detection of anomalies in M-dependent random fields with applications to image data

Authors: Claudia Kirch, Philipp Klein, Marco Meyer

Abstract: Anomaly detection in random fields is an important problem in many applications including the detection of cancerous cells in medicine, obstacles in autonomous driving and cracks in the construction material of buildings. Such anomalies are often visible as areas with different expected values compared to the background noise. Scan statistics based on local means have the potential to detect such… ▽ More Anomaly detection in random fields is an important problem in many applications including the detection of cancerous cells in medicine, obstacles in autonomous driving and cracks in the construction material of buildings. Such anomalies are often visible as areas with different expected values compared to the background noise. Scan statistics based on local means have the potential to detect such local anomalies by enhancing relevant features. We derive limit theorems for a general class of such statistics over M-dependent random fields of arbitrary but fixed dimension. By allowing for a variety of combinations and contrasts of sample means over differently-shaped local windows, this yields a flexible class of scan statistics that can be tailored to the particular application of interest. The latter is demonstrated for crack detection in 2D-images of different types of concrete. Together with a simulation study this indicates the potential of the proposed methodology for the detection of anomalies in a variety of situations. △ Less

Submitted 16 November, 2023; originally announced November 2023.

MSC Class: 62G10; 62P30; 60F05

arXiv:2306.14761 [pdf, ps, other]

Doubly ranked tests of location for grouped functional data

Authors: Mark J. Meyer

Abstract: Nonparametric tests for functional data are a challenging class of tests to work with because of the potentially high dimensional nature of the data. One of the main challenges for considering rank-based tests, like the Mann-Whitney or Wilcoxon Rank Sum tests (MWW), is that the unit of observation is typically a curve. Thus any rank-based test must consider ways of ranking curves. While several pr… ▽ More Nonparametric tests for functional data are a challenging class of tests to work with because of the potentially high dimensional nature of the data. One of the main challenges for considering rank-based tests, like the Mann-Whitney or Wilcoxon Rank Sum tests (MWW), is that the unit of observation is typically a curve. Thus any rank-based test must consider ways of ranking curves. While several procedures, including depth-based methods, have recently been used to create scores for rank-based tests, these scores are not constructed under the null and often introduce additional, uncontrolled for variability. We therefore reconsider the problem of rank-based tests for functional data and develop an alternative approach that incorporates the null hypothesis throughout. Our approach first ranks realizations from the curves at each measurement occurrence, then calculates a summary statistic for the ranks of each subject, and finally re-ranks the summary statistic in a procedure we refer to as a doubly ranked test. We propose two summaries for the middle step: a sufficient statistic and the average rank. As we demonstrate, doubly rank tests are more powerful while maintaining ideal type I error in the two sample, MWW setting. We also extend our framework to more than two samples, developing a Kruskal-Wallis test for functional data which exhibits good test characteristics as well. Finally, we illustrate the use of doubly ranked tests in functional data contexts from material science, climatology, and public health policy. △ Less

Submitted 11 July, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2206.04012 [pdf, ps, other]

Model Selection in Variational Mixed Effects Models

Authors: Mark J. Meyer, Selina Carter, Elizabeth J. Malloy

Abstract: Variational inference is an alternative estimation technique for Bayesian models. Recent work shows that variational methods provide consistent estimation via efficient, deterministic algorithms. Other tools, such as model selection using variational AICs (VAIC) have been developed and studied for the linear regression case. While mixed effects models have enjoyed some study in the variational con… ▽ More Variational inference is an alternative estimation technique for Bayesian models. Recent work shows that variational methods provide consistent estimation via efficient, deterministic algorithms. Other tools, such as model selection using variational AICs (VAIC) have been developed and studied for the linear regression case. While mixed effects models have enjoyed some study in the variational context, tools for model selection are lacking. One important feature of model selection in mixed effects models, particularly longitudinal models, is the selection of the random effects which in turn determine the covariance structure for the repeatedly sampled outcome. To address this, we derive a VAIC specifically for variational mixed effects (VME) models. We also implement a parameter-efficient VME as part of our study which reduces any general random effects structure down to a single subject-specific score. This model accommodates a wide range of random effect structures including random intercept and slope models as well as random functional effects. Our VAIC can model and perform selection on a variety of VME models including more classic longitudinal models as well as longitudinal scalar-on-function regression. As we demonstrate empirically, our VAIC performs well in discriminating between correctly and incorrectly specified random effects structures. Finally, we illustrate the use of VAICs for VMEs on two datasets: a study of lead levels in children and a study of diffusion tensor imaging. △ Less

Submitted 31 July, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

arXiv:2205.13505 [pdf, other]

doi 10.1145/3531146.3533104

Flipping the Script on Criminal Justice Risk Assessment: An actuarial model for assessing the risk the federal sentencing system poses to defendants

Authors: Mikaela Meyer, Aaron Horowitz, Erica Marshall, Kristian Lum

Abstract: In the criminal justice system, algorithmic risk assessment instruments are used to predict the risk a defendant poses to society; examples include the risk of recidivating or the risk of failing to appear at future court dates. However, defendants are also at risk of harm from the criminal justice system. To date, there exists no risk assessment instrument that considers the risk the system poses… ▽ More In the criminal justice system, algorithmic risk assessment instruments are used to predict the risk a defendant poses to society; examples include the risk of recidivating or the risk of failing to appear at future court dates. However, defendants are also at risk of harm from the criminal justice system. To date, there exists no risk assessment instrument that considers the risk the system poses to the individual. We develop a risk assessment instrument that "flips the script." Using data about U.S. federal sentencing decisions, we build a risk assessment instrument that predicts the likelihood an individual will receive an especially lengthy sentence given factors that should be legally irrelevant to the sentencing decision. To do this, we develop a two-stage modeling approach. Our first-stage model is used to determine which sentences were "especially lengthy." We then use a second-stage model to predict the defendant's risk of receiving a sentence that is flagged as especially lengthy given factors that should be legally irrelevant. The factors that should be legally irrelevant include, for example, race, court location, and other socio-demographic information about the defendant. Our instrument achieves comparable predictive accuracy to risk assessment instruments used in pretrial and parole contexts. We discuss the limitations of our modeling approach and use the opportunity to highlight how traditional risk assessment instruments in various criminal justice settings also suffer from many of the same limitations and embedded value systems of their creators. △ Less

Submitted 13 July, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Conference on Fairness, Accountability, and Transparency (FAccT 2022)

arXiv:2108.04096 [pdf, ps, other]

doi 10.1007/s12561-023-09368-8

Bayesian Analysis of Multivariate Matched Proportions with Sparse Response

Authors: Mark J. Meyer, Haobo Cheng, Katherine Hobbs Knutson

Abstract: Multivariate matched proportions (MMP) data appears in a variety of contexts including post-market surveillance of adverse events in pharmaceuticals, disease classification, and agreement between care providers. It consists of multiple sets of paired binary measurements taken on the same subject. While recent work proposes non-Bayesian methods to address the complexities of MMP data, the issue of… ▽ More Multivariate matched proportions (MMP) data appears in a variety of contexts including post-market surveillance of adverse events in pharmaceuticals, disease classification, and agreement between care providers. It consists of multiple sets of paired binary measurements taken on the same subject. While recent work proposes non-Bayesian methods to address the complexities of MMP data, the issue of sparse response, where no or very few "yes" responses are recorded for one or more sets, is unaddressed. The presence of sparse response sets results in underestimates of variance, loss of coverage, and lowered power in existing methods. Bayesian methods have not previously been considered for MMP data but provide a useful framework when sparse responses are present. In particular, the Bayesian probit model provides an elegant solution to the problem of variance underestimation. We examine three approaches built on that model: a naive analysis with flat priors, a penalized analysis using half-Cauchy priors on the mean model variances, and a multivariate analysis with a Bayesian functional principal component analysis (FPCA) to model the latent covariance. We show that the multivariate analysis performs well on MMP data with sparse responses and outperforms existing non-Bayesian methods. In a re-analysis of data from a study of the system of care (SOC) framework for children with mental and behavioral disorders, we are able to provide a more complete picture of the relationships in the data. Our analysis provides additional insights into the functioning on the SOC that a previous univariate analysis missed. △ Less

Submitted 7 November, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

arXiv:2106.05403 [pdf, other]

The Attraction Indian Buffet Distribution

Authors: Richard L. Warr, David B. Dahl, Jeremy M. Meyer, Arthur Lui

Abstract: We propose the attraction Indian buffet distribution (AIBD), a distribution for binary feature matrices influenced by pairwise similarity information. Binary feature matrices are used in Bayesian models to uncover latent variables (i.e., features) that explain observed data. The Indian buffet process (IBP) is a popular exchangeable prior distribution for latent feature matrices. In the presence of… ▽ More We propose the attraction Indian buffet distribution (AIBD), a distribution for binary feature matrices influenced by pairwise similarity information. Binary feature matrices are used in Bayesian models to uncover latent variables (i.e., features) that explain observed data. The Indian buffet process (IBP) is a popular exchangeable prior distribution for latent feature matrices. In the presence of additional information, however, the exchangeability assumption is not reasonable or desirable. The AIBD can incorporate pairwise similarity information, yet it preserves many properties of the IBP, including the distribution of the total number of features. Thus, much of the interpretation and intuition that one has for the IBP directly carries over to the AIBD. A temperature parameter controls the degree to which the similarity information affects feature-sharing between observations. Unlike other nonexchangeable distributions for feature allocations, the probability mass function of the AIBD has a tractable normalizing constant, making posterior inference on hyperparameters straight-forward using standard MCMC methods. A novel posterior sampling algorithm is proposed for the IBP and the AIBD. We demonstrate the feasibility of the AIBD as a prior distribution in feature allocation models and compare the performance of competing methods in simulations and an application. △ Less

Submitted 16 July, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

arXiv:2105.08859 [pdf, other]

Changes in Crime Rates During the COVID-19 Pandemic

Authors: Mikaela Meyer, Ahmed Hassafy, Gina Lewis, Prasun Shrestha, Amelia M. Haviland, Daniel S. Nagin

Abstract: We estimate changes in the rates of five FBI Part 1 crime (homicide, auto theft, burglary, robbery, and larceny) during the COVID-19 pandemic from March through December 2020. Using publicly available weekly crime count data from 29 of the 70 largest cities in the U.S. from January 2018 through December 2020, three different linear regression model specifications are used to detect changes. One de… ▽ More We estimate changes in the rates of five FBI Part 1 crime (homicide, auto theft, burglary, robbery, and larceny) during the COVID-19 pandemic from March through December 2020. Using publicly available weekly crime count data from 29 of the 70 largest cities in the U.S. from January 2018 through December 2020, three different linear regression model specifications are used to detect changes. One detects whether crime trends in four 2020 pre- and post-pandemic periods differ from those in 2018 and 2019. A second looks in more detail at the spring 2020 lockdowns to detect whether crime trends changed over successive biweekly periods into the lockdown. The third uses a city-level openness index that we created for the purpose of examining whether the degree of openness was associated with changing crime rates. For homicide and auto theft, we find significant increases during all or most of the pandemic. By contrast, we find significant declines in robbery and larceny during all or part of the pandemic and no significant changes in burglary over the course of the pandemic. Only larceny rates fluctuated with the degree of each city's lockdown. It is unusual for crime rates to move in different directions, and the reasons for the mixed findings for these five Part 1 Index crimes, one with no change, two with sustained increases, and two with sustained decreases, are not yet known. We hypothesize that the reasons may be related to changes in opportunity, and the pandemic provides unique opportunities for future research to better understand the forces impacting crime rates. In the absence of a clear understanding of the mechanisms by which the pandemic affected crime, in the spirit of evidence-based crime policy, we caution against advancing policy at this time based on lessons learned from the pandemic "natural experiment." △ Less

Submitted 18 May, 2021; originally announced May 2021.

arXiv:2102.01943 [pdf, ps, other]

A Frequency Domain Bootstrap for General Multivariate Stationary Processes

Authors: Marco Meyer, Efstathios Paparoditis

Abstract: For many relevant statistics of multivariate time series, no valid frequency domain bootstrap procedures exist. This is mainly due to the fact that the distribution of such statistics depends on the fourth-order moment structure of the underlying process in nearly every scenario, except for some special cases like Gaussian time series. In contrast to the univariate case, even additional structural… ▽ More For many relevant statistics of multivariate time series, no valid frequency domain bootstrap procedures exist. This is mainly due to the fact that the distribution of such statistics depends on the fourth-order moment structure of the underlying process in nearly every scenario, except for some special cases like Gaussian time series. In contrast to the univariate case, even additional structural assumptions such as linearity of the multivariate process or a standardization of the statistic of interest do not solve the problem. This paper focuses on integrated periodogram statistics as well as functions thereof and presents a new frequency domain bootstrap procedure for multivariate time series, the multivariate frequency domain hybrid bootstrap (MFHB), to fill this gap. Asymptotic validity of the MFHB procedure is established for general classes of periodogram-based statistics and for stationary multivariate processes satisfying rather weak dependence conditions. A simulation study is carried out which compares the finite sample performance of the MFHB with that of the moving block bootstrap. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Comments: 43 pages

MSC Class: 62M10; 62M15

arXiv:2002.00952 [pdf, other]

doi 10.1007/978-3-030-46640-4_10

Improved inter-scanner MS lesion segmentation by adversarial training on longitudinal data

Authors: Mattias Billast, Maria Ines Meyer, Diana M. Sima, David Robben

Abstract: The evaluation of white matter lesion progression is an important biomarker in the follow-up of MS patients and plays a crucial role when deciding the course of treatment. Current automated lesion segmentation algorithms are susceptible to variability in image characteristics related to MRI scanner or protocol differences. We propose a model that improves the consistency of MS lesion segmentations… ▽ More The evaluation of white matter lesion progression is an important biomarker in the follow-up of MS patients and plays a crucial role when deciding the course of treatment. Current automated lesion segmentation algorithms are susceptible to variability in image characteristics related to MRI scanner or protocol differences. We propose a model that improves the consistency of MS lesion segmentations in inter-scanner studies. First, we train a CNN base model to approximate the performance of icobrain, an FDA-approved clinically available lesion segmentation software. A discriminator model is then trained to predict if two lesion segmentations are based on scans acquired using the same scanner type or not, achieving a 78% accuracy in this task. Finally, the base model and the discriminator are trained adversarially on multi-scanner longitudinal data to improve the inter-scanner consistency of the base model. The performance of the models is evaluated on an unseen dataset containing manual delineations. The inter-scanner variability is evaluated on test-retest data, where the adversarial network produces improved results over the base model and the FDA-approved solution. △ Less

Submitted 27 October, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

Comments: Added link to final authenticated publication (https://doi.org/10.1007/978-3-030-46640-4_10)

Journal ref: Crimi A., Bakas S. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2019. Lecture Notes in Computer Science, vol 11992. Springer, Cham

arXiv:1912.07359 [pdf, other]

Function-on-Function Regression for the Identification of Epigenetic Regions Exhibiting Windows of Susceptibility to Environmental Exposures

Authors: Michele Zemplenyi, Mark J. Meyer, Andres Cardenas, Marie-France Hivert, Sheryl L. Rifas-Shiman, Heike Gibson, Itai Kloog, Joel Schwartz, Emily Oken, Dawn L. DeMeo, Diane R. Gold, Brent A. Coull

Abstract: The ability to identify time periods when individuals are most susceptible to exposures, as well as the biological mechanisms through which these exposures act, is of great public health interest. Growing evidence supports an association between prenatal exposure to air pollution and epigenetic marks, such as DNA methylation, but the timing and gene-specific effects of these epigenetic changes are… ▽ More The ability to identify time periods when individuals are most susceptible to exposures, as well as the biological mechanisms through which these exposures act, is of great public health interest. Growing evidence supports an association between prenatal exposure to air pollution and epigenetic marks, such as DNA methylation, but the timing and gene-specific effects of these epigenetic changes are not well understood. Here, we present the first study that aims to identify prenatal windows of susceptibility to air pollution exposures in cord blood DNA methylation. In particular, we propose a function-on-function regression model that leverages data from nearby DNA methylation probes to identify epigenetic regions that exhibit windows of susceptibility to ambient particulate matter less than 2.5 microns (PM$_{2.5}$). By incorporating the covariance structure among both the multivariate DNA methylation outcome and the time-varying exposure under study, this framework yields greater power to detect windows of susceptibility and greater control of false discoveries than methods that model probes independently. We compare our method to a distributed lag model approach that models DNA methylation in a probe-by-probe manner, both in simulation and by application to motivating data from the Project Viva birth cohort. In two epigenetic regions selected based on prior studies of air pollution effects on epigenome-wide methylation, we identify windows of susceptibility to PM$_{2.5}$ exposure near the beginning and middle of the third trimester of pregnancy. △ Less

Submitted 13 December, 2019; originally announced December 2019.

Comments: 20 pages, 10 figures

arXiv:1911.04289 [pdf, other]

doi 10.1007/978-3-030-32695-1_9

Relevance Vector Machines for harmonization of MRI brain volumes using image descriptors

Authors: Maria Ines Meyer, Ezequiel de la Rosa, Koen Van Leemput, Diana M. Sima

Abstract: With the increased need for multi-center magnetic resonance imaging studies, problems arise related to differences in hardware and software between centers. Namely, current algorithms for brain volume quantification are unreliable for the longitudinal assessment of volume changes in this type of setting. Currently most methods attempt to decrease this issue by regressing the scanner- and/or center… ▽ More With the increased need for multi-center magnetic resonance imaging studies, problems arise related to differences in hardware and software between centers. Namely, current algorithms for brain volume quantification are unreliable for the longitudinal assessment of volume changes in this type of setting. Currently most methods attempt to decrease this issue by regressing the scanner- and/or center-effects from the original data. In this work, we explore a novel approach to harmonize brain volume measurements by using only image descriptors. First, we explore the relationships between volumes and image descriptors. Then, we train a Relevance Vector Machine (RVM) model over a large multi-site dataset of healthy subjects to perform volume harmonization. Finally, we validate the method over two different datasets: i) a subset of unseen healthy controls; and ii) a test-retest dataset of multiple sclerosis (MS) patients. The method decreases scanner and center variability while preserving measurements that did not require correction in MS patient data. We show that image descriptors can be used as input to a machine learning algorithm to improve the reliability of longitudinal volumetric studies. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Comments: 9 pages, 4 figures. Presented at the International Workshop on Machine Learning in Clinical Neuroimaging (MLCN) 2019

Journal ref: OR 2.0 Context-Aware Operating Theaters and Machine Learning in Clinical Neuroimaging. OR 2.0 2019, MLCN 2019. Lecture Notes in Computer Science, vol 11796. Springer, Cham

arXiv:1911.00535 [pdf, other]

doi 10.1080/26939169.2022.2063209

Think-aloud interviews: A tool for exploring student statistical reasoning

Authors: Alex Reinhart, Ciaran Evans, Amanda Luby, Josue Orellana, Mikaela Meyer, Jerzy Wieczorek, Peter Elliott, Philipp Burckhardt, Rebecca Nugent

Abstract: Think-aloud interviews have been a valuable but underused tool in statistics education research. Think-alouds, in which students narrate their reasoning in real time while solving problems, differ in important ways from other types of cognitive interviews and related education research methods. Beyond the uses already found in the statistics literature -- mostly validating the wording of statistic… ▽ More Think-aloud interviews have been a valuable but underused tool in statistics education research. Think-alouds, in which students narrate their reasoning in real time while solving problems, differ in important ways from other types of cognitive interviews and related education research methods. Beyond the uses already found in the statistics literature -- mostly validating the wording of statistical concept inventory questions and studying student misconceptions -- we suggest other possible use cases for think-alouds and summarize best-practice guidelines for designing think-aloud interview studies. Using examples from our own experiences studying the local student body for our introductory statistics courses, we illustrate how research goals should inform study-design decisions and what kinds of insights think-alouds can provide. We hope that our overview of think-alouds encourages more statistics educators and researchers to begin using this method. △ Less

Submitted 4 April, 2022; v1 submitted 1 November, 2019; originally announced November 2019.

Comments: 29 pages, 2 tables, 2 figures

Journal ref: Journal of Statistics and Data Science Education (2022), 30:2, 100-113

arXiv:1906.02269 [pdf, other]

doi 10.1007/s11222-020-09981-3

Bayesian Wavelet-packet Historical Functional Linear Models

Authors: Mark J. Meyer, Elizabeth J. Malloy, Brent A. Coull

Abstract: Historical Functional Linear Models (HFLM) quantify associations between a functional predictor and functional outcome where the predictor is an exposure variable that occurs before, or at least concurrently with, the outcome. Current work on the HFLM is largely limited to frequentist estimation techniques that employ spline-based basis representations. In this work, we propose a novel use of the… ▽ More Historical Functional Linear Models (HFLM) quantify associations between a functional predictor and functional outcome where the predictor is an exposure variable that occurs before, or at least concurrently with, the outcome. Current work on the HFLM is largely limited to frequentist estimation techniques that employ spline-based basis representations. In this work, we propose a novel use of the discrete wavelet-packet transformation, which has not previously been used in functional models, to estimate historical relationships in a fully Bayesian model. Since inference has not been an emphasis of the existing work on HFLMs, we also employ two established Bayesian inference procedures in this historical functional setting. We investigate the operating characteristics of our wavelet-packet HFLM, as well as the two inference procedures, in simulation and use the model to analyze data on the impact of lagged exposure to particulate matter finer than 2.5$μ$g on heart rate variability in a cohort of journeyman boilermakers over the course of a day's shift. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: Submitted for publication in JCGS

arXiv:1901.07976 [pdf, other]

doi 10.1214/21-AOAS1513

Ordinal Probit Functional Outcome Regression with Application to Computer-Use Behavior in Rhesus Monkeys

Authors: Mark J. Meyer, Jeffrey S. Morris, Regina Paxton Gazes, Brent A. Coull

Abstract: Research in functional regression has made great strides in expanding to non-Gaussian functional outcomes, but exploration of ordinal functional outcomes remains limited. Motivated by a study of computer-use behavior in rhesus macaques (Macaca mulatta), we introduce the Ordinal Probit Functional Outcome Regression model (OPFOR). OPFOR models can be fit using one of several basis functions includin… ▽ More Research in functional regression has made great strides in expanding to non-Gaussian functional outcomes, but exploration of ordinal functional outcomes remains limited. Motivated by a study of computer-use behavior in rhesus macaques (Macaca mulatta), we introduce the Ordinal Probit Functional Outcome Regression model (OPFOR). OPFOR models can be fit using one of several basis functions including penalized B-splines, wavelets, and O'Sullivan splines -- the last of which typically performs best. Simulation using a variety of underlying covariance patterns shows that the model performs reasonably well in estimation under multiple basis functions with near nominal coverage for joint credible intervals. Finally, in application, we use Bayesian model selection criteria adapted to functional outcome regression to best characterize the relation between several demographic factors of interest and the monkeys' computer use over the course of a year. In comparison with a standard ordinal longitudinal analysis, OPFOR outperforms a cumulative-link mixed-effects model in simulation and provides additional and more nuanced information on the nature of the monkeys' computer-use behavior. △ Less

Submitted 18 March, 2021; v1 submitted 23 January, 2019; originally announced January 2019.

arXiv:1812.07696 [pdf, other]

cgam: An R Package for the Constrained Generalized Additive Model

Authors: Xiyue Liao, Mary C. Meyer

Abstract: The cgam package contains routines to fit the generalized additive model where the components may be modeled with shape and smoothness assumptions. The main routine is cgam and nineteen symbolic routines are provided to indicate the relationship between the response and each predictor, which satisfies constraints such as monotonicity, convexity, their combinations, tree, and umbrella orderings. Th… ▽ More The cgam package contains routines to fit the generalized additive model where the components may be modeled with shape and smoothness assumptions. The main routine is cgam and nineteen symbolic routines are provided to indicate the relationship between the response and each predictor, which satisfies constraints such as monotonicity, convexity, their combinations, tree, and umbrella orderings. The user may specify constrained splines to fit the components for continuous predictors, and various types of orderings for the ordinal predictors. In addition, the user may specify parametrically modeled covariates. The set over which the likelihood is maximized is a polyhedral convex cone, and a least-squares solution is obtained by projecting the data vector onto the cone. For generalized models, the fit is obtained through iteratively re-weighted cone projections. The cone information criterion is provided and may be used to compare fits for combinations of variables and shapes. In addition, the routine wps implements monotone regression in two dimensions using warped-plane splines, without an additivity assumption. The graphical routine plotpersp will plot an estimated mean surface for a selected pair of predictors, given an object of either cgam or wps. This package is now available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=cgam. △ Less

Submitted 18 December, 2018; originally announced December 2018.

arXiv:1810.09409 [pdf, other]

Event-triggered Natural Hazard Monitoring with Convolutional Neural Networks on the Edge

Authors: Matthias Meyer, Timo Farei-Campagna, Akos Pasztor, Reto Da Forno, Tonio Gsell, Jérome Faillettaz, Andreas Vieli, Samuel Weber, Jan Beutel, Lothar Thiele

Abstract: In natural hazard warning systems fast decision making is vital to avoid catastrophes. Decision making at the edge of a wireless sensor network promises fast response times but is limited by the availability of energy, data transfer speed, processing and memory constraints. In this work we present a realization of a wireless sensor network for hazard monitoring based on an array of event-triggered… ▽ More In natural hazard warning systems fast decision making is vital to avoid catastrophes. Decision making at the edge of a wireless sensor network promises fast response times but is limited by the availability of energy, data transfer speed, processing and memory constraints. In this work we present a realization of a wireless sensor network for hazard monitoring based on an array of event-triggered single-channel micro-seismic sensors with advanced signal processing and characterization capabilities based on a novel co-detection technique. On the one hand we leverage an ultra-low power, threshold-triggering circuit paired with on-demand digital signal acquisition capable of extracting relevant information exactly and efficiently at times when it matters most and consequentially not wasting precious resources when nothing can be observed. On the other hand we utilize machine-learning-based classification implemented on low-power, off-the-shelf microcontrollers to avoid false positive warnings and to actively identify humans in hazard zones. The sensors' response time and memory requirement is substantially improved by quantizing and pipelining the inference of a convolutional neural network. In this way, convolutional neural networks that would not run unmodified on a memory constrained device can be executed in real-time and at scale on low-power embedded devices. A field study with our system is running on the rockfall scarp of the Matterhorn Hörnligrat at 3500 m a.s.l. since 08/2018. △ Less

Submitted 1 March, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

arXiv:1806.06523 [pdf, other]

A Frequency Domain Bootstrap for General Stationary Processes

Authors: Marco Meyer, Efstathios Paparoditis, Jens-Peter Kreiss

Abstract: Existing frequency domain methods for bootstrapping time series have a limited range. Consider for instance the class of spectral mean statistics (also called integrated periodograms) which includes many important statistics in time series analysis, such as sample autocovariances and autocorrelations among other things. Essentially, such frequency domain bootstrap procedures cover the case of line… ▽ More Existing frequency domain methods for bootstrapping time series have a limited range. Consider for instance the class of spectral mean statistics (also called integrated periodograms) which includes many important statistics in time series analysis, such as sample autocovariances and autocorrelations among other things. Essentially, such frequency domain bootstrap procedures cover the case of linear time series with independent innovations, and some even require the time series to be Gaussian. In this paper we propose a new, frequency domain bootstrap method which is consistent for a much wider range of stationary processes and can be applied to a large class of periodogram-based statistics. It introduces a new concept of convolved periodograms of smaller samples which uses pseudo periodograms of subsamples generated in a way that correctly imitates the weak dependence structure of the periodogram. %The new bootstrap procedure %corrects for those aspects of the distribution of spectral means that cannot be mimicked by existing procedures. We show consistency for this procedure for a general class of stationary time series, ranging clearly beyond linear processes, and for general spectral means and ratio statistics. Furthermore, and for the class of spectral means, we also show, how, using this new approach, existing bootstrap methods, which replicate appropriately only the dominant part of the distribution of interest, can be corrected. The finite sample performance of the new bootstrap procedure is illustrated via simulations. △ Less

Submitted 18 June, 2018; originally announced June 2018.

Comments: 38 pages, 4 figures

MSC Class: 62M10; 62M15

arXiv:1804.09285 [pdf, other]

Estimation and inference of domain means subject to shape constraints

Authors: Cristian Oliva-Aviles, Mary C. Meyer, Jean D. Opsomer

Abstract: Population domain means are frequently expected to respect shape or order constraints that arise naturally with survey data. For example, given a job category, mean salaries in big cities might be expected to be higher than those in small cities, but no order might be available to be imposed within big or small cities. A design-based estimator of domain means that imposes constraints on the most c… ▽ More Population domain means are frequently expected to respect shape or order constraints that arise naturally with survey data. For example, given a job category, mean salaries in big cities might be expected to be higher than those in small cities, but no order might be available to be imposed within big or small cities. A design-based estimator of domain means that imposes constraints on the most common survey estimators is proposed. Inequality restrictions that can be expressed with irreducible matrices are considered, as these cover a broad class of shapes and partial orderings. The constrained estimator is shown to be consistent and asymptotically normally distributed under mild conditions, given that the shape is a reasonable assumption for the population. Further, simulation experiments demonstrate that both estimation and variability of domain means are improved by the constrained estimator in comparison with usual unconstrained estimators, especially for small domains. An application of the proposed estimator to the 2015 U.S. National Survey of College Graduates is shown. △ Less

Submitted 24 April, 2018; originally announced April 2018.

arXiv:1711.04749 [pdf, other]

Checking validity of monotone domain mean estimators

Authors: Cristian Oliva-Aviles, Mary C. Meyer, Jean D. Opsomer

Abstract: Estimates of population characteristics such as domain means are often expected to follow monotonicity assumptions. Recently, a method to adaptively pool neighboring domains was proposed, which ensures that the resulting domain mean estimates follow monotone constraints. The method leads to asymptotically valid estimation and inference, and can lead to substantial improvements in efficiency, in co… ▽ More Estimates of population characteristics such as domain means are often expected to follow monotonicity assumptions. Recently, a method to adaptively pool neighboring domains was proposed, which ensures that the resulting domain mean estimates follow monotone constraints. The method leads to asymptotically valid estimation and inference, and can lead to substantial improvements in efficiency, in comparison with unconstrained domain estimators. However, assuming incorrect shape constraints could lead to biased estimators. Here, we develop the Cone Information Criterion for Survey Data (CICs) as a diagnostic method to measure monotonicity departures on population domain means. We show that the criterion leads to a consistent methodology that makes an asymptotically correct decision choosing between unconstrained and constrained domain mean estimators. △ Less

Submitted 24 April, 2018; v1 submitted 13 November, 2017; originally announced November 2017.

arXiv:1701.08974 [pdf, other]

Towards Adversarial Retinal Image Synthesis

Authors: Pedro Costa, Adrian Galdran, Maria Inês Meyer, Michael David Abràmoff, Meindert Niemeijer, Ana Maria Mendonça, Aurélio Campilho

Abstract: Synthesizing images of the eye fundus is a challenging task that has been previously approached by formulating complex models of the anatomy of the eye. New images can then be generated by sampling a suitable parameter space. In this work, we propose a method that learns to synthesize eye fundus images directly from data. For that, we pair true eye fundus images with their respective vessel trees,… ▽ More Synthesizing images of the eye fundus is a challenging task that has been previously approached by formulating complex models of the anatomy of the eye. New images can then be generated by sampling a suitable parameter space. In this work, we propose a method that learns to synthesize eye fundus images directly from data. For that, we pair true eye fundus images with their respective vessel trees, by means of a vessel segmentation technique. These pairs are then used to learn a mapping from a binary vessel tree to a new retinal image. For this purpose, we use a recent image-to-image translation technique, based on the idea of adversarial learning. Experimental results show that the original and the generated images are visually different in terms of their global appearance, in spite of sharing the same vessel tree. Additionally, a quantitative quality analysis of the synthetic retinal images confirms that the produced images retain a high proportion of the true image set quality. △ Less

Submitted 31 January, 2017; originally announced January 2017.

arXiv:1412.4260 [pdf, other]

A Bayesian Nonparametric System Reliability Model which Integrates Multiple Sources of Lifetime Information

Authors: Richard L. Warr, Jeremy M. Meyer, Jackson T. Curtis

Abstract: We present a Bayesian nonparametric system reliability model which scales well and provides a great deal of flexibility in modeling. The Bayesian approach naturally handles the disparate amounts of component and subsystem data that may exist. However, traditional Bayesian reliability models are quite computationally complex, relying on MCMC techniques. Our approach utilizes the conjugate propertie… ▽ More We present a Bayesian nonparametric system reliability model which scales well and provides a great deal of flexibility in modeling. The Bayesian approach naturally handles the disparate amounts of component and subsystem data that may exist. However, traditional Bayesian reliability models are quite computationally complex, relying on MCMC techniques. Our approach utilizes the conjugate properties of the beta-Stacy process, which is the fundamental building block of our model. These individual models are linked together using a method of moments estimation approach. This model is computationally fast, allows for right-censored data, and is used for estimating and predicting system reliability. △ Less

Submitted 21 March, 2022; v1 submitted 13 December, 2014; originally announced December 2014.

arXiv:1311.6849 [pdf, other]

Testing against a linear regression model using ideas from shape-restricted estimation

Authors: Bodhisattva Sen, Mary Meyer

Abstract: A formal likelihood ratio hypothesis test for the validity of a parametric regression function is proposed, using a large-dimensional, nonparametric double cone alternative. For example, the test against a constant function uses the alternative of increasing or decreasing regression functions, and the test against a linear function uses the convex or concave alternative. The proposed test is exact… ▽ More A formal likelihood ratio hypothesis test for the validity of a parametric regression function is proposed, using a large-dimensional, nonparametric double cone alternative. For example, the test against a constant function uses the alternative of increasing or decreasing regression functions, and the test against a linear function uses the convex or concave alternative. The proposed test is exact, unbiased and the critical value is easily computed. The power of the test increases to one as the sample size increases, under very mild assumptions -- even when the alternative is mis-specified. That is, the power of the test converges to one for any true regression function that deviates (in a non-degenerate way) from the parametric null hypothesis. We also formulate tests for the linear versus partial linear model, and consider the special case of the additive model. Simulations show that our procedure behaves well consistently when compared with other methods. Although the alternative fit is non-parametric, no tuning parameters are involved. △ Less

Submitted 26 June, 2014; v1 submitted 26 November, 2013; originally announced November 2013.

Comments: 38 pages, 7 figures

arXiv:0811.1705 [pdf, ps, other]

doi 10.1214/08-AOAS167

Inference using shape-restricted regression splines

Authors: Mary C. Meyer

Abstract: Regression splines are smooth, flexible, and parsimonious nonparametric function estimators. They are known to be sensitive to knot number and placement, but if assumptions such as monotonicity or convexity may be imposed on the regression function, the shape-restricted regression splines are robust to knot choices. Monotone regression splines were introduced by Ramsay [Statist. Sci. 3 (1998) 42… ▽ More Regression splines are smooth, flexible, and parsimonious nonparametric function estimators. They are known to be sensitive to knot number and placement, but if assumptions such as monotonicity or convexity may be imposed on the regression function, the shape-restricted regression splines are robust to knot choices. Monotone regression splines were introduced by Ramsay [Statist. Sci. 3 (1998) 425--461], but were limited to quadratic and lower order. In this paper an algorithm for the cubic monotone case is proposed, and the method is extended to convex constraints and variants such as increasing-concave. The restricted versions have smaller squared error loss than the unrestricted splines, although they have the same convergence rates. The relatively small degrees of freedom of the model and the insensitivity of the fits to the knot choices allow for practical inference methods; the computational efficiency allows for back-fitting of additive models. Tests of constant versus increasing and linear versus convex regression function, when implemented with shape-restricted regression splines, have higher power than the standard version using ordinary shape-restricted regression. △ Less

Submitted 11 November, 2008; originally announced November 2008.

Comments: Published in at http://dx.doi.org/10.1214/08-AOAS167 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS167

Journal ref: Annals of Applied Statistics 2008, Vol. 2, No. 3, 1013-1033

Showing 1–26 of 26 results for author: Meyer, M