-
Toward a Principled Framework for Disclosure Avoidance
Authors:
Michael B Hawes,
Evan M Brassell,
Anthony Caruso,
Ryan Cumings-Menon,
Jason Devine,
Cassandra Dorius,
David Evans,
Kenneth Haase,
Michele C Hedrick,
Alexandra Krause,
Philip Leclerc,
James Livsey,
Rolando A Rodriguez,
Luke T Rogers,
Matthew Spence,
Victoria Velkoff,
Michael Walsh,
James Whitehorne,
Sallie Ann Keller
Abstract:
Responsible disclosure limitation is an iterative exercise in risk assessment and mitigation. From time to time, as disclosure risks grow and evolve and as data users' needs change, agencies must consider redesigning the disclosure avoidance system(s) they use. Discussions about candidate systems often conflate inherent features of those systems with implementation decisions independent of those s…
▽ More
Responsible disclosure limitation is an iterative exercise in risk assessment and mitigation. From time to time, as disclosure risks grow and evolve and as data users' needs change, agencies must consider redesigning the disclosure avoidance system(s) they use. Discussions about candidate systems often conflate inherent features of those systems with implementation decisions independent of those systems. For example, a system's ability to calibrate the strength of protection to suit the underlying disclosure risk of the data (e.g., by varying suppression thresholds), is a worthwhile feature regardless of the independent decision about how much protection is actually necessary. Having a principled discussion of candidate disclosure avoidance systems requires a framework for distinguishing these inherent features of the systems from the implementation decisions that need to be made independent of the system selected. For statistical agencies, this framework must also reflect the applied nature of these systems, acknowledging that candidate systems need to be adaptable to requirements stemming from the legal, scientific, resource, and stakeholder environments within which they would be operating. This paper proposes such a framework. No approach will be perfectly adaptable to every potential system requirement. Because the selection of some methodologies over others may constrain the resulting systems' efficiency and flexibility to adapt to particular statistical product specifications, data user needs, or disclosure risks, agencies may approach these choices in an iterative fashion, adapting system requirements, product specifications, and implementation parameters as necessary to ensure the resulting quality of the statistical product.
△ Less
Submitted 29 May, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
A Plant Root System Algorithm Based on Swarm Intelligence for One-dimensional Biomedical Signal Feature Engineering
Authors:
Rui Gong,
Kazunori Hase
Abstract:
To date, very few biomedical signals have transitioned from research applications to clinical applications. This is largely due to the lack of trust in the diagnostic ability of non-stationary signals. To reach the level of clinical diagnostic application, classification using high-quality signal features is necessary. While there has been considerable progress in machine learning in recent years,…
▽ More
To date, very few biomedical signals have transitioned from research applications to clinical applications. This is largely due to the lack of trust in the diagnostic ability of non-stationary signals. To reach the level of clinical diagnostic application, classification using high-quality signal features is necessary. While there has been considerable progress in machine learning in recent years, especially deep learning, progress has been quite limited in the field of feature engineering. This study proposes a feature extraction algorithm based on group intelligence which we call a Plant Root System (PRS) algorithm. Importantly, the correlation between features produced by this PRS algorithm and traditional features is low, and the accuracy of several widely-used classifiers was found to be substantially improved with the addition of PRS features. It is expected that more biomedical signals can be applied to clinical diagnosis using the proposed algorithm.
△ Less
Submitted 31 July, 2021;
originally announced August 2021.
-
Neural Model-based Optimization with Right-Censored Observations
Authors:
Katharina Eggensperger,
Kai Haase,
Philipp Müller,
Marius Lindauer,
Frank Hutter
Abstract:
In many fields of study, we only observe lower bounds on the true response value of some experiments. When fitting a regression model to predict the distribution of the outcomes, we cannot simply drop these right-censored observations, but need to properly model them. In this work, we focus on the concept of censored data in the light of model-based optimization where prematurely terminating evalu…
▽ More
In many fields of study, we only observe lower bounds on the true response value of some experiments. When fitting a regression model to predict the distribution of the outcomes, we cannot simply drop these right-censored observations, but need to properly model them. In this work, we focus on the concept of censored data in the light of model-based optimization where prematurely terminating evaluations (and thus generating right-censored data) is a key factor for efficiency, e.g., when searching for an algorithm configuration that minimizes runtime of the algorithm at hand. Neural networks (NNs) have been demonstrated to work well at the core of model-based optimization procedures and here we extend them to handle these censored observations. We propose (i)~a loss function based on the Tobit model to incorporate censored samples into training and (ii) use an ensemble of networks to model the posterior distribution. To nevertheless be efficient in terms of optimization-overhead, we propose to use Thompson sampling s.t. we only need to train a single NN in each iteration. Our experiments show that our trained regression models achieve a better predictive quality than several baselines and that our approach achieves new state-of-the-art performance for model-based optimization on two optimization problems: minimizing the solution time of a SAT solver and the time-to-accuracy of neural networks.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.