-
A Refreshment Stirred, Not Shaken (III): Can Swapping Be Differentially Private?
Authors:
James Bailie,
Ruobin Gong,
Xiao-Li Meng
Abstract:
The quest for a precise and contextually grounded answer to the question in the present paper's title resulted in this stirred-not-shaken triptych, a phrase that reflects our desire to deepen the theoretical basis, broaden the practical applicability, and reduce the misperception of differential privacy (DP)$\unicode{x2014}$all without shaking its core foundations. Indeed, given the existence of m…
▽ More
The quest for a precise and contextually grounded answer to the question in the present paper's title resulted in this stirred-not-shaken triptych, a phrase that reflects our desire to deepen the theoretical basis, broaden the practical applicability, and reduce the misperception of differential privacy (DP)$\unicode{x2014}$all without shaking its core foundations. Indeed, given the existence of more than 200 formulations of DP (and counting), before even attempting to answer the titular question one must first precisely specify what it actually means to be DP. Motivated by this observation, a theoretical investigation into DP's fundamental essence resulted in Part I of this trio, which introduces a five-building-block system explicating the who, where, what, how and how much aspects of DP. Instantiating this system in the context of the United States Decennial Census, Part II then demonstrates the broader applicability and relevance of DP by comparing a swapping strategy like that used in 2010 with the TopDown Algorithm$\unicode{x2014}$a DP method adopted in the 2020 Census. This paper provides nontechnical summaries of the preceding two parts as well as new discussion$\unicode{x2014}$for example, on how greater awareness of the five building blocks can thwart privacy theatrics; how our results bridging traditional SDC and DP allow a data custodian to reap the benefits of both these fields; how invariants impact disclosure risk; and how removing the implicit reliance on aleatoric uncertainty could lead to new generalizations of DP.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
A Refreshment Stirred, Not Shaken (II): Invariant-Preserving Deployments of Differential Privacy for the US Decennial Census
Authors:
James Bailie,
Ruobin Gong,
Xiao-Li Meng
Abstract:
Through the lens of the system of differential privacy specifications developed in Part I of a trio of articles, this second paper examines two statistical disclosure control (SDC) methods for the United States Decennial Census: the Permutation Swapping Algorithm (PSA), which is similar to the 2010 Census's disclosure avoidance system (DAS), and the TopDown Algorithm (TDA), which was used in the 2…
▽ More
Through the lens of the system of differential privacy specifications developed in Part I of a trio of articles, this second paper examines two statistical disclosure control (SDC) methods for the United States Decennial Census: the Permutation Swapping Algorithm (PSA), which is similar to the 2010 Census's disclosure avoidance system (DAS), and the TopDown Algorithm (TDA), which was used in the 2020 DAS. To varying degrees, both methods leave unaltered some statistics of the confidential data $\unicode{x2013}$ which are called the method's invariants $\unicode{x2013}$ and hence neither can be readily reconciled with differential privacy (DP), at least as it was originally conceived. Nevertheless, we establish that the PSA satisfies $\varepsilon$-DP subject to the invariants it necessarily induces, thereby showing that this traditional SDC method can in fact still be understood within our more-general system of DP specifications. By a similar modification to $ρ$-zero concentrated DP, we also provide a DP specification for the TDA. Finally, as a point of comparison, we consider the counterfactual scenario in which the PSA was adopted for the 2020 Census, resulting in a reduction in the nominal privacy loss, but at the cost of releasing many more invariants. Therefore, while our results explicate the mathematical guarantees of SDC provided by the PSA, the TDA and the 2020 DAS in general, care must be taken in their translation to actual privacy protection $\unicode{x2013}$ just as is the case for any DP deployment.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
dapper: Data Augmentation for Private Posterior Estimation in R
Authors:
Kevin Eng,
Jordan A. Awan,
Nianqiao Phyllis Ju,
Vinayak A. Rao,
Ruobin Gong
Abstract:
This paper serves as a reference and introduction to using the R package dapper. dapper encodes a sampling framework which allows exact Markov chain Monte Carlo simulation of parameters and latent variables in a statistical model given privatized data. The goal of this package is to fill an urgent need by providing applied researchers with a flexible tool to perform valid Bayesian inference on dat…
▽ More
This paper serves as a reference and introduction to using the R package dapper. dapper encodes a sampling framework which allows exact Markov chain Monte Carlo simulation of parameters and latent variables in a statistical model given privatized data. The goal of this package is to fill an urgent need by providing applied researchers with a flexible tool to perform valid Bayesian inference on data protected by differential privacy, allowing them to properly account for the noise introduced for privacy protection in their statistical analysis. dapper offers a significant step forward in providing general-purpose statistical inference tools for privatized data.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
Differentially Private Range Queries with Correlated Input Perturbation
Authors:
Prathamesh Dharangutte,
Jie Gao,
Ruobin Gong,
Guanyang Wang
Abstract:
This work proposes a class of differentially private mechanisms for linear queries, in particular range queries, that leverages correlated input perturbation to simultaneously achieve unbiasedness, consistency, statistical transparency, and control over utility requirements in terms of accuracy targets expressed either in certain query margins or as implied by the hierarchical database structure.…
▽ More
This work proposes a class of differentially private mechanisms for linear queries, in particular range queries, that leverages correlated input perturbation to simultaneously achieve unbiasedness, consistency, statistical transparency, and control over utility requirements in terms of accuracy targets expressed either in certain query margins or as implied by the hierarchical database structure. The proposed Cascade Sampling algorithm instantiates the mechanism exactly and efficiently. Our theoretical and empirical analysis demonstrates that we achieve near-optimal utility, effectively compete with other methods, and retain all the favorable statistical properties discussed earlier.
△ Less
Submitted 6 November, 2024; v1 submitted 10 February, 2024;
originally announced February 2024.
-
Integer Subspace Differential Privacy
Authors:
Prathamesh Dharangutte,
Jie Gao,
Ruobin Gong,
Fang-Yi Yu
Abstract:
We propose new differential privacy solutions for when external \emph{invariants} and \emph{integer} constraints are simultaneously enforced on the data product. These requirements arise in real world applications of private data curation, including the public release of the 2020 U.S. Decennial Census. They pose a great challenge to the production of provably private data products with adequate st…
▽ More
We propose new differential privacy solutions for when external \emph{invariants} and \emph{integer} constraints are simultaneously enforced on the data product. These requirements arise in real world applications of private data curation, including the public release of the 2020 U.S. Decennial Census. They pose a great challenge to the production of provably private data products with adequate statistical usability. We propose \emph{integer subspace differential privacy} to rigorously articulate the privacy guarantee when data products maintain both the invariants and integer characteristics, and demonstrate the composition and post-processing properties of our proposal. To address the challenge of sampling from a potentially highly restricted discrete space, we devise a pair of unbiased additive mechanisms, the generalized Laplace and the generalized Gaussian mechanisms, by solving the Diophantine equations as defined by the constraints. The proposed mechanisms have good accuracy, with errors exhibiting sub-exponential and sub-Gaussian tail probabilities respectively. To implement our proposal, we design an MCMC algorithm and supply empirical convergence assessment using estimated upper bounds on the total variation distance via $L$-lag coupling. We demonstrate the efficacy of our proposal with applications to a synthetic problem with intersecting invariants, a sensitive contingency table with known margins, and the 2010 Census county-level demonstration data with mandated fixed state population totals.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Data Augmentation MCMC for Bayesian Inference from Privatized Data
Authors:
Nianqiao Ju,
Jordan A. Awan,
Ruobin Gong,
Vinayak A. Rao
Abstract:
Differentially private mechanisms protect privacy by introducing additional randomness into the data. Restricting access to only the privatized data makes it challenging to perform valid statistical inference on parameters underlying the confidential data. Specifically, the likelihood function of the privatized data requires integrating over the large space of confidential databases and is typical…
▽ More
Differentially private mechanisms protect privacy by introducing additional randomness into the data. Restricting access to only the privatized data makes it challenging to perform valid statistical inference on parameters underlying the confidential data. Specifically, the likelihood function of the privatized data requires integrating over the large space of confidential databases and is typically intractable. For Bayesian analysis, this results in a posterior distribution that is doubly intractable, rendering traditional MCMC techniques inapplicable. We propose an MCMC framework to perform Bayesian inference from the privatized data, which is applicable to a wide range of statistical models and privacy mechanisms. Our MCMC algorithm augments the model parameters with the unobserved confidential data, and alternately updates each one conditional on the other. For the potentially challenging step of updating the confidential data, we propose a generic approach that exploits the privacy guarantee of the mechanism to ensure efficiency. We give results on the computational complexity, acceptance rate, and mixing properties of our MCMC. We illustrate the efficacy and applicability of our methods on a naïve-Bayes log-linear model as well as on a linear regression model.
△ Less
Submitted 7 December, 2022; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Six Statistical Senses
Authors:
Radu V. Craiu,
Ruobin Gong,
Xiao-Li Meng
Abstract:
This article proposes a set of categories, each one representing a particular distillation of important statistical ideas. Each category is labeled a "sense" because we think of these as essential in helping every statistical mind connect in constructive and insightful ways with statistical theory, methodologies, and computation, toward the ultimate goal of building statistical phronesis. The illu…
▽ More
This article proposes a set of categories, each one representing a particular distillation of important statistical ideas. Each category is labeled a "sense" because we think of these as essential in helping every statistical mind connect in constructive and insightful ways with statistical theory, methodologies, and computation, toward the ultimate goal of building statistical phronesis. The illustration of each sense with statistical principles and methods provides a sensical tour of the conceptual landscape of statistics, as a leading discipline in the data science ecosystem.
△ Less
Submitted 18 September, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Subspace Differential Privacy
Authors:
Jie Gao,
Ruobin Gong,
Fang-Yi Yu
Abstract:
Many data applications have certain invariant constraints due to practical needs. Data curators who employ differential privacy need to respect such constraints on the sanitized data product as a primary utility requirement. Invariants challenge the formulation, implementation, and interpretation of privacy guarantees.
We propose subspace differential privacy, to honestly characterize the depend…
▽ More
Many data applications have certain invariant constraints due to practical needs. Data curators who employ differential privacy need to respect such constraints on the sanitized data product as a primary utility requirement. Invariants challenge the formulation, implementation, and interpretation of privacy guarantees.
We propose subspace differential privacy, to honestly characterize the dependence of the sanitized output on confidential aspects of the data. We discuss two design frameworks that convert well-known differentially private mechanisms, such as the Gaussian and the Laplace mechanisms, to subspace differentially private ones that respect the invariants specified by the curator. For linear queries, we discuss the design of near-optimal mechanisms that minimize the mean squared error. Subspace differentially private mechanisms rid the need for post-processing due to invariants, preserve transparency and statistical intelligibility of the output, and can be suitable for distributed implementation. We showcase the proposed mechanisms on the 2020 Census Disclosure Avoidance demonstration data, and a spatio-temporal dataset of mobile access point connections on a large university campus.
△ Less
Submitted 29 April, 2022; v1 submitted 25 August, 2021;
originally announced August 2021.
-
Bayesian Poisson Mortality Projections with Incomplete Data
Authors:
Rui Gong,
Xiaoqian Sun,
Leping Liu,
Yu-Bo Wang
Abstract:
The missing data problem pervasively exists in statistical applications. Even as simple as the count data in mortality projections, it may not be available for certain age-and-year groups due to the budget limitations or difficulties in tracing research units, resulting in the follow-up estimation and prediction inaccuracies. To circumvent this data-driven challenge, we extend the Poisson log-norm…
▽ More
The missing data problem pervasively exists in statistical applications. Even as simple as the count data in mortality projections, it may not be available for certain age-and-year groups due to the budget limitations or difficulties in tracing research units, resulting in the follow-up estimation and prediction inaccuracies. To circumvent this data-driven challenge, we extend the Poisson log-normal Lee-Carter model to accommodate a more flexible time structure, and develop the new sampling algorithm that improves the MCMC convergence when dealing with incomplete mortality data. Via the overdispersion term and Gibbs sampler, the extended model can be re-written as the dynamic linear model so that both Kalman and sequential Kalman filters can be incorporated into the sampling scheme. Additionally, our meticulous prior settings can avoid the re-scaling step in each MCMC iteration, and allow model selection simultaneously conducted with estimation and prediction. The proposed method is applied to the mortality data of Chinese males during the period 1995-2016 to yield mortality rate forecasts for 2017-2039. The results are comparable to those based on the imputed data set, suggesting that our approach could handle incomplete data well.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Estimation of Tempered Stable Lévy Models of Infinite Variation
Authors:
José E. Figueroa-López,
Ruoting Gong,
Yuchen Han
Abstract:
We propose a new method for the estimation of a semiparametric tempered stable Lévy model. The estimation procedure combines iteratively an approximate semiparametric method of moment estimator, Truncated Realized Quadratic Variations (TRQV), and a newly found small-time high-order approximation for the optimal threshold of the TRQV of tempered stable processes. The method is tested via simulation…
▽ More
We propose a new method for the estimation of a semiparametric tempered stable Lévy model. The estimation procedure combines iteratively an approximate semiparametric method of moment estimator, Truncated Realized Quadratic Variations (TRQV), and a newly found small-time high-order approximation for the optimal threshold of the TRQV of tempered stable processes. The method is tested via simulations to estimate the volatility and the Blumenthal-Getoor index of the generalized CGMY model as well as the integrated volatility of a Heston-type model with CGMY jumps. The method outperforms other efficient alternatives proposed in the literature when working with a Lévy process (i.e., the volatility is constant), or when the index of jump intensity $Y$ is larger than $3/2$ in the presence of stochastic volatility.
△ Less
Submitted 24 February, 2022; v1 submitted 3 January, 2021;
originally announced January 2021.
-
Transparent Privacy is Principled Privacy
Authors:
Ruobin Gong
Abstract:
In a technical treatment, this article establishes the necessity of transparent privacy for drawing unbiased statistical inference for a wide range of scientific questions. Transparency is a distinct feature enjoyed by differential privacy: the probabilistic mechanism with which the data are privatized can be made public without sabotaging the privacy guarantee. Uncertainty due to transparent priv…
▽ More
In a technical treatment, this article establishes the necessity of transparent privacy for drawing unbiased statistical inference for a wide range of scientific questions. Transparency is a distinct feature enjoyed by differential privacy: the probabilistic mechanism with which the data are privatized can be made public without sabotaging the privacy guarantee. Uncertainty due to transparent privacy may be conceived as a dynamic and controllable component from the total survey error perspective. As the 2020 U.S. Decennial Census adopts differential privacy, constraints imposed on the privatized data products through optimization constitute a threat to transparency and result in limited statistical usability. Transparent privacy presents a viable path toward principled inference from privatized data releases, and shows great promise toward improved reproducibility, accountability, and public trust in modern data curation.
△ Less
Submitted 18 September, 2022; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Efficient Bitwidth Search for Practical Mixed Precision Neural Network
Authors:
Yuhang Li,
Wei Wang,
Haoli Bai,
Ruihao Gong,
Xin Dong,
Fengwei Yu
Abstract:
Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to improve the overall performance. However, it is challenging to find the optimal bitwidth (i.e., precision) for weights and activations of each layer efficiently. Mean…
▽ More
Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to improve the overall performance. However, it is challenging to find the optimal bitwidth (i.e., precision) for weights and activations of each layer efficiently. Meanwhile, it is yet unclear how to perform convolution for weights and activations of different precision efficiently on generic hardware platforms. To resolve these two issues, in this paper, we first propose an Efficient Bitwidth Search (EBS) algorithm, which reuses the meta weights for different quantization bitwidth and thus the strength for each candidate precision can be optimized directly w.r.t the objective without superfluous copies, reducing both the memory and computational cost significantly. Second, we propose a binary decomposition algorithm that converts weights and activations of different precision into binary matrices to make the mixed precision convolution efficient and practical. Experiment results on CIFAR10 and ImageNet datasets demonstrate our mixed precision QNN outperforms the handcrafted uniform bitwidth counterparts and other mixed precision techniques.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Geometric Conditions for the Discrepant Posterior Phenomenon and Connections to Simpson's Paradox
Authors:
Yang Chen,
Ruobin Gong,
Min-ge Xie
Abstract:
The discrepant posterior phenomenon (DPP) is a counter-intuitive phenomenon that can frequently occur in a Bayesian analysis of multivariate parameters. It refers to the phenomenon that a parameter estimate based on a posterior is more extreme than both of those inferred based on either the prior or the likelihood alone. Inferential claims that exhibit DPP defy the common intuition that the poster…
▽ More
The discrepant posterior phenomenon (DPP) is a counter-intuitive phenomenon that can frequently occur in a Bayesian analysis of multivariate parameters. It refers to the phenomenon that a parameter estimate based on a posterior is more extreme than both of those inferred based on either the prior or the likelihood alone. Inferential claims that exhibit DPP defy the common intuition that the posterior is a prior-data compromise, and the phenomenon can be surprisingly ubiquitous in well-behaved Bayesian models. In this paper we revisit this phenomenon and, using point estimation as an example, derive conditions under which the DPP occurs in Bayesian models with exponential quadratic likelihoods and conjugate multivariate Gaussian priors. The family of exponential quadratic likelihood models includes Gaussian models and those models with local asymptotic normality property. We provide an intuitive geometric interpretation of the phenomenon and show that there exists a nontrivial space of marginal directions such that the DPP occurs. We further relate the phenomenon to the Simpson's paradox and discover their deep-rooted connection that is associated with marginalization. We also draw connections with Bayesian computational algorithms when difficult geometry exists. Our discovery demonstrates that DPP is more prevalent than previously understood and anticipated. Theoretical results are complemented by numerical illustrations. Scenarios covered in this study have implications for parameterization, sensitivity analysis, and prior choice for Bayesian modeling.
△ Less
Submitted 12 January, 2022; v1 submitted 22 January, 2020;
originally announced January 2020.
-
A Gibbs sampler for a class of random convex polytopes
Authors:
Pierre E. Jacob,
Ruobin Gong,
Paul T. Edlefsen,
Arthur P. Dempster
Abstract:
We present a Gibbs sampler for the Dempster-Shafer (DS) approach to statistical inference for Categorical distributions. The DS framework extends the Bayesian approach, allows in particular the use of partial prior information, and yields three-valued uncertainty assessments representing probabilities "for", "against", and "don't know" about formal assertions of interest. The proposed algorithm ta…
▽ More
We present a Gibbs sampler for the Dempster-Shafer (DS) approach to statistical inference for Categorical distributions. The DS framework extends the Bayesian approach, allows in particular the use of partial prior information, and yields three-valued uncertainty assessments representing probabilities "for", "against", and "don't know" about formal assertions of interest. The proposed algorithm targets the distribution of a class of random convex polytopes which encapsulate the DS inference. The sampler relies on an equivalence between the iterative constraints of the vertex configuration and the non-negativity of cycles in a fully connected directed graph. Illustrations include the testing of independence in 2x2 contingency tables and parameter estimation of the linkage model.
△ Less
Submitted 21 January, 2021; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Exact Inference with Approximate Computation for Differentially Private Data via Perturbations
Authors:
Ruobin Gong
Abstract:
This paper discusses how two classes of approximate computation algorithms can be adapted, in a modular fashion, to achieve exact statistical inference from differentially private data products. Considered are approximate Bayesian computation for Bayesian inference, and Monte Carlo Expectation-Maximization for likelihood inference. Up to Monte Carlo error, inference from these algorithms is exact…
▽ More
This paper discusses how two classes of approximate computation algorithms can be adapted, in a modular fashion, to achieve exact statistical inference from differentially private data products. Considered are approximate Bayesian computation for Bayesian inference, and Monte Carlo Expectation-Maximization for likelihood inference. Up to Monte Carlo error, inference from these algorithms is exact with respect to the joint specification of both the analyst's original data model, and the curator's differential privacy mechanism. Highlighted is a duality between approximate computation on exact data, and exact computation on approximate data, which can be leveraged by a well-designed computational procedure for statistical inference.
△ Less
Submitted 26 September, 2022; v1 submitted 26 September, 2019;
originally announced September 2019.
-
MarmoNet: a pipeline for automated projection mapping of the common marmoset brain from whole-brain serial two-photon tomography
Authors:
Henrik Skibbe,
Akiya Watakabe,
Ken Nakae,
Carlos Enrique Gutierrez,
Hiromichi Tsukada,
Junichi Hata,
Takashi Kawase,
Rui Gong,
Alexander Woodward,
Kenji Doya,
Hideyuki Okano,
Tetsuo Yamamori,
Shin Ishii
Abstract:
Understanding the connectivity in the brain is an important prerequisite for understanding how the brain processes information. In the Brain/MINDS project, a connectivity study on marmoset brains uses two-photon microscopy fluorescence images of axonal projections to collect the neuron connectivity from defined brain regions at the mesoscopic scale. The processing of the images requires the detect…
▽ More
Understanding the connectivity in the brain is an important prerequisite for understanding how the brain processes information. In the Brain/MINDS project, a connectivity study on marmoset brains uses two-photon microscopy fluorescence images of axonal projections to collect the neuron connectivity from defined brain regions at the mesoscopic scale. The processing of the images requires the detection and segmentation of the axonal tracer signal. The objective is to detect as much tracer signal as possible while not misclassifying other background structures as the signal. This can be challenging because of imaging noise, a cluttered image background, distortions or varying image contrast cause problems.
We are developing MarmoNet, a pipeline that processes and analyzes tracer image data of the common marmoset brain. The pipeline incorporates state-of-the-art machine learning techniques based on artificial convolutional neural networks (CNN) and image registration techniques to extract and map all relevant information in a robust manner. The pipeline processes new images in a fully automated way.
This report introduces the current state of the tracer signal analysis part of the pipeline.
△ Less
Submitted 2 August, 2019;
originally announced August 2019.
-
Simultaneous Inference Under the Vacuous Orientation Assumption
Authors:
Ruobin Gong
Abstract:
I propose a novel approach to simultaneous inference that alleviates the need to specify a correlational structure among marginal errors. The vacuous orientation assumption retains what the normal i.i.d. assumption implies about the distribution of error configuration, but relaxes the implication that the error orientation is isotropic. When a large number of highly dependent hypotheses are tested…
▽ More
I propose a novel approach to simultaneous inference that alleviates the need to specify a correlational structure among marginal errors. The vacuous orientation assumption retains what the normal i.i.d. assumption implies about the distribution of error configuration, but relaxes the implication that the error orientation is isotropic. When a large number of highly dependent hypotheses are tested simultaneously, the proposed model produces calibrated posterior inference by leveraging the logical relationship among them. This stands in contrast to the conservative performance of the Bonferroni correction, even if neither approaches makes assumptions about error dependence. The proposed model employs the Dempster-Shafer Extended Calculus of Probability, and delivers posterior inference in the form of stochastic three-valued logic.
△ Less
Submitted 14 May, 2019;
originally announced May 2019.
-
A Simple Fusion of Deep and Shallow Learning for Acoustic Scene Classification
Authors:
Eduardo Fonseca,
Rong Gong,
Xavier Serra
Abstract:
In the past, Acoustic Scene Classification systems have been based on hand crafting audio features that are input to a classifier. Nowadays, the common trend is to adopt data driven techniques, e.g., deep learning, where audio representations are learned from data. In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approa…
▽ More
In the past, Acoustic Scene Classification systems have been based on hand crafting audio features that are input to a classifier. Nowadays, the common trend is to adopt data driven techniques, e.g., deep learning, where audio representations are learned from data. In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approach where log-scaled mel-spectrograms are input to a convolutional neural network, and a feature engineering approach, where a collection of hand-crafted features is input to a gradient boosting machine. We first show that both methods provide complementary information to some extent. Then, we use a simple late fusion strategy to combine both methods. We report classification accuracy of each method individually and the combined system on the TUT Acoustic Scenes 2017 dataset. The proposed fused system outperforms each of the individual methods and attains a classification accuracy of 72.8% on the evaluation set, improving the baseline system by 11.8%.
△ Less
Submitted 27 June, 2018; v1 submitted 19 June, 2018;
originally announced June 2018.
-
Judicious Judgment Meets Unsettling Updating: Dilation, Sure Loss, and Simpson's Paradox
Authors:
Ruobin Gong,
Xiao-Li Meng
Abstract:
Statistical learning using imprecise probabilities is gaining more attention because it presents an alternative strategy for reducing irreplicable findings by freeing the user from the task of making up unwarranted high-resolution assumptions. However, model updating as a mathematical operation is inherently exact, hence updating imprecise models requires the user's judgment in choosing among comp…
▽ More
Statistical learning using imprecise probabilities is gaining more attention because it presents an alternative strategy for reducing irreplicable findings by freeing the user from the task of making up unwarranted high-resolution assumptions. However, model updating as a mathematical operation is inherently exact, hence updating imprecise models requires the user's judgment in choosing among competing updating rules. These rules often lead to incompatible inferences, and can exhibit unsettling phenomena like dilation, contraction and sure loss, which cannot occur with the Bayes rule and precise probabilities. We revisit a number of famous "paradoxes", including the three prisoners/Monty Hall problem, revealing that a logical fallacy arises from a set of marginally plausible yet jointly incommensurable assumptions when updating the underlying imprecise model. We establish an equivalence between Simpson's paradox and an implicit adoption of a pair of aggregation rules that induce sure loss. We also explore behavioral discrepancies between the generalized Bayes rule, Dempster's rule and the Geometric rule as alternative posterior updating rules for Choquet capacities of order 2. We show that both the generalized Bayes rule and Geometric rule are incapable of updating without prior information regardless of how strong the information in our data is, and that Dempster's rule and the Geometric rule can mathematically contradict each other with respect to dilation and contraction. Our findings show that unsettling updates reflect a collision between the rules' assumptions and the inexactness allowed by the model itself, highlighting the invaluable role of judicious judgment in handling low-resolution information, and the care we must take when applying learning rules to update imprecise probabilities.
△ Less
Submitted 24 December, 2017;
originally announced December 2017.