-
Mixing Samples to Address Weak Overlap in Causal Inference
Authors:
Jaehyuk Jang,
Suehyun Kim,
Kwonsang Lee
Abstract:
In observational studies, the assumption of sufficient overlap (positivity) is fundamental for the identification and estimation of causal effects. Failing to account for this assumption yields inaccurate and potentially infeasible estimators. To address this issue, we introduce a simple yet novel approach, \textit{mixing}, which mitigates overlap violations by constructing a synthetic treated gro…
▽ More
In observational studies, the assumption of sufficient overlap (positivity) is fundamental for the identification and estimation of causal effects. Failing to account for this assumption yields inaccurate and potentially infeasible estimators. To address this issue, we introduce a simple yet novel approach, \textit{mixing}, which mitigates overlap violations by constructing a synthetic treated group that combines treated and control units. Our strategy offers three key advantages. First, it improves the accuracy of the estimator by preserving unbiasedness while reducing variance. The benefit is particularly significant in settings with weak overlap, though the method remains effective regardless of the overlap level. This phenomenon results from the shrinkage of propensity scores in the mixed sample, which enhances robustness to poor overlap. Second, it enables direct estimation of the target estimand without discarding extreme observations or modifying the target population, thus facilitating a straightforward interpretation of the results. Third, the mixing approach is highly adaptable to various weighting schemes, including contemporary methods such as entropy balancing. The estimation of the Mixed IPW (MIPW) estimator is done via M-estimation, and the method extends to a broader class of weighting estimators through a resampling algorithm. We illustrate the mixing approach through extensive simulation studies and provide practical guidance with a real-data analysis.
△ Less
Submitted 4 April, 2025; v1 submitted 16 November, 2024;
originally announced November 2024.
-
Fast and Optimal Changepoint Detection and Localization using Bonferroni Triplets
Authors:
Jayoon Jang,
Guenther Walther
Abstract:
The paper considers the problem of detecting and localizing changepoints in a sequence of independent observations. We propose to evaluate a local test statistic on a triplet of time points, for each such triplet in a particular collection. This collection is sparse enough so that the results of the local tests can simply be combined with a weighted Bonferroni correction. This results in a simple…
▽ More
The paper considers the problem of detecting and localizing changepoints in a sequence of independent observations. We propose to evaluate a local test statistic on a triplet of time points, for each such triplet in a particular collection. This collection is sparse enough so that the results of the local tests can simply be combined with a weighted Bonferroni correction. This results in a simple and fast method, {\sl Lean Bonferroni Changepoint detection} (LBD), that provides finite sample guarantees for the existance of changepoints as well as simultaneous confidence intervals for their locations. LBD is free of tuning parameters, and we show that LBD allows optimal inference for the detection of changepoints. To this end, we provide a lower bound for the critical constant that measures the difficulty of the changepoint detection problem, and we show that LBD attains this critical constant. We illustrate LBD for a number of distributional settings, namely when the observations are homoscedastic normal with known or unknown variance, for observations from a natural exponential family, and in a nonparametric setting where we assume only exchangeability for segments without a changepoint.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Tight Distribution-Free Confidence Intervals for Local Quantile Regression
Authors:
Jayoon Jang,
Emmanuel Candès
Abstract:
It is well known that it is impossible to construct useful confidence intervals (CIs) about the mean or median of a response $Y$ conditional on features $X = x$ without making strong assumptions about the joint distribution of $X$ and $Y$. This paper introduces a new framework for reasoning about problems of this kind by casting the conditional problem at different levels of resolution, ranging fr…
▽ More
It is well known that it is impossible to construct useful confidence intervals (CIs) about the mean or median of a response $Y$ conditional on features $X = x$ without making strong assumptions about the joint distribution of $X$ and $Y$. This paper introduces a new framework for reasoning about problems of this kind by casting the conditional problem at different levels of resolution, ranging from coarse to fine localization. In each of these problems, we consider local quantiles defined as the marginal quantiles of $Y$ when $(X,Y)$ is resampled in such a way that samples $X$ near $x$ are up-weighted while the conditional distribution $Y \mid X$ does not change. We then introduce the Weighted Quantile method, which asymptotically produces the uniformly most accurate confidence intervals for these local quantiles no matter the (unknown) underlying distribution. Another method, namely, the Quantile Rejection method, achieves finite sample validity under no assumption whatsoever. We conduct extensive numerical studies demonstrating that both of these methods are valid. In particular, we show that the Weighted Quantile procedure achieves nominal coverage as soon as the effective sample size is in the range of 10 to 20.
△ Less
Submitted 26 January, 2024; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Credal Bayesian Deep Learning
Authors:
Michele Caprio,
Souradeep Dutta,
Kuk Jin Jang,
Vivian Lin,
Radoslav Ivanov,
Oleg Sokolsky,
Insup Lee
Abstract:
Uncertainty quantification and robustness to distribution shifts are important goals in machine learning and artificial intelligence. Although Bayesian Neural Networks (BNNs) allow for uncertainty in the predictions to be assessed, different sources of predictive uncertainty cannot be distinguished properly. We present Credal Bayesian Deep Learning (CBDL). Heuristically, CBDL allows to train an (u…
▽ More
Uncertainty quantification and robustness to distribution shifts are important goals in machine learning and artificial intelligence. Although Bayesian Neural Networks (BNNs) allow for uncertainty in the predictions to be assessed, different sources of predictive uncertainty cannot be distinguished properly. We present Credal Bayesian Deep Learning (CBDL). Heuristically, CBDL allows to train an (uncountably) infinite ensemble of BNNs, using only finitely many elements. This is possible thanks to prior and likelihood finitely generated credal sets (FGCSs), a concept from the imprecise probability literature. Intuitively, convex combinations of a finite collection of prior-likelihood pairs are able to represent infinitely many such pairs. After training, CBDL outputs a set of posteriors on the parameters of the neural network. At inference time, such posterior set is used to derive a set of predictive distributions that is in turn utilized to distinguish between (predictive) aleatoric and epistemic uncertainties, and to quantify them. The predictive set also produces either (i) a collection of outputs enjoying desirable probabilistic guarantees, or (ii) the single output that is deemed the best, that is, the one having the highest predictive lower probability -- another imprecise-probabilistic concept. CBDL is more robust than single BNNs to prior and likelihood misspecification, and to distribution shift. We show that CBDL is better at quantifying and disentangling different types of (predictive) uncertainties than single BNNs and ensemble of BNNs. In addition, we apply CBDL to two case studies to demonstrate its downstream tasks capabilities: one, for motion prediction in autonomous driving scenarios, and two, to model blood glucose and insulin dynamics for artificial pancreas control. We show that CBDL performs better when compared to an ensemble of BNNs baseline.
△ Less
Submitted 22 October, 2024; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Time-Aware Tensor Decomposition for Missing Entry Prediction
Authors:
Dawon Ahn,
Jun-Gi Jang,
U Kang
Abstract:
Given a time-evolving tensor with missing entries, how can we effectively factorize it for precisely predicting the missing entries? Tensor factorization has been extensively utilized for analyzing various multi-dimensional real-world data. However, existing models for tensor factorization have disregarded the temporal property for tensor factorization while most real-world data are closely relate…
▽ More
Given a time-evolving tensor with missing entries, how can we effectively factorize it for precisely predicting the missing entries? Tensor factorization has been extensively utilized for analyzing various multi-dimensional real-world data. However, existing models for tensor factorization have disregarded the temporal property for tensor factorization while most real-world data are closely related to time. Moreover, they do not address accuracy degradation due to the sparsity of time slices. The essential problems of how to exploit the temporal property for tensor decomposition and consider the sparsity of time slices remain unresolved. In this paper, we propose TATD (Time-Aware Tensor Decomposition), a novel tensor decomposition method for real-world temporal tensors. TATD is designed to exploit temporal dependency and time-varying sparsity of real-world temporal tensors. We propose a new smoothing regularization with Gaussian kernel for modeling time dependency. Moreover, we improve the performance of TATD by considering time-varying sparsity. We design an alternating optimization scheme suitable for temporal tensor factorization with our smoothing regularization. Extensive experiments show that TATD provides the state-of-the-art accuracy for decomposing temporal tensors.
△ Less
Submitted 16 December, 2020;
originally announced December 2020.
-
Systemic Risk in Market Microstructure of Crude Oil and Gasoline Futures Prices: A Hawkes Flocking Model Approach
Authors:
Hyun Jin Jang,
Kiseop Lee,
Kyungsub Lee
Abstract:
We propose the Hawkes flocking model that assesses systemic risk in high-frequency processes at the two perspectives -- endogeneity and interactivity. We examine the futures markets of WTI crude oil and gasoline for the past decade, and perform a comparative analysis with conditional value-at-risk as a benchmark measure. In terms of high-frequency structure, we derive the empirical findings. The e…
▽ More
We propose the Hawkes flocking model that assesses systemic risk in high-frequency processes at the two perspectives -- endogeneity and interactivity. We examine the futures markets of WTI crude oil and gasoline for the past decade, and perform a comparative analysis with conditional value-at-risk as a benchmark measure. In terms of high-frequency structure, we derive the empirical findings. The endogenous systemic risk in WTI was significantly higher than that in gasoline, and the level at which gasoline affects WTI was constantly higher than in the opposite case. Moreover, although the relative influence's degree was asymmetric, its difference has gradually reduced.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Online Class-Incremental Continual Learning with Adversarial Shapley Value
Authors:
Dongsub Shim,
Zheda Mai,
Jihwan Jeong,
Scott Sanner,
Hyunwoo Kim,
Jongseong Jang
Abstract:
As image-based deep learning becomes pervasive on every device, from cell phones to smart watches, there is a growing need to develop methods that continually learn from data while minimizing memory footprint and power consumption. While memory replay techniques have shown exceptional promise for this task of continual learning, the best method for selecting which buffered images to replay is stil…
▽ More
As image-based deep learning becomes pervasive on every device, from cell phones to smart watches, there is a growing need to develop methods that continually learn from data while minimizing memory footprint and power consumption. While memory replay techniques have shown exceptional promise for this task of continual learning, the best method for selecting which buffered images to replay is still an open question. In this paper, we specifically focus on the online class-incremental setting where a model needs to learn new classes continually from an online data stream. To this end, we contribute a novel Adversarial Shapley value scoring method that scores memory data samples according to their ability to preserve latent decision boundaries for previously observed classes (to maintain learning stability and avoid forgetting) while interfering with latent decision boundaries of current classes being learned (to encourage plasticity and optimal learning of new class boundaries). Overall, we observe that our proposed ASER method provides competitive or improved performance compared to state-of-the-art replay-based continual learning methods on a variety of datasets.
△ Less
Submitted 22 March, 2021; v1 submitted 31 August, 2020;
originally announced September 2020.
-
Fast Partial Fourier Transform
Authors:
Yong-chan Park,
Jun-Gi Jang,
U Kang
Abstract:
Given a time series vector, how can we efficiently compute a specified part of Fourier coefficients? Fast Fourier transform (FFT) is a widely used algorithm that computes the discrete Fourier transform in many machine learning applications. Despite its pervasive use, all known FFT algorithms do not provide a fine-tuning option for the user to specify one's demand, that is, the output size (the num…
▽ More
Given a time series vector, how can we efficiently compute a specified part of Fourier coefficients? Fast Fourier transform (FFT) is a widely used algorithm that computes the discrete Fourier transform in many machine learning applications. Despite its pervasive use, all known FFT algorithms do not provide a fine-tuning option for the user to specify one's demand, that is, the output size (the number of Fourier coefficients to be computed) is algorithmically determined by the input size. This matters because not every application using FFT requires the whole spectrum of the frequency domain, resulting in an inefficiency due to extra computation. In this paper, we propose a fast Partial Fourier Transform (PFT), a careful modification of the Cooley-Tukey algorithm that enables one to specify an arbitrary consecutive range where the coefficients should be computed. We derive the asymptotic time complexity of PFT with respect to input and output sizes, as well as its numerical accuracy. Experimental results show that our algorithm outperforms the state-of-the-art FFT algorithms, with an order of magnitude of speedup for sufficiently small output sizes without sacrificing accuracy.
△ Less
Submitted 28 August, 2020;
originally announced August 2020.
-
A Bivariate Compound Dynamic Contagion Process for Cyber Insurance
Authors:
Jiwook Jang,
Rosy Oh
Abstract:
As corporates and governments become more digital, they become vulnerable to various forms of cyber attack. Cyber insurance products have been used as risk management tools, yet their pricing does not reflect actual risk, including that of multiple, catastrophic and contagious losses. For the modelling of aggregate losses from cyber events, in this paper we introduce a bivariate compound dynamic c…
▽ More
As corporates and governments become more digital, they become vulnerable to various forms of cyber attack. Cyber insurance products have been used as risk management tools, yet their pricing does not reflect actual risk, including that of multiple, catastrophic and contagious losses. For the modelling of aggregate losses from cyber events, in this paper we introduce a bivariate compound dynamic contagion process, where the bivariate dynamic contagion process is a point process that includes both externally excited joint jumps, which are distributed according to a shot noise Cox process and two separate self-excited jumps, which are distributed according to the branching structure of a Hawkes process with an exponential fertility rate, respectively. We analyse the theoretical distributional properties for these processes systematically, based on the piecewise deterministic Markov process developed by Davis (1984) and the univariate dynamic contagion process theory developed by Dassios and Zhao (2011). The analytic expression of the Laplace transform of the compound process and its moments are presented, which have the potential to be applicable to a variety of problems in credit, insurance, market and other operational risks. As an application of this process, we provide insurance premium calculations based on its moments. Numerical examples show that this compound process can be used for the modelling of aggregate losses from cyber events. We also provide the simulation algorithm for statistical analysis, further business applications and research.
△ Less
Submitted 12 June, 2020;
originally announced July 2020.
-
Faster DBSCAN via subsampled similarity queries
Authors:
Heinrich Jiang,
Jennifer Jang,
Jakub Łącki
Abstract:
DBSCAN is a popular density-based clustering algorithm. It computes the $ε$-neighborhood graph of a dataset and uses the connected components of the high-degree nodes to decide the clusters. However, the full neighborhood graph may be too costly to compute with a worst-case complexity of $O(n^2)$. In this paper, we propose a simple variant called SNG-DBSCAN, which clusters based on a subsampled…
▽ More
DBSCAN is a popular density-based clustering algorithm. It computes the $ε$-neighborhood graph of a dataset and uses the connected components of the high-degree nodes to decide the clusters. However, the full neighborhood graph may be too costly to compute with a worst-case complexity of $O(n^2)$. In this paper, we propose a simple variant called SNG-DBSCAN, which clusters based on a subsampled $ε$-neighborhood graph, only requires access to similarity queries for pairs of points and in particular avoids any complex data structures which need the embeddings of the data points themselves. The runtime of the procedure is $O(sn^2)$, where $s$ is the sampling rate. We show under some natural theoretical assumptions that $s \approx \log n/n$ is sufficient for statistical cluster recovery guarantees leading to an $O(n\log n)$ complexity. We provide an extensive experimental analysis showing that on large datasets, one can subsample as little as $0.1\%$ of the neighborhood graph, leading to as much as over 200x speedup and 250x reduction in RAM consumption compared to scikit-learn's implementation of DBSCAN, while still maintaining competitive clustering performance.
△ Less
Submitted 21 October, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Neutralizing Gender Bias in Word Embedding with Latent Disentanglement and Counterfactual Generation
Authors:
Seungjae Shin,
Kyungwoo Song,
JoonHo Jang,
Hyemi Kim,
Weonyoung Joo,
Il-Chul Moon
Abstract:
Recent research demonstrates that word embeddings, trained on the human-generated corpus, have strong gender biases in embedding spaces, and these biases can result in the discriminative results from the various downstream tasks. Whereas the previous methods project word embeddings into a linear subspace for debiasing, we introduce a \textit{Latent Disentanglement} method with a siamese auto-encod…
▽ More
Recent research demonstrates that word embeddings, trained on the human-generated corpus, have strong gender biases in embedding spaces, and these biases can result in the discriminative results from the various downstream tasks. Whereas the previous methods project word embeddings into a linear subspace for debiasing, we introduce a \textit{Latent Disentanglement} method with a siamese auto-encoder structure with an adapted gradient reversal layer. Our structure enables the separation of the semantic latent information and gender latent information of given word into the disjoint latent dimensions. Afterwards, we introduce a \textit{Counterfactual Generation} to convert the gender information of words, so the original and the modified embeddings can produce a gender-neutralized word embedding after geometric alignment regularization, without loss of semantic information. From the various quantitative and qualitative debiasing experiments, our method shows to be better than existing debiasing methods in debiasing word embeddings. In addition, Our method shows the ability to preserve semantic information during debiasing by minimizing the semantic information losses for extrinsic NLP downstream tasks.
△ Less
Submitted 3 November, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Orthogonality Constrained Multi-Head Attention For Keyword Spotting
Authors:
Mingu Lee,
Jinkyu Lee,
Hye Jin Jang,
Byeonggeun Kim,
Wonil Chang,
Kyuwoong Hwang
Abstract:
Multi-head attention mechanism is capable of learning various representations from sequential data while paying attention to different subsequences, e.g., word-pieces or syllables in a spoken word. From the subsequences, it retrieves richer information than a single-head attention which only summarizes the whole sequence into one context vector. However, a naive use of the multi-head attention doe…
▽ More
Multi-head attention mechanism is capable of learning various representations from sequential data while paying attention to different subsequences, e.g., word-pieces or syllables in a spoken word. From the subsequences, it retrieves richer information than a single-head attention which only summarizes the whole sequence into one context vector. However, a naive use of the multi-head attention does not guarantee such richness as the attention heads may have positional and representational redundancy. In this paper, we propose a regularization technique for multi-head attention mechanism in an end-to-end neural keyword spotting system. Augmenting regularization terms which penalize positional and contextual non-orthogonality between the attention heads encourages to output different representations from separate subsequences, which in turn enables leveraging structured information without explicit sequence models such as hidden Markov models. In addition, intra-head contextual non-orthogonality regularization encourages each attention head to have similar representations across keyword examples, which helps classification by reducing feature variability. The experimental results demonstrate that the proposed regularization technique significantly improves the keyword spotting performance for the keyword "Hey Snapdragon".
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
Bivariate Beta-LSTM
Authors:
Kyungwoo Song,
JoonHo Jang,
Seung jae Shin,
Il-Chul Moon
Abstract:
Long Short-Term Memory (LSTM) infers the long term dependency through a cell state maintained by the input and the forget gate structures, which models a gate output as a value in [0,1] through a sigmoid function. However, due to the graduality of the sigmoid function, the sigmoid gate is not flexible in representing multi-modality or skewness. Besides, the previous models lack modeling on the cor…
▽ More
Long Short-Term Memory (LSTM) infers the long term dependency through a cell state maintained by the input and the forget gate structures, which models a gate output as a value in [0,1] through a sigmoid function. However, due to the graduality of the sigmoid function, the sigmoid gate is not flexible in representing multi-modality or skewness. Besides, the previous models lack modeling on the correlation between the gates, which would be a new method to adopt inductive bias for a relationship between previous and current input. This paper proposes a new gate structure with the bivariate Beta distribution. The proposed gate structure enables probabilistic modeling on the gates within the LSTM cell so that the modelers can customize the cell state flow with priors and distributions. Moreover, we theoretically show the higher upper bound of the gradient compared to the sigmoid function, and we empirically observed that the bivariate Beta distribution gate structure provides higher gradient values in training. We demonstrate the effectiveness of bivariate Beta gate structure on the sentence classification, image classification, polyphonic music modeling, and image caption generation.
△ Less
Submitted 16 November, 2019; v1 submitted 25 May, 2019;
originally announced May 2019.
-
k-Same-Siamese-GAN: k-Same Algorithm with Generative Adversarial Network for Facial Image De-identification with Hyperparameter Tuning and Mixed Precision Training
Authors:
Yi-Lun Pan,
Min-Jhih Huang,
Kuo-Teng Ding,
Ja-Ling Wu,
Jyh-Shing Jang
Abstract:
For a data holder, such as a hospital or a government entity, who has a privately held collection of personal data, in which the revealing and/or processing of the personal identifiable data is restricted and prohibited by law. Then, "how can we ensure the data holder does conceal the identity of each individual in the imagery of personal data while still preserving certain useful aspects of the d…
▽ More
For a data holder, such as a hospital or a government entity, who has a privately held collection of personal data, in which the revealing and/or processing of the personal identifiable data is restricted and prohibited by law. Then, "how can we ensure the data holder does conceal the identity of each individual in the imagery of personal data while still preserving certain useful aspects of the data after de-identification?" becomes a challenge issue. In this work, we propose an approach towards high-resolution facial image de-identification, called k-Same-Siamese-GAN, which leverages the k-Same-Anonymity mechanism, the Generative Adversarial Network, and the hyperparameter tuning methods. Moreover, to speed up model training and reduce memory consumption, the mixed precision training technique is also applied to make kSS-GAN provide guarantees regarding privacy protection on close-form identities and be trained much more efficiently as well. Finally, to validate its applicability, the proposed work has been applied to actual datasets - RafD and CelebA for performance testing. Besides protecting privacy of high-resolution facial images, the proposed system is also justified for its ability in automating parameter tuning and breaking through the limitation of the number of adjustable parameters.
△ Less
Submitted 17 September, 2019; v1 submitted 27 March, 2019;
originally announced April 2019.
-
DBSCAN++: Towards fast and scalable density clustering
Authors:
Jennifer Jang,
Heinrich Jiang
Abstract:
DBSCAN is a classical density-based clustering procedure with tremendous practical relevance. However, DBSCAN implicitly needs to compute the empirical density for each sample point, leading to a quadratic worst-case time complexity, which is too slow on large datasets. We propose DBSCAN++, a simple modification of DBSCAN which only requires computing the densities for a chosen subset of points. W…
▽ More
DBSCAN is a classical density-based clustering procedure with tremendous practical relevance. However, DBSCAN implicitly needs to compute the empirical density for each sample point, leading to a quadratic worst-case time complexity, which is too slow on large datasets. We propose DBSCAN++, a simple modification of DBSCAN which only requires computing the densities for a chosen subset of points. We show empirically that, compared to traditional DBSCAN, DBSCAN++ can provide not only competitive performance but also added robustness in the bandwidth hyperparameter while taking a fraction of the runtime. We also present statistical consistency guarantees showing the trade-off between computational cost and estimation rates. Surprisingly, up to a certain point, we can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a sub-quadratic algorithm that attains minimax optimal rates for level-set estimation, a quality that may be of independent interest.
△ Less
Submitted 17 May, 2019; v1 submitted 31 October, 2018;
originally announced October 2018.
-
Deep Asymmetric Networks with a Set of Node-wise Variant Activation Functions
Authors:
Jinhyeok Jang,
Hyunjoong Cho,
Jaehong Kim,
Jaeyeon Lee,
Seungjoon Yang
Abstract:
This work presents deep asymmetric networks with a set of node-wise variant activation functions. The nodes' sensitivities are affected by activation function selections such that the nodes with smaller indices become increasingly more sensitive. As a result, features learned by the nodes are sorted by the node indices in the order of their importance. Asymmetric networks not only learn input feat…
▽ More
This work presents deep asymmetric networks with a set of node-wise variant activation functions. The nodes' sensitivities are affected by activation function selections such that the nodes with smaller indices become increasingly more sensitive. As a result, features learned by the nodes are sorted by the node indices in the order of their importance. Asymmetric networks not only learn input features but also the importance of those features. Nodes of lesser importance in asymmetric networks can be pruned to reduce the complexity of the networks, and the pruned networks can be retrained without incurring performance losses. We validate the feature-sorting property using both shallow and deep asymmetric networks as well as deep asymmetric networks transferred from famous networks.
△ Less
Submitted 17 May, 2019; v1 submitted 11 September, 2018;
originally announced September 2018.
-
Security and Privacy Issues in Deep Learning
Authors:
Ho Bae,
Jaehee Jang,
Dahuin Jung,
Hyemi Jang,
Heonseok Ha,
Hyungyu Lee,
Sungroh Yoon
Abstract:
To promote secure and private artificial intelligence (SPAI), we review studies on the model security and data privacy of DNNs. Model security allows system to behave as intended without being affected by malicious external influences that can compromise its integrity and efficiency. Security attacks can be divided based on when they occur: if an attack occurs during training, it is known as a poi…
▽ More
To promote secure and private artificial intelligence (SPAI), we review studies on the model security and data privacy of DNNs. Model security allows system to behave as intended without being affected by malicious external influences that can compromise its integrity and efficiency. Security attacks can be divided based on when they occur: if an attack occurs during training, it is known as a poisoning attack, and if it occurs during inference (after training) it is termed an evasion attack. Poisoning attacks compromise the training process by corrupting the data with malicious examples, while evasion attacks use adversarial examples to disrupt entire classification process. Defenses proposed against such attacks include techniques to recognize and remove malicious data, train a model to be insensitive to such data, and mask the model's structure and parameters to render attacks more challenging to implement. Furthermore, the privacy of the data involved in model training is also threatened by attacks such as the model-inversion attack, or by dishonest service providers of AI applications. To maintain data privacy, several solutions that combine existing data-privacy techniques have been proposed, including differential privacy and modern cryptography techniques. In this paper, we describe the notions of some of methods, e.g., homomorphic encryption, and review their advantages and challenges when implemented in deep-learning models.
△ Less
Submitted 9 March, 2021; v1 submitted 31 July, 2018;
originally announced July 2018.
-
Backpropagation with N-D Vector-Valued Neurons Using Arbitrary Bilinear Products
Authors:
Zhe-Cheng Fan,
Tak-Shing T. Chan,
Yi-Hsuan Yang,
Jyh-Shing R. Jang
Abstract:
Vector-valued neural learning has emerged as a promising direction in deep learning recently. Traditionally, training data for neural networks (NNs) are formulated as a vector of scalars; however, its performance may not be optimal since associations among adjacent scalars are not modeled. In this paper, we propose a new vector neural architecture called the Arbitrary BIlinear Product Neural Netwo…
▽ More
Vector-valued neural learning has emerged as a promising direction in deep learning recently. Traditionally, training data for neural networks (NNs) are formulated as a vector of scalars; however, its performance may not be optimal since associations among adjacent scalars are not modeled. In this paper, we propose a new vector neural architecture called the Arbitrary BIlinear Product Neural Network (ABIPNN), which processes information as vectors in each neuron, and the feedforward projections are defined using arbitrary bilinear products. Such bilinear products can include circular convolution, seven-dimensional vector product, skew circular convolution, reversed- time circular convolution, or other new products not seen in previous work. As a proof-of-concept, we apply our proposed network to multispectral image denoising and singing voice sepa- ration. Experimental results show that ABIPNN gains substantial improvements when compared to conventional NNs, suggesting that associations are learned during training.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA
Authors:
Seongsik Park,
Jaehee Jang,
Seijoon Kim,
Sungroh Yoon
Abstract:
Memory-augmented neural networks (MANNs) are designed for question-answering tasks. It is difficult to run a MANN effectively on accelerators designed for other neural networks (NNs), in particular on mobile devices, because MANNs require recurrent data paths and various types of operations related to external memory access. We implement an accelerator for MANNs on a field-programmable gate array…
▽ More
Memory-augmented neural networks (MANNs) are designed for question-answering tasks. It is difficult to run a MANN effectively on accelerators designed for other neural networks (NNs), in particular on mobile devices, because MANNs require recurrent data paths and various types of operations related to external memory access. We implement an accelerator for MANNs on a field-programmable gate array (FPGA) based on a data flow architecture. Inference times are also reduced by inference thresholding, which is a data-based maximum inner-product search specialized for natural language tasks. Measurements on the bAbI data show that the energy efficiency of the accelerator (FLOPS/kJ) was higher than that of an NVIDIA TITAN V GPU by a factor of about 125, increasing to 140 with inference thresholding
△ Less
Submitted 11 February, 2019; v1 submitted 21 May, 2018;
originally announced May 2018.
-
Quickshift++: Provably Good Initializations for Sample-Based Mean Shift
Authors:
Heinrich Jiang,
Jennifer Jang,
Samory Kpotufe
Abstract:
We provide initial seedings to the Quick Shift clustering algorithm, which approximate the locally high-density regions of the data. Such seedings act as more stable and expressive cluster-cores than the singleton modes found by Quick Shift. We establish statistical consistency guarantees for this modification. We then show strong clustering performance on real datasets as well as promising applic…
▽ More
We provide initial seedings to the Quick Shift clustering algorithm, which approximate the locally high-density regions of the data. Such seedings act as more stable and expressive cluster-cores than the singleton modes found by Quick Shift. We establish statistical consistency guarantees for this modification. We then show strong clustering performance on real datasets as well as promising applications to image segmentation.
△ Less
Submitted 21 May, 2018;
originally announced May 2018.
-
Automatic Estimation of Fetal Abdominal Circumference from Ultrasound Images
Authors:
Jaeseong Jang,
Yejin Park,
Bukweon Kim,
Sung Min Lee,
Ja-Young Kwon,
Jin Keun Seo
Abstract:
Ultrasound diagnosis is routinely used in obstetrics and gynecology for fetal biometry, and owing to its time-consuming process, there has been a great demand for automatic estimation. However, the automated analysis of ultrasound images is complicated because they are patient-specific, operator-dependent, and machine-specific. Among various types of fetal biometry, the accurate estimation of abdo…
▽ More
Ultrasound diagnosis is routinely used in obstetrics and gynecology for fetal biometry, and owing to its time-consuming process, there has been a great demand for automatic estimation. However, the automated analysis of ultrasound images is complicated because they are patient-specific, operator-dependent, and machine-specific. Among various types of fetal biometry, the accurate estimation of abdominal circumference (AC) is especially difficult to perform automatically because the abdomen has low contrast against surroundings, non-uniform contrast, and irregular shape compared to other parameters.We propose a method for the automatic estimation of the fetal AC from 2D ultrasound data through a specially designed convolutional neural network (CNN), which takes account of doctors' decision process, anatomical structure, and the characteristics of the ultrasound image. The proposed method uses CNN to classify ultrasound images (stomach bubble, amniotic fluid, and umbilical vein) and Hough transformation for measuring AC. We test the proposed method using clinical ultrasound data acquired from 56 pregnant women. Experimental results show that, with relatively small training samples, the proposed CNN provides sufficient classification results for AC estimation through the Hough transformation. The proposed method automatically estimates AC from ultrasound images. The method is quantitatively evaluated, and shows stable performance in most cases and even for ultrasound images deteriorated by shadowing artifacts. As a result of experiments for our acceptance check, the accuracies are 0.809 and 0.771 with the expert 1 and expert 2, respectively, while the accuracy between the two experts is 0.905. However, for cases of oversized fetus, when the amniotic fluid is not observed or the abdominal area is distorted, it could not correctly estimate AC.
△ Less
Submitted 2 November, 2017; v1 submitted 9 February, 2017;
originally announced February 2017.