Search | arXiv e-print repository

fastrerandomize: Fast Rerandomization Using Accelerated Computing

Authors: Rebecca Goldstein, Connor T. Jerzak, Aniket Kamat, Fucheng Warren Zhu

Abstract: We introduce fastrerandomize, an R package that implements novel algorithmic approaches to rerandomization in experimental design. Rerandomization improves precision by discarding treatment assignments until covariate balance meets predefined thresholds, but existing implementations often struggle with computational demands in large-scale settings. fastrerandomize addresses these limitations throu… ▽ More We introduce fastrerandomize, an R package that implements novel algorithmic approaches to rerandomization in experimental design. Rerandomization improves precision by discarding treatment assignments until covariate balance meets predefined thresholds, but existing implementations often struggle with computational demands in large-scale settings. fastrerandomize addresses these limitations through three key innovations: (1) optional hardware acceleration via GPU/TPU backends, (2) memory-efficient key-based storage that avoids explicit randomization storage, and (3) computational optimizations through auto-vectorization and just-in-time compilation. This algorithmic framework enables exact or Monte Carlo generation of rerandomized designs even with billions of candidate randomizations and stringent balance thresholds. Simulations demonstrate substantial performance gains over existing implementations (greater than 34-fold speedup reported here), particularly in high-dimensional settings. By integrating modern computational techniques with principled statistical methods, fastrerandomize extends the practical applicability of rerandomization to experimental designs that were previously computationally intractable. △ Less

Submitted 14 April, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

Comments: 38 pages, 10 figures

MSC Class: 62K10; 65C60 ACM Class: G.3; G.4

arXiv:2411.02134 [pdf, other]

Optimizing Multi-Scale Representations to Detect Effect Heterogeneity Using Earth Observation and Computer Vision: Applications to Two Anti-Poverty RCTs

Authors: Fucheng Warren Zhu, Connor T. Jerzak, Adel Daoud

Abstract: Earth Observation (EO) data are increasingly used in policy analysis by enabling granular estimation of conditional average treatment effects (CATE). However, a challenge in EO-based causal inference is determining the scale of the input satellite imagery -- balancing the trade-off between capturing fine-grained individual heterogeneity in smaller images and broader contextual information in large… ▽ More Earth Observation (EO) data are increasingly used in policy analysis by enabling granular estimation of conditional average treatment effects (CATE). However, a challenge in EO-based causal inference is determining the scale of the input satellite imagery -- balancing the trade-off between capturing fine-grained individual heterogeneity in smaller images and broader contextual information in larger ones. This paper introduces Multi-Scale Representation Concatenation, a set of composable procedures that transform arbitrary single-scale EO-based CATE estimation algorithms into multi-scale ones. We benchmark the performance of Multi-Scale Representation Concatenation on a CATE estimation pipeline that combines Vision Transformer (ViT) models (which encode images) with Causal Forests (CFs) to obtain CATE estimates from those encodings. We first perform simulation studies where the causal mechanism is known, showing that our multi-scale approach captures information relevant to effect heterogeneity that single-scale ViT models fail to capture as measured by $R^2$. We then apply the multi-scale method to two randomized controlled trials (RCTs) conducted in Peru and Uganda using Landsat satellite imagery. As we do not have access to ground truth CATEs in the RCT analysis, the Rank Average Treatment Effect Ratio (RATE Ratio) measure is employed to assess performance. Results indicate that Multi-Scale Representation Concatenation improves the performance of deep learning models in EO-based CATE estimation without the complexity of designing new multi-scale architectures for a specific use case. The application of Multi-Scale Representation Concatenation could have meaningful policy benefits -- e.g., potentially increasing the impact of poverty alleviation programs without additional resource expenditure. △ Less

Submitted 15 March, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

Comments: To appear in: Conference on Causal Learning and Reasoning, 2025

ACM Class: I.4.7; I.4.9

arXiv:2403.00224 [pdf, other]

Tobit models for count time series

Authors: Christian H. Weiß, Fukang Zhu

Abstract: Several models for count time series have been developed during the last decades, often inspired by traditional autoregressive moving average (ARMA) models for real-valued time series, including integer-valued ARMA (INARMA) and integer-valued generalized autoregressive conditional heteroscedasticity (INGARCH) models. Both INARMA and INGARCH models exhibit an ARMA-like autocorrelation function (ACF… ▽ More Several models for count time series have been developed during the last decades, often inspired by traditional autoregressive moving average (ARMA) models for real-valued time series, including integer-valued ARMA (INARMA) and integer-valued generalized autoregressive conditional heteroscedasticity (INGARCH) models. Both INARMA and INGARCH models exhibit an ARMA-like autocorrelation function (ACF). To achieve negative ACF values within the class of INGARCH models, log and softplus link functions are suggested in the literature, where the softplus approach leads to conditional linearity in good approximation. However, the softplus approach is limited to the INGARCH family for unbounded counts, i.e. it can neither be used for bounded counts, nor for count processes from the INARMA family. In this paper, we present an alternative solution, named the Tobit approach, for achieving approximate linearity together with negative ACF values, which is more generally applicable than the softplus approach. A Skellam--Tobit INGARCH model for unbounded counts is studied in detail, including stationarity, approximate computation of moments, maximum likelihood and censored least absolute deviations estimation for unknown parameters and corresponding simulations. Extensions of the Tobit approach to other situations are also discussed, including underlying discrete distributions, INAR models, and bounded counts. Three real-data examples are considered to illustrate the usefulness of the new approach. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.15772 [pdf, other]

Mean-preserving rounding integer-valued ARMA models

Authors: Christian H. Weiß, Fukang Zhu

Abstract: In the past four decades, research on count time series has made significant progress, but research on $\mathbb{Z}$-valued time series is relatively rare. Existing $\mathbb{Z}$-valued models are mainly of autoregressive structure, where the use of the rounding operator is very natural. Because of the discontinuity of the rounding operator, the formulation of the corresponding model identifiability… ▽ More In the past four decades, research on count time series has made significant progress, but research on $\mathbb{Z}$-valued time series is relatively rare. Existing $\mathbb{Z}$-valued models are mainly of autoregressive structure, where the use of the rounding operator is very natural. Because of the discontinuity of the rounding operator, the formulation of the corresponding model identifiability conditions and the computation of parameter estimators need special attention. It is also difficult to derive closed-form formulae for crucial stochastic properties. We rediscover a stochastic rounding operator, referred to as mean-preserving rounding, which overcomes the above drawbacks. Then, a novel class of $\mathbb{Z}$-valued ARMA models based on the new operator is proposed, and the existence of stationary solutions of the models is established. Stochastic properties including closed-form formulae for (conditional) moments, autocorrelation function, and conditional distributions are obtained. The advantages of our novel model class compared to existing ones are demonstrated. In particular, our model construction avoids identifiability issues such that maximum likelihood estimation is possible. A simulation study is provided, and the appealing performance of the new models is shown by several real-world data sets. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.11425 [pdf, ps, other]

Online Resource Allocation with Average Budget Constraints

Authors: Ruicheng Ao, Hongyu Chen, David Simchi-Levi, Feng Zhu

Abstract: We consider the problem of online resource allocation with average budget constraints. At each time point the decision maker makes an irrevocable decision of whether to accept or reject a request before the next request arrives with the goal to maximize the cumulative rewards. In contrast to existing literature requiring the total resource consumption is below a certain level, we require the avera… ▽ More We consider the problem of online resource allocation with average budget constraints. At each time point the decision maker makes an irrevocable decision of whether to accept or reject a request before the next request arrives with the goal to maximize the cumulative rewards. In contrast to existing literature requiring the total resource consumption is below a certain level, we require the average resource consumption per accepted request does not exceed a given threshold. This problem can be casted as an online knapsack problem with exogenous random budget replenishment, and can find applications in various fields such as online anomaly detection, sequential advertising, and per-capita public service providers. We start with general arrival distributions and show that a simple policy achieves a $O(\sqrt{T})$ regret. We complement the result by showing that such a regret growing rate is in general not improvable. We then shift our focus to discrete arrival distributions. We find that many existing re-solving heuristics in the online resource allocation literature, albeit achieve bounded loss in canonical settings, may incur a $Ω(\sqrt{T})$ or even a $Ω(T)$ regret. With the observation that canonical policies tend to be too optimistic and over accept arrivals, we propose a novel policy that incorporates budget safety buffers. It turns out that a little more safety can greatly enhance efficiency -- small additional logarithmic buffers suffice to reduce the regret from $Ω(\sqrt{T})$ or even $Ω(T)$ to $O(\ln^2 T)$. From a practical perspective, we extend the policy to the scenario with continuous arrival distributions, time-dependent information structures, as well as unknown $T$. We conduct both synthetic experiments and empirical applications on a time series data of New York City taxi passengers to validate the performance of our proposed policies. △ Less

Submitted 25 September, 2025; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2304.04341 [pdf, ps, other]

Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk

Authors: David Simchi-Levi, Zeyu Zheng, Feng Zhu

Abstract: We study the trade-off between expectation and tail risk for regret distribution in the stochastic multi-armed bandit problem. We fully characterize the interplay among three desired properties for policy design: worst-case optimality, instance-dependent consistency, and light-tailed risk. We show how the order of expected regret exactly affects the decaying rate of the regret tail probability for… ▽ More We study the trade-off between expectation and tail risk for regret distribution in the stochastic multi-armed bandit problem. We fully characterize the interplay among three desired properties for policy design: worst-case optimality, instance-dependent consistency, and light-tailed risk. We show how the order of expected regret exactly affects the decaying rate of the regret tail probability for both the worst-case and instance-dependent scenario. A novel policy is proposed to characterize the optimal regret tail probability for any regret threshold. Concretely, for any given $α\in[1/2, 1)$ and $β\in[0, α]$, our policy achieves a worst-case expected regret of $\tilde O(T^α)$ (we call it $α$-optimal) and an instance-dependent expected regret of $\tilde O(T^β)$ (we call it $β$-consistent), while enjoys a probability of incurring an $\tilde O(T^δ)$ regret ($δ\geqα$ in the worst-case scenario and $δ\geqβ$ in the instance-dependent scenario) that decays exponentially with a polynomial $T$ term. Such decaying rate is proved to be best achievable. Moreover, we discover an intrinsic gap of the optimal tail rate under the instance-dependent scenario between whether the time horizon $T$ is known a priori or not. Interestingly, when it comes to the worst-case scenario, this gap disappears. Finally, we extend our proposed policy design to (1) a stochastic multi-armed bandit setting with non-stationary baseline rewards, and (2) a stochastic linear bandit setting. Our results reveal insights on the trade-off between regret expectation and regret tail risk for both worst-case and instance-dependent scenarios, indicating that more sub-optimality and inconsistency leave space for more light-tailed risk of incurring a large regret, and that knowing the planning horizon in advance can make a difference on alleviating tail risks. △ Less

Submitted 9 April, 2023; originally announced April 2023.

Comments: arXiv admin note: text overlap with arXiv:2206.02969

arXiv:2301.06658 [pdf, other]

Statistical inference for the logarithmic spatial heteroskedasticity model with exogenous variables

Authors: Bing Su, Fukang Zhu, Ke Zhu

Abstract: The spatial dependence in mean has been well studied by plenty of models in a large strand of literature, however, the investigation of spatial dependence in variance is lagging significantly behind. The existing models for the spatial dependence in variance are scarce, with neither probabilistic structure nor statistical inference procedure being explored. To circumvent this deficiency, this pape… ▽ More The spatial dependence in mean has been well studied by plenty of models in a large strand of literature, however, the investigation of spatial dependence in variance is lagging significantly behind. The existing models for the spatial dependence in variance are scarce, with neither probabilistic structure nor statistical inference procedure being explored. To circumvent this deficiency, this paper proposes a new generalized logarithmic spatial heteroscedasticity model with exogenous variables (denoted by the log-SHE model) to study the spatial dependence in variance. For the log-SHE model, its spatial near-epoch dependence (NED) property is investigated, and a systematic statistical inference procedure is provided, including the maximum likelihood and generalized method of moments estimators, the Wald, Lagrange multiplier and likelihood-ratio-type D tests for model parameter constraints, and the overidentification test for the model diagnostic checking. Using the tool of spatial NED, the asymptotics of all proposed estimators and tests are established under regular conditions. The usefulness of the proposed methodology is illustrated by simulation results and a real data example on the house selling price. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2212.05831 [pdf, other]

doi 10.1016/j.csda.2023.107885

Conditional-mean Multiplicative Operator Models for Count Time Series

Authors: Christian H. Weiß, Fukang Zhu

Abstract: Multiplicative error models (MEMs) are commonly used for real-valued time series, but they cannot be applied to discrete-valued count time series as the involved multiplication would not preserve the integer nature of the data. Thus, the concept of a multiplicative operator for counts is proposed (as well as several specific instances thereof), which are then used to develop a kind of MEMs for cou… ▽ More Multiplicative error models (MEMs) are commonly used for real-valued time series, but they cannot be applied to discrete-valued count time series as the involved multiplication would not preserve the integer nature of the data. Thus, the concept of a multiplicative operator for counts is proposed (as well as several specific instances thereof), which are then used to develop a kind of MEMs for count time series (CMEMs). If equipped with a linear conditional mean, the resulting CMEMs are closely related to the class of so-called integer-valued generalized autoregressive conditional heteroscedasticity (INGARCH) models and might be used as a semi-parametric extension thereof. Important stochastic properties of different types of INGARCH-CMEM as well as relevant estimation approaches are derived, namely types of quasi-maximum likelihood and weighted least squares estimation. The performance and application are demonstrated with simulations as well as with two real-world data examples. △ Less

Submitted 27 November, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

Comments: 45 pages

Journal ref: Computational Statistics & Data Analysis, 2024, 191, 107885

arXiv:2206.02969 [pdf, other]

A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits

Authors: David Simchi-Levi, Zeyu Zheng, Feng Zhu

Abstract: We study the stochastic multi-armed bandit problem and design new policies that enjoy both worst-case optimality for expected regret and light-tailed risk for regret distribution. Specifically, our policy design (i) enjoys the worst-case optimality for the expected regret at order $O(\sqrt{KT\ln T})$ and (ii) has the worst-case tail probability of incurring a regret larger than any $x>0$ being upp… ▽ More We study the stochastic multi-armed bandit problem and design new policies that enjoy both worst-case optimality for expected regret and light-tailed risk for regret distribution. Specifically, our policy design (i) enjoys the worst-case optimality for the expected regret at order $O(\sqrt{KT\ln T})$ and (ii) has the worst-case tail probability of incurring a regret larger than any $x>0$ being upper bounded by $\exp(-Ω(x/\sqrt{KT}))$, a rate that we prove to be best achievable with respect to $T$ for all worst-case optimal policies. Our proposed policy achieves a delicate balance between doing more exploration at the beginning of the time horizon and doing more exploitation when approaching the end, compared to standard confidence-bound-based policies. We also enhance the policy design to accommodate the "any-time" setting where $T$ is unknown a priori, and prove equivalently desired policy performances as compared to the "fixed-time" setting with known $T$. Numerical experiments are conducted to illustrate the theoretical findings. We find that from a managerial perspective, our new policy design yields better tail distributions and is preferable than celebrated policies especially when (i) there is a risk of under-estimating the volatility profile, or (ii) there is a challenge of tuning policy hyper-parameters. We conclude by extending our proposed policy design to the stochastic linear bandit setting that leads to both worst-case optimality in terms of expected regret and light-tailed risk on the regret distribution. △ Less

Submitted 22 July, 2024; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: Preliminary version appeared in NeurIPS 2022

arXiv:2109.11929 [pdf, other]

Deep Bayesian Estimation for Dynamic Treatment Regimes with a Long Follow-up Time

Authors: Adi Lin, Jie Lu, Junyu Xuan, Fujin Zhu, Guangquan Zhang

Abstract: Causal effect estimation for dynamic treatment regimes (DTRs) contributes to sequential decision making. However, censoring and time-dependent confounding under DTRs are challenging as the amount of observational data declines over time due to a reducing sample size but the feature dimension increases over time. Long-term follow-up compounds these challenges. Another challenge is the highly comple… ▽ More Causal effect estimation for dynamic treatment regimes (DTRs) contributes to sequential decision making. However, censoring and time-dependent confounding under DTRs are challenging as the amount of observational data declines over time due to a reducing sample size but the feature dimension increases over time. Long-term follow-up compounds these challenges. Another challenge is the highly complex relationships between confounders, treatments, and outcomes, which causes the traditional and commonly used linear methods to fail. We combine outcome regression models with treatment models for high dimensional features using uncensored subjects that are small in sample size and we fit deep Bayesian models for outcome regression models to reveal the complex relationships between confounders, treatments, and outcomes. Also, the developed deep Bayesian models can model uncertainty and output the prediction variance which is essential for the safety-aware applications, such as self-driving cars and medical treatment design. The experimental results on medical simulations of HIV treatment show the ability of the proposed method to obtain stable and accurate dynamic causal effect estimation from observational data, especially with long-term follow-up. Our technique provides practical guidance for sequential decision making, and policy-making. △ Less

Submitted 20 September, 2021; originally announced September 2021.

arXiv:2106.14813 [pdf, other]

Offline Planning and Online Learning under Recovering Rewards

Authors: David Simchi-Levi, Zeyu Zheng, Feng Zhu

Abstract: Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K\,(\ge 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops… ▽ More Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K\,(\ge 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the arm's idle time increases. With the objective of maximizing the expected cumulative reward over $T$ time periods, we design a class of ``Purely Periodic Policies'' that jointly set a period to pull each arm. For the proposed policies, we prove performance guarantees for both the offline problem and the online problems. For the offline problem when all model parameters are known, the proposed periodic policy obtains an approximation ratio that is at the order of $1-\mathcal O(1/\sqrt{K})$, which is asymptotically optimal when $K$ grows to infinity. For the online problem when the model parameters are unknown and need to be dynamically learned, we integrate the offline periodic policy with the upper confidence bound procedure to construct on online policy. The proposed online policy is proved to approximately have $\widetilde{\mathcal O}(N\sqrt{T})$ regret against the offline benchmark. Our framework and policy design may shed light on broader offline planning and online learning applications with non-stationary and recovering rewards. △ Less

Submitted 21 December, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: v1 accepted by ICML 2021

arXiv:2009.13333 [pdf, other]

Group Whitening: Balancing Learning Efficiency and Representational Capacity

Authors: Lei Huang, Yi Zhou, Li Liu, Fan Zhu, Ling Shao

Abstract: Batch normalization (BN) is an important technique commonly incorporated into deep learning models to perform standardization within mini-batches. The merits of BN in improving a model's learning efficiency can be further amplified by applying whitening, while its drawbacks in estimating population statistics for inference can be avoided through group normalization (GN). This paper proposes group… ▽ More Batch normalization (BN) is an important technique commonly incorporated into deep learning models to perform standardization within mini-batches. The merits of BN in improving a model's learning efficiency can be further amplified by applying whitening, while its drawbacks in estimating population statistics for inference can be avoided through group normalization (GN). This paper proposes group whitening (GW), which exploits the advantages of the whitening operation and avoids the disadvantages of normalization within mini-batches. In addition, we analyze the constraints imposed on features by normalization, and show how the batch size (group number) affects the performance of batch (group) normalized networks, from the perspective of model's representational capacity. This analysis provides theoretical guidance for applying GW in practice. Finally, we apply the proposed GW to ResNet and ResNeXt architectures and conduct experiments on the ImageNet and COCO benchmarks. Results show that GW consistently improves the performance of different architectures, with absolute gains of $1.02\%$ $\sim$ $1.49\%$ in top-1 accuracy on ImageNet and $1.82\%$ $\sim$ $3.21\%$ in bounding box AP on COCO. △ Less

Submitted 6 April, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: V4: camera version of CVPR 2021. Code available at: https://github.com/huangleiBuaa/GroupWhitening

arXiv:2009.12836 [pdf, other]

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

Authors: Lei Huang, Jie Qin, Yi Zhou, Fan Zhu, Li Liu, Ling Shao

Abstract: Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past, present and future of normalization methods in the context of DNN training. We provide a unified picture of the main motivation behind different approaches from the… ▽ More Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past, present and future of normalization methods in the context of DNN training. We provide a unified picture of the main motivation behind different approaches from the perspective of optimization, and present a taxonomy for understanding the similarities and differences between them. Specifically, we decompose the pipeline of the most representative normalizing activation methods into three components: the normalization area partitioning, normalization operation and normalization representation recovery. In doing so, we provide insight for designing new normalization technique. Finally, we discuss the current progress in understanding normalization methods, and provide a comprehensive review of the applications of normalization for particular tasks, in which it can effectively solve the key issues. △ Less

Submitted 27 September, 2020; originally announced September 2020.

Comments: 20 pages

arXiv:2007.04431 [pdf]

doi 10.1016/j.cad.2021.103013

Understanding the effect of hyperparameter optimization on machine learning models for structure design problems

Authors: Xianping Du, Hongyi Xu, Feng Zhu

Abstract: To relieve the computational cost of design evaluations using expensive finite element simulations, surrogate models have been widely applied in computer-aided engineering design. Machine learning algorithms (MLAs) have been implemented as surrogate models due to their capability of learning the complex interrelations between the design variables and the response from big datasets. Typically, an M… ▽ More To relieve the computational cost of design evaluations using expensive finite element simulations, surrogate models have been widely applied in computer-aided engineering design. Machine learning algorithms (MLAs) have been implemented as surrogate models due to their capability of learning the complex interrelations between the design variables and the response from big datasets. Typically, an MLA regression model contains model parameters and hyperparameters. The model parameters are obtained by fitting the training data. Hyperparameters, which govern the model structures and the training processes, are assigned by users before training. There is a lack of systematic studies on the effect of hyperparameters on the accuracy and robustness of the surrogate model. In this work, we proposed to establish a hyperparameter optimization (HOpt) framework to deepen our understanding of the effect. Four frequently used MLAs, namely Gaussian Process Regression (GPR), Support Vector Machine (SVM), Random Forest Regression (RFR), and Artificial Neural Network (ANN), are tested on four benchmark examples. For each MLA model, the model accuracy and robustness before and after the HOpt are compared. The results show that HOpt can generally improve the performance of the MLA models in general. HOpt leads to few improvements in the MLAs accuracy and robustness for complex problems, which are featured by high-dimensional mixed-variable design space. The HOpt is recommended for the design problems with intermediate complexity. We also investigated the additional computational costs incurred by HOpt. The training cost is closely related to the MLA architecture. After HOpt, the training cost of ANN and RFR is increased more than that of the GPR and SVM. To sum up, this study benefits the selection of HOpt method for the different types of design problems based on their complexity. △ Less

Submitted 15 March, 2021; v1 submitted 4 July, 2020; originally announced July 2020.

Comments: 43 pages, 15 figures,8 tables, Accepted by the Computer-aided design

Journal ref: Computer-Aided Design (2021): 103013

arXiv:2006.05554 [pdf, other]

Causal Discovery from Incomplete Data using An Encoder and Reinforcement Learning

Authors: Xiaoshui Huang, Fujin Zhu, Lois Holloway, Ali Haidar

Abstract: Discovering causal structure among a set of variables is a fundamental problem in many domains. However, state-of-the-art methods seldom consider the possibility that the observational data has missing values (incomplete data), which is ubiquitous in many real-world situations. The missing value will significantly impair the performance and even make the causal discovery algorithms fail. In this p… ▽ More Discovering causal structure among a set of variables is a fundamental problem in many domains. However, state-of-the-art methods seldom consider the possibility that the observational data has missing values (incomplete data), which is ubiquitous in many real-world situations. The missing value will significantly impair the performance and even make the causal discovery algorithms fail. In this paper, we propose an approach to discover causal structures from incomplete data by using a novel encoder and reinforcement learning (RL). The encoder is designed for missing data imputation as well as feature extraction. In particular, it learns to encode the currently available information (with missing values) into a robust feature representation which is then used to determine where to search the best graph. The encoder is integrated into a RL framework that can be optimized using the actor-critic algorithm. Our method takes the incomplete observational data as input and generates a causal structure graph. Experimental results on synthetic and real data demonstrate that our method can robustly generate causal structures from incomplete data. Compared with the direct combination of data imputation and causal discovery methods, our method performs generally better and can even obtain a performance gain as much as 43.2%. △ Less

Submitted 9 June, 2020; originally announced June 2020.

arXiv:2006.00978 [pdf, ps, other]

On the Number of Linear Regions of Convolutional Neural Networks

Authors: H. Xiong, L. Huang, M. Yu, L. Liu, F. Zhu, L. Shao

Abstract: One fundamental problem in deep learning is understanding the outstanding performance of deep Neural Networks (NNs) in practice. One explanation for the superiority of NNs is that they can realize a large class of complicated functions, i.e., they have powerful expressivity. The expressivity of a ReLU NN can be quantified by the maximal number of linear regions it can separate its input space into… ▽ More One fundamental problem in deep learning is understanding the outstanding performance of deep Neural Networks (NNs) in practice. One explanation for the superiority of NNs is that they can realize a large class of complicated functions, i.e., they have powerful expressivity. The expressivity of a ReLU NN can be quantified by the maximal number of linear regions it can separate its input space into. In this paper, we provide several mathematical results needed for studying the linear regions of CNNs, and use them to derive the maximal and average numbers of linear regions for one-layer ReLU CNNs. Furthermore, we obtain upper and lower bounds for the number of linear regions of multi-layer ReLU CNNs. Our results suggest that deeper CNNs have more powerful expressivity than their shallow counterparts, while CNNs have more expressivity than fully-connected NNs per parameter. △ Less

Submitted 27 June, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: International Conference on Machine Learning (ICML) 2020

arXiv:2004.09161 [pdf, ps, other]

Multi-frequency-band tests for white noise under heteroskedasticity

Authors: Mengya Liu, Fukan Zhu, Ke Zhu

Abstract: This paper proposes a new family of multi-frequency-band (MFB) tests for the white noise hypothesis by using the maximum overlap discrete wavelet packet transform (MODWPT). The MODWPT allows the variance of a process to be decomposed into the variance of its components on different equal-length frequency sub-bands, and the MFB tests then measure the distance between the MODWPT-based variance ratio… ▽ More This paper proposes a new family of multi-frequency-band (MFB) tests for the white noise hypothesis by using the maximum overlap discrete wavelet packet transform (MODWPT). The MODWPT allows the variance of a process to be decomposed into the variance of its components on different equal-length frequency sub-bands, and the MFB tests then measure the distance between the MODWPT-based variance ratio and its theoretical null value jointly over several frequency sub-bands. The resulting MFB tests have the chi-squared asymptotic null distributions under mild conditions, which allow the data to be heteroskedastic. The MFB tests are shown to have the desirable size and power performance by simulation studies, and their usefulness is further illustrated by two applications. △ Less

Submitted 20 April, 2020; originally announced April 2020.

arXiv:1906.09205 [pdf, other]

Continual Reinforcement Learning with Diversity Exploration and Adversarial Self-Correction

Authors: Fengda Zhu, Xiaojun Chang, Runhao Zeng, Mingkui Tan

Abstract: Deep reinforcement learning has made significant progress in the field of continuous control, such as physical control and autonomous driving. However, it is challenging for a reinforcement model to learn a policy for each task sequentially due to catastrophic forgetting. Specifically, the model would forget knowledge it learned in the past when trained on a new task. We consider this challenge fr… ▽ More Deep reinforcement learning has made significant progress in the field of continuous control, such as physical control and autonomous driving. However, it is challenging for a reinforcement model to learn a policy for each task sequentially due to catastrophic forgetting. Specifically, the model would forget knowledge it learned in the past when trained on a new task. We consider this challenge from two perspectives: i) acquiring task-specific skills is difficult since task information and rewards are not highly related; ii) learning knowledge from previous experience is difficult in continuous control domains. In this paper, we introduce an end-to-end framework namely Continual Diversity Adversarial Network (CDAN). We first develop an unsupervised diversity exploration method to learn task-specific skills using an unsupervised objective. Then, we propose an adversarial self-correction mechanism to learn knowledge by exploiting past experience. The two learning procedures are presumably reciprocal. To evaluate the proposed method, we propose a new continuous reinforcement learning environment named Continual Ant Maze (CAM) and a new metric termed Normalized Shorten Distance (NSD). The experimental results confirm the effectiveness of diversity exploration and self-correction. It is worthwhile noting that our final result outperforms baseline by 18.35% in terms of NSD, and 0.61 according to the average reward. △ Less

Submitted 21 June, 2019; originally announced June 2019.

arXiv:1903.06258 [pdf, ps, other]

doi 10.1109/LGRS.2019.2939356

Hyperspectral Image Classification with Deep Metric Learning and Conditional Random Field

Authors: Yi Liang, Xin Zhao, Alan J. X. Guo, Fei Zhu

Abstract: To improve the classification performance in the context of hyperspectral image processing, many works have been developed based on two common strategies, namely the spatial-spectral information integration and the utilization of neural networks. However, both strategies typically require more training data than the classical algorithms, aggregating the shortage of labeled samples. In this letter,… ▽ More To improve the classification performance in the context of hyperspectral image processing, many works have been developed based on two common strategies, namely the spatial-spectral information integration and the utilization of neural networks. However, both strategies typically require more training data than the classical algorithms, aggregating the shortage of labeled samples. In this letter, we propose a novel framework that organically combines the spectrum-based deep metric learning model and the conditional random field algorithm. The deep metric learning model is supervised by the center loss to produce spectrum-based features that gather more tightly in Euclidean space within classes. The conditional random field with Gaussian edge potentials, which is firstly proposed for image segmentation tasks, is introduced to give the pixel-wise classification over the hyperspectral image by utilizing both the geographical distances between pixels and the Euclidean distances between the features produced by the deep metric learning model. The proposed framework is trained by spectral pixels at the deep metric learning stage and utilizes the half handcrafted spatial features at the conditional random field stage. This settlement alleviates the shortage of training data to some extent. Experiments on two real hyperspectral images demonstrate the advantages of the proposed method in terms of both classification accuracy and computation cost. △ Less

Submitted 15 July, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

arXiv:1801.03226 [pdf, other]

Adaptive Graph Convolutional Neural Networks

Authors: Ruoyu Li, Sheng Wang, Feiyun Zhu, Junzhou Huang

Abstract: Graph Convolutional Neural Networks (Graph CNNs) are generalizations of classical CNNs to handle graph data such as molecular data, point could and social networks. Current filters in graph CNNs are built for fixed and shared graph structure. However, for most real data, the graph structures varies in both size and connectivity. The paper proposes a generalized and flexible graph CNN taking data o… ▽ More Graph Convolutional Neural Networks (Graph CNNs) are generalizations of classical CNNs to handle graph data such as molecular data, point could and social networks. Current filters in graph CNNs are built for fixed and shared graph structure. However, for most real data, the graph structures varies in both size and connectivity. The paper proposes a generalized and flexible graph CNN taking data of arbitrary graph structure as input. In that way a task-driven adaptive graph is learned for each graph data while training. To efficiently learn the graph, a distance metric learning is proposed. Extensive experiments on nine graph-structured datasets have demonstrated the superior performance improvement on both convergence speed and predictive accuracy. △ Less

Submitted 9 January, 2018; originally announced January 2018.

Comments: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 8 pages

arXiv:1708.05446 [pdf, other]

Robust Contextual Bandit via the Capped-$\ell_{2}$ norm

Authors: Feiyun Zhu, Xinliang Zhu, Sheng Wang, Jiawen Yao, Junzhou Huang

Abstract: This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decision-making methods in mHealth generally assume that the noise in the dynamic system follows the Gaussian distribution. Those methods use the least-square-based algorithm to estimate the expected reward, which is prone to the existence of outliers. To deal with the issue o… ▽ More This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decision-making methods in mHealth generally assume that the noise in the dynamic system follows the Gaussian distribution. Those methods use the least-square-based algorithm to estimate the expected reward, which is prone to the existence of outliers. To deal with the issue of outliers, we propose a novel robust actor-critic contextual bandit method for the mHealth intervention. In the critic updating, the capped-$\ell_{2}$ norm is used to measure the approximation error, which prevents outliers from dominating our objective. A set of weights could be achieved from the critic updating. Considering them gives a weighted objective for the actor updating. It provides the badly noised sample in the critic updating with zero weights for the actor updating. As a result, the robustness of both actor-critic updating is enhanced. There is a key parameter in the capped-$\ell_{2}$ norm. We provide a reliable method to properly set it by making use of one of the most fundamental definitions of outliers in statistics. Extensive experiment results demonstrate that our method can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers. △ Less

Submitted 17 August, 2017; originally announced August 2017.

arXiv:1602.01729 [pdf, ps, other]

doi 10.1109/TGRS.2017.2696262

Correntropy Maximization via ADMM - Application to Robust Hyperspectral Unmixing

Authors: Fei Zhu, Abderrahim Halimi, Paul Honeine, Badong Chen, Nanning Zheng

Abstract: In hyperspectral images, some spectral bands suffer from low signal-to-noise ratio due to noisy acquisition and atmospheric effects, thus requiring robust techniques for the unmixing problem. This paper presents a robust supervised spectral unmixing approach for hyperspectral images. The robustness is achieved by writing the unmixing problem as the maximization of the correntropy criterion subject… ▽ More In hyperspectral images, some spectral bands suffer from low signal-to-noise ratio due to noisy acquisition and atmospheric effects, thus requiring robust techniques for the unmixing problem. This paper presents a robust supervised spectral unmixing approach for hyperspectral images. The robustness is achieved by writing the unmixing problem as the maximization of the correntropy criterion subject to the most commonly used constraints. Two unmixing problems are derived: the first problem considers the fully-constrained unmixing, with both the non-negativity and sum-to-one constraints, while the second one deals with the non-negativity and the sparsity-promoting of the abundances. The corresponding optimization problems are solved efficiently using an alternating direction method of multipliers (ADMM) approach. Experiments on synthetic and real hyperspectral images validate the performance of the proposed algorithms for different scenarios, demonstrating that the correntropy-based unmixing is robust to outlier bands. △ Less

Submitted 4 February, 2016; originally announced February 2016.

Comments: 23 pages

arXiv:1601.03124 [pdf, other]

Online Prediction of Dyadic Data with Heterogeneous Matrix Factorization

Authors: Guangyong Chen, Fengyuan Zhu, Pheng Ann Heng

Abstract: Dyadic Data Prediction (DDP) is an important problem in many research areas. This paper develops a novel fully Bayesian nonparametric framework which integrates two popular and complementary approaches, discrete mixed membership modeling and continuous latent factor modeling into a unified Heterogeneous Matrix Factorization~(HeMF) model, which can predict the unobserved dyadics accurately. The HeM… ▽ More Dyadic Data Prediction (DDP) is an important problem in many research areas. This paper develops a novel fully Bayesian nonparametric framework which integrates two popular and complementary approaches, discrete mixed membership modeling and continuous latent factor modeling into a unified Heterogeneous Matrix Factorization~(HeMF) model, which can predict the unobserved dyadics accurately. The HeMF can determine the number of communities automatically and exploit the latent linear structure for each bicluster efficiently. We propose a Variational Bayesian method to estimate the parameters and missing data. We further develop a novel online learning approach for Variational inference and use it for the online learning of HeMF, which can efficiently cope with the important large-scale DDP problem. We evaluate the performance of our method on the EachMoive, MovieLens and Netflix Prize collaborative filtering datasets. The experiment shows that, our model outperforms state-of-the-art methods on all benchmarks. Compared with Stochastic Gradient Method (SGD), our online learning approach achieves significant improvement on the estimation accuracy and robustness. △ Less

Submitted 12 January, 2016; originally announced January 2016.

Comments: 26 pages, 10 figures

arXiv:1601.03117 [pdf, other]

Blind Image Denoising via Dependent Dirichlet Process Tree

Authors: Fengyuan Zhu, Guangyong Chen, Jianye Hao, Pheng-Ann Heng

Abstract: Most existing image denoising approaches assumed the noise to be homogeneous white Gaussian distributed with known intensity. However, in real noisy images, the noise models are usually unknown beforehand and can be much more complex. This paper addresses this problem and proposes a novel blind image denoising algorithm to recover the clean image from noisy one with the unknown noise model. To mod… ▽ More Most existing image denoising approaches assumed the noise to be homogeneous white Gaussian distributed with known intensity. However, in real noisy images, the noise models are usually unknown beforehand and can be much more complex. This paper addresses this problem and proposes a novel blind image denoising algorithm to recover the clean image from noisy one with the unknown noise model. To model the empirical noise of an image, our method introduces the mixture of Gaussian distribution, which is flexible enough to approximate different continuous distributions. The problem of blind image denoising is reformulated as a learning problem. The procedure is to first build a two-layer structural model for noisy patches and consider the clean ones as latent variable. To control the complexity of the noisy patch model, this work proposes a novel Bayesian nonparametric prior called "Dependent Dirichlet Process Tree" to build the model. Then, this study derives a variational inference algorithm to estimate model parameters and recover clean patches. We apply our method on synthesis and real noisy images with different noise models. Comparing with previous approaches, ours achieves better performance. The experimental results indicate the efficiency of the proposed algorithm to cope with practical image denoising tasks. △ Less

Submitted 12 January, 2016; originally announced January 2016.

Comments: 25 pages, 11 figures

arXiv:1501.05684 [pdf, ps, other]

Bi-Objective Nonnegative Matrix Factorization: Linear Versus Kernel-Based Models

Authors: Paul Honeine, Fei Zhu

Abstract: Nonnegative matrix factorization (NMF) is a powerful class of feature extraction techniques that has been successfully applied in many fields, namely in signal and image processing. Current NMF techniques have been limited to a single-objective problem in either its linear or nonlinear kernel-based formulation. In this paper, we propose to revisit the NMF as a multi-objective problem, in particula… ▽ More Nonnegative matrix factorization (NMF) is a powerful class of feature extraction techniques that has been successfully applied in many fields, namely in signal and image processing. Current NMF techniques have been limited to a single-objective problem in either its linear or nonlinear kernel-based formulation. In this paper, we propose to revisit the NMF as a multi-objective problem, in particular a bi-objective one, where the objective functions defined in both input and feature spaces are taken into account. By taking the advantage of the sum-weighted method from the literature of multi-objective optimization, the proposed bi-objective NMF determines a set of nondominated, Pareto optimal, solutions instead of a single optimal decomposition. Moreover, the corresponding Pareto front is studied and approximated. Experimental results on unmixing real hyperspectral images confirm the efficiency of the proposed bi-objective NMF compared with the state-of-the-art methods. △ Less

Submitted 22 January, 2015; originally announced January 2015.

arXiv:1409.3660 [pdf, other]

10,000+ Times Accelerated Robust Subset Selection (ARSS)

Authors: Feiyun Zhu, Bin Fan, Xinliang Zhu, Ying Wang, Shiming Xiang, Chunhong Pan

Abstract: Subset selection from massive data with noised information is increasingly popular for various applications. This problem is still highly challenging as current methods are generally slow in speed and sensitive to outliers. To address the above two issues, we propose an accelerated robust subset selection (ARSS) method. Specifically in the subset selection area, this is the first attempt to employ… ▽ More Subset selection from massive data with noised information is increasingly popular for various applications. This problem is still highly challenging as current methods are generally slow in speed and sensitive to outliers. To address the above two issues, we propose an accelerated robust subset selection (ARSS) method. Specifically in the subset selection area, this is the first attempt to employ the $\ell_{p}(0<p\leq1)$-norm based measure for the representation loss, preventing large errors from dominating our objective. As a result, the robustness against outlier elements is greatly enhanced. Actually, data size is generally much larger than feature length, i.e. $N\gg L$. Based on this observation, we propose a speedup solver (via ALM and equivalent derivations) to highly reduce the computational cost, theoretically from $O(N^{4})$ to $O(N{}^{2}L)$. Extensive experiments on ten benchmark datasets verify that our method not only outperforms state of the art methods, but also runs 10,000+ times faster than the most related method. △ Less

Submitted 17 November, 2014; v1 submitted 12 September, 2014; originally announced September 2014.

arXiv:1407.4420 [pdf, ps, other]

Kernel Nonnegative Matrix Factorization Without the Curse of the Pre-image - Application to Unmixing Hyperspectral Images

Authors: Fei Zhu, Paul Honeine, Maya Kallas

Abstract: The nonnegative matrix factorization (NMF) is widely used in signal and image processing, including bio-informatics, blind source separation and hyperspectral image analysis in remote sensing. A great challenge arises when dealing with a nonlinear formulation of the NMF. Within the framework of kernel machines, the models suggested in the literature do not allow the representation of the factoriza… ▽ More The nonnegative matrix factorization (NMF) is widely used in signal and image processing, including bio-informatics, blind source separation and hyperspectral image analysis in remote sensing. A great challenge arises when dealing with a nonlinear formulation of the NMF. Within the framework of kernel machines, the models suggested in the literature do not allow the representation of the factorization matrices, which is a fallout of the curse of the pre-image. In this paper, we propose a novel kernel-based model for the NMF that does not suffer from the pre-image problem, by investigating the estimation of the factorization matrices directly in the input space. For different kernel functions, we describe two schemes for iterative algorithms: an additive update rule based on a gradient descent scheme and a multiplicative update rule in the same spirit as in the Lee and Seung algorithm. Within the proposed framework, we develop several extensions to incorporate constraints, including sparseness, smoothness, and spatial regularization with a total-variation-like penalty. The effectiveness of the proposed method is demonstrated with the problem of unmixing hyperspectral images, using well-known real images and results with state-of-the-art techniques. △ Less

Submitted 27 March, 2016; v1 submitted 16 July, 2014; originally announced July 2014.

Comments: 13 pages, 12 figures

arXiv:1404.7642 [pdf, ps, other]

doi 10.1214/13-AOAS708

Predictive regressions for macroeconomic data

Authors: Fukang Zhu, Zongwu Cai, Liang Peng

Abstract: Researchers have constantly asked whether stock returns can be predicted by some macroeconomic data. However, it is known that macroeconomic data may exhibit nonstationarity and/or heavy tails, which complicates existing testing procedures for predictability. In this paper we propose novel empirical likelihood methods based on some weighted score equations to test whether the monthly CRSP value-we… ▽ More Researchers have constantly asked whether stock returns can be predicted by some macroeconomic data. However, it is known that macroeconomic data may exhibit nonstationarity and/or heavy tails, which complicates existing testing procedures for predictability. In this paper we propose novel empirical likelihood methods based on some weighted score equations to test whether the monthly CRSP value-weighted index can be predicted by the log dividend-price ratio or the log earnings-price ratio. The new methods work well both theoretically and empirically regardless of the predicting variables being stationary or nonstationary or having an infinite variance. △ Less

Submitted 30 April, 2014; originally announced April 2014.

Comments: Published in at http://dx.doi.org/10.1214/13-AOAS708 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS708

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 1, 577-594

Showing 1–28 of 28 results for author: Zhu, F