-
Modeling large dimensional matrix time series with partially known and latent factors
Authors:
Yongchang Hui,
Yuteng Zhang,
Siting Huang
Abstract:
This article considers to model large-dimensional matrix time series by introducing a regression term to the matrix factor model. This is an extension of classic matrix factor model to incorporate the information of known factors or useful covariates. We establish the convergence rates of coefficient matrix, loading matrices and the signal part. The theoretical results coincide with the rates in W…
▽ More
This article considers to model large-dimensional matrix time series by introducing a regression term to the matrix factor model. This is an extension of classic matrix factor model to incorporate the information of known factors or useful covariates. We establish the convergence rates of coefficient matrix, loading matrices and the signal part. The theoretical results coincide with the rates in Wang et al. (2019). We conduct numerical studies to verify the performance of our estimation procedure in finite samples. Finally, we demonstrate the superiority of our proposed model using the daily returns of stocks data.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
Multilevel Matrix Factor Model
Authors:
Yuteng Zhang,
Yongchang Hui,
Junrong Song,
Shurong Zheng
Abstract:
Large-scale matrix data has been widely discovered and continuously studied in various fields recently. Considering the multi-level factor structure and utilizing the matrix structure, we propose a multilevel matrix factor model with both global and local factors. The global factors can affect all matrix times series, whereas the local factors are only allow to affect within each specific matrix t…
▽ More
Large-scale matrix data has been widely discovered and continuously studied in various fields recently. Considering the multi-level factor structure and utilizing the matrix structure, we propose a multilevel matrix factor model with both global and local factors. The global factors can affect all matrix times series, whereas the local factors are only allow to affect within each specific matrix time series. The estimation procedures can consistently estimate the factor loadings and determine the number of factors. We establish the asymptotic properties of the estimators. The simulation is presented to illustrate the performance of the proposed estimation method. We utilize the model to analyze eight indicators across 200 stocks from ten distinct industries, demonstrating the empirical utility of our proposed approach.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Group Personalized Federated Learning
Authors:
Zhe Liu,
Yue Hui,
Fuchun Peng
Abstract:
Federated learning (FL) can help promote data privacy by training a shared model in a de-centralized manner on the physical devices of clients. In the presence of highly heterogeneous distributions of local data, personalized FL strategy seeks to mitigate the potential client drift. In this paper, we present the group personalization approach for applications of FL in which there exist inherent pa…
▽ More
Federated learning (FL) can help promote data privacy by training a shared model in a de-centralized manner on the physical devices of clients. In the presence of highly heterogeneous distributions of local data, personalized FL strategy seeks to mitigate the potential client drift. In this paper, we present the group personalization approach for applications of FL in which there exist inherent partitions among clients that are significantly distinct. In our method, the global FL model is fine-tuned through another FL training process over each homogeneous group of clients, after which each group-specific FL model is further adapted and personalized for any client. The proposed method can be well interpreted from a Bayesian hierarchical modeling perspective. With experiments on two real-world datasets, we demonstrate this approach can achieve superior personalization performance than other FL counterparts.
△ Less
Submitted 11 October, 2022; v1 submitted 4 October, 2022;
originally announced October 2022.
-
On the geometry of generalization and memorization in deep neural networks
Authors:
Cory Stephenson,
Suchismita Padhy,
Abhinav Ganesh,
Yue Hui,
Hanlin Tang,
SueYeon Chung
Abstract:
Understanding how large neural networks avoid memorizing training data is key to explaining their high generalization performance. To examine the structure of when and where memorization occurs in a deep network, we use a recently developed replica-based mean field theoretic geometric analysis method. We find that all layers preferentially learn from examples which share features, and link this be…
▽ More
Understanding how large neural networks avoid memorizing training data is key to explaining their high generalization performance. To examine the structure of when and where memorization occurs in a deep network, we use a recently developed replica-based mean field theoretic geometric analysis method. We find that all layers preferentially learn from examples which share features, and link this behavior to generalization performance. Memorization predominately occurs in the deeper layers, due to decreasing object manifolds' radius and dimension, whereas early layers are minimally affected. This predicts that generalization can be restored by reverting the final few layer weights to earlier epochs before significant memorization occurred, which is confirmed by the experiments. Additionally, by studying generalization under different model sizes, we reveal the connection between the double descent phenomenon and the underlying model geometry. Finally, analytical analysis shows that networks avoid memorization early in training because close to initialization, the gradient contribution from permuted examples are small. These findings provide quantitative evidence for the structure of memorization across layers of a deep neural network, the drivers for such structure, and its connection to manifold geometric properties.
△ Less
Submitted 30 May, 2021;
originally announced May 2021.
-
A Benchmark of Medical Out of Distribution Detection
Authors:
Tianshi Cao,
Chin-Wei Huang,
David Yu-Tung Hui,
Joseph Paul Cohen
Abstract:
Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images s…
▽ More
Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images should be flagged by an OoDD method prior to diagnosis. Our approach: This paper defines 3 categories of OoD examples and benchmarks popular OoDD methods in three domains of medical imaging: chest X-ray, fundus imaging, and histology slides. Results: Our experiments show that despite methods yielding good results on some categories of out-of-distribution samples, they fail to recognize images close to the training distribution. Conclusion: We find a simple binary classifier on the feature representation has the best accuracy and AUPRC on average. Users of diagnostic tools which employ these OoDD methods should still remain vigilant that images very close to the training distribution yet not in it could yield unexpected results.
△ Less
Submitted 4 August, 2020; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Combating False Negatives in Adversarial Imitation Learning
Authors:
Konrad Zolna,
Chitwan Saharia,
Leonard Boussioux,
David Yu-Tung Hui,
Maxime Chevalier-Boisvert,
Dzmitry Bahdanau,
Yoshua Bengio
Abstract:
In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's t…
▽ More
In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the 'False Negatives' (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.
△ Less
Submitted 2 February, 2020;
originally announced February 2020.
-
A New Test of Multivariate Nonlinear Causality
Authors:
Zhidong Bai,
Yongchang Hui,
Zhihui Lv,
Wing-Keung Wong,
Shurong Zheng,
Zhenzhen Zhu
Abstract:
The multivariate nonlinear Granger causality developed by Bai et al. (2010) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate…
▽ More
The multivariate nonlinear Granger causality developed by Bai et al. (2010) plays an important role in detecting the dynamic interrelationships between two groups of variables. Following the idea of Hiemstra-Jones (HJ) test proposed by Hiemstra and Jones (1994), they attempt to establish a central limit theorem (CLT) of their test statistic by applying the asymptotical property of multivariate $U$-statistic. However, Bai et al. (2016) revisit the HJ test and find that the test statistic given by HJ is NOT a function of $U$-statistics which implies that the CLT neither proposed by Hiemstra and Jones (1994) nor the one extended by Bai et al. (2010) is valid for statistical inference. In this paper, we re-estimate the probabilities and reestablish the CLT of the new test statistic. Numerical simulation shows that our new estimates are consistent and our new test performs decent size and power.
△ Less
Submitted 3 March, 2017;
originally announced March 2017.
-
The Hiemstra-Jones Test Revisited
Authors:
Zhidong Bai,
Yongchang Hui,
Zhihui Lv,
Wing-Keung Wong,
Zhen-Zhen Zhu
Abstract:
The famous Hiemstra-Jones (HJ) test developed by Hiemstra and Jones (1994) plays a significant role in studying nonlinear causality. Over the last two decades, there have been numerous applications and theoretical extensions based on this pioneering work. However, several works note that counterintuitive results are obtained from the HJ test, and some researchers find that the HJ test is seriously…
▽ More
The famous Hiemstra-Jones (HJ) test developed by Hiemstra and Jones (1994) plays a significant role in studying nonlinear causality. Over the last two decades, there have been numerous applications and theoretical extensions based on this pioneering work. However, several works note that counterintuitive results are obtained from the HJ test, and some researchers find that the HJ test is seriously over-rejecting in simulation studies. In this paper, we reinvestigate HJ's creative 1994 work and find that their proposed estimators of the probabilities over different time intervals were not consistent with the target ones proposed in their criterion. To test HJ's novel hypothesis on Granger causality, we propose new estimators of the probabilities defined in their paper and reestablish the asymptotic properties to induce new tests similar to those of HJ. Some simulations will also be presented to support our findings.
△ Less
Submitted 14 January, 2017;
originally announced January 2017.