Skip to main content

Showing 1–50 of 68 results for author: Zheng, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.02410  [pdf, ps, other

    stat.ME

    Testing for large-dimensional covariance matrix under differential privacy

    Authors: Shiwei Sang, Yicheng Zeng, Xuehu Zhu, Shurong Zheng

    Abstract: The increasing prevalence of high-dimensional data across various applications has raised significant privacy concerns in statistical inference. In this paper, we propose a differentially private integrated statistic for testing large-dimensional covariance structures, enabling accurate statistical insights while safeguarding privacy. First, we analyze the global sensitivity of sample eigenvalues… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  2. arXiv:2504.01432  [pdf, other

    stat.ME

    Adaptive adequacy testing of high-dimensional factor-augmented regression model

    Authors: Yanmei Shi, Leheng Cai, Xu Guo, Shurong Zheng

    Abstract: In this paper, we investigate the adequacy testing problem of high-dimensional factor-augmented regression model. Existing test procedures perform not well under dense alternatives. To address this critical issue, we introduce a novel quadratic-type test statistic which can efficiently detect dense alternative hypotheses. We further propose an adaptive test procedure to remain powerful under both… ▽ More

    Submitted 3 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

  3. arXiv:2502.13117  [pdf, other

    stat.AP cs.AI

    Performance Evaluation of Large Language Models in Statistical Programming

    Authors: Xinyi Song, Kexin Xie, Lina Lee, Ruizhe Chen, Jared M. Clark, Hao He, Haoran He, Jie Min, Xinlei Zhang, Simin Zheng, Zhiyang Zhang, Xinwei Deng, Yili Hong

    Abstract: The programming capabilities of large language models (LLMs) have revolutionized automatic code generation and opened new avenues for automatic statistical analysis. However, the validity and quality of these generated codes need to be systematically evaluated before they can be widely adopted. Despite their growing prominence, a comprehensive evaluation of statistical code generated by LLMs remai… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: 27 pages, 8 figures

  4. arXiv:2502.12386  [pdf, other

    stat.AP cs.AI

    Bridging the Data Gap in AI Reliability Research and Establishing DR-AIR, a Comprehensive Data Repository for AI Reliability

    Authors: Simin Zheng, Jared M. Clark, Fatemeh Salboukh, Priscila Silva, Karen da Mata, Fenglian Pan, Jie Min, Jiayi Lian, Caleb B. King, Lance Fiondella, Jian Liu, Xinwei Deng, Yili Hong

    Abstract: Artificial intelligence (AI) technology and systems have been advancing rapidly. However, ensuring the reliability of these systems is crucial for fostering public confidence in their use. This necessitates the modeling and analysis of reliability data specific to AI systems. A major challenge in AI reliability research, particularly for those in academia, is the lack of readily available AI relia… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 34 pages, 12 figures

  5. arXiv:2501.02353  [pdf, other

    cs.LG stat.ML

    Reweighting Improves Conditional Risk Bounds

    Authors: Yikai Zhang, Jiahe Lin, Fengpei Li, Songzhu Zheng, Anant Raj, Anderson Schneider, Yuriy Nevmyvaka

    Abstract: In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general ``balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from sta… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: 33 pages

    ACM Class: G.3; I.3

  6. arXiv:2412.10331  [pdf, other

    stat.AP cs.SI

    Applied Statistics in the Era of Artificial Intelligence: A Review and Vision

    Authors: Jie Min, Xinyi Song, Simin Zheng, Caleb B. King, Xinwei Deng, Yili Hong

    Abstract: The advent of artificial intelligence (AI) technologies has significantly changed many domains, including applied statistics. This review and vision paper explores the evolving role of applied statistics in the AI era, drawing from our experiences in engineering statistics. We begin by outlining the fundamental concepts and historical developments in applied statistics and tracing the rise of AI t… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

  7. arXiv:2411.19647  [pdf, ps, other

    cs.LG cs.AI stat.ML

    CAdam: Confidence-Based Optimization for Online Learning

    Authors: Shaowen Wang, Anan Liu, Jian Xiao, Huan Liu, Yuekui Yang, Cong Xu, Qianqian Pu, Suncong Zheng, Wei Zhang, Di Wang, Jie Jiang, Jian Li

    Abstract: Modern recommendation systems frequently employ online learning to dynamically update their models with freshly collected data. The most commonly used optimizer for updating neural networks in these contexts is the Adam optimizer, which integrates momentum ($m_t$) and adaptive learning rate ($v_t$). However, the volatile nature of online learning data, characterized by its frequent distribution sh… ▽ More

    Submitted 4 June, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

  8. arXiv:2411.01842  [pdf, other

    cs.LG stat.ML

    ElasTST: Towards Robust Varied-Horizon Forecasting with Elastic Time-Series Transformer

    Authors: Jiawen Zhang, Shun Zheng, Xumeng Wen, Xiaofang Zhou, Jiang Bian, Jia Li

    Abstract: Numerous industrial sectors necessitate models capable of providing robust forecasts across various horizons. Despite the recent strides in crafting specific architectures for time-series forecasting and developing pre-trained universal models, a comprehensive examination of their capability in accommodating varied-horizon forecasting during inference is still lacking. This paper bridges this gap… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  9. arXiv:2410.24220  [pdf, ps, other

    cs.LG cs.AI q-bio.QM stat.ML

    Bridging Geometric States via Geometric Diffusion Bridge

    Authors: Shengjie Luo, Yixian Xu, Di He, Shuxin Zheng, Tie-Yan Liu, Liwei Wang

    Abstract: The accurate prediction of geometric state evolution in complex systems is critical for advancing scientific domains such as quantum chemistry and material modeling. Traditional experimental and computational methods face challenges in terms of environmental constraints and computational demands, while current deep learning approaches still fall short in terms of precision and generality. In this… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Comments: 33 pages, 5 tables; NeurIPS 2024 Camera Ready version

  10. arXiv:2404.00551  [pdf, other

    stat.ML cs.LG

    Convergence of Continuous Normalizing Flows for Learning Probability Distributions

    Authors: Yuan Gao, Jian Huang, Yuling Jiao, Shurong Zheng

    Abstract: Continuous normalizing flows (CNFs) are a generative method for learning probability distributions, which is based on ordinary differential equations. This method has shown remarkable empirical success across various applications, including large-scale image synthesis, protein structure prediction, and molecule generation. In this work, we study the theoretical properties of CNFs with linear inter… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 60 pages, 3 tables, and 3 figures

    MSC Class: 62G05; 68T07

  11. arXiv:2402.14090  [pdf, other

    cs.AI econ.GN stat.ML

    Social Environment Design

    Authors: Edwin Zhang, Sadie Zhao, Tonghan Wang, Safwan Hossain, Henry Gasztowtt, Stephan Zheng, David C. Parkes, Milind Tambe, Yiling Chen

    Abstract: Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The fra… ▽ More

    Submitted 17 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: ICML 2024 Position Paper. Website at https://sed.eddie.win

  12. arXiv:2312.00186  [pdf, other

    stat.AP cs.AI

    Planning Reliability Assurance Tests for Autonomous Vehicles

    Authors: Simin Zheng, Lu Lu, Yili Hong, Jian Liu

    Abstract: Artificial intelligence (AI) technology has become increasingly prevalent and transforms our everyday life. One important application of AI technology is the development of autonomous vehicles (AV). However, the reliability of an AV needs to be carefully demonstrated via an assurance test so that the product can be used with confidence in the field. To plan for an assurance test, one needs to dete… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: 29 pages, 5 figures

  13. arXiv:2311.07972  [pdf, other

    stat.ME

    Residual Importance Weighted Transfer Learning For High-dimensional Linear Regression

    Authors: Junlong Zhao, Shengbin Zheng, Chenlei Leng

    Abstract: Transfer learning is an emerging paradigm for leveraging multiple sources to improve the statistical inference on a single target. In this paper, we propose a novel approach named residual importance weighted transfer learning (RIW-TL) for high-dimensional linear models built on penalized likelihood. Compared to existing methods such as Trans-Lasso that selects sources in an all-in-all-out manner,… ▽ More

    Submitted 3 January, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  14. arXiv:2310.13911  [pdf, other

    stat.ME stat.AP

    Multilevel Matrix Factor Model

    Authors: Yuteng Zhang, Yongchang Hui, Junrong Song, Shurong Zheng

    Abstract: Large-scale matrix data has been widely discovered and continuously studied in various fields recently. Considering the multi-level factor structure and utilizing the matrix structure, we propose a multilevel matrix factor model with both global and local factors. The global factors can affect all matrix times series, whereas the local factors are only allow to affect within each specific matrix t… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: 47 pages, 22 figures

  15. arXiv:2309.16578  [pdf, other

    stat.ML cs.LG physics.chem-ph

    Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

    Authors: He Zhang, Siyuan Liu, Jiacheng You, Chang Liu, Shuxin Zheng, Ziheng Lu, Tong Wang, Nanning Zheng, Bin Shao

    Abstract: Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT… ▽ More

    Submitted 9 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Published in Nature Computational Science, March 2024. Full paper with supplementary information

  16. arXiv:2309.04072  [pdf, ps, other

    math.NA cs.LG stat.ML

    Riemannian Langevin Monte Carlo schemes for sampling PSD matrices with fixed rank

    Authors: Tianmin Yu, Shixin Zheng, Jianfeng Lu, Govind Menon, Xiangxiong Zhang

    Abstract: This paper introduces two explicit schemes to sample matrices from Gibbs distributions on $\mathcal S^{n,p}_+$, the manifold of real positive semi-definite (PSD) matrices of size $n\times n$ and rank $p$. Given an energy function $\mathcal E:\mathcal S^{n,p}_+\to \mathbb{R}$ and certain Riemannian metrics $g$ on $\mathcal S^{n,p}_+$, these schemes rely on an Euler-Maruyama discretization of the Ri… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  17. arXiv:2308.08364  [pdf, other

    stat.AP stat.ME

    False Discovery Rate Control for Lesion-Symptom Mapping with Heterogeneous data via Weighted P-values

    Authors: Siyu Zheng, Alexander C. McLain, Joshua Habiger, Christopher Rorden, Julius Fridriksson

    Abstract: Lesion-symptom mapping studies provide insight into what areas of the brain are involved in different aspects of cognition. This is commonly done via behavioral testing in patients with a naturally occurring brain injury or lesions (e.g., strokes or brain tumors). This results in high-dimensional observational data where lesion status (present/absent) is non-uniformly distributed with some voxels… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    MSC Class: 62J15

  18. arXiv:2305.18258  [pdf, other

    cs.LG cs.AI cs.GT math.OC stat.ML

    Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

    Authors: Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang

    Abstract: In online reinforcement learning (online RL), balancing exploration and exploitation is crucial for finding an optimal policy in a sample-efficient way. To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration. However, in order to cope with general function approximators, most of them involve impractical algorithm… ▽ More

    Submitted 25 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  19. arXiv:2211.01962  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

    Authors: Han Zhong, Wei Xiong, Sirui Zheng, Liwei Wang, Zhaoran Wang, Zhuoran Yang, Tong Zhang

    Abstract: We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generali… ▽ More

    Submitted 30 June, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: We changed the title from the first version. We fixed a technical issue in the first version regarding the $\ell_2$ eluder technique (Lemma D.2)

  20. arXiv:2210.16350  [pdf, other

    stat.OT

    A Comparison of Reproducibility Guidelines and Its Implications on Undergraduate Statistical Education

    Authors: Siqi Zheng

    Abstract: In this paper, we replicated a Bayesian educational research project, which explores the association between broadband access and online course enrollment in the US. We summarized key findings from our replication and compared them with the original project. Based on my replication experience, we aim to demonstrate the challenges of research reproduction, even when codes and data are shared openly… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  21. arXiv:2210.01765  [pdf, other

    cs.LG q-bio.BM stat.ML

    One Transformer Can Understand Both 2D & 3D Molecular Data

    Authors: Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, Di He

    Abstract: Unlike vision and language data which usually has a unique format, molecules can naturally be characterized using different chemical formulations. One can view a molecule as a 2D graph or define it as a collection of atoms located in a 3D space. For molecular representation learning, most previous works designed neural networks only for a particular data format, making the learned models likely to… ▽ More

    Submitted 27 March, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: 20 pages; ICLR 2023, Camera Ready Version; Code: https://github.com/lsj2408/Transformer-M

  22. arXiv:2207.10772  [pdf, other

    stat.ML cs.LG

    Deep Sufficient Representation Learning via Mutual Information

    Authors: Siming Zheng, Yuanyuan Lin, Jian Huang

    Abstract: We propose a mutual information-based sufficient representation learning (MSRL) approach, which uses the variational formulation of the mutual information and leverages the approximation power of deep neural networks. MSRL learns a sufficient representation with the maximum mutual information with the response and a user-selected distribution. It can easily handle multi-dimensional continuous or c… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: 43 pages, 6 figures and 5 tables

    MSC Class: 62G05; 68T07

  23. arXiv:2205.13401  [pdf, other

    cs.LG cs.CL stat.ML

    Your Transformer May Not be as Powerful as You Expect

    Authors: Shengjie Luo, Shanda Li, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, Di He

    Abstract: Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based Transformers is largely unexplored. In this work, we mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximati… ▽ More

    Submitted 28 October, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: 22 pages; NeurIPS 2022, Camera Ready Version

  24. arXiv:2204.11155  [pdf, ps, other

    stat.ME

    Adaptive Tests for Bandedness of High-dimensional Covariance Matrices

    Authors: Xiaoyi Wang, Gongjun Xu, Shurong Zheng

    Abstract: Estimation of the high-dimensional banded covariance matrix is widely used in multivariate statistical analysis. To ensure the validity of estimation, we aim to test the hypothesis that the covariance matrix is banded with a certain bandwidth under the high-dimensional framework. Though several testing methods have been proposed in the literature, the existing tests are only powerful for some alte… ▽ More

    Submitted 23 April, 2022; originally announced April 2022.

  25. arXiv:2203.12003  [pdf, ps, other

    stat.ME

    On block-wise and reference panel-based estimators for genetic data prediction in high dimensions

    Authors: Bingxin Zhao, Shurong Zheng, Hongtu Zhu

    Abstract: Genetic prediction of complex traits and diseases has attracted enormous attention in precision medicine, mainly because it has the potential to translate discoveries from genome-wide association studies (GWAS) into medical advances. As the high dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants has a block-diagonal structure, many existing methods attem… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: 27 pages, 5 figures

  26. arXiv:2203.07681  [pdf, other

    cs.LG cs.AI stat.ML

    DEPTS: Deep Expansion Learning for Periodic Time Series Forecasting

    Authors: Wei Fan, Shun Zheng, Xiaohan Yi, Wei Cao, Yanjie Fu, Jiang Bian, Tie-Yan Liu

    Abstract: Periodic time series (PTS) forecasting plays a crucial role in a variety of industries to foster critical tasks, such as early warning, pre-planning, resource scheduling, etc. However, the complicated dependencies of the PTS signal on its inherent periodicity as well as the sophisticated composition of various periods hinder the performance of PTS forecasting. In this paper, we introduce a deep ex… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: ICLR22 Spotlight

  27. arXiv:2202.09784  [pdf, other

    cs.LG cs.AI cs.CV stat.ME

    Clustering by the Probability Distributions from Extreme Value Theory

    Authors: Sixiao Zheng, Ke Fan, Yanxi Hou, Jianfeng Feng, Yanwei Fu

    Abstract: Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into coherent subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique cluster, while it does not utilize the information of sample distribution or density. Comparably, it would potentially be more beneficial to consider the probabili… ▽ More

    Submitted 20 February, 2022; originally announced February 2022.

    Comments: IEEE Transactions on Artificial Intelligence

  28. arXiv:2106.12566  [pdf, other

    cs.LG cs.CL stat.ML

    Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

    Authors: Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu

    Abstract: The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful atte… ▽ More

    Submitted 2 November, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021, camera ready version

  29. arXiv:2105.07829  [pdf, other

    cs.DC cs.LG stat.ML

    Compressed Communication for Distributed Training: Adaptive Methods and System

    Authors: Yuchen Zhong, Cong Xie, Shuai Zheng, Haibin Lin

    Abstract: Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training. However, there is little understanding of applying gradient compression to adaptive gradient methods. Moreover, its performance benefits are often limited by the n… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  30. arXiv:2103.07756  [pdf, other

    cs.LG cs.CV stat.AP stat.ML

    Learning with Feature-Dependent Label Noise: A Progressive Approach

    Authors: Yikai Zhang, Songzhu Zheng, Pengxiang Wu, Mayank Goswami, Chao Chen

    Abstract: Label noise is frequently observed in real-world large-scale datasets. The noise is introduced due to a variety of reasons; it is heterogeneous and feature-dependent. Most existing approaches to handling noisy labels fall into two categories: they either assume an ideal feature-independent noise, or remain heuristic without theoretical guarantees. In this paper, we propose to target a new family o… ▽ More

    Submitted 27 March, 2021; v1 submitted 13 March, 2021; originally announced March 2021.

    Comments: ICLR 2021 (Spotlight)

  31. arXiv:2012.11100  [pdf, other

    stat.ME

    Two-directional simultaneous inference for high-dimensional models

    Authors: Wei Liu, Huazhen Lin, Jin Liu, Shurong Zheng

    Abstract: This paper proposes a general two directional simultaneous inference (TOSI) framework for high-dimensional models with a manifest variable or latent variable structure, for example, high-dimensional mean models, high-dimensional sparse regression models, and high-dimensional latent factors models. TOSI performs simultaneous inference on a set of parameters from two directions, one to test whether… ▽ More

    Submitted 6 February, 2023; v1 submitted 20 December, 2020; originally announced December 2020.

  32. arXiv:2007.13221  [pdf, other

    cs.LG cs.DC stat.ML

    CSER: Communication-efficient SGD with Error Reset

    Authors: Cong Xie, Shuai Zheng, Oluwasanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin

    Abstract: The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communication-efficient SGD with Error Reset, or CSER. The key idea in CSER is first a new technique called "error reset" that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. S… ▽ More

    Submitted 4 December, 2020; v1 submitted 26 July, 2020; originally announced July 2020.

  33. arXiv:2006.13484  [pdf, other

    cs.LG cs.CL cs.DC stat.ML

    Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes

    Authors: Shuai Zheng, Haibin Lin, Sheng Zha, Mu Li

    Abstract: BERT has recently attracted a lot of attention in natural language understanding (NLU) and achieved state-of-the-art results in various NLU tasks. However, its success requires large deep neural networks and huge amount of data, which result in long training time and impede development progress. Using stochastic gradient methods with large mini-batch has been advocated as an efficient tool to redu… ▽ More

    Submitted 18 September, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: Technical Report (not under reviewed in any venue)

  34. arXiv:2004.13332  [pdf, other

    econ.GN cs.LG stat.ML

    The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies

    Authors: Stephan Zheng, Alexander Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David C. Parkes, Richard Socher

    Abstract: Tackling real-world socio-economic challenges requires designing and testing economic policies. However, this is hard in practice, due to a lack of appropriate (micro-level) economic data and limited opportunity to experiment. In this work, we train social planners that discover tax policies in dynamic economies that can effectively trade-off economic equality and productivity. We propose a two-le… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: 46 pages, 21 figures

  35. arXiv:2002.05712  [pdf, other

    cs.LG cs.CV stat.ML

    Cross-Iteration Batch Normalization

    Authors: Zhuliang Yao, Yue Cao, Shuxin Zheng, Gao Huang, Stephen Lin

    Abstract: A well-known issue of Batch Normalization is its significantly reduced effectiveness in the case of small mini-batch sizes. When a mini-batch contains few examples, the statistics upon which the normalization is defined cannot be reliably estimated from it during a training iteration. To address this problem, we present Cross-Iteration Batch Normalization (CBN), in which examples from multiple rec… ▽ More

    Submitted 25 March, 2021; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: Accepted to CVPR 2021

  36. arXiv:2002.05578  [pdf, other

    cs.LG stat.ML

    Multiresolution Tensor Learning for Efficient and Interpretable Spatial Analysis

    Authors: Jung Yeon Park, Kenneth Theo Carr, Stephan Zheng, Yisong Yue, Rose Yu

    Abstract: Efficient and interpretable spatial analysis is crucial in many fields such as geology, sports, and climate science. Tensor latent factor models can describe higher-order correlations for spatial data. However, they are computationally expensive to train and are sensitive to initialization, leading to spatially incoherent, uninterpretable results. We develop a novel Multiresolution Tensor Learning… ▽ More

    Submitted 14 August, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  37. arXiv:2002.04745  [pdf, other

    cs.LG cs.CL stat.ML

    On Layer Normalization in the Transformer Architecture

    Authors: Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, Tie-Yan Liu

    Abstract: The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be crucial to the final performance but will slow down the optimization and bring more hyper-parameter tunings. In this paper, we first study theoretically why the learning rate warm-up stage is essential and show… ▽ More

    Submitted 29 June, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Journal ref: Published on ICML 2020

  38. arXiv:1912.01899  [pdf, other

    cs.LG stat.ML

    Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning

    Authors: Shuai Zheng, Zhenfeng Zhu, Xingxing Zhang, Zhizhe Liu, Jian Cheng, Yao Zhao

    Abstract: Graph representation learning aims to encode all nodes of a graph into low-dimensional vectors that will serve as input of many compute vision tasks. However, most existing algorithms ignore the existence of inherent data distribution and even noises. This may significantly increase the phenomenon of over-fitting and deteriorate the testing accuracy. In this paper, we propose a Distribution-induce… ▽ More

    Submitted 2 August, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Accepted to CVPR2020. 10 pages, 5 figures, 4 tables, fixed a error in the Figure.1

    Journal ref: booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={7224--7233}, year={2020}

  39. arXiv:1910.02035  [pdf, other

    cs.LG cs.AI stat.ML

    Manufacturing Dispatching using Reinforcement and Transfer Learning

    Authors: Shuai Zheng, Chetan Gupta, Susumu Serita

    Abstract: Efficient dispatching rule in manufacturing industry is key to ensure product on-time delivery and minimum past-due and inventory cost. Manufacturing, especially in the developed world, is moving towards on-demand manufacturing meaning a high mix, low volume product mix. This requires efficient dispatching that can work in dynamic and stochastic environments, meaning it allows for quick response t… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: ECML PKDD 2019 (The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019)

  40. arXiv:1910.02034  [pdf, other

    cs.LG cs.CV stat.ML

    Generative Adversarial Networks for Failure Prediction

    Authors: Shuai Zheng, Ahmed Farahat, Chetan Gupta

    Abstract: Prognostics and Health Management (PHM) is an emerging engineering discipline which is concerned with the analysis and prediction of equipment health and performance. One of the key challenges in PHM is to accurately predict impending failures in the equipment. In recent years, solutions for failure prediction have evolved from building complex physical models to the use of machine learning algori… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: ECML PKDD 2019 (The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019)

  41. arXiv:1909.10710  [pdf, other

    stat.ME

    Estimating Number of Factors by Adjusted Eigenvalues Thresholding

    Authors: Jianqing Fan, Jianhua Guo, Shurong Zheng

    Abstract: Determining the number of common factors is an important and practical topic in high dimensional factor models. The existing literatures are mainly based on the eigenvalues of the covariance matrix. Due to the incomparability of the eigenvalues of the covariance matrix caused by heterogeneous scales of observed variables, it is very difficult to give an accurate relationship between these eigenval… ▽ More

    Submitted 24 September, 2019; originally announced September 2019.

    Comments: 35 pages; 4 figures

  42. arXiv:1907.04433  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

    Authors: Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

    Abstract: We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customiza… ▽ More

    Submitted 12 February, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

    Journal ref: Journal of Machine Learning Research 21 (2020) 1-7

  43. arXiv:1907.00664  [pdf, other

    cs.LG stat.ML

    Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

    Authors: Wenling Shang, Alex Trott, Stephan Zheng, Caiming Xiong, Richard Socher

    Abstract: In many real-world scenarios, an autonomous agent often encounters various tasks within a single complex environment. We propose to build a graph abstraction over the environment structure to accelerate the learning of these tasks. Here, nodes are important points of interest (pivotal states) and edges represent feasible traversals between them. Our approach has two stages. First, we jointly train… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  44. arXiv:1906.06713  [pdf, ps, other

    math.ST stat.AP stat.ME

    Community Detection Based on the $L_\infty$ convergence of eigenvectors in DCBM

    Authors: Yan Liu, Zhiqiang Hou, Zhigang Yao, Zhidong Bai, Jiang Hu, Shurong Zheng

    Abstract: Spectral clustering is one of the most popular algorithms for community detection in network analysis. Based on this rationale, in this paper we give the convergence rate of eigenvectors for the adjacency matrix in the $l_\infty$ norm, under the stochastic block model (BM) and degree corrected stochastic block model (DCBM), adding some mild and rational conditions. We also extend this result to a… ▽ More

    Submitted 16 June, 2019; originally announced June 2019.

    Comments: 28 pages, 2 figures

  45. arXiv:1906.00562  [pdf, ps, other

    cs.CV cs.LG stat.ML

    Learning to Self-Train for Semi-Supervised Few-Shot Classification

    Authors: Xinzhe Li, Qianru Sun, Yaoyao Liu, Shibao Zheng, Qin Zhou, Tat-Seng Chua, Bernt Schiele

    Abstract: Few-shot classification (FSC) is challenging due to the scarcity of labeled training data (e.g. only one labeled data point per class). Meta-learning has shown to achieve promising results by learning to initialize a classification model for FSC. In this paper we propose a novel semi-supervised meta-learning method called learning to self-train (LST) that leverages unlabeled data and specifically… ▽ More

    Submitted 29 September, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  46. arXiv:1905.12654  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On the Generalization Gap in Reparameterizable Reinforcement Learning

    Authors: Huan Wang, Stephan Zheng, Caiming Xiong, Richard Socher

    Abstract: Understanding generalization in reinforcement learning (RL) is a significant challenge, as many common assumptions of traditional supervised learning theory do not apply. We focus on the special class of reparameterizable RL problems, where the trajectory distribution can be decomposed using the reparametrization trick. For this problem class, estimating the expected return is efficient and the tr… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Journal ref: Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019

  47. arXiv:1905.10936  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

    Authors: Shuai Zheng, Ziyue Huang, James T. Kwok

    Abstract: Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is base… ▽ More

    Submitted 28 October, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: NeurIPS 2019

  48. arXiv:1905.09899  [pdf, other

    cs.LG math.OC stat.ML

    Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning

    Authors: Shuai Zheng, James T. Kwok

    Abstract: Stochastic methods with coordinate-wise adaptive stepsize (such as RMSprop and Adam) have been widely used in training deep neural networks. Despite their fast convergence, they can generalize worse than stochastic gradient descent. In this paper, by revisiting the design of Adagrad, we propose to split the network parameters into blocks, and use a blockwise adaptive stepsize. Intuitively, blockwi… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

  49. arXiv:1904.06442  [pdf, other

    cs.LG stat.ML

    Remaining Useful Life Estimation Using Functional Data Analysis

    Authors: Qiyao Wang, Shuai Zheng, Ahmed Farahat, Susumu Serita, Chetan Gupta

    Abstract: Remaining Useful Life (RUL) of an equipment or one of its components is defined as the time left until the equipment or component reaches its end of useful life. Accurate RUL estimation is exceptionally beneficial to Predictive Maintenance, and Prognostics and Health Management (PHM). Data driven approaches which leverage the power of algorithms for RUL estimation using sensor and operational time… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: Accepted by IEEE International Conference on Prognostics and Health Management 2019

  50. arXiv:1901.10946  [pdf, other

    cs.LG stat.ML

    NAOMI: Non-Autoregressive Multiresolution Sequence Imputation

    Authors: Yukai Liu, Rose Yu, Stephan Zheng, Eric Zhan, Yisong Yue

    Abstract: Missing value imputation is a fundamental problem in spatiotemporal modeling, from motion tracking to the dynamics of physical systems. Deep autoregressive models suffer from error propagation which becomes catastrophic for imputing long-range sequences. In this paper, we take a non-autoregressive approach and propose a novel deep generative model: Non-AutOregressive Multiresolution Imputation (NA… ▽ More

    Submitted 29 October, 2019; v1 submitted 30 January, 2019; originally announced January 2019.