-
Heteroscedastic Growth Curve Modeling with Shape-Restricted Splines
Authors:
Jieying Jiao,
Wenling Song,
Yishu Xue,
Jun Yan
Abstract:
Growth curve analysis (GCA) has a wide range of applications in various fields where growth trajectories need to be modeled. Heteroscedasticity is often present in the error term, which can not be handled with sufficient flexibility by standard linear fixed or mixed-effects models. One situation that has been addressed is where the error variance is characterized by a linear predictor with certain…
▽ More
Growth curve analysis (GCA) has a wide range of applications in various fields where growth trajectories need to be modeled. Heteroscedasticity is often present in the error term, which can not be handled with sufficient flexibility by standard linear fixed or mixed-effects models. One situation that has been addressed is where the error variance is characterized by a linear predictor with certain covariates. A frequently encountered scenario in GCA, however, is one in which the variance is a smooth function of the mean with known shape restrictions. A naive application of standard linear mixed-effects models would underestimate the variance of the fixed effects estimators and, consequently, the uncertainty of the estimated growth curve. We propose to model the variance of the response variable as a shape-restricted (increasing/decreasing; convex/concave) function of the marginal or conditional mean using shape-restricted splines. A simple iteratively reweighted fitting algorithm that takes advantage of existing software for linear mixed-effects models is developed. For inference, a parametric bootstrap procedure is recommended. Our simulation study shows that the proposed method gives satisfactory inference with moderate sample sizes. The utility of the method is demonstrated using two real-world applications.
△ Less
Submitted 28 February, 2025;
originally announced March 2025.
-
Data Jamboree: A Party of Open-Source Software Solving Real-World Data Science Problems
Authors:
Lucy D'Agostino McGowan,
Shannon Tass,
Sam Tyner,
HaiYing Wang,
Jun Yan
Abstract:
The evolving focus in statistics and data science education highlights the growing importance of computing. This paper presents the Data Jamboree, a live event that combines computational methods with traditional statistical techniques to address real-world data science problems. Participants, ranging from novices to experienced users, followed workshop leaders in using open-source tools like Juli…
▽ More
The evolving focus in statistics and data science education highlights the growing importance of computing. This paper presents the Data Jamboree, a live event that combines computational methods with traditional statistical techniques to address real-world data science problems. Participants, ranging from novices to experienced users, followed workshop leaders in using open-source tools like Julia, Python, and R to perform tasks such as data cleaning, manipulation, and predictive modeling. The Jamboree showcased the educational benefits of working with open data, providing participants with practical, hands-on experience. We compared the tools in terms of efficiency, flexibility, and statistical power, with Julia excelling in performance, Python in versatility, and R in statistical analysis and visualization. The paper concludes with recommendations for designing similar events to encourage collaborative learning and critical thinking in data science.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Authors:
Jinbo Wang,
Mingze Wang,
Zhanpeng Zhou,
Junchi Yan,
Weinan E,
Lei Wu
Abstract:
Transformers consist of diverse building blocks, such as embedding layers, normalization layers, self-attention mechanisms, and point-wise feedforward networks. Thus, understanding the differences and interactions among these blocks is important. In this paper, we uncover a clear Sharpness Disparity across these blocks, which emerges early in training and intriguingly persists throughout the train…
▽ More
Transformers consist of diverse building blocks, such as embedding layers, normalization layers, self-attention mechanisms, and point-wise feedforward networks. Thus, understanding the differences and interactions among these blocks is important. In this paper, we uncover a clear Sharpness Disparity across these blocks, which emerges early in training and intriguingly persists throughout the training process. Motivated by this finding, we propose Blockwise Learning Rate (LR), a strategy that tailors the LR to each block's sharpness, accelerating large language model (LLM) pre-training. By integrating Blockwise LR into AdamW, we consistently achieve lower terminal loss and nearly $2\times$ speedup compared to vanilla AdamW. We demonstrate this acceleration across GPT-2 and LLaMA, with model sizes ranging from 0.12B to 1.1B and datasets of OpenWebText and MiniPile. Finally, we incorporate Blockwise LR into Adam-mini (Zhang et al., 2024), a recently proposed memory-efficient variant of Adam, achieving a combined $2\times$ speedup and $2\times$ memory saving. These results underscore the potential of exploiting the sharpness disparity to improve LLM training.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Principles for Open Data Curation: A Case Study with the New York City 311 Service Request Data
Authors:
David Tussey,
Jun Yan
Abstract:
In the early 21st century, the open data movement began to transform societies and governments by promoting transparency, innovation, and public engagement. The City of New York (NYC) has been at the forefront of this movement since the enactment of the Open Data Law in 2012, creating the NYC Open Data portal. The portal currently hosts 2,700 datasets, serving as a crucial resource for research ac…
▽ More
In the early 21st century, the open data movement began to transform societies and governments by promoting transparency, innovation, and public engagement. The City of New York (NYC) has been at the forefront of this movement since the enactment of the Open Data Law in 2012, creating the NYC Open Data portal. The portal currently hosts 2,700 datasets, serving as a crucial resource for research across various domains, including health, urban development, and transportation. However, the effective use of open data relies heavily on data quality and usability, challenges that remain insufficiently addressed in the literature. This paper examines these challenges via a case study of the NYC 311 Service Request dataset, identifying key issues in data validity, consistency, and curation efficiency. We propose a set of data curation principles, tailored for government-released open data, to address these challenges. Our findings highlight the importance of harmonized field definitions, streamlined storage, and automated quality checks, offering practical guidelines for improving the reliability and utility of open datasets.
△ Less
Submitted 7 March, 2025; v1 submitted 14 January, 2025;
originally announced February 2025.
-
CALF-SBM: A Covariate-Assisted Latent Factor Stochastic Block Model
Authors:
Sydney Louit,
Evan Clark,
Alexander Gelbard,
Niketna Vivek,
Jun Yan,
Panpan Zhang
Abstract:
We propose a novel network generative model extended from the standard stochastic block model by concurrently utilizing observed node-level information and accounting for network-enabled nodal heterogeneity. The proposed model is so so-called covariate-assisted latent factor stochastic block model (CALF-SBM). The inference for the proposed model is done in a fully Bayesian framework. The primary a…
▽ More
We propose a novel network generative model extended from the standard stochastic block model by concurrently utilizing observed node-level information and accounting for network-enabled nodal heterogeneity. The proposed model is so so-called covariate-assisted latent factor stochastic block model (CALF-SBM). The inference for the proposed model is done in a fully Bayesian framework. The primary application of CALF-SBM in the present research is focused on community detection, where a model-selection-based approach is employed to estimate the number of communities which is practically assumed unknown. To assess the performance of CALF-SBM, an extensive simulation study is carried out, including comparisons with multiple classical and modern network clustering algorithms. Lastly, the paper presents two real data applications, respectively based on an extremely new network data demonstrating collaborative relationships of otolaryngologists in the United States and a traditional aviation network data containing information about direct flights between airports in the United States and Canada.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Context-Alignment: Activating and Enhancing LLM Capabilities in Time Series
Authors:
Yuxiao Hu,
Qian Li,
Dongxiao Zhang,
Jinyue Yan,
Yuntian Chen
Abstract:
Recently, leveraging pre-trained Large Language Models (LLMs) for time series (TS) tasks has gained increasing attention, which involves activating and enhancing LLMs' capabilities. Many methods aim to activate LLMs' capabilities based on token-level alignment but overlook LLMs' inherent strength on natural language processing -- their deep understanding of linguistic logic and structure rather th…
▽ More
Recently, leveraging pre-trained Large Language Models (LLMs) for time series (TS) tasks has gained increasing attention, which involves activating and enhancing LLMs' capabilities. Many methods aim to activate LLMs' capabilities based on token-level alignment but overlook LLMs' inherent strength on natural language processing -- their deep understanding of linguistic logic and structure rather than superficial embedding processing. We propose Context-Alignment, a new paradigm that aligns TS with a linguistic component in the language environments familiar to LLMs to enable LLMs to contextualize and comprehend TS data, thereby activating their capabilities. Specifically, such context-level alignment comprises structural alignment and logical alignment, which is achieved by a Dual-Scale Context-Alignment GNNs (DSCA-GNNs) applied to TS-language multimodal inputs. Structural alignment utilizes dual-scale nodes to describe hierarchical structure in TS-language, enabling LLMs treat long TS data as a whole linguistic component while preserving intrinsic token features. Logical alignment uses directed edges to guide logical relationships, ensuring coherence in the contextual semantics. Demonstration examples prompt are employed to construct Demonstration Examples based Context-Alignment (DECA) following DSCA-GNNs framework. DECA can be flexibly and repeatedly integrated into various layers of pre-trained LLMs to improve awareness of logic and structure, thereby enhancing performance. Extensive experiments show the effectiveness of DECA and the importance of Context-Alignment across tasks, particularly in few-shot and zero-shot forecasting, confirming that Context-Alignment provide powerful prior knowledge on context.
△ Less
Submitted 5 April, 2025; v1 submitted 7 January, 2025;
originally announced January 2025.
-
On the Unknowable Limits to Prediction
Authors:
Jiani Yan,
Charles Rahal
Abstract:
We propose a rigorous decomposition of predictive error, highlighting that not all 'irreducible' error is genuinely immutable. Many domains stand to benefit from iterative enhancements in measurement, construct validity, and modeling. Our approach demonstrates how apparently 'unpredictable' outcomes can become more tractable with improved data (across both target and features) and refined algorith…
▽ More
We propose a rigorous decomposition of predictive error, highlighting that not all 'irreducible' error is genuinely immutable. Many domains stand to benefit from iterative enhancements in measurement, construct validity, and modeling. Our approach demonstrates how apparently 'unpredictable' outcomes can become more tractable with improved data (across both target and features) and refined algorithms. By distinguishing aleatoric from epistemic error, we delineate how accuracy may asymptotically improve--though inherent stochasticity may remain--and offer a robust framework for advancing computational research.
△ Less
Submitted 10 February, 2025; v1 submitted 28 November, 2024;
originally announced November 2024.
-
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Authors:
Zhanpeng Zhou,
Mingze Wang,
Yuchen Mao,
Bingrui Li,
Junchi Yan
Abstract:
Sharpness-Aware Minimization (SAM) has substantially improved the generalization of neural networks under various settings. Despite the success, its effectiveness remains poorly understood. In this work, we discover an intriguing phenomenon in the training dynamics of SAM, shedding light on understanding its implicit bias towards flatter minima over Stochastic Gradient Descent (SGD). Specifically,…
▽ More
Sharpness-Aware Minimization (SAM) has substantially improved the generalization of neural networks under various settings. Despite the success, its effectiveness remains poorly understood. In this work, we discover an intriguing phenomenon in the training dynamics of SAM, shedding light on understanding its implicit bias towards flatter minima over Stochastic Gradient Descent (SGD). Specifically, we find that SAM efficiently selects flatter minima late in training. Remarkably, even a few epochs of SAM applied at the end of training yield nearly the same generalization and solution sharpness as full SAM training. Subsequently, we delve deeper into the underlying mechanism behind this phenomenon. Theoretically, we identify two phases in the learning dynamics after applying SAM late in training: i) SAM first escapes the minimum found by SGD exponentially fast; and ii) then rapidly converges to a flatter minimum within the same valley. Furthermore, we empirically investigate the role of SAM during the early training phase. We conjecture that the optimization method chosen in the late phase is more crucial in shaping the final solution's properties. Based on this viewpoint, we extend our findings from SAM to Adversarial Training.
△ Less
Submitted 20 February, 2025; v1 submitted 14 October, 2024;
originally announced October 2024.
-
Dynamical causality under invisible confounders
Authors:
Jinling Yan,
Shao-Wu Zhang,
Chihao Zhang,
Weitian Huang,
Jifan Shi,
Luonan Chen
Abstract:
Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result,…
▽ More
Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result, accurately inferring causation with invisible confounders remains a largely unexplored and outstanding issue in data science and AI fields. In this work, we propose a method to overcome such challenges to infer dynamical causality under invisible confounders (CIC method) and further reconstruct the invisible confounders from time-series data by developing an orthogonal decomposition theorem in a delay embedding space. The core of our CIC method lies in its ability to decompose the observed variables not in their original space but in their delay embedding space into the common and private subspaces respectively, thereby quantifying causality between those variables both theoretically and computationally. This theoretical foundation ensures the causal detection for any high-dimensional system even with only two observed variables under many invisible confounders, which is actually a long-standing problem in the field. In addition to the invisible confounder problem, such a decomposition actually makes the intertwined variables separable in the embedding space, thus also solving the non-separability problem of causal inference. Extensive validation of the CIC method is carried out using various real datasets, and the experimental results demonstrates its effectiveness to reconstruct real biological networks even with unobserved confounders.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
Recurrent Events Modeling Based on a Reflected Brownian Motion with Application to Hypoglycemia
Authors:
Yingfa Xie,
Haoda Fu,
Yuan Huang,
Vladimir Pozdnyakov,
Jun Yan
Abstract:
Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient…
▽ More
Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient's observed hypoglycemic event as a lower-boundary crossing event for a reflected Brownian motion with an upper reflection barrier. The lower-boundary is set by clinical standards. To capture patient heterogeneity and within-patient dependence, covariates and a patient level frailty are incorporated into the volatility and the upper reflection barrier. This framework provides quantification for the underlying glucose level variability, patients heterogeneity, and risk factors' impact on glucose. We make inferences based on a Bayesian framework using Markov chain Monte Carlo. Two model comparison criteria, the Deviance Information Criterion and the Logarithm of the Pseudo-Marginal Likelihood, are used for model selection. The methodology is validated in simulation studies. In analyzing a dataset from the diabetic patients in the DURABLE trial, our model provides adequate fit, generates data similar to the observed data, and offers insights that could be missed by other models.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Simple rejection Monte Carlo algorithm and its application to multivariate statistical inference
Authors:
Fengyu Li,
Huijiao Yu,
Jun Yan,
Xianyong Meng
Abstract:
The Monte Carlo algorithm is increasingly utilized, with its central step involving computer-based random sampling from stochastic models. While both Markov Chain Monte Carlo (MCMC) and Reject Monte Carlo serve as sampling methods, the latter finds fewer applications compared to the former. Hence, this paper initially provides a concise introduction to the theory of the Reject Monte Carlo algorith…
▽ More
The Monte Carlo algorithm is increasingly utilized, with its central step involving computer-based random sampling from stochastic models. While both Markov Chain Monte Carlo (MCMC) and Reject Monte Carlo serve as sampling methods, the latter finds fewer applications compared to the former. Hence, this paper initially provides a concise introduction to the theory of the Reject Monte Carlo algorithm and its implementation techniques, aiming to enhance conceptual understanding and program implementation. Subsequently, a simplified rejection Monte Carlo algorithm is formulated. Furthermore, by considering multivariate distribution sampling and multivariate integration as examples, this study explores the specific application of the algorithm in statistical inference.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Comparison of sectoral structures between China and Japan: A network perspective
Authors:
Tao Wang,
Shiying Xiao,
Jun Yan
Abstract:
Economic structure comparisons between China and Japan have long captivated development economists. To delve deeper into their sectoral differences from 1995 to 2018, we used the annual input-output tables (IOTs) of both nations to construct weighted and directed input-output networks (IONs). This facilitated deeper network analyses. Strength distributions underscored variations in inter-sector ec…
▽ More
Economic structure comparisons between China and Japan have long captivated development economists. To delve deeper into their sectoral differences from 1995 to 2018, we used the annual input-output tables (IOTs) of both nations to construct weighted and directed input-output networks (IONs). This facilitated deeper network analyses. Strength distributions underscored variations in inter-sector economic interactions. Weighted, directed assortativity coefficients encapsulated the homophily among connecting sectors' features. By adjusting emphasis in PageRank centrality, key sectors were identified. Community detection revealed their clustering tendencies among the sectors. As anticipated, the analysis pinpointed manufacturing as China's central sector, while Japan favored services. Yet, at a finer level of the specific sectors, both nations exhibited varied structural evolutions. Contrastingly, sectoral communities in both China and Japan demonstrated commendable stability over the examined duration.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Poisson Process for Bayesian Optimization
Authors:
Xiaoxing Wang,
Jiaxing Li,
Chao Xue,
Wei Liu,
Weifeng Liu,
Xiaokang Yang,
Junchi Yan,
Dacheng Tao
Abstract:
BayesianOptimization(BO) is a sample-efficient black-box optimizer, and extensive methods have been proposed to build the absolute function response of the black-box function through a probabilistic surrogate model, including Tree-structured Parzen Estimator (TPE), random forest (SMAC), and Gaussian process (GP). However, few methods have been explored to estimate the relative rankings of candidat…
▽ More
BayesianOptimization(BO) is a sample-efficient black-box optimizer, and extensive methods have been proposed to build the absolute function response of the black-box function through a probabilistic surrogate model, including Tree-structured Parzen Estimator (TPE), random forest (SMAC), and Gaussian process (GP). However, few methods have been explored to estimate the relative rankings of candidates, which can be more robust to noise and have better practicality than absolute function responses, especially when the function responses are intractable but preferences can be acquired. To this end, we propose a novel ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO). Two tailored acquisition functions are further derived from classic LCB and EI to accommodate it. Compared to the classic GP-BO method, our PoPBO has lower computation costs and better robustness to noise, which is verified by abundant experiments. The results on both simulated and real-world benchmarks, including hyperparameter optimization (HPO) and neural architecture search (NAS), show the effectiveness of PoPBO.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
On GEE for Mean-Variance-Correlation Models: Variance Estimation and Model Selection
Authors:
Zhenyu Xu,
Jason P. Fine,
Wenling Song,
Jun Yan
Abstract:
Generalized estimating equations (GEE) are of great importance in analyzing clustered data without full specification of multivariate distributions. A recent approach jointly models the mean, variance, and correlation coefficients of clustered data through three sets of regressions (Luo and Pan, 2022). We observe that these estimating equations, however, are a special case of those of Yan and Fine…
▽ More
Generalized estimating equations (GEE) are of great importance in analyzing clustered data without full specification of multivariate distributions. A recent approach jointly models the mean, variance, and correlation coefficients of clustered data through three sets of regressions (Luo and Pan, 2022). We observe that these estimating equations, however, are a special case of those of Yan and Fine (2004) which further allows the variance to depend on the mean through a variance function. The proposed variance estimators may be incorrect for the variance and correlation parameters because of a subtle dependence induced by the nested structure of the estimating equations. We characterize model settings where their variance estimation is invalid and show the variance estimators in Yan and Fine (2004) correctly account for such dependence. In addition, we introduce a novel model selection criterion that enables the simultaneous selection of the mean-scale-correlation model. The sandwich variance estimator and the proposed model selection criterion are tested by several simulation studies and real data analysis, which validate its effectiveness in variance estimation and model selection. Our work also extends the R package geepack with the flexibility to apply different working covariance matrices for the variance and correlation structures.
△ Less
Submitted 9 January, 2025; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis Data
Authors:
Yuxiao Hu,
Qian Li,
Xiaodan Shi,
Jinyue Yan,
Yuntian Chen
Abstract:
Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is of…
▽ More
Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is often overlooked or inadequately considered. To address these limitations, we present a novel Multi-spatial Multi-temporal air quality forecasting method based on Graph Convolutional Networks and Gated Recurrent Units (M2G2), bridging the gap in air quality forecasting across spatial and temporal scales. The proposed framework consists of two modules: Multi-scale Spatial GCN (MS-GCN) for spatial information fusion and Multi-scale Temporal GRU(MT-GRU) for temporal information integration. In the spatial dimension, the MS-GCN module employs a bidirectional learnable structure and a residual structure, enabling comprehensive information exchange between individual monitoring stations and the city-scale graph. Regarding the temporal dimension, the MT-GRU module adaptively combines information from different temporal scales through parallel hidden states. Leveraging meteorological indicators and four air quality indicators, we present comprehensive comparative analyses and ablation experiments, showcasing the higher accuracy of M2G2 in comparison to nine currently available advanced approaches across all aspects. The improvements of M2G2 over the second-best method on RMSE of the 24h/48h/72h are as follows: PM2.5: (7.72%, 6.67%, 10.45%); PM10: (6.43%, 5.68%, 7.73%); NO2: (5.07%, 7.76%, 16.60%); O3: (6.46%, 6.86%, 9.79%). Furthermore, we demonstrate the effectiveness of each module of M2G2 by ablation study.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Multicoated and Folded Graph Neural Networks with Strong Lottery Tickets
Authors:
Jiale Yan,
Hiroaki Ito,
Ángel López García-Arias,
Yasuyuki Okoshi,
Hikari Otsuka,
Kazushi Kawamura,
Thiem Van Chu,
Masato Motomura
Abstract:
The Strong Lottery Ticket Hypothesis (SLTH) demonstrates the existence of high-performing subnetworks within a randomly initialized model, discoverable through pruning a convolutional neural network (CNN) without any weight training. A recent study, called Untrained GNNs Tickets (UGT), expanded SLTH from CNNs to shallow graph neural networks (GNNs). However, discrepancies persist when comparing ba…
▽ More
The Strong Lottery Ticket Hypothesis (SLTH) demonstrates the existence of high-performing subnetworks within a randomly initialized model, discoverable through pruning a convolutional neural network (CNN) without any weight training. A recent study, called Untrained GNNs Tickets (UGT), expanded SLTH from CNNs to shallow graph neural networks (GNNs). However, discrepancies persist when comparing baseline models with learned dense weights. Additionally, there remains an unexplored area in applying SLTH to deeper GNNs, which, despite delivering improved accuracy with additional layers, suffer from excessive memory requirements. To address these challenges, this work utilizes Multicoated Supermasks (M-Sup), a scalar pruning mask method, and implements it in GNNs by proposing a strategy for setting its pruning thresholds adaptively. In the context of deep GNNs, this research uncovers the existence of untrained recurrent networks, which exhibit performance on par with their trained feed-forward counterparts. This paper also introduces the Multi-Stage Folding and Unshared Masks methods to expand the search space in terms of both architecture and parameters. Through the evaluation of various datasets, including the Open Graph Benchmark (OGB), this work establishes a triple-win scenario for SLTH-based GNNs: by achieving high sparsity, competitive performance, and high memory efficiency with up to 98.7\% reduction, it demonstrates suitability for energy-efficient graph processing.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
The Blessings of Multiple Treatments and Outcomes in Treatment Effect Estimation
Authors:
Yong Wu,
Mingzhou Liu,
Jing Yan,
Yanwei Fu,
Shouyan Wang,
Yizhou Wang,
Xinwei Sun
Abstract:
Assessing causal effects in the presence of unobserved confounding is a challenging problem. Existing studies leveraged proxy variables or multiple treatments to adjust for the confounding bias. In particular, the latter approach attributes the impact on a single outcome to multiple treatments, allowing estimating latent variables for confounding control. Nevertheless, these methods primarily focu…
▽ More
Assessing causal effects in the presence of unobserved confounding is a challenging problem. Existing studies leveraged proxy variables or multiple treatments to adjust for the confounding bias. In particular, the latter approach attributes the impact on a single outcome to multiple treatments, allowing estimating latent variables for confounding control. Nevertheless, these methods primarily focus on a single outcome, whereas in many real-world scenarios, there is greater interest in studying the effects on multiple outcomes. Besides, these outcomes are often coupled with multiple treatments. Examples include the intensive care unit (ICU), where health providers evaluate the effectiveness of therapies on multiple health indicators. To accommodate these scenarios, we consider a new setting dubbed as multiple treatments and multiple outcomes. We then show that parallel studies of multiple outcomes involved in this setting can assist each other in causal identification, in the sense that we can exploit other treatments and outcomes as proxies for each treatment effect under study. We proceed with a causal discovery method that can effectively identify such proxies for causal estimation. The utility of our method is demonstrated in synthetic data and sepsis disease.
△ Less
Submitted 14 October, 2023; v1 submitted 29 September, 2023;
originally announced September 2023.
-
A Strength and Sparsity Preserving Algorithm for Generating Weighted, Directed Networks with Predetermined Assortativity
Authors:
Yelie Yuan,
Jun Yan,
Panpan Zhang
Abstract:
Degree-preserving rewiring is a widely used technique for generating unweighted networks with given assortativity, but for weighted networks, it is unclear how an analog would preserve the strengths and other critical network features such as sparsity level. This study introduces a novel approach for rewiring weighted networks to achieve desired directed assortativity. The method utilizes a mixed…
▽ More
Degree-preserving rewiring is a widely used technique for generating unweighted networks with given assortativity, but for weighted networks, it is unclear how an analog would preserve the strengths and other critical network features such as sparsity level. This study introduces a novel approach for rewiring weighted networks to achieve desired directed assortativity. The method utilizes a mixed integer programming framework to establish a target network with predetermined assortativity coefficients, followed by an efficient rewiring algorithm termed "strength and sparsity preserving rewiring" (SSPR). SSPR retains the node strength distributions and network sparsity after rewiring. It is also possible to accommodate additional properties like edge weight distribution with extra computational cost. The optimization scheme can be used to determine feasible assortativity ranges for an initial network. The effectiveness of the proposed SSPR algorithm is demonstrated through its application to two classes of popular network models.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
A general model-checking procedure for semiparametric accelerated failure time models
Authors:
Dongrak Choi,
Woojung Bae,
Jun Yan,
Sangwook Kang
Abstract:
We propose a set of goodness-of-fit tests for the semiparametric accelerated failure time (AFT) model, including an omnibus test, a link function test, and a functional form test. This set of tests is derived from a multi-parameter cumulative sum process shown to follow asymptotically a zero-mean Gaussian process. Its evaluation is based on the asymptotically equivalent perturbed version, which en…
▽ More
We propose a set of goodness-of-fit tests for the semiparametric accelerated failure time (AFT) model, including an omnibus test, a link function test, and a functional form test. This set of tests is derived from a multi-parameter cumulative sum process shown to follow asymptotically a zero-mean Gaussian process. Its evaluation is based on the asymptotically equivalent perturbed version, which enables both graphical and numerical evaluations of the assumed AFT model. Empirical p-values are obtained using the Kolmogorov-type supremum test, which provides a reliable approach for estimating the significance of both proposed un-standardized and standardized test statistics. The proposed procedure is illustrated using the induced smoothed rank-based estimator but is directly applicable to other popular estimators such as non-smooth rank-based estimator or least-squares estimator.Our proposed methods are rigorously evaluated using extensive simulation experiments that demonstrate their effectiveness in maintaining a Type I error rate and detecting departures from the assumed AFT model in practical sample sizes and censoring rates. Furthermore, the proposed approach is applied to the analysis of the Primary Biliary Cirrhosis data, a widely studied dataset in survival analysis, providing further evidence of the practical usefulness of the proposed methods in real-world scenarios. To make the proposed methods more accessible to researchers, we have implemented them in the R package afttest, which is publicly available on the Comprehensive R Archieve Network.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Sample-efficient Multi-objective Molecular Optimization with GFlowNets
Authors:
Yiheng Zhu,
Jialu Wu,
Chaowen Hu,
Jiahuan Yan,
Chang-Yu Hsieh,
Tingjun Hou,
Jian Wu
Abstract:
Many crucial scientific problems involve designing novel molecules with desired properties, which can be formulated as a black-box optimization problem over the discrete chemical space. In practice, multiple conflicting objectives and costly evaluations (e.g., wet-lab experiments) make the diversity of candidates paramount. Computational methods have achieved initial success but still struggle wit…
▽ More
Many crucial scientific problems involve designing novel molecules with desired properties, which can be formulated as a black-box optimization problem over the discrete chemical space. In practice, multiple conflicting objectives and costly evaluations (e.g., wet-lab experiments) make the diversity of candidates paramount. Computational methods have achieved initial success but still struggle with considering diversity in both objective and search space. To fill this gap, we propose a multi-objective Bayesian optimization (MOBO) algorithm leveraging the hypernetwork-based GFlowNets (HN-GFN) as an acquisition function optimizer, with the purpose of sampling a diverse batch of candidate molecular graphs from an approximate Pareto front. Using a single preference-conditioned hypernetwork, HN-GFN learns to explore various trade-offs between objectives. We further propose a hindsight-like off-policy strategy to share high-performing molecules among different preferences in order to speed up learning for HN-GFN. We empirically illustrate that HN-GFN has adequate capacity to generalize over preferences. Moreover, experiments in various real-world MOBO settings demonstrate that our framework predominantly outperforms existing methods in terms of candidate quality and sample efficiency. The code is available at https://github.com/violet-sto/HN-GFN.
△ Less
Submitted 2 November, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
Generating General Preferential Attachment Networks with R Package wdnet
Authors:
Yelie Yuan,
Tiandong Wang,
Jun Yan,
Panpan Zhang
Abstract:
Preferential attachment (PA) network models have a wide range of applications in various scientific disciplines. Efficient generation of large-scale PA networks helps uncover their structural properties and facilitate the development of associated analytical methodologies. Existing software packages only provide limited functions for this purpose with restricted configurations and efficiency. We p…
▽ More
Preferential attachment (PA) network models have a wide range of applications in various scientific disciplines. Efficient generation of large-scale PA networks helps uncover their structural properties and facilitate the development of associated analytical methodologies. Existing software packages only provide limited functions for this purpose with restricted configurations and efficiency. We present a generic, user-friendly implementation of weighted, directed PA network generation with R package wdnet. The core algorithm is based on an efficient binary tree approach. The package further allows adding multiple edges at a time, heterogeneous reciprocal edges, and user-specified preference functions. The engine under the hood is implemented in C++. Usages of the package are illustrated with detailed explanation. A benchmark study shows that wdnet is efficient for generating general PA networks not available in other packages. In restricted settings that can be handled by existing packages, wdnet provides comparable efficiency.
△ Less
Submitted 15 October, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
Transfer Learning with Large-Scale Quantile Regression
Authors:
Jun Jin,
Jun Yan,
Robert H. Aseltine,
Kun Chen
Abstract:
Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data may go beyond the target and be supplemented from other sources that possibly share similarities with the target. A crucial question is how to properly distinguis…
▽ More
Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data may go beyond the target and be supplemented from other sources that possibly share similarities with the target. A crucial question is how to properly distinguish and utilize useful information from other sources to improve the quantile estimation and inference at the target. We develop transfer learning methods for high-dimensional quantile regression by detecting informative sources whose models are similar to the target and utilizing them to improve the target model. We show that under reasonable conditions, the detection of the informative sources based on sample splitting is consistent. Compared to the naive estimator with only the target data, the transfer learning estimator achieves a much lower error rate as a function of the sample sizes, the signal-to-noise ratios, and the similarity measures among the target and the source models. Extensive simulation studies demonstrate the superiority of our proposed approach. We apply our methods to tackle the problem of detecting hard-landing risk for flight safety and show the benefits and insights gained from transfer learning of three different types of airplanes: Boeing 737, Airbus A320, and Airbus A380.
△ Less
Submitted 25 February, 2024; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Distance and Kernel-Based Measures for Global and Local Two-Sample Conditional Distribution Testing
Authors:
Jian Yan,
Zhuoxi Li,
Xianyang Zhang
Abstract:
Testing the equality of two conditional distributions is crucial in various modern applications, including transfer learning and causal inference. Despite its importance, this fundamental problem has received surprisingly little attention in the literature. This work aims to present a unified framework based on distance and kernel methods for both global and local two-sample conditional distributi…
▽ More
Testing the equality of two conditional distributions is crucial in various modern applications, including transfer learning and causal inference. Despite its importance, this fundamental problem has received surprisingly little attention in the literature. This work aims to present a unified framework based on distance and kernel methods for both global and local two-sample conditional distribution testing. To this end, we introduce distance and kernel-based measures that characterize the homogeneity of two conditional distributions. Drawing from the concept of conditional U-statistics, we propose consistent estimators for these measures. Theoretically, we derive the convergence rates and the asymptotic distributions of the estimators under both the null and alternative hypotheses. Utilizing these measures, along with a local bootstrap approach, we develop global and local tests that can detect discrepancies between two conditional distributions at global and local levels, respectively. Our tests demonstrate reliable performance through simulations and real data analyses.
△ Less
Submitted 24 October, 2024; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Statistical Inferences and Predictions for Areal Data and Spatial Data Fusion with Hausdorff--Gaussian Processes
Authors:
Lucas da Cunha Godoy,
Marcos Oliveira Prates,
Jun Yan
Abstract:
Accurate modeling of spatial dependence is pivotal in analyzing spatial data, influencing parameter estimation and predictions. The spatial structure of the data significantly impacts valid statistical inference. Existing models for areal data often rely on adjacency matrices, struggling to differentiate between polygons of varying sizes and shapes. Conversely, data fusion models rely on computati…
▽ More
Accurate modeling of spatial dependence is pivotal in analyzing spatial data, influencing parameter estimation and predictions. The spatial structure of the data significantly impacts valid statistical inference. Existing models for areal data often rely on adjacency matrices, struggling to differentiate between polygons of varying sizes and shapes. Conversely, data fusion models rely on computationally intensive numerical integrals, presenting challenges for moderately large datasets. In response to these issues, we propose the Hausdorff-Gaussian process (HGP), a versatile model utilizing the Hausdorff distance to capture spatial dependence in both point and areal data. Integration into generalized linear mixed-effects models enhances its applicability, particularly in addressing data fusion challenges. We validate our approach through a comprehensive simulation study and application to two real-world scenarios: one involving areal data and another demonstrating its effectiveness in data fusion. The results suggest that the HGP is competitive with specialized models regarding goodness-of-fit and prediction performances. In summary, the HGP offers a flexible and robust solution for modeling spatial data of various types and shapes, with potential applications spanning fields such as public health and climate science.
△ Less
Submitted 21 February, 2025; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Targeted learning in observational studies with multi-valued treatments: An evaluation of antipsychotic drug treatment safety
Authors:
Jason Poulos,
Marcela Horvitz-Lennon,
Katya Zelevinsky,
Tudor Cristea-Platon,
Thomas Huijskens,
Pooja Tyagi,
Jiaju Yan,
Jordi Diaz,
Sharon-Lise Normand
Abstract:
We investigate estimation of causal effects of multiple competing (multi-valued) treatments in the absence of randomization. Our work is motivated by an intention-to-treat study of the relative cardiometabolic risk of assignment to one of six commonly prescribed antipsychotic drugs in a cohort of nearly 39,000 adults with serious mental illnesses. Doubly-robust estimators, such as targeted minimum…
▽ More
We investigate estimation of causal effects of multiple competing (multi-valued) treatments in the absence of randomization. Our work is motivated by an intention-to-treat study of the relative cardiometabolic risk of assignment to one of six commonly prescribed antipsychotic drugs in a cohort of nearly 39,000 adults with serious mental illnesses. Doubly-robust estimators, such as targeted minimum loss-based estimation (TMLE), require correct specification of either the treatment model or outcome model to ensure consistent estimation; however, common TMLE implementations estimate treatment probabilities using multiple binomial regressions rather than multinomial regression. We implement a TMLE estimator that uses multinomial treatment assignment and ensemble machine learning to estimate average treatment effects. Our multinomial implementation improves coverage, but does not necessarily reduce bias, relative to the binomial implementation in simulation experiments with varying treatment propensity overlap and event rates. Evaluating the causal effects of the antipsychotics on three-year diabetes risk or death, we find a safety benefit of moving from a second-generation drug considered among the safest of the second-generation drugs to an infrequently prescribed first-generation drug known for having low cardiometabolic risk.
△ Less
Submitted 28 November, 2023; v1 submitted 30 June, 2022;
originally announced June 2022.
-
Transformers in Time Series: A Survey
Authors:
Qingsong Wen,
Tian Zhou,
Chaoli Zhang,
Weiqi Chen,
Ziqing Ma,
Junchi Yan,
Liang Sun
Abstract:
Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applicati…
▽ More
Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations. In particular, we examine the development of time series Transformers in two perspectives. From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers in order to accommodate the challenges in time series analysis. From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance. To the best of our knowledge, this paper is the first work to comprehensively and systematically summarize the recent advances of Transformers for modeling time series data. We hope this survey will ignite further research interests in time series Transformers.
△ Less
Submitted 11 May, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks
Authors:
Yixuan He,
Quan Gan,
David Wipf,
Gesine Reinert,
Junchi Yan,
Mihai Cucuringu
Abstract:
Recovering global rankings from pairwise comparisons has wide applications from time synchronization to sports team ranking. Pairwise comparisons corresponding to matches in a competition can be construed as edges in a directed graph (digraph), whose nodes represent e.g. competitors with an unknown rank. In this paper, we introduce neural networks into the ranking recovery problem by proposing the…
▽ More
Recovering global rankings from pairwise comparisons has wide applications from time synchronization to sports team ranking. Pairwise comparisons corresponding to matches in a competition can be construed as edges in a directed graph (digraph), whose nodes represent e.g. competitors with an unknown rank. In this paper, we introduce neural networks into the ranking recovery problem by proposing the so-called GNNRank, a trainable GNN-based framework with digraph embedding. Moreover, new objectives are devised to encode ranking upsets/violations. The framework involves a ranking score estimation approach, and adds an inductive bias by unfolding the Fiedler vector computation of the graph constructed from a learnable similarity matrix. Experimental results on extensive data sets show that our methods attain competitive and often superior performance against baselines, as well as showing promising transfer ability. Codes and preprocessed data are at: \url{https://github.com/SherylHYX/GNNRank}.
△ Less
Submitted 19 July, 2022; v1 submitted 31 January, 2022;
originally announced February 2022.
-
An Efficient Algorithm for Generating Directed Networks with Predetermined Assortativity Measures
Authors:
Tiandong Wang,
Jun Yan,
Yelie Yuan,
Panpan Zhang
Abstract:
Assortativity coefficients are important metrics to analyze both directed and undirected networks. In general, it is not guaranteed that the fitted model will always agree with the assortativity coefficients in the given network, and the structure of directed networks is more complicated than the undirected ones. Therefore, we provide a remedy by proposing a degree-preserving rewiring algorithm, c…
▽ More
Assortativity coefficients are important metrics to analyze both directed and undirected networks. In general, it is not guaranteed that the fitted model will always agree with the assortativity coefficients in the given network, and the structure of directed networks is more complicated than the undirected ones. Therefore, we provide a remedy by proposing a degree-preserving rewiring algorithm, called DiDPR, for generating directed networks with given directed assortativity coefficients. We construct the joint edge distribution of the target network by accounting for the four directed assortativity coefficients simultaneously, provided that they are attainable, and obtain the desired network by solving a convex optimization problem.Our algorithm also helps check the attainability of the given assortativity coefficients. We assess the performance of the proposed algorithm by simulation studies with focus on two different network models, namely Erdös--Rényi and preferential attachment random networks. We then apply the algorithm to a Facebook wall post network as a real data example. The codes for implementing our algorithm are publicly available in R package wdnet.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
Kernel Two-Sample Tests in High Dimension: Interplay Between Moment Discrepancy and Dimension-and-Sample Orders
Authors:
Jian Yan,
Xianyang Zhang
Abstract:
Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, we study the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the maximum mean discrepancy (MMD) using isotropic kernel, including MMD with the Gaussian kernel and the Laplace kernel, and the energy distance as special cases. We…
▽ More
Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, we study the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the maximum mean discrepancy (MMD) using isotropic kernel, including MMD with the Gaussian kernel and the Laplace kernel, and the energy distance as special cases. We derive asymptotic expansions of the kernel two-sample statistics, based on which we establish the central limit theorem (CLT) under both the null hypothesis and the local and fixed alternatives. The new non-null CLT results allow us to perform asymptotic exact power analysis, which reveals a delicate interplay between the moment discrepancy that can be detected by the kernel two-sample tests and the dimension-and-sample orders. The asymptotic theory is further corroborated through numerical studies.
△ Less
Submitted 30 October, 2024; v1 submitted 31 December, 2021;
originally announced January 2022.
-
Principal Component Pursuit for Pattern Identification in Environmental Mixtures
Authors:
Elizabeth A. Gibson,
Junhui Zhang,
Jingkai Yan,
Lawrence Chillrud,
Jaime Benavides,
Yanelli Nunez,
Julie B. Herbstman,
Jeff Goldsmith,
John Wright,
Marianthi-Anna Kioumourtzoglou
Abstract:
Environmental health researchers often aim to identify sources/behaviors that give rise to potentially harmful exposures. We adapted principal component pursuit (PCP)-a robust technique for dimensionality reduction in computer vision and signal processing-to identify patterns in environmental mixtures. PCP decomposes the exposure mixture into a low-rank matrix containing consistent exposure patter…
▽ More
Environmental health researchers often aim to identify sources/behaviors that give rise to potentially harmful exposures. We adapted principal component pursuit (PCP)-a robust technique for dimensionality reduction in computer vision and signal processing-to identify patterns in environmental mixtures. PCP decomposes the exposure mixture into a low-rank matrix containing consistent exposure patterns across pollutants and a sparse matrix isolating unique exposure events. We adapted PCP to accommodate non-negative and missing data, and values below a given limit of detection (LOD). We simulated data to represent environmental mixtures of two sizes with increasing proportions <LOD and three noise structures. We compared PCP-LOD to principal component analysis (PCA) to evaluate performance. We next applied PCP-LOD to a mixture of 21 persistent organic pollutants (POPs) measured in 1,000 U.S. adults from the 2001-2002 National Health and Nutrition Examination Survey. We applied singular value decomposition to the estimated low-rank matrix to characterize the patterns. PCP-LOD recovered the true number of patterns through cross-validation for all simulations; based on an a priori specified criterion, PCA recovered the true number of patterns in 32% of simulations. PCP-LOD achieved lower relative predictive error than PCA for all simulated datasets with up to 50% of the data <LOD. When 75% of values were <LOD, PCP-LOD outperformed PCA only when noise was low. In the POP mixture, PCP-LOD identified a rank-three underlying structure and separated 6% of values as unique events. One pattern represented comprehensive exposure to all POPs. The other patterns grouped chemicals based on known structure and toxicity. PCP-LOD serves as a useful tool to express multi-dimensional exposures as consistent patterns that, if found to be related to adverse health, are amenable to targeted interventions.
△ Less
Submitted 29 October, 2021;
originally announced November 2021.
-
Estimating a distribution function for discrete data subject to random truncation with an application to structured finance
Authors:
Jackson P. Lautier,
Vladimir Pozdnyakov,
Jun Yan
Abstract:
Proper econometric analysis should be informed by data structure. Many forms of financial data are recorded in discrete-time and relate to products of a finite term. If the data comes from a financial trust, it will often be further subject to random left-truncation. While the literature for estimating a distribution function from left-truncated data is extensive, a thorough literature search reve…
▽ More
Proper econometric analysis should be informed by data structure. Many forms of financial data are recorded in discrete-time and relate to products of a finite term. If the data comes from a financial trust, it will often be further subject to random left-truncation. While the literature for estimating a distribution function from left-truncated data is extensive, a thorough literature search reveals that the case of discrete data over a finite number of possible values has received little attention. A precise discrete framework and suitable sampling procedure for the Woodroofe-type estimator for discrete data over a finite number of possible values is therefore established. Subsequently, the resulting vector of hazard rate estimators is proved to be asymptotically normal with independent components. Asymptotic normality of the survival function estimator is then established. Sister results for the left-truncating random variable are also proved. Taken together, the resulting joint vector of hazard rate estimates for the lifetime and left-truncation random variables is proved to be the maximum likelihood estimate of the parameters of the conditional joint lifetime and left-truncation distribution given the lifetime has not been left-truncated. A hypothesis test for the shape of the distribution function based on our asymptotic results is derived. Such a test is useful to formally assess the plausibility of the stationarity assumption in length-biased sampling. The finite sample performance of the estimators is investigated in a simulation study. Applicability of the theoretical results in an econometric setting is demonstrated with a subset of data from the Mercedes-Benz 2017-A securitized bond.
△ Less
Submitted 22 November, 2022; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Regression Modeling for Recurrent Events Using R Package reReg
Authors:
Sy Han Chiou,
Gongjun Xu,
Jun Yan,
Chiung-Yu Huang
Abstract:
Recurrent event analyses have found a wide range of applications in biomedicine, public health, and engineering, among others, where study subjects may experience a sequence of event of interest during follow-up. The R package reReg (Chiou and Huang 2021) offers a comprehensive collection of practical and easy-to-use tools for regression analysis of recurrent events, possibly with the presence of…
▽ More
Recurrent event analyses have found a wide range of applications in biomedicine, public health, and engineering, among others, where study subjects may experience a sequence of event of interest during follow-up. The R package reReg (Chiou and Huang 2021) offers a comprehensive collection of practical and easy-to-use tools for regression analysis of recurrent events, possibly with the presence of an informative terminal event. The regression framework is a general scale-change model which encompasses the popular Cox-type model, the accelerated rate model, and the accelerated mean model as special cases. Informative censoring is accommodated through a subject-specific frailty without no need for parametric specification. Different regression models are allowed for the recurrent event process and the terminal event. Also included are visualization and simulation tools.
△ Less
Submitted 20 August, 2022; v1 submitted 23 April, 2021;
originally announced April 2021.
-
PageRank centrality and algorithms for weighted, directed networks with applications to World Input-Output Tables
Authors:
Panpan Zhang,
Tiandong Wang,
Jun Yan
Abstract:
PageRank (PR) is a fundamental tool for assessing the relative importance of the nodes in a network. In this paper, we propose a measure, weighted PageRank (WPR), extended from the classical PR for weighted, directed networks with possible non-uniform node-specific information that is dependent or independent of network structure. A tuning parameter leveraging node degree and strength is introduce…
▽ More
PageRank (PR) is a fundamental tool for assessing the relative importance of the nodes in a network. In this paper, we propose a measure, weighted PageRank (WPR), extended from the classical PR for weighted, directed networks with possible non-uniform node-specific information that is dependent or independent of network structure. A tuning parameter leveraging node degree and strength is introduced. An efficient algorithm based on R program has been developed for computing WPR in large-scale networks. We have tested the proposed WPR on widely used simulated network models, and found it outperformed other competing measures in the literature. By applying the proposed WPR to the real network data generated from World Input-Output Tables, we have seen the results that are consistent with the global economic trends, which renders it a preferred measure in the analysis.
△ Less
Submitted 15 May, 2021; v1 submitted 6 April, 2021;
originally announced April 2021.
-
The Effects of the NBA COVID Bubble on the NBA Playoffs: A Case Study for Home-Court Advantage
Authors:
Michael Price,
Jun Yan
Abstract:
The 2020 NBA playoffs were played inside of a bubble in Disney World because of the COVID-19 pandemic. This meant that there were no fans in attendance, games played on neutral courts and no traveling for teams, which in theory removes home-court advantage from the games. This setting has attracted much discussion as analysts and fans debated the possible effects it may have on the outcome of game…
▽ More
The 2020 NBA playoffs were played inside of a bubble in Disney World because of the COVID-19 pandemic. This meant that there were no fans in attendance, games played on neutral courts and no traveling for teams, which in theory removes home-court advantage from the games. This setting has attracted much discussion as analysts and fans debated the possible effects it may have on the outcome of games. Home-court advantage has historically played an influential role in NBA playoff series outcomes. The 2020 playoff provided a unique opportunity to study the effects of the bubble and home-court advantage by comparing the 2020 season with the seasons in the past. While many factors contribute to the outcome of games, points scored is the deciding factor of who wins games, so scoring is the primary focus of this study. The specific measures of interest are team scoring totals and team shooting percentage on two-pointers, three-pointers, and free throws. Comparing these measures for home teams and away teams in 2020 vs. 2017-2019 shows that the 2020 playoffs favored away teams more than usual, particularly with two point shooting and total scoring.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Regional and Sectoral Structures and Their Dynamics of Chinese Economy: A Network Perspective from Multi-Regional Input-Output Tables
Authors:
Tao Wang,
Shiying Xiao,
Jun Yan,
Panpan Zhang
Abstract:
A multi-regional input-output table (MRIOT) containing the transactions among the region-sectors in an economy defines a weighted and directed network. Using network analysis tools, we analyze the regional and sectoral structure of the Chinese economy and their temporal dynamics from 2007 to 2012 via the MRIOTs of China. Global analyses are done with network topology measures. Growth-driving provi…
▽ More
A multi-regional input-output table (MRIOT) containing the transactions among the region-sectors in an economy defines a weighted and directed network. Using network analysis tools, we analyze the regional and sectoral structure of the Chinese economy and their temporal dynamics from 2007 to 2012 via the MRIOTs of China. Global analyses are done with network topology measures. Growth-driving province-sector clusters are identified with community detection methods. Influential province-sectors are ranked by weighted PageRank scores. The results revealed a few interesting and telling insights. The level of inter-province-sector activities increased with the rapid growth of the national economy, but not as fast as that of intra-province economic activities. Regional community structures were deeply associated with geographical factors. The community heterogeneity across the regions was high and the regional fragmentation increased during the study period. Quantified metrics assessing the relative importance of the province-sectors in the national economy echo the national and regional economic development policies to a certain extent.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting
Authors:
Longyuan Li,
Jihai Zhang,
Junchi Yan,
Yaohui Jin,
Yunhao Zhang,
Yanjie Duan,
Guangjian Tian
Abstract:
Time-series is ubiquitous across applications, such as transportation, finance and healthcare. Time-series is often influenced by external factors, especially in the form of asynchronous events, making forecasting difficult. However, existing models are mainly designated for either synchronous time-series or asynchronous event sequence, and can hardly provide a synthetic way to capture the relatio…
▽ More
Time-series is ubiquitous across applications, such as transportation, finance and healthcare. Time-series is often influenced by external factors, especially in the form of asynchronous events, making forecasting difficult. However, existing models are mainly designated for either synchronous time-series or asynchronous event sequence, and can hardly provide a synthetic way to capture the relation between them. We propose Variational Synergetic Multi-Horizon Network (VSMHN), a novel deep conditional generative model. To learn complex correlations across heterogeneous sequences, a tailored encoder is devised to combine the advances in deep point processes models and variational recurrent neural networks. In addition, an aligned time coding and an auxiliary transition scheme are carefully devised for batched training on unaligned sequences. Our model can be trained effectively using stochastic variational inference and generates probabilistic predictions with Monte-Carlo simulation. Furthermore, our model produces accurate, sharp and more realistic probabilistic forecasts. We also show that modeling asynchronous event sequences is crucial for multi-horizon time-series forecasting.
△ Less
Submitted 31 January, 2021;
originally announced February 2021.
-
Learning Interpretable Deep State Space Model for Probabilistic Time Series Forecasting
Authors:
Longyuan Li,
Junchi Yan,
Xiaokang Yang,
Yaohui Jin
Abstract:
Probabilistic time series forecasting involves estimating the distribution of future based on its history, which is essential for risk management in downstream decision-making. We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks and the dependency is modeled by recurrent neural nets.…
▽ More
Probabilistic time series forecasting involves estimating the distribution of future based on its history, which is essential for risk management in downstream decision-making. We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks and the dependency is modeled by recurrent neural nets. We take the automatic relevance determination (ARD) view and devise a network to exploit the exogenous variables in addition to time series. In particular, our ARD network can incorporate the uncertainty of the exogenous variables and eventually helps identify useful exogenous variables and suppress those irrelevant for forecasting. The distribution of multi-step ahead forecasts are approximated by Monte Carlo simulation. We show in experiments that our model produces accurate and sharp probabilistic forecasts. The estimated uncertainty of our forecasting also realistically increases over time, in a spontaneous manner.
△ Less
Submitted 31 January, 2021;
originally announced February 2021.
-
Assortativity measures for weighted and directed networks
Authors:
Yelie Yuan,
Jun Yan,
Panpan Zhang
Abstract:
Assortativity measures the tendency of a vertex in a network being connected by other vertexes with respect to some vertex-specific features. Classical assortativity coefficients are defined for unweighted and undirected networks with respect to vertex degree. We propose a class of assortativity coefficients that capture the assortative characteristics and structure of weighted and directed networ…
▽ More
Assortativity measures the tendency of a vertex in a network being connected by other vertexes with respect to some vertex-specific features. Classical assortativity coefficients are defined for unweighted and undirected networks with respect to vertex degree. We propose a class of assortativity coefficients that capture the assortative characteristics and structure of weighted and directed networks more precisely. The vertex-to-vertex strength correlation is used as an example, but the proposed measure can be applied to any pair of vertex-specific features. The effectiveness of the proposed measure is assessed through extensive simulations based on prevalent random network models in comparison with existing assortativity measures. In application World Input-Ouput Networks,the new measures reveal interesting insights that would not be obtained by using existing ones. An implementation is publicly available in a R package "wdnet".
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Regularized Fingerprinting in Detection and Attribution of Climate Change with Weight Matrix Optimizing the Efficiency in Scaling Factor Estimation
Authors:
Yan Li,
Kun Chen,
Jun Yan,
Xuebin Zhang
Abstract:
The optimal fingerprinting method for detection and attribution of climate change is based on a multiple regression where each covariate has measurement error whose covariance matrix is the same as that of the regression error up to a known scale. Inferences about the regression coefficients are critical not only for making statements about detection and attribution but also for quantifying the un…
▽ More
The optimal fingerprinting method for detection and attribution of climate change is based on a multiple regression where each covariate has measurement error whose covariance matrix is the same as that of the regression error up to a known scale. Inferences about the regression coefficients are critical not only for making statements about detection and attribution but also for quantifying the uncertainty in important outcomes derived from detection and attribution analyses. When there is no errors-in-variables (EIV), the optimal weight matrix in estimating the regression coefficients is the precision matrix of the regression error which, in practice, is never known and has to be estimated from climate model simulations. We construct a weight matrix by inverting a nonlinear shrinkage estimate of the error covariance matrix that minimizes loss functions directly targeting the uncertainty of the resulting regression coefficient estimator. The resulting estimator of the regression coefficients is asymptotically optimal as the sample size of the climate model simulations and the matrix dimension go to infinity together with a limiting ratio. When EIVs are present, the estimator of the regression coefficients based on the proposed weight matrix is asymptotically more efficient than that based on the inverse of the existing linear shrinkage estimator of the error covariance matrix. The performance of the method is confirmed in finite sample simulation studies mimicking realistic situations in terms of the length of the confidence intervals and empirical coverage rates for the regression coefficients. An application to detection and attribution analyses of the mean temperature at different spatial scales illustrates the utility of the method.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Clustering US States by Time Series of COVID-19 New Case Counts with Non-negative Matrix Factorization
Authors:
Jianmin Chen,
Jun Yan,
Panpan Zhang
Abstract:
The spreading pattern of COVID-19 differ a lot across the US states under different quarantine measures and reopening policies. We proposed to cluster the US states into distinct communities based on the daily new confirmed case counts via a nonnegative matrix factorization (NMF) followed by a k-means clustering procedure on the coefficients of the NMF basis. A cross-validation method was employed…
▽ More
The spreading pattern of COVID-19 differ a lot across the US states under different quarantine measures and reopening policies. We proposed to cluster the US states into distinct communities based on the daily new confirmed case counts via a nonnegative matrix factorization (NMF) followed by a k-means clustering procedure on the coefficients of the NMF basis. A cross-validation method was employed to select the rank of the NMF. Applying the method to the entire study period from March 22 to July 25, we clustered the 49 continental states (including District of Columbia) into 7 groups, two of which contained a single state. To investigate the dynamics of the clustering results over time, the same method was successively applied to the time periods with increment of one week, starting from the period of March 22 to March 28. The results suggested a change point in the clustering in the week starting on May 30, which might be explained by a combined impact of both quarantine measures and reopening policies.
△ Less
Submitted 15 January, 2021; v1 submitted 29 November, 2020;
originally announced November 2020.
-
Bayesian Nonparametric Estimation for Point Processes with Spatial Homogeneity: A Spatial Analysis of NBA Shot Locations
Authors:
Fan Yin,
Jieying Jiao,
Guanyu Hu,
Jun Yan
Abstract:
Basketball shot location data provide valuable summary information regarding players to coaches, sports analysts, fans, statisticians, as well as players themselves. Represented by spatial points, such data are naturally analyzed with spatial point process models. We present a novel nonparametric Bayesian method for learning the underlying intensity surface built upon a combination of Dirichlet pr…
▽ More
Basketball shot location data provide valuable summary information regarding players to coaches, sports analysts, fans, statisticians, as well as players themselves. Represented by spatial points, such data are naturally analyzed with spatial point process models. We present a novel nonparametric Bayesian method for learning the underlying intensity surface built upon a combination of Dirichlet process and Markov random field. Our method has the advantage of effectively encouraging local spatial homogeneity when estimating a globally heterogeneous intensity surface. Posterior inferences are performed with an efficient Markov chain Monte Carlo (MCMC) algorithm. Simulation studies show that the inferences are accurate and that the method is superior compared to the competing methods. Application to the shot location data of $20$ representative NBA players in the 2017-2018 regular season offers interesting insights about the shooting patterns of these players. A comparison against the competing method shows that the proposed method can effectively incorporate spatial contiguity into the estimation of intensity surfaces.
△ Less
Submitted 22 November, 2020;
originally announced November 2020.
-
Improving Auto-Augment via Augmentation-Wise Weight Sharing
Authors:
Keyu Tian,
Chen Lin,
Ming Sun,
Luping Zhou,
Junjie Yan,
Wanli Ouyang
Abstract:
The recent progress on automatically searching augmentation policies has boosted the performance substantially for various tasks. A key component of automatic augmentation search is the evaluation process for a particular augmentation policy, which is utilized to return reward and usually runs thousands of times. A plain evaluation process, which includes full model training and validation, would…
▽ More
The recent progress on automatically searching augmentation policies has boosted the performance substantially for various tasks. A key component of automatic augmentation search is the evaluation process for a particular augmentation policy, which is utilized to return reward and usually runs thousands of times. A plain evaluation process, which includes full model training and validation, would be time-consuming. To achieve efficiency, many choose to sacrifice evaluation reliability for speed. In this paper, we dive into the dynamics of augmented training of the model. This inspires us to design a powerful and efficient proxy task based on the Augmentation-Wise Weight Sharing (AWS) to form a fast yet accurate evaluation process in an elegant way. Comprehensive analysis verifies the superiority of this approach in terms of effectiveness and efficiency. The augmentation policies found by our method achieve superior accuracies compared with existing auto-augmentation search methods. On CIFAR-10, we achieve a top-1 error rate of 1.24%, which is currently the best performing single model without extra training data. On ImageNet, we get a top-1 error rate of 20.36% for ResNet-50, which leads to 3.34% absolute error rate reduction over the baseline augmentation.
△ Less
Submitted 22 October, 2020; v1 submitted 30 September, 2020;
originally announced September 2020.
-
Survival Modeling of Suicide Risk with Rare and Uncertain Diagnoses
Authors:
Wenjie Wang,
Chongliang Luo,
Robert H. Aseltine,
Fei Wang,
Jun Yan,
Kun Chen
Abstract:
Motivated by the pressing need for suicide prevention through improving behavioral healthcare, we use medical claims data to study the risk of subsequent suicide attempts for patients who were hospitalized due to suicide attempts and later discharged. Understanding the risk behaviors of such patients at elevated suicide risk is an important step toward the goal of "Zero Suicide." An immediate and…
▽ More
Motivated by the pressing need for suicide prevention through improving behavioral healthcare, we use medical claims data to study the risk of subsequent suicide attempts for patients who were hospitalized due to suicide attempts and later discharged. Understanding the risk behaviors of such patients at elevated suicide risk is an important step toward the goal of "Zero Suicide." An immediate and unconventional challenge is that the identification of suicide attempts from medical claims contains substantial uncertainty: almost 20% of "suspected" suicide attempts are identified from diagnosis codes indicating external causes of injury and poisoning with undermined intent. It is thus of great interest to learn which of these undetermined events are more likely actual suicide attempts and how to properly utilize them in survival analysis with severe censoring. To tackle these interrelated problems, we develop an integrative Cox cure model with regularization to perform survival regression with uncertain events and a latent cure fraction. We apply the proposed approach to study the risk of subsequent suicide attempts after suicide-related hospitalization for the adolescent and young adult population, using medical claims data from Connecticut. The identified risk factors are highly interpretable; more intriguingly, our method distinguishes the risk factors that are most helpful in assessing either susceptibility or timing of subsequent attempts. The predicted statuses of the uncertain attempts are further investigated, leading to several new insights on suicide event identification.
△ Less
Submitted 7 May, 2023; v1 submitted 5 September, 2020;
originally announced September 2020.
-
DARTS-: Robustly Stepping out of Performance Collapse Without Indicators
Authors:
Xiangxiang Chu,
Xiaoxing Wang,
Bo Zhang,
Shun Lu,
Xiaolin Wei,
Junchi Yan
Abstract:
Despite the fast development of differentiable architecture search (DARTS), it suffers from long-standing performance instability, which extremely limits its application. Existing robustifying methods draw clues from the resulting deteriorated behavior instead of finding out its causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal to stop searching before the per…
▽ More
Despite the fast development of differentiable architecture search (DARTS), it suffers from long-standing performance instability, which extremely limits its application. Existing robustifying methods draw clues from the resulting deteriorated behavior instead of finding out its causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal to stop searching before the performance collapses. However, these indicator-based methods tend to easily reject good architectures if the thresholds are inappropriately set, let alone the searching is intrinsically noisy. In this paper, we undertake a more subtle and direct approach to resolve the collapse. We first demonstrate that skip connections have a clear advantage over other candidate operations, where it can easily recover from a disadvantageous state and become dominant. We conjecture that this privilege is causing degenerated performance. Therefore, we propose to factor out this benefit with an auxiliary skip connection, ensuring a fairer competition for all operations. We call this approach DARTS-. Extensive experiments on various datasets verify that it can substantially improve robustness. Our code is available at https://github.com/Meituan-AutoML/DARTS- .
△ Less
Submitted 15 January, 2021; v1 submitted 2 September, 2020;
originally announced September 2020.
-
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning
Authors:
Wenhao Li,
Bo Jin,
Xiangfeng Wang,
Junchi Yan,
Hongyuan Zha
Abstract:
Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs t…
▽ More
Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. In this paper, we propose a flexible fully decentralized actor-critic MARL framework, which can combine most of actor-critic methods, and handle large-scale general cooperative multi-agent setting. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, our framework can achieve scalability and stability for large-scale environment and reduce information transmission, by the parameter sharing mechanism and a novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that our decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.
△ Less
Submitted 7 July, 2023; v1 submitted 17 April, 2020;
originally announced April 2020.
-
Moving-Resting Process with Measurement Error in Animal Movement Modeling
Authors:
Chaoran Hu,
Mark Elbroch,
Thomas Meyer,
Vladimir Pozdnyakov,
Jun Yan
Abstract:
Statistical modeling of animal movement is of critical importance. The continuous trajectory of an animal's movements is only observed at discrete, often irregularly spaced time points. Most existing models cannot handle the unequal sampling interval naturally and/or do not allow inactivity periods such as resting or sleeping. The recently proposed moving-resting (MR) model is a Brownian motion go…
▽ More
Statistical modeling of animal movement is of critical importance. The continuous trajectory of an animal's movements is only observed at discrete, often irregularly spaced time points. Most existing models cannot handle the unequal sampling interval naturally and/or do not allow inactivity periods such as resting or sleeping. The recently proposed moving-resting (MR) model is a Brownian motion governed by a telegraph process, which allows periods of inactivity in one state of the telegraph process. The MR model shows promise in modeling the movements of predators with long inactive periods such as many felids, but the lack of accommodation of measurement errors seriously prohibits its application in practice. Here we incorporate measurement errors in the MR model and derive basic properties of the model. Inferences are based on a composite likelihood using the Markov property of the chain composed by every other observed increments. The performance of the method is validated in finite sample simulation studies. Application to the movement data of a mountain lion in Wyoming illustrates the utility of the method.
△ Less
Submitted 23 August, 2020; v1 submitted 31 March, 2020;
originally announced April 2020.
-
Heterogeneity Pursuit for Spatial Point Pattern with Application to Tree Locations: A Bayesian Semiparametric Recourse
Authors:
Jieying Jiao,
Guanyu Hu,
Jun Yan
Abstract:
Spatial point pattern data are routinely encountered. A flexible regression model for the underlying intensity is essential to characterizing the spatial point pattern and understanding the impacts of potential risk factors on such pattern. We propose a Bayesian semiparametric regression model where the observed spatial points follow a spatial Poisson process with an intensity function which adjus…
▽ More
Spatial point pattern data are routinely encountered. A flexible regression model for the underlying intensity is essential to characterizing the spatial point pattern and understanding the impacts of potential risk factors on such pattern. We propose a Bayesian semiparametric regression model where the observed spatial points follow a spatial Poisson process with an intensity function which adjusts a nonparametric baseline intensity with multiplicative covariate effects. The baseline intensity is piecewise constant, approached with a powered Chinese restaurant process prior which prevents an unnecessarily large number of pieces. The parametric regression part allows for variable selection through the spike-slab prior on the regression coefficients. An efficient Markov chain Monte Carlo (MCMC) algorithm is developed for the proposed methods. The performance of the methods is validated in an extensive simulation study. In application to the locations of Beilschmiedia pendula trees in the Barro Colorado Island forest dynamics research plot in central Panama, the spatial heterogeneity is attributed to a subset of soil measurements in addition to geographic measurements with a spatially varying baseline intensity.
△ Less
Submitted 23 March, 2020; v1 submitted 22 March, 2020;
originally announced March 2020.
-
HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem
Authors:
Yun Hua,
Xiangfeng Wang,
Bo Jin,
Wenhao Li,
Junchi Yan,
Xiaofeng He,
Hongyuan Zha
Abstract:
In spite of the success of existing meta reinforcement learning methods, they still have difficulty in learning a meta policy effectively for RL problems with sparse reward. In this respect, we develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL), for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding modul…
▽ More
In spite of the success of existing meta reinforcement learning methods, they still have difficulty in learning a meta policy effectively for RL problems with sparse reward. In this respect, we develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL), for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments; the meta state based environment-specific meta reward shaping which effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity and as a consequence the meta policy achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.
△ Less
Submitted 5 June, 2021; v1 submitted 11 February, 2020;
originally announced February 2020.
-
Learning Structured Communication for Multi-agent Reinforcement Learning
Authors:
Junjie Sheng,
Xiangfeng Wang,
Bo Jin,
Junchi Yan,
Wenhao Li,
Tsung-Hui Chang,
Jun Wang,
Hongyuan Zha
Abstract:
This work explores the large-scale multi-agent communication mechanism under a multi-agent reinforcement learning (MARL) setting. We summarize the general categories of topology for communication structures in MARL literature, which are often manually specified. Then we propose a novel framework termed as Learning Structured Communication (LSC) by using a more flexible and efficient communication…
▽ More
This work explores the large-scale multi-agent communication mechanism under a multi-agent reinforcement learning (MARL) setting. We summarize the general categories of topology for communication structures in MARL literature, which are often manually specified. Then we propose a novel framework termed as Learning Structured Communication (LSC) by using a more flexible and efficient communication topology. Our framework allows for adaptive agent grouping to form different hierarchical formations over episodes, which is generated by an auxiliary task combined with a hierarchical routing protocol. Given each formed topology, a hierarchical graph neural network is learned to enable effective message information generation and propagation among inter- and intra-group communications. In contrast to existing communication mechanisms, our method has an explicit while learnable design for hierarchical communication. Experiments on challenging tasks show the proposed LSC enjoys high communication efficiency, scalability, and global cooperation capability.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization
Authors:
Junjie Yan,
Ruosi Wan,
Xiangyu Zhang,
Wei Zhang,
Yichen Wei,
Jian Sun
Abstract:
Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been pr…
▽ More
Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption. In this paper, we reveal that there are two extra batch statistics involved in backward propagation of BN, on which has never been well discussed before. The extra batch statistics associated with gradients also can severely affect the training of deep neural network. Based on our analysis, we propose a novel normalization method, named Moving Average Batch Normalization (MABN). MABN can completely restore the performance of vanilla BN in small batch cases, without introducing any additional nonlinear operations in inference procedure. We prove the benefits of MABN by both theoretical analysis and experiments. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO. The code has been released in https://github.com/megvii-model/MABN.
△ Less
Submitted 8 April, 2020; v1 submitted 19 January, 2020;
originally announced January 2020.