Skip to main content

Showing 1–50 of 91 results for author: Yan, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.00254  [pdf, other

    stat.ME stat.AP

    Heteroscedastic Growth Curve Modeling with Shape-Restricted Splines

    Authors: Jieying Jiao, Wenling Song, Yishu Xue, Jun Yan

    Abstract: Growth curve analysis (GCA) has a wide range of applications in various fields where growth trajectories need to be modeled. Heteroscedasticity is often present in the error term, which can not be handled with sufficient flexibility by standard linear fixed or mixed-effects models. One situation that has been addressed is where the error variance is characterized by a linear predictor with certain… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: 16 pages, 6 figures

    Journal ref: The New England Journal of Statistics in Data Science, 2025

  2. Data Jamboree: A Party of Open-Source Software Solving Real-World Data Science Problems

    Authors: Lucy D'Agostino McGowan, Shannon Tass, Sam Tyner, HaiYing Wang, Jun Yan

    Abstract: The evolving focus in statistics and data science education highlights the growing importance of computing. This paper presents the Data Jamboree, a live event that combines computational methods with traditional statistical techniques to address real-world data science problems. Participants, ranging from novices to experienced users, followed workshop leaders in using open-source tools like Juli… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Journal ref: The New England Journal of Statistics in Data Science 2025

  3. arXiv:2502.19002  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

    Authors: Jinbo Wang, Mingze Wang, Zhanpeng Zhou, Junchi Yan, Weinan E, Lei Wu

    Abstract: Transformers consist of diverse building blocks, such as embedding layers, normalization layers, self-attention mechanisms, and point-wise feedforward networks. Thus, understanding the differences and interactions among these blocks is important. In this paper, we uncover a clear Sharpness Disparity across these blocks, which emerges early in training and intriguingly persists throughout the train… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 23 pages

  4. arXiv:2502.08649  [pdf, other

    cs.DB cs.CY stat.ME

    Principles for Open Data Curation: A Case Study with the New York City 311 Service Request Data

    Authors: David Tussey, Jun Yan

    Abstract: In the early 21st century, the open data movement began to transform societies and governments by promoting transparency, innovation, and public engagement. The City of New York (NYC) has been at the forefront of this movement since the enactment of the Open Data Law in 2012, creating the NYC Open Data portal. The portal currently hosts 2,700 datasets, serving as a crucial resource for research ac… ▽ More

    Submitted 7 March, 2025; v1 submitted 14 January, 2025; originally announced February 2025.

  5. arXiv:2502.04681  [pdf, other

    stat.ME

    CALF-SBM: A Covariate-Assisted Latent Factor Stochastic Block Model

    Authors: Sydney Louit, Evan Clark, Alexander Gelbard, Niketna Vivek, Jun Yan, Panpan Zhang

    Abstract: We propose a novel network generative model extended from the standard stochastic block model by concurrently utilizing observed node-level information and accounting for network-enabled nodal heterogeneity. The proposed model is so so-called covariate-assisted latent factor stochastic block model (CALF-SBM). The inference for the proposed model is done in a fully Bayesian framework. The primary a… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  6. arXiv:2501.03747  [pdf, other

    cs.LG cs.CL stat.AP

    Context-Alignment: Activating and Enhancing LLM Capabilities in Time Series

    Authors: Yuxiao Hu, Qian Li, Dongxiao Zhang, Jinyue Yan, Yuntian Chen

    Abstract: Recently, leveraging pre-trained Large Language Models (LLMs) for time series (TS) tasks has gained increasing attention, which involves activating and enhancing LLMs' capabilities. Many methods aim to activate LLMs' capabilities based on token-level alignment but overlook LLMs' inherent strength on natural language processing -- their deep understanding of linguistic logic and structure rather th… ▽ More

    Submitted 5 April, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: no comment

  7. arXiv:2411.19223  [pdf, other

    cs.LG cs.AI cs.CY stat.ME

    On the Unknowable Limits to Prediction

    Authors: Jiani Yan, Charles Rahal

    Abstract: We propose a rigorous decomposition of predictive error, highlighting that not all 'irreducible' error is genuinely immutable. Many domains stand to benefit from iterative enhancements in measurement, construct validity, and modeling. Our approach demonstrates how apparently 'unpredictable' outcomes can become more tractable with improved data (across both target and features) and refined algorith… ▽ More

    Submitted 10 February, 2025; v1 submitted 28 November, 2024; originally announced November 2024.

  8. arXiv:2410.10373  [pdf, other

    cs.LG stat.ML

    Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training

    Authors: Zhanpeng Zhou, Mingze Wang, Yuchen Mao, Bingrui Li, Junchi Yan

    Abstract: Sharpness-Aware Minimization (SAM) has substantially improved the generalization of neural networks under various settings. Despite the success, its effectiveness remains poorly understood. In this work, we discover an intriguing phenomenon in the training dynamics of SAM, shedding light on understanding its implicit bias towards flatter minima over Stochastic Gradient Descent (SGD). Specifically,… ▽ More

    Submitted 20 February, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: 32 pages, 16 figures, ICLR 2025 Spotlight

  9. arXiv:2408.05584  [pdf

    cs.LG stat.ME

    Dynamical causality under invisible confounders

    Authors: Jinling Yan, Shao-Wu Zhang, Chihao Zhang, Weitian Huang, Jifan Shi, Luonan Chen

    Abstract: Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result,… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 23 pages, 5 figures

  10. arXiv:2403.09042  [pdf, other

    stat.ME

    Recurrent Events Modeling Based on a Reflected Brownian Motion with Application to Hypoglycemia

    Authors: Yingfa Xie, Haoda Fu, Yuan Huang, Vladimir Pozdnyakov, Jun Yan

    Abstract: Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  11. arXiv:2402.17096  [pdf, other

    stat.CO

    Simple rejection Monte Carlo algorithm and its application to multivariate statistical inference

    Authors: Fengyu Li, Huijiao Yu, Jun Yan, Xianyong Meng

    Abstract: The Monte Carlo algorithm is increasingly utilized, with its central step involving computer-based random sampling from stochastic models. While both Markov Chain Monte Carlo (MCMC) and Reject Monte Carlo serve as sampling methods, the latter finds fewer applications compared to the former. Hence, this paper initially provides a concise introduction to the theory of the Reject Monte Carlo algorith… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  12. arXiv:2402.15620  [pdf, other

    stat.AP econ.GN physics.soc-ph

    Comparison of sectoral structures between China and Japan: A network perspective

    Authors: Tao Wang, Shiying Xiao, Jun Yan

    Abstract: Economic structure comparisons between China and Japan have long captivated development economists. To delve deeper into their sectoral differences from 1995 to 2018, we used the annual input-output tables (IOTs) of both nations to construct weighted and directed input-output networks (IONs). This facilitated deeper network analyses. Strength distributions underscored variations in inter-sector ec… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  13. arXiv:2402.02687  [pdf, other

    cs.LG cs.AI stat.ML

    Poisson Process for Bayesian Optimization

    Authors: Xiaoxing Wang, Jiaxing Li, Chao Xue, Wei Liu, Weifeng Liu, Xiaokang Yang, Junchi Yan, Dacheng Tao

    Abstract: BayesianOptimization(BO) is a sample-efficient black-box optimizer, and extensive methods have been proposed to build the absolute function response of the black-box function through a probabilistic surrogate model, including Tree-structured Parzen Estimator (TPE), random forest (SMAC), and Gaussian process (GP). However, few methods have been explored to estimate the relative rankings of candidat… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  14. arXiv:2401.08172  [pdf, other

    stat.ME

    On GEE for Mean-Variance-Correlation Models: Variance Estimation and Model Selection

    Authors: Zhenyu Xu, Jason P. Fine, Wenling Song, Jun Yan

    Abstract: Generalized estimating equations (GEE) are of great importance in analyzing clustered data without full specification of multivariate distributions. A recent approach jointly models the mean, variance, and correlation coefficients of clustered data through three sets of regressions (Luo and Pan, 2022). We observe that these estimating equations, however, are a special case of those of Yan and Fine… ▽ More

    Submitted 9 January, 2025; v1 submitted 16 January, 2024; originally announced January 2024.

  15. arXiv:2401.00521  [pdf, other

    cs.LG cs.AI stat.AP

    Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis Data

    Authors: Yuxiao Hu, Qian Li, Xiaodan Shi, Jinyue Yan, Yuntian Chen

    Abstract: Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is of… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  16. arXiv:2312.03236  [pdf, other

    cs.LG cs.AI stat.ML

    Multicoated and Folded Graph Neural Networks with Strong Lottery Tickets

    Authors: Jiale Yan, Hiroaki Ito, Ángel López García-Arias, Yasuyuki Okoshi, Hikari Otsuka, Kazushi Kawamura, Thiem Van Chu, Masato Motomura

    Abstract: The Strong Lottery Ticket Hypothesis (SLTH) demonstrates the existence of high-performing subnetworks within a randomly initialized model, discoverable through pruning a convolutional neural network (CNN) without any weight training. A recent study, called Untrained GNNs Tickets (UGT), expanded SLTH from CNNs to shallow graph neural networks (GNNs). However, discrepancies persist when comparing ba… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: 9 pages, accepted in the Second Learning on Graphs Conference (LoG 2023)

    Journal ref: Proceedings of the Second Learning on Graphs Conference (LoG 2023), PMLR 231

  17. arXiv:2309.17283  [pdf, other

    stat.ME stat.ML

    The Blessings of Multiple Treatments and Outcomes in Treatment Effect Estimation

    Authors: Yong Wu, Mingzhou Liu, Jing Yan, Yanwei Fu, Shouyan Wang, Yizhou Wang, Xinwei Sun

    Abstract: Assessing causal effects in the presence of unobserved confounding is a challenging problem. Existing studies leveraged proxy variables or multiple treatments to adjust for the confounding bias. In particular, the latter approach attributes the impact on a single outcome to multiple treatments, allowing estimating latent variables for confounding control. Nevertheless, these methods primarily focu… ▽ More

    Submitted 14 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Preprint, under review

  18. arXiv:2308.13033  [pdf, other

    stat.CO

    A Strength and Sparsity Preserving Algorithm for Generating Weighted, Directed Networks with Predetermined Assortativity

    Authors: Yelie Yuan, Jun Yan, Panpan Zhang

    Abstract: Degree-preserving rewiring is a widely used technique for generating unweighted networks with given assortativity, but for weighted networks, it is unclear how an analog would preserve the strengths and other critical network features such as sparsity level. This study introduces a novel approach for rewiring weighted networks to achieve desired directed assortativity. The method utilizes a mixed… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  19. arXiv:2305.11445  [pdf, ps, other

    stat.ME math.ST stat.AP stat.CO

    A general model-checking procedure for semiparametric accelerated failure time models

    Authors: Dongrak Choi, Woojung Bae, Jun Yan, Sangwook Kang

    Abstract: We propose a set of goodness-of-fit tests for the semiparametric accelerated failure time (AFT) model, including an omnibus test, a link function test, and a functional form test. This set of tests is derived from a multi-parameter cumulative sum process shown to follow asymptotically a zero-mean Gaussian process. Its evaluation is based on the asymptotically equivalent perturbed version, which en… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  20. arXiv:2302.04040  [pdf, other

    cs.LG stat.ML

    Sample-efficient Multi-objective Molecular Optimization with GFlowNets

    Authors: Yiheng Zhu, Jialu Wu, Chaowen Hu, Jiahuan Yan, Chang-Yu Hsieh, Tingjun Hou, Jian Wu

    Abstract: Many crucial scientific problems involve designing novel molecules with desired properties, which can be formulated as a black-box optimization problem over the discrete chemical space. In practice, multiple conflicting objectives and costly evaluations (e.g., wet-lab experiments) make the diversity of candidates paramount. Computational methods have achieved initial success but still struggle wit… ▽ More

    Submitted 2 November, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  21. Generating General Preferential Attachment Networks with R Package wdnet

    Authors: Yelie Yuan, Tiandong Wang, Jun Yan, Panpan Zhang

    Abstract: Preferential attachment (PA) network models have a wide range of applications in various scientific disciplines. Efficient generation of large-scale PA networks helps uncover their structural properties and facilitate the development of associated analytical methodologies. Existing software packages only provide limited functions for this purpose with restricted configurations and efficiency. We p… ▽ More

    Submitted 15 October, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: 19 pages, 4 figures

    Journal ref: J. data sci. 21(2023), no. 3, 538-556

  22. arXiv:2212.06693  [pdf, other

    math.ST stat.CO stat.ME

    Transfer Learning with Large-Scale Quantile Regression

    Authors: Jun Jin, Jun Yan, Robert H. Aseltine, Kun Chen

    Abstract: Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data may go beyond the target and be supplemented from other sources that possibly share similarities with the target. A crucial question is how to properly distinguis… ▽ More

    Submitted 25 February, 2024; v1 submitted 13 December, 2022; originally announced December 2022.

  23. arXiv:2210.08149  [pdf, other

    stat.ME math.ST stat.ML

    Distance and Kernel-Based Measures for Global and Local Two-Sample Conditional Distribution Testing

    Authors: Jian Yan, Zhuoxi Li, Xianyang Zhang

    Abstract: Testing the equality of two conditional distributions is crucial in various modern applications, including transfer learning and causal inference. Despite its importance, this fundamental problem has received surprisingly little attention in the literature. This work aims to present a unified framework based on distance and kernel methods for both global and local two-sample conditional distributi… ▽ More

    Submitted 24 October, 2024; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: Extensively revised version

  24. arXiv:2208.07900  [pdf, other

    stat.ME

    Statistical Inferences and Predictions for Areal Data and Spatial Data Fusion with Hausdorff--Gaussian Processes

    Authors: Lucas da Cunha Godoy, Marcos Oliveira Prates, Jun Yan

    Abstract: Accurate modeling of spatial dependence is pivotal in analyzing spatial data, influencing parameter estimation and predictions. The spatial structure of the data significantly impacts valid statistical inference. Existing models for areal data often rely on adjacency matrices, struggling to differentiate between polygons of varying sizes and shapes. Conversely, data fusion models rely on computati… ▽ More

    Submitted 21 February, 2025; v1 submitted 16 August, 2022; originally announced August 2022.

    MSC Class: 62 ACM Class: G.3

  25. arXiv:2206.15367  [pdf, other

    stat.AP stat.ME

    Targeted learning in observational studies with multi-valued treatments: An evaluation of antipsychotic drug treatment safety

    Authors: Jason Poulos, Marcela Horvitz-Lennon, Katya Zelevinsky, Tudor Cristea-Platon, Thomas Huijskens, Pooja Tyagi, Jiaju Yan, Jordi Diaz, Sharon-Lise Normand

    Abstract: We investigate estimation of causal effects of multiple competing (multi-valued) treatments in the absence of randomization. Our work is motivated by an intention-to-treat study of the relative cardiometabolic risk of assignment to one of six commonly prescribed antipsychotic drugs in a cohort of nearly 39,000 adults with serious mental illnesses. Doubly-robust estimators, such as targeted minimum… ▽ More

    Submitted 28 November, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

  26. arXiv:2202.07125  [pdf, other

    cs.LG cs.AI eess.SP stat.ML

    Transformers in Time Series: A Survey

    Authors: Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, Liang Sun

    Abstract: Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applicati… ▽ More

    Submitted 11 May, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Accepted by 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023). 9 pages. The first work to comprehensively and systematically summarize time series Transformers. The GitHub repository is https://github.com/qingsongedu/time-series-transformers-review

    Journal ref: In the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023)

  27. arXiv:2202.00211  [pdf, other

    cs.LG math.OC stat.ML

    GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks

    Authors: Yixuan He, Quan Gan, David Wipf, Gesine Reinert, Junchi Yan, Mihai Cucuringu

    Abstract: Recovering global rankings from pairwise comparisons has wide applications from time synchronization to sports team ranking. Pairwise comparisons corresponding to matches in a competition can be construed as edges in a directed graph (digraph), whose nodes represent e.g. competitors with an unknown rank. In this paper, we introduce neural networks into the ranking recovery problem by proposing the… ▽ More

    Submitted 19 July, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: ICML 2022 spotlight; 32 pages (9 pages for main text)

  28. arXiv:2201.03451  [pdf, other

    stat.CO

    An Efficient Algorithm for Generating Directed Networks with Predetermined Assortativity Measures

    Authors: Tiandong Wang, Jun Yan, Yelie Yuan, Panpan Zhang

    Abstract: Assortativity coefficients are important metrics to analyze both directed and undirected networks. In general, it is not guaranteed that the fitted model will always agree with the assortativity coefficients in the given network, and the structure of directed networks is more complicated than the undirected ones. Therefore, we provide a remedy by proposing a degree-preserving rewiring algorithm, c… ▽ More

    Submitted 10 January, 2022; originally announced January 2022.

  29. arXiv:2201.00073  [pdf, ps, other

    math.ST stat.ML

    Kernel Two-Sample Tests in High Dimension: Interplay Between Moment Discrepancy and Dimension-and-Sample Orders

    Authors: Jian Yan, Xianyang Zhang

    Abstract: Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, we study the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the maximum mean discrepancy (MMD) using isotropic kernel, including MMD with the Gaussian kernel and the Laplace kernel, and the energy distance as special cases. We… ▽ More

    Submitted 30 October, 2024; v1 submitted 31 December, 2021; originally announced January 2022.

    Comments: Minor changes are made (and highlighted in red) for clarity

  30. arXiv:2111.00104  [pdf, other

    stat.AP eess.SP stat.ME stat.OT

    Principal Component Pursuit for Pattern Identification in Environmental Mixtures

    Authors: Elizabeth A. Gibson, Junhui Zhang, Jingkai Yan, Lawrence Chillrud, Jaime Benavides, Yanelli Nunez, Julie B. Herbstman, Jeff Goldsmith, John Wright, Marianthi-Anna Kioumourtzoglou

    Abstract: Environmental health researchers often aim to identify sources/behaviors that give rise to potentially harmful exposures. We adapted principal component pursuit (PCP)-a robust technique for dimensionality reduction in computer vision and signal processing-to identify patterns in environmental mixtures. PCP decomposes the exposure mixture into a low-rank matrix containing consistent exposure patter… ▽ More

    Submitted 29 October, 2021; originally announced November 2021.

    Comments: 32 pages, 11 figures, 4 tables

  31. Estimating a distribution function for discrete data subject to random truncation with an application to structured finance

    Authors: Jackson P. Lautier, Vladimir Pozdnyakov, Jun Yan

    Abstract: Proper econometric analysis should be informed by data structure. Many forms of financial data are recorded in discrete-time and relate to products of a finite term. If the data comes from a financial trust, it will often be further subject to random left-truncation. While the literature for estimating a distribution function from left-truncated data is extensive, a thorough literature search reve… ▽ More

    Submitted 22 November, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: 56 pages, 5 figures, 2 tables

  32. arXiv:2104.11708  [pdf

    stat.CO

    Regression Modeling for Recurrent Events Using R Package reReg

    Authors: Sy Han Chiou, Gongjun Xu, Jun Yan, Chiung-Yu Huang

    Abstract: Recurrent event analyses have found a wide range of applications in biomedicine, public health, and engineering, among others, where study subjects may experience a sequence of event of interest during follow-up. The R package reReg (Chiou and Huang 2021) offers a comprehensive collection of practical and easy-to-use tools for regression analysis of recurrent events, possibly with the presence of… ▽ More

    Submitted 20 August, 2022; v1 submitted 23 April, 2021; originally announced April 2021.

  33. arXiv:2104.02764  [pdf, other

    physics.soc-ph stat.CO

    PageRank centrality and algorithms for weighted, directed networks with applications to World Input-Output Tables

    Authors: Panpan Zhang, Tiandong Wang, Jun Yan

    Abstract: PageRank (PR) is a fundamental tool for assessing the relative importance of the nodes in a network. In this paper, we propose a measure, weighted PageRank (WPR), extended from the classical PR for weighted, directed networks with possible non-uniform node-specific information that is dependent or independent of network structure. A tuning parameter leveraging node degree and strength is introduce… ▽ More

    Submitted 15 May, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

  34. The Effects of the NBA COVID Bubble on the NBA Playoffs: A Case Study for Home-Court Advantage

    Authors: Michael Price, Jun Yan

    Abstract: The 2020 NBA playoffs were played inside of a bubble in Disney World because of the COVID-19 pandemic. This meant that there were no fans in attendance, games played on neutral courts and no traveling for teams, which in theory removes home-court advantage from the games. This setting has attracted much discussion as analysts and fans debated the possible effects it may have on the outcome of game… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Journal ref: American Journal of Undergraduate Research, 18(4), 2021

  35. arXiv:2102.12454  [pdf, other

    physics.soc-ph econ.GN stat.AP

    Regional and Sectoral Structures and Their Dynamics of Chinese Economy: A Network Perspective from Multi-Regional Input-Output Tables

    Authors: Tao Wang, Shiying Xiao, Jun Yan, Panpan Zhang

    Abstract: A multi-regional input-output table (MRIOT) containing the transactions among the region-sectors in an economy defines a weighted and directed network. Using network analysis tools, we analyze the regional and sectoral structure of the Chinese economy and their temporal dynamics from 2007 to 2012 via the MRIOTs of China. Global analyses are done with network topology measures. Growth-driving provi… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

  36. arXiv:2102.00431  [pdf, other

    cs.LG cs.AI stat.ML

    Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting

    Authors: Longyuan Li, Jihai Zhang, Junchi Yan, Yaohui Jin, Yunhao Zhang, Yanjie Duan, Guangjian Tian

    Abstract: Time-series is ubiquitous across applications, such as transportation, finance and healthcare. Time-series is often influenced by external factors, especially in the form of asynchronous events, making forecasting difficult. However, existing models are mainly designated for either synchronous time-series or asynchronous event sequence, and can hardly provide a synthetic way to capture the relatio… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: Accepted by AAAI 2021 conference

  37. arXiv:2102.00397  [pdf, other

    cs.LG stat.ML

    Learning Interpretable Deep State Space Model for Probabilistic Time Series Forecasting

    Authors: Longyuan Li, Junchi Yan, Xiaokang Yang, Yaohui Jin

    Abstract: Probabilistic time series forecasting involves estimating the distribution of future based on its history, which is essential for risk management in downstream decision-making. We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks and the dependency is modeled by recurrent neural nets.… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: IJCAI 2019

  38. arXiv:2101.05389  [pdf, other

    stat.AP

    Assortativity measures for weighted and directed networks

    Authors: Yelie Yuan, Jun Yan, Panpan Zhang

    Abstract: Assortativity measures the tendency of a vertex in a network being connected by other vertexes with respect to some vertex-specific features. Classical assortativity coefficients are defined for unweighted and undirected networks with respect to vertex degree. We propose a class of assortativity coefficients that capture the assortative characteristics and structure of weighted and directed networ… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

  39. arXiv:2012.04200  [pdf, other

    stat.ME stat.AP

    Regularized Fingerprinting in Detection and Attribution of Climate Change with Weight Matrix Optimizing the Efficiency in Scaling Factor Estimation

    Authors: Yan Li, Kun Chen, Jun Yan, Xuebin Zhang

    Abstract: The optimal fingerprinting method for detection and attribution of climate change is based on a multiple regression where each covariate has measurement error whose covariance matrix is the same as that of the regression error up to a known scale. Inferences about the regression coefficients are critical not only for making statements about detection and attribution but also for quantifying the un… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

  40. arXiv:2011.14412  [pdf, other

    stat.AP

    Clustering US States by Time Series of COVID-19 New Case Counts with Non-negative Matrix Factorization

    Authors: Jianmin Chen, Jun Yan, Panpan Zhang

    Abstract: The spreading pattern of COVID-19 differ a lot across the US states under different quarantine measures and reopening policies. We proposed to cluster the US states into distinct communities based on the daily new confirmed case counts via a nonnegative matrix factorization (NMF) followed by a k-means clustering procedure on the coefficients of the NMF basis. A cross-validation method was employed… ▽ More

    Submitted 15 January, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

    MSC Class: 62H30; 62H12; 62M10

  41. arXiv:2011.11178  [pdf, other

    stat.ME stat.AP

    Bayesian Nonparametric Estimation for Point Processes with Spatial Homogeneity: A Spatial Analysis of NBA Shot Locations

    Authors: Fan Yin, Jieying Jiao, Guanyu Hu, Jun Yan

    Abstract: Basketball shot location data provide valuable summary information regarding players to coaches, sports analysts, fans, statisticians, as well as players themselves. Represented by spatial points, such data are naturally analyzed with spatial point process models. We present a novel nonparametric Bayesian method for learning the underlying intensity surface built upon a combination of Dirichlet pr… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

  42. arXiv:2009.14737  [pdf, other

    cs.LG cs.CV stat.ML

    Improving Auto-Augment via Augmentation-Wise Weight Sharing

    Authors: Keyu Tian, Chen Lin, Ming Sun, Luping Zhou, Junjie Yan, Wanli Ouyang

    Abstract: The recent progress on automatically searching augmentation policies has boosted the performance substantially for various tasks. A key component of automatic augmentation search is the evaluation process for a particular augmentation policy, which is utilized to return reward and usually runs thousands of times. A plain evaluation process, which includes full model training and validation, would… ▽ More

    Submitted 22 October, 2020; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: Accepted to NeurIPS 2020 (Poster)

  43. arXiv:2009.02597  [pdf, other

    stat.AP stat.ME stat.ML

    Survival Modeling of Suicide Risk with Rare and Uncertain Diagnoses

    Authors: Wenjie Wang, Chongliang Luo, Robert H. Aseltine, Fei Wang, Jun Yan, Kun Chen

    Abstract: Motivated by the pressing need for suicide prevention through improving behavioral healthcare, we use medical claims data to study the risk of subsequent suicide attempts for patients who were hospitalized due to suicide attempts and later discharged. Understanding the risk behaviors of such patients at elevated suicide risk is an important step toward the goal of "Zero Suicide." An immediate and… ▽ More

    Submitted 7 May, 2023; v1 submitted 5 September, 2020; originally announced September 2020.

  44. arXiv:2009.01027  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

    Authors: Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun Lu, Xiaolin Wei, Junchi Yan

    Abstract: Despite the fast development of differentiable architecture search (DARTS), it suffers from long-standing performance instability, which extremely limits its application. Existing robustifying methods draw clues from the resulting deteriorated behavior instead of finding out its causing factor. Various indicators such as Hessian eigenvalues are proposed as a signal to stop searching before the per… ▽ More

    Submitted 15 January, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

    Comments: Accepted to ICLR2021

  45. arXiv:2004.11145  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

    Authors: Wenhao Li, Bo Jin, Xiangfeng Wang, Junchi Yan, Hongyuan Zha

    Abstract: Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs t… ▽ More

    Submitted 7 July, 2023; v1 submitted 17 April, 2020; originally announced April 2020.

    Comments: 75 pages, 10 figures, JMLR camera ready

  46. Moving-Resting Process with Measurement Error in Animal Movement Modeling

    Authors: Chaoran Hu, Mark Elbroch, Thomas Meyer, Vladimir Pozdnyakov, Jun Yan

    Abstract: Statistical modeling of animal movement is of critical importance. The continuous trajectory of an animal's movements is only observed at discrete, often irregularly spaced time points. Most existing models cannot handle the unequal sampling interval naturally and/or do not allow inactivity periods such as resting or sleeping. The recently proposed moving-resting (MR) model is a Brownian motion go… ▽ More

    Submitted 23 August, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

    Comments: 26 pages, 3 figures, 3 tables

    MSC Class: 62M05

  47. Heterogeneity Pursuit for Spatial Point Pattern with Application to Tree Locations: A Bayesian Semiparametric Recourse

    Authors: Jieying Jiao, Guanyu Hu, Jun Yan

    Abstract: Spatial point pattern data are routinely encountered. A flexible regression model for the underlying intensity is essential to characterizing the spatial point pattern and understanding the impacts of potential risk factors on such pattern. We propose a Bayesian semiparametric regression model where the observed spatial points follow a spatial Poisson process with an intensity function which adjus… ▽ More

    Submitted 23 March, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

    Comments: 21 pages, 7 figures

    Journal ref: Environmetrics 2021: 32(7) e2694

  48. arXiv:2002.04238  [pdf, other

    cs.LG cs.AI stat.ML

    HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem

    Authors: Yun Hua, Xiangfeng Wang, Bo Jin, Wenhao Li, Junchi Yan, Xiaofeng He, Hongyuan Zha

    Abstract: In spite of the success of existing meta reinforcement learning methods, they still have difficulty in learning a meta policy effectively for RL problems with sparse reward. In this respect, we develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL), for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding modul… ▽ More

    Submitted 5 June, 2021; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: 13 pages

    Journal ref: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2021

  49. arXiv:2002.04235  [pdf, other

    cs.LG stat.ML

    Learning Structured Communication for Multi-agent Reinforcement Learning

    Authors: Junjie Sheng, Xiangfeng Wang, Bo Jin, Junchi Yan, Wenhao Li, Tsung-Hui Chang, Jun Wang, Hongyuan Zha

    Abstract: This work explores the large-scale multi-agent communication mechanism under a multi-agent reinforcement learning (MARL) setting. We summarize the general categories of topology for communication structures in MARL literature, which are often manually specified. Then we propose a novel framework termed as Learning Structured Communication (LSC) by using a more flexible and efficient communication… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

  50. arXiv:2001.06838  [pdf, other

    cs.CV cs.LG stat.ML

    Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

    Authors: Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei Zhang, Yichen Wei, Jian Sun

    Abstract: Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been pr… ▽ More

    Submitted 8 April, 2020; v1 submitted 19 January, 2020; originally announced January 2020.

    Comments: ICLR2020; https://github.com/megvii-model/MABN