-
Candidate Dark Galaxy-2: Validation and Analysis of an Almost Dark Galaxy in the Perseus Cluster
Authors:
Dayi Li,
Qing Liu,
Gwendolyn Eadie,
Roberto Abraham,
Francine Marleau,
William Harris,
Pieter van Dokkum,
Aaron Romanowsky,
Shany Danieli,
Patrick Brown,
Alex Stringer
Abstract:
Candidate Dark Galaxy-2 (CDG-2) is a potential dark galaxy consisting of four globular clusters (GCs) in the Perseus cluster, first identified in Li et al. (2025) through a sophisticated statistical method. The method searched for over-densities of GCs from a \textit{Hubble Space Telescope} (\textit{HST}) survey targeting Perseus. Using the same \textit{HST} images and the new imaging data from th…
▽ More
Candidate Dark Galaxy-2 (CDG-2) is a potential dark galaxy consisting of four globular clusters (GCs) in the Perseus cluster, first identified in Li et al. (2025) through a sophisticated statistical method. The method searched for over-densities of GCs from a \textit{Hubble Space Telescope} (\textit{HST}) survey targeting Perseus. Using the same \textit{HST} images and the new imaging data from the \textit{Euclid} survey, we report the detection of extremely faint but significant diffuse emission around the four GCs of CDG-2. We thus have exceptionally strong evidence that CDG-2 is a galaxy. This is the first galaxy detected purely through its GC population. Under the conservative assumption that the four GCs make up the entire GC population, preliminary analysis shows that CDG-2 has a total luminosity of $L_{V, \mathrm{gal}}= 6.2\pm{3.0} \times 10^6 L_{\odot}$ and a minimum GC luminosity of $L_{V, \mathrm{GC}}= 1.03\pm{0.2}\times 10^6 L_{\odot}$. Our results indicate that CDG-2 is one of the faintest galaxies having associated GCs, while at least $\sim 16.6\%$ of its light is contained in its GC population. This ratio is likely to be much higher ($\sim 33\%$) if CDG-2 has a canonical GC luminosity function (GCLF). In addition, if the previously observed GC-to-halo mass relations apply to CDG-2, it would have a minimum dark matter halo mass fraction of $99.94\%$ to $99.98\%$. If it has a canonical GCLF, then the dark matter halo mass fraction is $\gtrsim 99.99\%$. Therefore, CDG-2 may be the most GC dominated galaxy and potentially one of the most dark matter dominated galaxies ever discovered.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Strongly Consistent Community Detection in Popularity Adjusted Block Models
Authors:
Quan Yuan,
Binghui Liu,
Danning Li,
Lingzhou Xue
Abstract:
The Popularity Adjusted Block Model (PABM) provides a flexible framework for community detection in network data by allowing heterogeneous node popularity across communities. However, this flexibility increases model complexity and raises key unresolved challenges, particularly in effectively adapting spectral clustering techniques and efficiently achieving strong consistency in label recovery. To…
▽ More
The Popularity Adjusted Block Model (PABM) provides a flexible framework for community detection in network data by allowing heterogeneous node popularity across communities. However, this flexibility increases model complexity and raises key unresolved challenges, particularly in effectively adapting spectral clustering techniques and efficiently achieving strong consistency in label recovery. To address these challenges, we first propose the Thresholded Cosine Spectral Clustering (TCSC) algorithm and establish its weak consistency under the PABM. We then introduce the one-step Refined TCSC algorithm and prove that it achieves strong consistency under the PABM, correctly recovering all community labels with high probability. We further show that the two-step Refined TCSC accelerates clustering error convergence, especially with small sample sizes. Additionally, we propose a data-driven approach for selecting the number of communities, which outperforms existing methods under the PABM. The effectiveness and robustness of our methods are validated through extensive simulations and real-world applications.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Lower Ricci Curvature for Hypergraphs
Authors:
Shiyi Yang,
Can Chen,
Didong Li
Abstract:
Networks with higher-order interactions, prevalent in biological, social, and information systems, are naturally represented as hypergraphs, yet their structural complexity poses fundamental challenges for geometric characterization. While curvature-based methods offer powerful insights in graph analysis, existing extensions to hypergraphs suffer from critical trade-offs: combinatorial approaches…
▽ More
Networks with higher-order interactions, prevalent in biological, social, and information systems, are naturally represented as hypergraphs, yet their structural complexity poses fundamental challenges for geometric characterization. While curvature-based methods offer powerful insights in graph analysis, existing extensions to hypergraphs suffer from critical trade-offs: combinatorial approaches such as Forman-Ricci curvature capture only coarse features, whereas geometric methods like Ollivier-Ricci curvature offer richer expressivity but demand costly optimal transport computations. To address these challenges, we introduce hypergraph lower Ricci curvature (HLRC), a novel curvature metric defined in closed form that achieves a principled balance between interpretability and efficiency. Evaluated across diverse synthetic and real-world hypergraph datasets, HLRC consistently reveals meaningful higher-order organization, distinguishing intra- from inter-community hyperedges, uncovering latent semantic labels, tracking temporal dynamics, and supporting robust clustering of hypergraphs based on global structure. By unifying geometric sensitivity with algorithmic simplicity, HLRC provides a versatile foundation for hypergraph analytics, with broad implications for tasks including node classification, anomaly detection, and generative modeling in complex systems.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity
Authors:
Xuejun Sun,
Yiran Song,
Xiaochen Zhou,
Ruilie Cai,
Yu Zhang,
Xinyi Li,
Rui Peng,
Jialiu Xie,
Yuanyuan Yan,
Muyao Tang,
Prem Lakshmanane,
Baiming Zou,
James S. Hagood,
Raymond J. Pickles,
Didong Li,
Fei Zou,
Xiaojing Zheng
Abstract:
Respiratory viral infections pose a global health burden, yet the cellular immune responses driving protection or pathology remain unclear. Natural infection cohorts often lack pre-exposure baseline data and structured temporal sampling. In contrast, inoculation and vaccination trials generate insightful longitudinal transcriptomic data. However, the scattering of these datasets across platforms,…
▽ More
Respiratory viral infections pose a global health burden, yet the cellular immune responses driving protection or pathology remain unclear. Natural infection cohorts often lack pre-exposure baseline data and structured temporal sampling. In contrast, inoculation and vaccination trials generate insightful longitudinal transcriptomic data. However, the scattering of these datasets across platforms, along with inconsistent metadata and preprocessing procedure, hinders AI-driven discovery. To address these challenges, we developed the Human Respiratory Viral Immunization LongitudinAl Gene Expression (HR-VILAGE-3K3M) repository: an AI-ready, rigorously curated dataset that integrates 14,136 RNA-seq profiles from 3,178 subjects across 66 studies encompassing over 2.56 million cells. Spanning vaccination, inoculation, and mixed exposures, the dataset includes microarray, bulk RNA-seq, and single-cell RNA-seq from whole blood, PBMCs, and nasal swabs, sourced from GEO, ImmPort, and ArrayExpress. We harmonized subject-level metadata, standardized outcome measures, applied unified preprocessing pipelines with rigorous quality control, and aligned all data to official gene symbols. To demonstrate the utility of HR-VILAGE-3K3M, we performed predictive modeling of vaccine responders and evaluated batch-effect correction methods. Beyond these initial demonstrations, it supports diverse systems immunology applications and benchmarking of feature selection and transfer learning algorithms. Its scale and heterogeneity also make it ideal for pretraining foundation models of the human immune response and for advancing multimodal learning frameworks. As the largest longitudinal transcriptomic resource for human respiratory viral immunization, it provides an accessible platform for reproducible AI-driven research, accelerating systems immunology and vaccine development against emerging viral threats.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
Authors:
Ruiquan Huang,
Donghao Li,
Chengshuai Shi,
Cong Shen,
Jing Yang
Abstract:
This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-o…
▽ More
This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-of-the-art results under two learning metrics, i.e., sub-optimality gap and online learning regret. Specifically, we show that our algorithm achieves a sub-optimality gap $\tilde{O}(\sqrt{1/(N_0/\mathtt{C}(π^*|ρ)+N_1}) )$, where $\mathtt{C}(π^*|ρ)$ is a new concentrability coefficient, $N_0$ and $N_1$ are the numbers of offline and online samples, respectively. For regret minimization, we show that it achieves a constant $\tilde{O}( \sqrt{N_1/(N_0/\mathtt{C}(π^{-}|ρ)+N_1)} )$ speed-up compared to pure online learning, where $\mathtt{C}(π^-|ρ)$ is the concentrability coefficient over all sub-optimal policies. Our results also reveal an interesting separation on the desired coverage properties of the offline dataset for sub-optimality gap minimization and regret minimization. We further validate our theoretical findings in several experiments in special RL models such as linear contextual bandits and Markov decision processes (MDPs).
△ Less
Submitted 27 June, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
Target Prediction Under Deceptive Switching Strategies via Outlier-Robust Filtering of Partially Observed Incomplete Trajectories
Authors:
Yiming Meng,
Dongchang Li,
Melkior Ornik
Abstract:
Motivated by a study on deception and counter-deception, this paper addresses the problem of identifying an agent's target as it seeks to reach one of two targets in a given environment. In practice, an agent may initially follow a strategy to aim at one target but decide to switch to another midway. Such a strategy can be deceptive when the counterpart only has access to imperfect observations, w…
▽ More
Motivated by a study on deception and counter-deception, this paper addresses the problem of identifying an agent's target as it seeks to reach one of two targets in a given environment. In practice, an agent may initially follow a strategy to aim at one target but decide to switch to another midway. Such a strategy can be deceptive when the counterpart only has access to imperfect observations, which include heavily corrupted sensor noise and possible outliers, making it difficult to visually identify the agent's true intent. To counter deception and identify the true target, we utilize prior knowledge of the agent's dynamics and the imprecisely observed partial trajectory of the agent's states to dynamically update the estimation of the posterior probability of whether a deceptive switch has taken place. However, existing methods in the literature have not achieved effective deception identification within a reasonable computation time. We propose a set of outlier-robust change detection methods to track relevant change-related statistics efficiently, enabling the detection of deceptive strategies in hidden nonlinear dynamics with reasonable computational effort. The performance of the proposed framework is examined for Weapon-Target Assignment (WTA) detection under deceptive strategies using random simulations in the kinematics model with external forcing.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Deep Generative Models: Complexity, Dimensionality, and Approximation
Authors:
Kevin Wang,
Hongqian Niu,
Yixin Wang,
Didong Li
Abstract:
Generative networks have shown remarkable success in learning complex data distributions, particularly in generating high-dimensional data from lower-dimensional inputs. While this capability is well-documented empirically, its theoretical underpinning remains unclear. One common theoretical explanation appeals to the widely accepted manifold hypothesis, which suggests that many real-world dataset…
▽ More
Generative networks have shown remarkable success in learning complex data distributions, particularly in generating high-dimensional data from lower-dimensional inputs. While this capability is well-documented empirically, its theoretical underpinning remains unclear. One common theoretical explanation appeals to the widely accepted manifold hypothesis, which suggests that many real-world datasets, such as images and signals, often possess intrinsic low-dimensional geometric structures. Under this manifold hypothesis, it is widely believed that to approximate a distribution on a $d$-dimensional Riemannian manifold, the latent dimension needs to be at least $d$ or $d+1$. In this work, we show that this requirement on the latent dimension is not necessary by demonstrating that generative networks can approximate distributions on $d$-dimensional Riemannian manifolds from inputs of any arbitrary dimension, even lower than $d$, taking inspiration from the concept of space-filling curves. This approach, in turn, leads to a super-exponential complexity bound of the deep neural networks through expanded neurons. Our findings thus challenge the conventional belief on the relationship between input dimensionality and the ability of generative networks to model data distributions. This novel insight not only corroborates the practical effectiveness of generative networks in handling complex data structures, but also underscores a critical trade-off between approximation error, dimensionality, and model complexity.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
On a new robust method of inference for general time series models
Authors:
Zihan Wang,
Xinghao Qiao,
Dong Li,
Howell Tong
Abstract:
In this article, we propose a novel logistic quasi-maximum likelihood estimation (LQMLE) for general parametric time series models. Compared to the classical Gaussian QMLE and existing robust estimations, it enjoys many distinctive advantages, such as robustness in respect of distributional misspecification and heavy-tailedness of the innovation, more resiliency to outliers, smoothness and strict…
▽ More
In this article, we propose a novel logistic quasi-maximum likelihood estimation (LQMLE) for general parametric time series models. Compared to the classical Gaussian QMLE and existing robust estimations, it enjoys many distinctive advantages, such as robustness in respect of distributional misspecification and heavy-tailedness of the innovation, more resiliency to outliers, smoothness and strict concavity of the log logistic quasi-likelihood function, and boundedness of the influence function among others. Under some mild conditions, we establish the strong consistency and asymptotic normality of the LQMLE. Moreover, we propose a new and vital parameter identifiability condition to ensure desirable asymptotics of the LQMLE. Further, based on the LQMLE, we consider the Wald test and the Lagrange multiplier test for the unknown parameters, and derive the limiting distributions of the corresponding test statistics. The applicability of our methodology is demonstrated by several time series models, including DAR, GARCH, ARMA-GARCH, DTARMACH, and EXPAR. Numerical simulation studies are carried out to assess the finite-sample performance of our methodology, and an empirical example is analyzed to illustrate its usefulness.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Max-Linear Tail Regression
Authors:
Liujun Chen,
Deyuan Li,
Zhengjun Zhang
Abstract:
The relationship between a response variable and its covariates can vary significantly, especially in scenarios where covariates take on extremely high or low values. This paper introduces a max-linear tail regression model specifically designed to capture such extreme relationships. To estimate the regression coefficients within this framework, we propose a novel M-estimator based on extreme valu…
▽ More
The relationship between a response variable and its covariates can vary significantly, especially in scenarios where covariates take on extremely high or low values. This paper introduces a max-linear tail regression model specifically designed to capture such extreme relationships. To estimate the regression coefficients within this framework, we propose a novel M-estimator based on extreme value theory. The consistency and asymptotic normality of our proposed estimator are rigorously established under mild conditions. Simulation results demonstrate that our estimation method outperforms the conditional least squares approach. We validate the practical applicability of our model through two case studies: one using financial data and the other using rainfall data.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Revisiting Dynamic Graph Clustering via Matrix Factorization
Authors:
Dongyuan Li,
Satoshi Kosugi,
Ying Zhang,
Manabu Okumura,
Feng Xia,
Renhe Jiang
Abstract:
Dynamic graph clustering aims to detect and track time-varying clusters in dynamic graphs, revealing the evolutionary mechanisms of complex real-world dynamic systems. Matrix factorization-based methods are promising approaches for this task; however, these methods often struggle with scalability and can be time-consuming when applied to large-scale dynamic graphs. Moreover, they tend to lack robu…
▽ More
Dynamic graph clustering aims to detect and track time-varying clusters in dynamic graphs, revealing the evolutionary mechanisms of complex real-world dynamic systems. Matrix factorization-based methods are promising approaches for this task; however, these methods often struggle with scalability and can be time-consuming when applied to large-scale dynamic graphs. Moreover, they tend to lack robustness and are vulnerable to real-world noisy data. To address these issues, we make three key contributions. First, to improve scalability, we propose temporal separated matrix factorization, where a single matrix is divided into multiple smaller matrices for independent factorization, resulting in faster computation. Second, to improve robustness, we introduce bi-clustering regularization, which jointly optimizes graph embedding and clustering, thereby filtering out noisy features from the graph embeddings. Third, to further enhance effectiveness and efficiency, we propose selective embedding updating, where we update only the embeddings of dynamic nodes while the embeddings of static nodes are fixed among different timestamps. Experimental results on six synthetic and five real-world benchmarks demonstrate the scalability, robustness and effectiveness of our proposed method. Source code is available at https://github.com/Clearloveyuan/DyG-MF.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
Estimating causal effects using difference-in-differences under network dependency and interference
Authors:
Michael Jetsupphasuk,
Didong Li,
Michael G. Hudgens
Abstract:
Differences-in-differences (DiD) is a causal inference method for observational longitudinal data that assumes parallel expected outcome trajectories between treatment groups under the (possible) counterfactual of receiving a specific treatment. In this paper DiD is extended to allow for (i) network dependency where outcomes, treatments, and covariates may exhibit between-unit latent correlation,…
▽ More
Differences-in-differences (DiD) is a causal inference method for observational longitudinal data that assumes parallel expected outcome trajectories between treatment groups under the (possible) counterfactual of receiving a specific treatment. In this paper DiD is extended to allow for (i) network dependency where outcomes, treatments, and covariates may exhibit between-unit latent correlation, and (ii) interference, where treatments can affect outcomes in neighboring units. In this setting, the causal estimand of interest is the average exposure effect among units with a specific exposure level, where the exposure is a function of treatments from potentially many units. Under a conditional parallel trends assumption and suitable network dependency conditions, a doubly robust estimator allowing for data-adaptive nuisance function estimation is proposed and shown to be consistent and asymptotically normal with variance reaching the semiparametric efficiency bound. The proposed methods are evaluated in simulations and applied to study the effects of adopting emission control technologies in coal power plants on county-level mortality due to cardiovascular disease.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Physics-Informed Machine Learning for Efficient Reconfigurable Intelligent Surface Design
Authors:
Zhen Zhang,
Jun Hui Qiu,
Jun Wei Zhang,
Hui Dong Li,
Dong Tang,
Qiang Cheng,
Wei Lin
Abstract:
Reconfigurable intelligent surface (RIS) is a two-dimensional periodic structure integrated with a large number of reflective elements, which can manipulate electromagnetic waves in a digital way, offering great potentials for wireless communication and radar detection applications. However, conventional RIS designs highly rely on extensive full-wave EM simulations that are extremely time-consumin…
▽ More
Reconfigurable intelligent surface (RIS) is a two-dimensional periodic structure integrated with a large number of reflective elements, which can manipulate electromagnetic waves in a digital way, offering great potentials for wireless communication and radar detection applications. However, conventional RIS designs highly rely on extensive full-wave EM simulations that are extremely time-consuming. To address this challenge, we propose a machine-learning-assisted approach for efficient RIS design. An accurate and fast model to predict the reflection coefficient of RIS element is developed by combining a multi-layer perceptron neural network (MLP) and a dual-port network, which can significantly reduce tedious EM simulations in the network training. A RIS has been practically designed based on the proposed method. To verify the proposed method, the RIS has also been fabricated and measured. The experimental results are in good agreement with the simulation results, which validates the efficacy of the proposed method in RIS design.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Large covariance matrix estimation with factor-assisted variable clustering
Authors:
Dong Li,
Xinghao Qiao,
Cheng Yu
Abstract:
This paper studies the covariance matrix estimation for high-dimensional time series within a new framework that combines low-rank factor and latent variable-specific cluster structures. The popular methods based on assuming the sparse error covariance matrix after taking out common factors may be invalid for many financial applications. Our formulation postulates a latent model-based error cluste…
▽ More
This paper studies the covariance matrix estimation for high-dimensional time series within a new framework that combines low-rank factor and latent variable-specific cluster structures. The popular methods based on assuming the sparse error covariance matrix after taking out common factors may be invalid for many financial applications. Our formulation postulates a latent model-based error cluster structure after removing observable factors, which not only leads to more interpretable cluster patterns but also accounts for non-sparse cross-sectional correlations among the variable-specific residuals. Our method begins with using least-squares to estimate the factor loadings, followed by identifying the latent cluster structure by thresholding the scaled covariance difference measures of residuals. A novel ratio-based criterion is introduced to determine the threshold parameter when performing the developed clustering algorithm. We then establish the cluster recovery consistency of our method and derive the convergence rates of our proposed covariance matrix estimators under different norms. Finally, we demonstrate the superior finite sample performance of our proposal over the competing methods through both extensive simulations and a real data application on minimum variance portfolio.
△ Less
Submitted 24 February, 2025; v1 submitted 18 January, 2025;
originally announced January 2025.
-
A note on local parameter orthogonality for multivariate data and the Whittle algorithm for multivariate autoregressive models
Authors:
Changle Shen,
Dong Li,
Howell Tong
Abstract:
This article extends the Cox--Reid local parameter orthogonality to a multivariate setting, gives an affirmative reply to one of Cox and Reid's questions, and shows that the extension can lead to efficient computational algorithms with the celebrated Whittle algorithm for multivariate autoregressive modeling as a showcase.
This article extends the Cox--Reid local parameter orthogonality to a multivariate setting, gives an affirmative reply to one of Cox and Reid's questions, and shows that the extension can lead to efficient computational algorithms with the celebrated Whittle algorithm for multivariate autoregressive modeling as a showcase.
△ Less
Submitted 21 January, 2025; v1 submitted 14 January, 2025;
originally announced January 2025.
-
Hypothesis Testing for High-Dimensional Matrix-Valued Data
Authors:
Shijie Cui,
Danning Li,
Runze Li,
Lingzhou Xue
Abstract:
This paper addresses hypothesis testing for the mean of matrix-valued data in high-dimensional settings. We investigate the minimum discrepancy test, originally proposed by Cragg (1997), which serves as a rank test for lower-dimensional matrices. We evaluate the performance of this test as the matrix dimensions increase proportionally with the sample size, and identify its limitations when matrix…
▽ More
This paper addresses hypothesis testing for the mean of matrix-valued data in high-dimensional settings. We investigate the minimum discrepancy test, originally proposed by Cragg (1997), which serves as a rank test for lower-dimensional matrices. We evaluate the performance of this test as the matrix dimensions increase proportionally with the sample size, and identify its limitations when matrix dimensions significantly exceed the sample size. To address these challenges, we propose a new test statistic tailored for high-dimensional matrix rank testing. The oracle version of this statistic is analyzed to highlight its theoretical properties. Additionally, we develop a novel approach for constructing a sparse singular value decomposition (SVD) estimator for singular vectors, providing a comprehensive examination of its theoretical aspects. Using the sparse SVD estimator, we explore the properties of the sample version of our proposed statistic. The paper concludes with simulation studies and two case studies involving surveillance video data, demonstrating the practical utility of our proposed methods.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
High-Dimensional Extreme Quantile Regression
Authors:
Yiwei Tang,
Judy Huixia Wang,
Deyuan Li
Abstract:
The estimation of conditional quantiles at extreme tails is of great interest in numerous applications. Various methods that integrate regression analysis with an extrapolation strategy derived from extreme value theory have been proposed to estimate extreme conditional quantiles in scenarios with a fixed number of covariates. However, these methods prove ineffective in high-dimensional settings,…
▽ More
The estimation of conditional quantiles at extreme tails is of great interest in numerous applications. Various methods that integrate regression analysis with an extrapolation strategy derived from extreme value theory have been proposed to estimate extreme conditional quantiles in scenarios with a fixed number of covariates. However, these methods prove ineffective in high-dimensional settings, where the number of covariates increases with the sample size. In this article, we develop new estimation methods tailored for extreme conditional quantiles with high-dimensional covariates. We establish the asymptotic properties of the proposed estimators and demonstrate their superior performance through simulation studies, particularly in scenarios of growing dimension and high dimension where existing methods may fail. Furthermore, the analysis of auto insurance data validates the efficacy of our methods in estimating extreme conditional insurance claims and selecting important variables.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Constrained Multi-objective Bayesian Optimization through Optimistic Constraints Estimation
Authors:
Diantong Li,
Fengxue Zhang,
Chong Liu,
Yuxin Chen
Abstract:
Multi-objective Bayesian optimization has been widely adopted in scientific experiment design, including drug discovery and hyperparameter optimization. In practice, regulatory or safety concerns often impose additional thresholds on certain attributes of the experimental outcomes. Previous work has primarily focused on constrained single-objective optimization tasks or active search under constra…
▽ More
Multi-objective Bayesian optimization has been widely adopted in scientific experiment design, including drug discovery and hyperparameter optimization. In practice, regulatory or safety concerns often impose additional thresholds on certain attributes of the experimental outcomes. Previous work has primarily focused on constrained single-objective optimization tasks or active search under constraints. The existing constrained multi-objective algorithms address the issue with heuristics and approximations, posing challenges to the analysis of the sample efficiency. We propose a novel constrained multi-objective Bayesian optimization algorithm COMBOO that balances active learning of the level-set defined on multiple unknowns with multi-objective optimization within the feasible region. We provide both theoretical analysis and empirical evidence, demonstrating the efficacy of our approach on various synthetic benchmarks and real-world applications.
△ Less
Submitted 21 April, 2025; v1 submitted 5 November, 2024;
originally announced November 2024.
-
K-Contact Distance for Noisy Nonhomogeneous Spatial Point Data with application to Repeating Fast Radio Burst sources
Authors:
A. M. Cook,
Dayi Li,
Gwendolyn M. Eadie,
David C. Stenning,
Paul Scholz,
Derek Bingham,
Radu Craiu,
B. M. Gaensler,
Kiyoshi W. Masui,
Ziggy Pleunis,
Antonio Herrera-Martin,
Ronniy C. Joseph,
Ayush Pandhi,
Aaron B. Pearlman,
J. Xavier Prochaska
Abstract:
This paper introduces an approach to analyze nonhomogeneous Poisson processes (NHPP) observed with noise, focusing on previously unstudied second-order characteristics of the noisy process. Utilizing a hierarchical Bayesian model with noisy data, we estimate hyperparameters governing a physically motivated NHPP intensity. Simulation studies demonstrate the reliability of this methodology in accura…
▽ More
This paper introduces an approach to analyze nonhomogeneous Poisson processes (NHPP) observed with noise, focusing on previously unstudied second-order characteristics of the noisy process. Utilizing a hierarchical Bayesian model with noisy data, we estimate hyperparameters governing a physically motivated NHPP intensity. Simulation studies demonstrate the reliability of this methodology in accurately estimating hyperparameters. Leveraging the posterior distribution, we then infer the probability of detecting a certain number of events within a given radius, the $k$-contact distance. We demonstrate our methodology with an application to observations of fast radio bursts (FRBs) detected by the Canadian Hydrogen Intensity Mapping Experiment's FRB Project (CHIME/FRB). This approach allows us to identify repeating FRB sources by bounding or directly simulating the probability of observing $k$ physically independent sources within some radius in the detection domain, or the $\textit{probability of coincidence}$ ($P_{\text{C}}$). The new methodology improves the repeater detection $P_{\text{C}}$ in 86% of cases when applied to the largest sample of previously classified observations, with a median improvement factor (existing metric over $P_{\text{C}}$ from our methodology) of $\sim$ 3000.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Integrating Expert Judgment and Algorithmic Decision Making: An Indistinguishability Framework
Authors:
Rohan Alur,
Loren Laine,
Darrick K. Li,
Dennis Shung,
Manish Raghavan,
Devavrat Shah
Abstract:
We introduce a novel framework for human-AI collaboration in prediction and decision tasks. Our approach leverages human judgment to distinguish inputs which are algorithmically indistinguishable, or "look the same" to any feasible predictive algorithm. We argue that this framing clarifies the problem of human-AI collaboration in prediction and decision tasks, as experts often form judgments by dr…
▽ More
We introduce a novel framework for human-AI collaboration in prediction and decision tasks. Our approach leverages human judgment to distinguish inputs which are algorithmically indistinguishable, or "look the same" to any feasible predictive algorithm. We argue that this framing clarifies the problem of human-AI collaboration in prediction and decision tasks, as experts often form judgments by drawing on information which is not encoded in an algorithm's training data. Algorithmic indistinguishability yields a natural test for assessing whether experts incorporate this kind of "side information", and further provides a simple but principled method for selectively incorporating human feedback into algorithmic predictions. We show that this method provably improves the performance of any feasible algorithmic predictor and precisely quantify this improvement. We demonstrate the utility of our framework in a case study of emergency room triage decisions, where we find that although algorithmic risk scores are highly competitive with physicians, there is strong evidence that physician judgments provide signal which could not be replicated by any predictive algorithm. This insight yields a range of natural decision rules which leverage the complementary strengths of human experts and predictive algorithms.
△ Less
Submitted 17 October, 2024; v1 submitted 11 October, 2024;
originally announced October 2024.
-
Asymmetric GARCH modelling without moment conditions
Authors:
Yuxin Tao,
Dong Li
Abstract:
There is a serious and long-standing restriction in the literature on heavy-tailed phenomena in that moment conditions, which are unrealistic, are almost always assumed in modelling such phenomena. Further, the issue of stability is often insufficiently addressed. To this end, we develop a comprehensive statistical inference for an asymmetric generalized autoregressive conditional heteroskedastici…
▽ More
There is a serious and long-standing restriction in the literature on heavy-tailed phenomena in that moment conditions, which are unrealistic, are almost always assumed in modelling such phenomena. Further, the issue of stability is often insufficiently addressed. To this end, we develop a comprehensive statistical inference for an asymmetric generalized autoregressive conditional heteroskedasticity model with standardized non-Gaussian symmetric stable innovation (sAGARCH) in a unified framework, covering both the stationary case and the explosive case. We consider first the maximum likelihood estimation of the model including the asymptotic properties of the estimator of the stable exponent parameter among others. We then propose a modified Kolmogorov-type test statistic for diagnostic checking, as well as those for strict stationarity and asymmetry testing. We conduct Monte Carlo simulation studies to examine the finite-sample performance of our entire statistical inference procedure. We include empirical examples of stock returns to highlight the usefulness and merits of our sAGARCH model.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
An adaptive Gaussian process method for multi-modal Bayesian inverse problems
Authors:
Zhihang Xu,
Xiaoyu Zhu,
Daoji Li,
Qifeng Liao
Abstract:
Inverse problems are prevalent in both scientific research and engineering applications. In the context of Bayesian inverse problems, sampling from the posterior distribution is particularly challenging when the forward models are computationally expensive. This challenge escalates further when the posterior distribution is multimodal. To address this, we propose a Gaussian process (GP) based meth…
▽ More
Inverse problems are prevalent in both scientific research and engineering applications. In the context of Bayesian inverse problems, sampling from the posterior distribution is particularly challenging when the forward models are computationally expensive. This challenge escalates further when the posterior distribution is multimodal. To address this, we propose a Gaussian process (GP) based method to indirectly build surrogates for the forward model. Specifically, the unnormalized posterior density is expressed as a product of an auxiliary density and an exponential GP surrogate. In an iterative way, the auxiliary density will converge to the posterior distribution starting from an arbitrary initial density. However, the efficiency of the GP regression is highly influenced by the quality of the training data. Therefore, we utilize the iterative local updating ensemble smoother (ILUES) to generate high-quality samples that are concentrated in regions with high posterior probability. Subsequently, based on the surrogate model and the mode information that is extracted by using a clustering method, MCMC with a Gaussian mixed (GM) proposal is used to draw samples from the auxiliary density. Through numerical examples, we demonstrate that the proposed method can accurately and efficiently represent the posterior with a limited number of forward simulations.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity
Authors:
Dongyue Li,
Aneesh Sharma,
Hongyang R. Zhang
Abstract:
Multitask learning is a widely used paradigm for training models on diverse tasks, with applications ranging from graph neural networks to language model fine-tuning. Since tasks may interfere with each other, a key notion for modeling their relationships is task affinity. This includes pairwise task affinity, computed among pairs of tasks, and higher-order affinity, computed among subsets of task…
▽ More
Multitask learning is a widely used paradigm for training models on diverse tasks, with applications ranging from graph neural networks to language model fine-tuning. Since tasks may interfere with each other, a key notion for modeling their relationships is task affinity. This includes pairwise task affinity, computed among pairs of tasks, and higher-order affinity, computed among subsets of tasks. Naively computing either of them requires repeatedly training on data from various task combinations, which is computationally intensive. We present a new algorithm Grad-TAG that can estimate task affinities without this repeated training.
The key idea of Grad-TAG is to train a "base" model for all tasks and then use a linearization technique to estimate the loss of the model for a specific task combination. The linearization works by computing a gradient-based approximation of the loss, using low-dimensional projections of gradients as features in a logistic regression to predict labels for the task combination. We show that the linearized model can provably approximate the loss when the gradient-based approximation is accurate, and also empirically verify that on several large models. Then, given the estimated task affinity, we design a semi-definite program for clustering similar tasks by maximizing the average density of clusters.
We evaluate Grad-TAG's performance across seven datasets, including multi-label classification on graphs, and instruction fine-tuning of language models. Our task affinity estimates are within 2.7% distance to the true affinities while needing only 3% of FLOPs in full training. On our largest graph with 21M edges and 500 labeling tasks, our algorithm delivers estimates within 5% distance to the true affinities, using only 112 GPU hours. Our results show that Grad-TAG achieves excellent performance and runtime tradeoffs compared to existing approaches.
△ Less
Submitted 20 November, 2024; v1 submitted 9 September, 2024;
originally announced September 2024.
-
Discovery of Two Ultra-Diffuse Galaxies with Unusually Bright Globular Cluster Luminosity Functions via a Mark-Dependently Thinned Point Process (MATHPOP)
Authors:
Dayi Li,
Gwendolyn Eadie,
Patrick Brown,
William Harris,
Roberto Abraham,
Pieter van Dokkum,
Steven Janssens,
Samantha Berek,
Shany Danieli,
Aaron Romanowsky,
Joshua Speagle
Abstract:
We present \textsc{Mathpop}, a novel method to infer the globular cluster (GC) counts in ultra-diffuse galaxies (UDGs) and low-surface brightness galaxies (LSBGs). Many known UDGs have a surprisingly high ratio of GC number to surface brightness. However, standard methods to infer GC counts in UDGs face various challenges, such as photometric measurement uncertainties, GC membership uncertainties,…
▽ More
We present \textsc{Mathpop}, a novel method to infer the globular cluster (GC) counts in ultra-diffuse galaxies (UDGs) and low-surface brightness galaxies (LSBGs). Many known UDGs have a surprisingly high ratio of GC number to surface brightness. However, standard methods to infer GC counts in UDGs face various challenges, such as photometric measurement uncertainties, GC membership uncertainties, and assumptions about the GC luminosity functions (GCLFs). \textsc{Mathpop} tackles these challenges using the mark-dependent thinned point process, enabling joint inference of the spatial and magnitude distributions of GCs. In doing so, \textsc{Mathpop} allows us to infer and quantify the uncertainties in both GC counts and GCLFs with minimal assumptions. As a precursor to \textsc{Mathpop}, we also address the data uncertainties coming from the selection process of GC candidates: we obtain probabilistic GC candidates instead of the traditional binary classification based on the color--magnitude diagram. We apply \textsc{Mathpop} to 40 LSBGs in the Perseus cluster using GC catalogs from a \textit{Hubble Space Telescope} imaging program. We then compare our results to those from an independent study using the standard method. We further calibrate and validate our approach through extensive simulations. Our approach reveals two LSBGs having GCLF turnover points much brighter than the canonical value with Bayes' factor being $\sim4.5$ and $\sim2.5$, respectively. An additional crude maximum-likelihood estimation shows that their GCLF TO points are approximately $0.9$~mag and $1.1$~mag brighter than the canonical value, with $p$-value $\sim 10^{-8}$ and $\sim 10^{-5}$, respectively.
△ Less
Submitted 12 September, 2024; v1 submitted 9 September, 2024;
originally announced September 2024.
-
Resultant: Incremental Effectiveness on Likelihood for Unsupervised Out-of-Distribution Detection
Authors:
Yewen Li,
Chaojie Wang,
Xiaobo Xia,
Xu He,
Ruyi An,
Dong Li,
Tongliang Liu,
Bo An,
Xinrun Wang
Abstract:
Unsupervised out-of-distribution (U-OOD) detection is to identify OOD data samples with a detector trained solely on unlabeled in-distribution (ID) data. The likelihood function estimated by a deep generative model (DGM) could be a natural detector, but its performance is limited in some popular "hard" benchmarks, such as FashionMNIST (ID) vs. MNIST (OOD). Recent studies have developed various det…
▽ More
Unsupervised out-of-distribution (U-OOD) detection is to identify OOD data samples with a detector trained solely on unlabeled in-distribution (ID) data. The likelihood function estimated by a deep generative model (DGM) could be a natural detector, but its performance is limited in some popular "hard" benchmarks, such as FashionMNIST (ID) vs. MNIST (OOD). Recent studies have developed various detectors based on DGMs to move beyond likelihood. However, despite their success on "hard" benchmarks, most of them struggle to consistently surpass or match the performance of likelihood on some "non-hard" cases, such as SVHN (ID) vs. CIFAR10 (OOD) where likelihood could be a nearly perfect detector. Therefore, we appeal for more attention to incremental effectiveness on likelihood, i.e., whether a method could always surpass or at least match the performance of likelihood in U-OOD detection. We first investigate the likelihood of variational DGMs and find its detection performance could be improved in two directions: i) alleviating latent distribution mismatch, and ii) calibrating the dataset entropy-mutual integration. Then, we apply two techniques for each direction, specifically post-hoc prior and dataset entropy-mutual calibration. The final method, named Resultant, combines these two directions for better incremental effectiveness compared to either technique alone. Experimental results demonstrate that the Resultant could be a new state-of-the-art U-OOD detector while maintaining incremental effectiveness on likelihood in a wide range of tasks.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review
Authors:
Buxin Su,
Jiayao Zhang,
Natalie Collina,
Yuling Yan,
Didong Li,
Kyunghyun Cho,
Jianqing Fan,
Aaron Roth,
Weijie Su
Abstract:
We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML), asking authors with multiple submissions to rank their papers based on perceived quality. In total, we received 1,342 rankings, each from a different author, covering 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be leverag…
▽ More
We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML), asking authors with multiple submissions to rank their papers based on perceived quality. In total, we received 1,342 rankings, each from a different author, covering 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be leveraged to improve peer review processes at machine learning conferences. We focus on the Isotonic Mechanism, which calibrates raw review scores using the author-provided rankings. Our analysis shows that these ranking-calibrated scores outperform the raw review scores in estimating the ground truth ``expected review scores'' in terms of both squared and absolute error metrics. Furthermore, we propose several cautious, low-risk applications of the Isotonic Mechanism and author-provided rankings in peer review, including supporting senior area chairs in overseeing area chairs' recommendations, assisting in the selection of paper awards, and guiding the recruitment of emergency reviewers.
△ Less
Submitted 17 May, 2025; v1 submitted 23 August, 2024;
originally announced August 2024.
-
Bayesian Spatiotemporal Wombling
Authors:
Aritra Halder,
Didong Li,
Sudipto Banerjee
Abstract:
Stochastic process models for spatiotemporal data underlying random fields find substantial utility in a range of scientific disciplines. Subsequent to predictive inference on the values of the random field (or spatial surface indexed continuously over time) at arbitrary space-time coordinates, scientific interest often turns to gleaning information regarding zones of rapid spatial-temporal change…
▽ More
Stochastic process models for spatiotemporal data underlying random fields find substantial utility in a range of scientific disciplines. Subsequent to predictive inference on the values of the random field (or spatial surface indexed continuously over time) at arbitrary space-time coordinates, scientific interest often turns to gleaning information regarding zones of rapid spatial-temporal change. We develop Bayesian modeling and inference for directional rates of change along a given surface. These surfaces, which demarcate regions of rapid change, are referred to as ``wombling'' surface boundaries. Existing methods for studying such changes have often been associated with curves and are not easily extendable to surfaces resulting from curves evolving over time. Our current contribution devises a fully model-based inferential framework for analyzing differential behavior in spatiotemporal responses by formalizing the notion of a ``wombling'' surface boundary using conventional multi-linear vector analytic frameworks and geometry followed by posterior predictive computations using triangulated surface approximations. We illustrate our methodology with comprehensive simulation experiments followed by multiple applications in environmental and climate science; pollutant analysis in environmental health; and brain imaging.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Two-way Matrix Autoregressive Model with Thresholds
Authors:
Cheng Yu,
Dong Li,
Xinyu Zhang,
Howell Tong
Abstract:
Recently, matrix-valued time series data have attracted significant attention in the literature with the recognition of threshold nonlinearity representing a significant advance. However, given the fact that a matrix is a two-array structure, it is unfortunate, perhaps even unusual, for the threshold literature to focus on using the same threshold variable for the rows and the columns. In fact, ev…
▽ More
Recently, matrix-valued time series data have attracted significant attention in the literature with the recognition of threshold nonlinearity representing a significant advance. However, given the fact that a matrix is a two-array structure, it is unfortunate, perhaps even unusual, for the threshold literature to focus on using the same threshold variable for the rows and the columns. In fact, evidence in economic, financial, environmental and other data shows advantages of allowing the possibilities of two different threshold variables (with possibly different threshold parameters for rows and columns), hence the need for a Two-way Matrix AutoRegressive model with Thresholds (2-MART). Naturally, two threshold variables pose new and perhaps even fierce challenges, which might be the reason behind the adoption of only one threshold variable in the literature up to now. In this paper, we develop a comprehensive methodology for the 2-MART model, by overcoming various challenges. Compared with existing models in the literature, the new model can achieve greater dimension reduction, much better model fitting, more accurate predictions, and more plausible interpretations.
△ Less
Submitted 21 January, 2025; v1 submitted 14 July, 2024;
originally announced July 2024.
-
Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics
Authors:
Deyuan Li,
Taesoo Daniel Lee,
Marynel Vázquez,
Nathan Tsoi
Abstract:
Multiclass neural network classifiers are typically trained using cross-entropy loss but evaluated using metrics derived from the confusion matrix, such as Accuracy, $F_β$-Score, and Matthews Correlation Coefficient. This mismatch between the training objective and evaluation metric can lead to suboptimal performance, particularly when the user's priorities differ from what cross-entropy implicitl…
▽ More
Multiclass neural network classifiers are typically trained using cross-entropy loss but evaluated using metrics derived from the confusion matrix, such as Accuracy, $F_β$-Score, and Matthews Correlation Coefficient. This mismatch between the training objective and evaluation metric can lead to suboptimal performance, particularly when the user's priorities differ from what cross-entropy implicitly optimizes. For example, in the presence of class imbalance, $F_1$-Score may be preferred over Accuracy. Similarly, given a preference towards precision, the $F_{β=0.25}$-Score will better reflect this preference than $F_1$-Score. However, standard cross-entropy loss does not accommodate such a preference. Building on prior work leveraging soft-set confusion matrices and a continuous piecewise-linear Heaviside approximation, we propose Evaluation Aligned Surrogate Training (EAST), a novel approach to train multiclass classifiers using close surrogates of confusion-matrix based metrics, thereby aligning a neural network classifier's predictions more closely to a target evaluation metric than typical cross-entropy loss. EAST introduces three key innovations: First, we propose a novel dynamic thresholding approach during training. Second, we propose using a multiclass soft-set confusion matrix. Third, we introduce an annealing process that gradually aligns the surrogate loss with the target evaluation metric. Our theoretical analysis shows that EAST results in consistent estimators of the target evaluation metric. Furthermore, we show that the learned network parameters converge asymptotically to values that optimize for the target evaluation metric. Extensive experiments validate the effectiveness of our approach, demonstrating improved alignment between training objectives and evaluation metrics, while outperforming existing methods across many datasets.
△ Less
Submitted 26 May, 2025; v1 submitted 31 May, 2024;
originally announced May 2024.
-
A Preferential Latent Space Model for Text Networks
Authors:
Maoyu Zhang,
Biao Cai,
Dong Li,
Xiaoyue Niu,
Jingfei Zhang
Abstract:
Network data enriched with textual information, referred to as text networks, arise in a wide range of applications, including email communications, scientific collaborations, and legal contracts. In such settings, both the structure of interactions (i.e., who connects with whom) and their content (i.e., what is communicated) are useful for understanding network relations. Traditional network anal…
▽ More
Network data enriched with textual information, referred to as text networks, arise in a wide range of applications, including email communications, scientific collaborations, and legal contracts. In such settings, both the structure of interactions (i.e., who connects with whom) and their content (i.e., what is communicated) are useful for understanding network relations. Traditional network analyses often focus only on the structure of the network and discard the rich textual information, resulting in an incomplete or inaccurate view of interactions. In this paper, we introduce a new modeling approach that incorporates texts into the analysis of networks using topic-aware text embedding, representing the text network as a generalized multi-layer network where each layer corresponds to a topic extracted from the data. We develop a new and flexible latent space network model that captures how node-topic preferences directly modulate edge formation, and establish identifiability conditions for the proposed model. We tackle model estimation with a projected gradient descent algorithm, and further discuss its theoretical properties. The efficacy of our proposed method is demonstrated through simulations and an analysis of an email network.
△ Less
Submitted 7 May, 2025; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Power-Enhanced Two-Sample Mean Tests for High-Dimensional Compositional Data with Application to Microbiome Data Analysis
Authors:
Danning Li,
Lingzhou Xue,
Haoyi Yang,
Xiufan Yu
Abstract:
Testing differences in mean vectors is a fundamental task in the analysis of high-dimensional compositional data. Existing methods may suffer from low power if the underlying signal pattern is in a situation that does not favor the deployed test. In this work, we develop two-sample power-enhanced mean tests for high-dimensional compositional data based on the combination of $p$-values, which integ…
▽ More
Testing differences in mean vectors is a fundamental task in the analysis of high-dimensional compositional data. Existing methods may suffer from low power if the underlying signal pattern is in a situation that does not favor the deployed test. In this work, we develop two-sample power-enhanced mean tests for high-dimensional compositional data based on the combination of $p$-values, which integrates strengths from two popular types of tests: the maximum-type test and the quadratic-type test. We provide rigorous theoretical guarantees on the proposed tests, showing accurate Type-I error rate control and enhanced testing power. Our method boosts the testing power towards a broader alternative space, which yields robust performance across a wide range of signal pattern settings. Our theory also contributes to the literature on power enhancement and Gaussian approximation for high-dimensional hypothesis testing. We demonstrate the performance of our method on both simulated data and real-world microbiome data, showing that our proposed approach improves the testing power substantially compared to existing methods.
△ Less
Submitted 7 March, 2025; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Percentage Coefficient (bp) -- Effect Size Analysis (Theory Paper 1)
Authors:
Xinshu Zhao,
Dianshi Moses Li,
Ze Zack Lai,
Piper Liping Liu,
Song Harris Ao,
Fei You
Abstract:
Percentage coefficient (bp) has emerged in recent publications as an additional and alternative estimator of effect size for regression analysis. This paper retraces the theory behind the estimator. It's posited that an estimator must first serve the fundamental function of enabling researchers and readers to comprehend an estimand, the target of estimation. It may then serve the instrumental func…
▽ More
Percentage coefficient (bp) has emerged in recent publications as an additional and alternative estimator of effect size for regression analysis. This paper retraces the theory behind the estimator. It's posited that an estimator must first serve the fundamental function of enabling researchers and readers to comprehend an estimand, the target of estimation. It may then serve the instrumental function of enabling researchers and readers to compare two or more estimands. Defined as the regression coefficient when dependent variable (DV) and independent variable (IV) are both on conceptual 0-1 percentage scales, percentage coefficients (bp) feature 1) clearly comprehendible interpretation and 2) equitable scales for comparison. The coefficient (bp) serves the two functions effectively and efficiently. It thus serves needs unserved by other indicators, such as raw coefficient (bw) and standardized beta.
Another premise of the functionalist theory is that "effect" is not a monolithic concept. Rather, it is a collection of concepts, each of which measures a component of the conglomerate called "effect", thereby serving a subfunction. Regression coefficient (b), for example, indicates the unit change in DV associated with a one-unit increase in IV, thereby measuring one aspect called unit effect, aka efficiency. Percentage coefficient (bp) indicates the percentage change in DV associated with a whole scale increase in IV. It is not meant to be an all-encompassing indicator of an all-encompassing concept, but rather a comprehendible and comparable indicator of efficiency, a key aspect of effect.
△ Less
Submitted 6 May, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Subscedastic weighted least squares estimates
Authors:
Jordan Bryan,
Haibo Zhou,
Didong Li
Abstract:
In the heteroscedastic linear model, the weighted least squares (WLS) estimate of the model coefficients is more efficient than the ordinary least squares (OLS) esti- mate. However, the practical application of WLS is challenging because it requires knowledge of the error variances. Feasible weighted least squares (FLS) estimates, which use approximations of the variances when they are unknown, ma…
▽ More
In the heteroscedastic linear model, the weighted least squares (WLS) estimate of the model coefficients is more efficient than the ordinary least squares (OLS) esti- mate. However, the practical application of WLS is challenging because it requires knowledge of the error variances. Feasible weighted least squares (FLS) estimates, which use approximations of the variances when they are unknown, may either be more or less efficient than the OLS estimate depending on the quality of the approx- imation. A direct comparison between FLS and OLS has significant implications for the application of regression analysis in varied fields, yet such a comparison remains an unresolved challenge. In this study, we address this challenge by identifying the conditions under which FLS estimates using fixed weights demonstrate greater effi- ciency than the OLS estimate. These conditions provide guidance for the design of feasible estimates using random weights. They also shed light on how certain robust regression estimates behave with respect to the linear model with normal errors of unequal variance.
△ Less
Submitted 27 May, 2025; v1 submitted 31 March, 2024;
originally announced April 2024.
-
Bayesian Optimization Sequential Surrogate (BOSS) Algorithm: Fast Bayesian Inference for a Broad Class of Bayesian Hierarchical Models
Authors:
Dayi Li,
Ziang Zhang
Abstract:
Approximate Bayesian inference based on Laplace approximation and quadrature methods have become increasingly popular for their efficiency at fitting latent Gaussian models (LGM), which encompass popular models such as Bayesian generalized linear models, survival models, and spatio-temporal models. However, many useful models fall under the LGM framework only if some conditioning parameters are fi…
▽ More
Approximate Bayesian inference based on Laplace approximation and quadrature methods have become increasingly popular for their efficiency at fitting latent Gaussian models (LGM), which encompass popular models such as Bayesian generalized linear models, survival models, and spatio-temporal models. However, many useful models fall under the LGM framework only if some conditioning parameters are fixed, as the design matrix would vary with these parameters otherwise. Such models are termed the conditional LGMs with examples in change-point detection, non-linear regression, etc. Existing methods for fitting conditional LGMs rely on grid search or Markov-chain Monte Carlo (MCMC); both require a large number of evaluations of the unnormalized posterior density of the conditioning parameters. As each evaluation of the density requires fitting a separate LGM, these methods become computationally prohibitive beyond simple scenarios. In this work, we introduce the Bayesian optimization sequential surrogate (BOSS) algorithm, which combines Bayesian optimization with approximate Bayesian inference methods to significantly reduce the computational resources required for fitting conditional LGMs. With orders of magnitude fewer evaluations compared to grid or MCMC methods, Bayesian optimization provides us with sequential design points that capture the majority of the posterior mass of the conditioning parameters, which subsequently yields an accurate surrogate posterior distribution that can be easily normalized. We illustrate the efficiency, accuracy, and practical utility of the proposed method through extensive simulation studies and real-world applications in epidemiology, environmental sciences, and astrophysics.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Estimating Factor-Based Spot Volatility Matrices with Noisy and Asynchronous High-Frequency Data
Authors:
Degui Li,
Oliver Linton,
Haoxuan Zhang
Abstract:
We propose a new estimator of high-dimensional spot volatility matrices satisfying a low-rank plus sparse structure from noisy and asynchronous high-frequency data collected for an ultra-large number of assets. The noise processes are allowed to be temporally correlated, heteroskedastic, asymptotically vanishing and dependent on the efficient prices. We define a kernel-weighted pre-averaging metho…
▽ More
We propose a new estimator of high-dimensional spot volatility matrices satisfying a low-rank plus sparse structure from noisy and asynchronous high-frequency data collected for an ultra-large number of assets. The noise processes are allowed to be temporally correlated, heteroskedastic, asymptotically vanishing and dependent on the efficient prices. We define a kernel-weighted pre-averaging method to jointly tackle the microstructure noise and asynchronicity issues, and we obtain uniformly consistent estimates for latent prices. We impose a continuous-time factor model with time-varying factor loadings on the price processes, and estimate the common factors and loadings via a local principal component analysis. Assuming a uniform sparsity condition on the idiosyncratic volatility structure, we combine the POET and kernel-smoothing techniques to estimate the spot volatility matrices for both the latent prices and idiosyncratic errors. Under some mild restrictions, the estimated spot volatility matrices are shown to be uniformly consistent under various matrix norms. We provide Monte-Carlo simulation and empirical studies to examine the numerical performance of the developed estimation methodology.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Multi-class Temporal Logic Neural Networks
Authors:
Danyang Li,
Roberto Tron
Abstract:
Time-series data can represent the behaviors of autonomous systems, such as drones and self-driving cars. The task of binary and multi-class classification for time-series data has become a prominent area of research. Neural networks represent a popular approach to classifying data; However, they lack interpretability, which poses a significant challenge in extracting meaningful information from t…
▽ More
Time-series data can represent the behaviors of autonomous systems, such as drones and self-driving cars. The task of binary and multi-class classification for time-series data has become a prominent area of research. Neural networks represent a popular approach to classifying data; However, they lack interpretability, which poses a significant challenge in extracting meaningful information from them. Signal Temporal Logic (STL) is a formalism that describes the properties of timed behaviors. We propose a method that combines all of the above: neural networks that represent STL specifications for multi-class classification of time-series data. We offer two key contributions: 1) We introduce a notion of margin for multi-class classification, and 2) we introduce STL-based attributes for enhancing the interpretability of the results. We evaluate our method on two datasets and compare it with state-of-the-art baselines.
△ Less
Submitted 24 June, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Lower Ricci Curvature for Efficient Community Detection
Authors:
Yun Jin Park,
Didong Li
Abstract:
This study introduces the Lower Ricci Curvature (LRC), a novel, scalable, and scale-free discrete curvature designed to enhance community detection in networks. Addressing the computational challenges posed by existing curvature-based methods, LRC offers a streamlined approach with linear computational complexity, making it well-suited for large-scale network analysis. We further develop an LRC-ba…
▽ More
This study introduces the Lower Ricci Curvature (LRC), a novel, scalable, and scale-free discrete curvature designed to enhance community detection in networks. Addressing the computational challenges posed by existing curvature-based methods, LRC offers a streamlined approach with linear computational complexity, making it well-suited for large-scale network analysis. We further develop an LRC-based preprocessing method that effectively augments popular community detection algorithms. Through comprehensive simulations and applications on real-world datasets, including the NCAA football league network, the DBLP collaboration network, the Amazon product co-purchasing network, and the YouTube social network, we demonstrate the efficacy of our method in significantly improving the performance of various community detection algorithms.
△ Less
Submitted 27 January, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Gaussian Processes for Time Series with Lead-Lag Effects with applications to biology data
Authors:
Wancen Mu,
Jiawen Chen,
Eric S. Davis,
Kathleen Reed,
Douglas Phanstiel,
Michael I. Love,
Didong Li
Abstract:
Investigating the relationship, particularly the lead-lag effect, between time series is a common question across various disciplines, especially when uncovering biological process. However, analyzing time series presents several challenges. Firstly, due to technical reasons, the time points at which observations are made are not at uniform inintervals. Secondly, some lead-lag effects are transien…
▽ More
Investigating the relationship, particularly the lead-lag effect, between time series is a common question across various disciplines, especially when uncovering biological process. However, analyzing time series presents several challenges. Firstly, due to technical reasons, the time points at which observations are made are not at uniform inintervals. Secondly, some lead-lag effects are transient, necessitating time-lag estimation based on a limited number of time points. Thirdly, external factors also impact these time series, requiring a similarity metric to assess the lead-lag relationship. To counter these issues, we introduce a model grounded in the Gaussian process, affording the flexibility to estimate lead-lag effects for irregular time series. In addition, our method outputs dissimilarity scores, thereby broadening its applications to include tasks such as ranking or clustering multiple pair-wise time series when considering their strength of lead-lag effects with external factors. Crucially, we offer a series of theoretical proofs to substantiate the validity of our proposed kernels and the identifiability of kernel parameters. Our model demonstrates advances in various simulations and real-world applications, particularly in the study of dynamic chromatin interactions, compared to other leading methods.
△ Less
Submitted 25 September, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
Covariance Function Estimation for High-Dimensional Functional Time Series with Dual Factor Structures
Authors:
Chenlei Leng,
Degui Li,
Hanlin Shang,
Yingcun Xia
Abstract:
We propose a flexible dual functional factor model for modelling high-dimensional functional time series. In this model, a high-dimensional fully functional factor parametrisation is imposed on the observed functional processes, whereas a low-dimensional version (via series approximation) is assumed for the latent functional factors. We extend the classic principal component analysis technique for…
▽ More
We propose a flexible dual functional factor model for modelling high-dimensional functional time series. In this model, a high-dimensional fully functional factor parametrisation is imposed on the observed functional processes, whereas a low-dimensional version (via series approximation) is assumed for the latent functional factors. We extend the classic principal component analysis technique for the estimation of a low-rank structure to the estimation of a large covariance matrix of random functions that satisfies a notion of (approximate) functional "low-rank plus sparse" structure; and generalise the matrix shrinkage method to functional shrinkage in order to estimate the sparse structure of functional idiosyncratic components. Under appropriate regularity conditions, we derive the large sample theory of the developed estimators, including the consistency of the estimated factors and functional factor loadings and the convergence rates of the estimated matrices of covariance functions measured by various (functional) matrix norms. Consistent selection of the number of factors and a data-driven rule to choose the shrinkage parameter are discussed. Simulation and empirical studies are provided to demonstrate the finite-sample performance of the developed model and estimation methodology.
△ Less
Submitted 12 January, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Contrastive linear regression
Authors:
Boyang Zhang,
Sarah Nyquist,
Andrew Jones,
Barbara E. Engelhardt,
Didong Li
Abstract:
Contrastive dimension reduction methods have been developed for case-control study data to identify variation that is enriched in the foreground (case) data X relative to the background (control) data Y. Here, we develop contrastive regression for the setting when there is a response variable r associated with each foreground observation. This situation occurs frequently when, for example, the una…
▽ More
Contrastive dimension reduction methods have been developed for case-control study data to identify variation that is enriched in the foreground (case) data X relative to the background (control) data Y. Here, we develop contrastive regression for the setting when there is a response variable r associated with each foreground observation. This situation occurs frequently when, for example, the unaffected controls do not have a disease grade or intervention dosage but the affected cases have a disease grade or intervention dosage, as in autism severity, solid tumors stages, polyp sizes, or warfarin dosages. Our contrastive regression model captures shared low-dimensional variation between the predictors in the cases and control groups, and then explains the case-specific response variables through the variance that remains in the predictors after shared variation is removed. We show that, in one single-nucleus RNA sequencing dataset on autism severity in postmortem brain samples from donors with and without autism and in another single-cell RNA sequencing dataset on cellular differentiation in chronic rhinosinusitis with and without nasal polyps, our contrastive linear regression performs feature ranking and identifies biologically-informative predictors associated with response that cannot be identified using other approaches
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Is Machine Learning Unsafe and Irresponsible in Social Sciences? Paradoxes and Reconsidering from Recidivism Prediction Tasks
Authors:
Jianhong Liu,
Dianshi Li
Abstract:
The paper addresses some fundamental and hotly debated issues for high-stakes event predictions underpinning the computational approach to social sciences. We question several prevalent views against machine learning and outline a new paradigm that highlights the promises and promotes the infusion of computational methods and conventional social science approaches.
The paper addresses some fundamental and hotly debated issues for high-stakes event predictions underpinning the computational approach to social sciences. We question several prevalent views against machine learning and outline a new paradigm that highlights the promises and promotes the infusion of computational methods and conventional social science approaches.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Nonparametric Screening for Additive Quantile Regression in Ultra-high Dimension
Authors:
Daoji Li,
Yinfei Kong,
Dawit Zerom
Abstract:
In practical applications, one often does not know the "true" structure of the underlying conditional quantile function, especially in the ultra-high dimensional setting. To deal with ultra-high dimensionality, quantile-adaptive marginal nonparametric screening methods have been recently developed. However, these approaches may miss important covariates that are marginally independent of the respo…
▽ More
In practical applications, one often does not know the "true" structure of the underlying conditional quantile function, especially in the ultra-high dimensional setting. To deal with ultra-high dimensionality, quantile-adaptive marginal nonparametric screening methods have been recently developed. However, these approaches may miss important covariates that are marginally independent of the response, or may select unimportant covariates due to their high correlations with important covariates. To mitigate such shortcomings, we develop a conditional nonparametric quantile screening procedure (complemented by subsequent selection) for nonparametric additive quantile regression models. Under some mild conditions, we show that the proposed screening method can identify all relevant covariates in a small number of steps with probability approaching one. The subsequent narrowed best subset (via a modified Bayesian information criterion) also contains all the relevant covariates with overwhelming probability. The advantages of our proposed procedure are demonstrated through simulation studies and a real data example.
△ Less
Submitted 24 April, 2024; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Factor-guided estimation of large covariance matrix function with conditional functional sparsity
Authors:
Dong Li,
Xinghao Qiao,
Zihan Wang
Abstract:
This paper addresses the fundamental task of estimating covariance matrix functions for high-dimensional functional data/functional time series. We consider two functional factor structures encompassing either functional factors with scalar loadings or scalar factors with functional loadings, and postulate functional sparsity on the covariance of idiosyncratic errors after taking out the common un…
▽ More
This paper addresses the fundamental task of estimating covariance matrix functions for high-dimensional functional data/functional time series. We consider two functional factor structures encompassing either functional factors with scalar loadings or scalar factors with functional loadings, and postulate functional sparsity on the covariance of idiosyncratic errors after taking out the common unobserved factors. To facilitate estimation, we rely on the spiked matrix model and its functional generalization, and derive some novel asymptotic identifiability results, based on which we develop DIGIT and FPOET estimators under two functional factor models, respectively. Both estimators involve performing associated eigenanalysis to estimate the covariance of common components, followed by adaptive functional thresholding applied to the residual covariance. We also develop functional information criteria for the purpose of model selection. The convergence rates of estimated factors, loadings, and conditional sparse covariance matrix functions under various functional matrix norms, are respectively established for DIGIT and FPOET estimators. Numerical studies including extensive simulations and two real data applications on mortality rates and functional portfolio allocation are conducted to examine the finite-sample performance of the proposed methodology.
△ Less
Submitted 4 November, 2023;
originally announced November 2023.
-
On the Identifiability and Interpretability of Gaussian Process Models
Authors:
Jiawen Chen,
Wancen Mu,
Yun Li,
Didong Li
Abstract:
In this paper, we critically examine the prevalent practice of using additive mixtures of Matérn kernels in single-output Gaussian process (GP) models and explore the properties of multiplicative mixtures of Matérn kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Matérn kernels is determined by the…
▽ More
In this paper, we critically examine the prevalent practice of using additive mixtures of Matérn kernels in single-output Gaussian process (GP) models and explore the properties of multiplicative mixtures of Matérn kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Matérn kernels is determined by the least smooth component and that a GP with such a kernel is effectively equivalent to the least smooth kernel component. Furthermore, we demonstrate that none of the mixing weights or parameters within individual kernel components are identifiable. We then turn our attention to multi-output GP models and analyze the identifiability of the covariance matrix $A$ in the multiplicative kernel $K(x,y) = AK_0(x,y)$, where $K_0$ is a standard single output kernel such as Matérn. We show that $A$ is identifiable up to a multiplicative constant, suggesting that multiplicative mixtures are well suited for multi-output tasks. Our findings are supported by extensive simulations and real applications for both single- and multi-output settings. This work provides insight into kernel selection and interpretation for GP models, emphasizing the importance of choosing appropriate kernel structures for different tasks.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Tail Gini Functional under Asymptotic Independence
Authors:
Zhaowen Wang,
Liujun Chen,
Deyuan Li
Abstract:
Tail Gini functional is a measure of tail risk variability for systemic risks, and has many applications in banking, finance and insurance. Meanwhile, there is growing attention on aymptotic independent pairs in quantitative risk management. This paper addresses the estimation of the tail Gini functional under asymptotic independence. We first estimate the tail Gini functional at an intermediate l…
▽ More
Tail Gini functional is a measure of tail risk variability for systemic risks, and has many applications in banking, finance and insurance. Meanwhile, there is growing attention on aymptotic independent pairs in quantitative risk management. This paper addresses the estimation of the tail Gini functional under asymptotic independence. We first estimate the tail Gini functional at an intermediate level and then extrapolate it to the extreme tails. The asymptotic normalities of both the intermediate and extreme estimators are established. The simulation study shows that our estimator performs comparatively well in view of both bias and variance. The application to measure the tail variability of weekly loss of individual stocks given the occurence of extreme events in the market index in Hong Kong Stock Exchange provides meaningful results, and leads to new insights in risk management.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Studying Large Language Model Generalization with Influence Functions
Authors:
Roger Grosse,
Juhan Bae,
Cem Anil,
Nelson Elhage,
Alex Tamkin,
Amirhossein Tajdini,
Benoit Steiner,
Dustin Li,
Esin Durmus,
Ethan Perez,
Evan Hubinger,
Kamilė Lukošiūtė,
Karina Nguyen,
Nicholas Joseph,
Sam McCandlish,
Jared Kaplan,
Samuel R. Bowman
Abstract:
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set?…
▽ More
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
CoxKnockoff: Controlled Feature Selection for the Cox Model Using Knockoffs
Authors:
Daoji Li,
Jinzhao Yu,
Hui Zhao
Abstract:
Although there is a huge literature on feature selection for the Cox model, none of the existing approaches can control the false discovery rate (FDR) unless the sample size tends to infinity. In addition, there is no formal power analysis of the knockoffs framework for survival data in the literature. To address those issues, in this paper, we propose a novel controlled feature selection approach…
▽ More
Although there is a huge literature on feature selection for the Cox model, none of the existing approaches can control the false discovery rate (FDR) unless the sample size tends to infinity. In addition, there is no formal power analysis of the knockoffs framework for survival data in the literature. To address those issues, in this paper, we propose a novel controlled feature selection approach using knockoffs for the Cox model. We establish that the proposed method enjoys the FDR control in finite samples regardless of the number of covariates. Moreover, under mild regularity conditions, we also show that the power of our method is asymptotically one as sample size tends to infinity. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure in the survival setting. Simulation studies confirm that our method has appealing finite-sample performance with desired FDR control and high power. We further demonstrate the performance of our method through a real data example.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Label Calibration for Semantic Segmentation Under Domain Shift
Authors:
Ondrej Bohdal,
Da Li,
Timothy Hospedales
Abstract:
Performance of a pre-trained semantic segmentation model is likely to substantially decrease on data from a new domain. We show a pre-trained model can be adapted to unlabelled target domain data by calculating soft-label prototypes under the domain shift and making predictions according to the prototype closest to the vector with predicted class probabilities. The proposed adaptation procedure is…
▽ More
Performance of a pre-trained semantic segmentation model is likely to substantially decrease on data from a new domain. We show a pre-trained model can be adapted to unlabelled target domain data by calculating soft-label prototypes under the domain shift and making predictions according to the prototype closest to the vector with predicted class probabilities. The proposed adaptation procedure is fast, comes almost for free in terms of computational resources and leads to considerable performance improvements. We demonstrate the benefits of such label calibration on the highly-practical synthetic-to-real semantic segmentation problem.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Feed-Forward Source-Free Domain Adaptation via Class Prototypes
Authors:
Ondrej Bohdal,
Da Li,
Timothy Hospedales
Abstract:
Source-free domain adaptation has become popular because of its practical usefulness and no need to access source data. However, the adaptation process still takes a considerable amount of time and is predominantly based on optimization that relies on back-propagation. In this work we present a simple feed-forward approach that challenges the need for back-propagation based adaptation. Our approac…
▽ More
Source-free domain adaptation has become popular because of its practical usefulness and no need to access source data. However, the adaptation process still takes a considerable amount of time and is predominantly based on optimization that relies on back-propagation. In this work we present a simple feed-forward approach that challenges the need for back-propagation based adaptation. Our approach is based on computing prototypes of classes under the domain shift using a pre-trained model. It achieves strong improvements in accuracy compared to the pre-trained model and requires only a small fraction of time of existing domain adaptation methods.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Nonparametric Estimation of Large Spot Volatility Matrices for High-Frequency Financial Data
Authors:
Ruijun Bu,
Degui Li,
Oliver Linton,
Hanchao Wang
Abstract:
In this paper, we consider estimating spot/instantaneous volatility matrices of high-frequency data collected for a large number of assets. We first combine classic nonparametric kernel-based smoothing with a generalised shrinkage technique in the matrix estimation for noise-free data under a uniform sparsity assumption, a natural extension of the approximate sparsity commonly used in the literatu…
▽ More
In this paper, we consider estimating spot/instantaneous volatility matrices of high-frequency data collected for a large number of assets. We first combine classic nonparametric kernel-based smoothing with a generalised shrinkage technique in the matrix estimation for noise-free data under a uniform sparsity assumption, a natural extension of the approximate sparsity commonly used in the literature. The uniform consistency property is derived for the proposed spot volatility matrix estimator with convergence rates comparable to the optimal minimax one. For the high-frequency data contaminated by microstructure noise, we introduce a localised pre-averaging estimation method that reduces the effective magnitude of the noise. We then use the estimation tool developed in the noise-free scenario, and derive the uniform convergence rates for the developed spot volatility matrix estimator. We further combine the kernel smoothing with the shrinkage technique to estimate the time-varying volatility matrix of the high-dimensional noise vector. In addition, we consider large spot volatility matrix estimation in time-varying factor models with observable risk factors and derive the uniform convergence property. We provide numerical studies including simulation and empirical application to examine the performance of the proposed estimation methods in finite samples.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach
Authors:
Hongyang R. Zhang,
Dongyue Li,
Haotian Ju
Abstract:
The training of over-parameterized neural networks has received much study in recent literature. An important consideration is the regularization of over-parameterized networks due to their highly nonconvex and nonlinear geometry. In this paper, we study noise injection algorithms, which can regularize the Hessian of the loss, leading to regions with flat loss surfaces. Specifically, by injecting…
▽ More
The training of over-parameterized neural networks has received much study in recent literature. An important consideration is the regularization of over-parameterized networks due to their highly nonconvex and nonlinear geometry. In this paper, we study noise injection algorithms, which can regularize the Hessian of the loss, leading to regions with flat loss surfaces. Specifically, by injecting isotropic Gaussian noise into the weight matrices of a neural network, we can obtain an approximately unbiased estimate of the trace of the Hessian. However, naively implementing the noise injection via adding noise to the weight matrices before backpropagation presents limited empirical improvements. To address this limitation, we design a two-point estimate of the Hessian penalty, which injects noise into the weight matrices along both positive and negative directions of the random noise. In particular, this two-point estimate eliminates the variance of the first-order Taylor's expansion term on the Hessian. We show a PAC-Bayes generalization bound that depends on the trace of the Hessian (and the radius of the weight space), which can be measured from data.
We conduct a detailed experimental study to validate our approach and show that it can effectively regularize the Hessian and improve generalization. First, our algorithm can outperform prior approaches on sharpness-reduced training, delivering up to a 2.4% test accuracy increase for fine-tuning ResNets on six image classification datasets. Moreover, the trace of the Hessian reduces by 15.8%, and the largest eigenvalue is reduced by 9.7% with our approach. We also find that the regularization of the Hessian can be combined with weight decay and data augmentation, leading to stronger regularization. Second, our approach remains effective for improving generalization in pretraining multimodal CLIP models and chain-of-thought fine-tuning.
△ Less
Submitted 23 September, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.