Skip to main content

Showing 1–50 of 191 results for author: Li, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.15644  [pdf, ps, other

    astro-ph.GA stat.AP

    Candidate Dark Galaxy-2: Validation and Analysis of an Almost Dark Galaxy in the Perseus Cluster

    Authors: Dayi Li, Qing Liu, Gwendolyn Eadie, Roberto Abraham, Francine Marleau, William Harris, Pieter van Dokkum, Aaron Romanowsky, Shany Danieli, Patrick Brown, Alex Stringer

    Abstract: Candidate Dark Galaxy-2 (CDG-2) is a potential dark galaxy consisting of four globular clusters (GCs) in the Perseus cluster, first identified in Li et al. (2025) through a sophisticated statistical method. The method searched for over-densities of GCs from a \textit{Hubble Space Telescope} (\textit{HST}) survey targeting Perseus. Using the same \textit{HST} images and the new imaging data from th… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 14 pages, 6 figures, 1 table. Published in ApJL

    Journal ref: The Astrophysical Journal Letters, 986 (2), L18 (2025)

  2. arXiv:2506.07224  [pdf, ps, other

    stat.ME math.ST stat.ML

    Strongly Consistent Community Detection in Popularity Adjusted Block Models

    Authors: Quan Yuan, Binghui Liu, Danning Li, Lingzhou Xue

    Abstract: The Popularity Adjusted Block Model (PABM) provides a flexible framework for community detection in network data by allowing heterogeneous node popularity across communities. However, this flexibility increases model complexity and raises key unresolved challenges, particularly in effectively adapting spectral clustering techniques and efficiently achieving strong consistency in label recovery. To… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 11 figures

  3. arXiv:2506.03943  [pdf, ps, other

    cs.LG stat.ML

    Lower Ricci Curvature for Hypergraphs

    Authors: Shiyi Yang, Can Chen, Didong Li

    Abstract: Networks with higher-order interactions, prevalent in biological, social, and information systems, are naturally represented as hypergraphs, yet their structural complexity poses fundamental challenges for geometric characterization. While curvature-based methods offer powerful insights in graph analysis, existing extensions to hypergraphs suffer from critical trade-offs: combinatorial approaches… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  4. arXiv:2505.14725  [pdf, ps, other

    q-bio.GN cs.LG stat.AP

    HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity

    Authors: Xuejun Sun, Yiran Song, Xiaochen Zhou, Ruilie Cai, Yu Zhang, Xinyi Li, Rui Peng, Jialiu Xie, Yuanyuan Yan, Muyao Tang, Prem Lakshmanane, Baiming Zou, James S. Hagood, Raymond J. Pickles, Didong Li, Fei Zou, Xiaojing Zheng

    Abstract: Respiratory viral infections pose a global health burden, yet the cellular immune responses driving protection or pathology remain unclear. Natural infection cohorts often lack pre-exposure baseline data and structured temporal sampling. In contrast, inoculation and vaccination trials generate insightful longitudinal transcriptomic data. However, the scattering of these datasets across platforms,… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  5. arXiv:2505.13768  [pdf, ps, other

    cs.LG stat.ML

    Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis

    Authors: Ruiquan Huang, Donghao Li, Chengshuai Shi, Cong Shen, Jing Yang

    Abstract: This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-o… ▽ More

    Submitted 27 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by UAI2025

  6. arXiv:2504.03502  [pdf, other

    stat.AP

    Target Prediction Under Deceptive Switching Strategies via Outlier-Robust Filtering of Partially Observed Incomplete Trajectories

    Authors: Yiming Meng, Dongchang Li, Melkior Ornik

    Abstract: Motivated by a study on deception and counter-deception, this paper addresses the problem of identifying an agent's target as it seeks to reach one of two targets in a given environment. In practice, an agent may initially follow a strategy to aim at one target but decide to switch to another midway. Such a strategy can be deceptive when the counterpart only has access to imperfect observations, w… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  7. arXiv:2504.00820  [pdf, other

    cs.LG math.DG stat.ML

    Deep Generative Models: Complexity, Dimensionality, and Approximation

    Authors: Kevin Wang, Hongqian Niu, Yixin Wang, Didong Li

    Abstract: Generative networks have shown remarkable success in learning complex data distributions, particularly in generating high-dimensional data from lower-dimensional inputs. While this capability is well-documented empirically, its theoretical underpinning remains unclear. One common theoretical explanation appeals to the widely accepted manifold hypothesis, which suggests that many real-world dataset… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  8. arXiv:2503.08655  [pdf, other

    stat.ME econ.EM

    On a new robust method of inference for general time series models

    Authors: Zihan Wang, Xinghao Qiao, Dong Li, Howell Tong

    Abstract: In this article, we propose a novel logistic quasi-maximum likelihood estimation (LQMLE) for general parametric time series models. Compared to the classical Gaussian QMLE and existing robust estimations, it enjoys many distinctive advantages, such as robustness in respect of distributional misspecification and heavy-tailedness of the innovation, more resiliency to outliers, smoothness and strict… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  9. arXiv:2502.15310  [pdf, other

    stat.ME

    Max-Linear Tail Regression

    Authors: Liujun Chen, Deyuan Li, Zhengjun Zhang

    Abstract: The relationship between a response variable and its covariates can vary significantly, especially in scenarios where covariates take on extremely high or low values. This paper introduces a max-linear tail regression model specifically designed to capture such extreme relationships. To estimate the regression coefficients within this framework, we propose a novel M-estimator based on extreme valu… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  10. arXiv:2502.06117  [pdf, other

    cs.LG cs.AI stat.ML

    Revisiting Dynamic Graph Clustering via Matrix Factorization

    Authors: Dongyuan Li, Satoshi Kosugi, Ying Zhang, Manabu Okumura, Feng Xia, Renhe Jiang

    Abstract: Dynamic graph clustering aims to detect and track time-varying clusters in dynamic graphs, revealing the evolutionary mechanisms of complex real-world dynamic systems. Matrix factorization-based methods are promising approaches for this task; however, these methods often struggle with scalability and can be time-consuming when applied to large-scale dynamic graphs. Moreover, they tend to lack robu… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted by TheWebConf 2025 (Oral)

  11. arXiv:2502.03414  [pdf, other

    stat.ME

    Estimating causal effects using difference-in-differences under network dependency and interference

    Authors: Michael Jetsupphasuk, Didong Li, Michael G. Hudgens

    Abstract: Differences-in-differences (DiD) is a causal inference method for observational longitudinal data that assumes parallel expected outcome trajectories between treatment groups under the (possible) counterfactual of receiving a specific treatment. In this paper DiD is extended to allow for (i) network dependency where outcomes, treatments, and covariates may exhibit between-unit latent correlation,… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  12. arXiv:2501.11323  [pdf

    cs.LG eess.SP physics.app-ph stat.ML

    Physics-Informed Machine Learning for Efficient Reconfigurable Intelligent Surface Design

    Authors: Zhen Zhang, Jun Hui Qiu, Jun Wei Zhang, Hui Dong Li, Dong Tang, Qiang Cheng, Wei Lin

    Abstract: Reconfigurable intelligent surface (RIS) is a two-dimensional periodic structure integrated with a large number of reflective elements, which can manipulate electromagnetic waves in a digital way, offering great potentials for wireless communication and radar detection applications. However, conventional RIS designs highly rely on extensive full-wave EM simulations that are extremely time-consumin… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  13. arXiv:2501.10942  [pdf, other

    stat.ME

    Large covariance matrix estimation with factor-assisted variable clustering

    Authors: Dong Li, Xinghao Qiao, Cheng Yu

    Abstract: This paper studies the covariance matrix estimation for high-dimensional time series within a new framework that combines low-rank factor and latent variable-specific cluster structures. The popular methods based on assuming the sparse error covariance matrix after taking out common factors may be invalid for many financial applications. Our formulation postulates a latent model-based error cluste… ▽ More

    Submitted 24 February, 2025; v1 submitted 18 January, 2025; originally announced January 2025.

    Comments: We have corrected some inaccurate descriptions

  14. arXiv:2501.08093  [pdf, ps, other

    stat.ME

    A note on local parameter orthogonality for multivariate data and the Whittle algorithm for multivariate autoregressive models

    Authors: Changle Shen, Dong Li, Howell Tong

    Abstract: This article extends the Cox--Reid local parameter orthogonality to a multivariate setting, gives an affirmative reply to one of Cox and Reid's questions, and shows that the extension can lead to efficient computational algorithms with the celebrated Whittle algorithm for multivariate autoregressive modeling as a showcase.

    Submitted 21 January, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

  15. arXiv:2412.07987  [pdf, other

    stat.ME math.ST stat.ML

    Hypothesis Testing for High-Dimensional Matrix-Valued Data

    Authors: Shijie Cui, Danning Li, Runze Li, Lingzhou Xue

    Abstract: This paper addresses hypothesis testing for the mean of matrix-valued data in high-dimensional settings. We investigate the minimum discrepancy test, originally proposed by Cragg (1997), which serves as a rank test for lower-dimensional matrices. We evaluate the performance of this test as the matrix dimensions increase proportionally with the sample size, and identify its limitations when matrix… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  16. arXiv:2411.13822  [pdf, other

    stat.ME

    High-Dimensional Extreme Quantile Regression

    Authors: Yiwei Tang, Judy Huixia Wang, Deyuan Li

    Abstract: The estimation of conditional quantiles at extreme tails is of great interest in numerous applications. Various methods that integrate regression analysis with an extrapolation strategy derived from extreme value theory have been proposed to estimate extreme conditional quantiles in scenarios with a fixed number of covariates. However, these methods prove ineffective in high-dimensional settings,… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

  17. arXiv:2411.03641  [pdf

    cs.LG stat.ME

    Constrained Multi-objective Bayesian Optimization through Optimistic Constraints Estimation

    Authors: Diantong Li, Fengxue Zhang, Chong Liu, Yuxin Chen

    Abstract: Multi-objective Bayesian optimization has been widely adopted in scientific experiment design, including drug discovery and hyperparameter optimization. In practice, regulatory or safety concerns often impose additional thresholds on certain attributes of the experimental outcomes. Previous work has primarily focused on constrained single-objective optimization tasks or active search under constra… ▽ More

    Submitted 21 April, 2025; v1 submitted 5 November, 2024; originally announced November 2024.

    Comments: This paper is accepted to AISTATS 2025

  18. arXiv:2410.12146  [pdf, other

    stat.AP astro-ph.IM

    K-Contact Distance for Noisy Nonhomogeneous Spatial Point Data with application to Repeating Fast Radio Burst sources

    Authors: A. M. Cook, Dayi Li, Gwendolyn M. Eadie, David C. Stenning, Paul Scholz, Derek Bingham, Radu Craiu, B. M. Gaensler, Kiyoshi W. Masui, Ziggy Pleunis, Antonio Herrera-Martin, Ronniy C. Joseph, Ayush Pandhi, Aaron B. Pearlman, J. Xavier Prochaska

    Abstract: This paper introduces an approach to analyze nonhomogeneous Poisson processes (NHPP) observed with noise, focusing on previously unstudied second-order characteristics of the noisy process. Utilizing a hierarchical Bayesian model with noisy data, we estimate hyperparameters governing a physically motivated NHPP intensity. Simulation studies demonstrate the reliability of this methodology in accura… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 24 pages, 8 figures, submitted to the Annals of Applied Statistics. Feedback/comments welcome

  19. arXiv:2410.08783  [pdf, other

    cs.LG cs.CY cs.HC stat.ML

    Integrating Expert Judgment and Algorithmic Decision Making: An Indistinguishability Framework

    Authors: Rohan Alur, Loren Laine, Darrick K. Li, Dennis Shung, Manish Raghavan, Devavrat Shah

    Abstract: We introduce a novel framework for human-AI collaboration in prediction and decision tasks. Our approach leverages human judgment to distinguish inputs which are algorithmically indistinguishable, or "look the same" to any feasible predictive algorithm. We argue that this framing clarifies the problem of human-AI collaboration in prediction and decision tasks, as experts often form judgments by dr… ▽ More

    Submitted 17 October, 2024; v1 submitted 11 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.00793

  20. arXiv:2410.00574  [pdf, other

    stat.ME math.ST

    Asymmetric GARCH modelling without moment conditions

    Authors: Yuxin Tao, Dong Li

    Abstract: There is a serious and long-standing restriction in the literature on heavy-tailed phenomena in that moment conditions, which are unrealistic, are almost always assumed in modelling such phenomena. Further, the issue of stability is often insufficiently addressed. To this end, we develop a comprehensive statistical inference for an asymmetric generalized autoregressive conditional heteroskedastici… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  21. arXiv:2409.15307  [pdf, other

    stat.CO physics.comp-ph

    An adaptive Gaussian process method for multi-modal Bayesian inverse problems

    Authors: Zhihang Xu, Xiaoyu Zhu, Daoji Li, Qifeng Liao

    Abstract: Inverse problems are prevalent in both scientific research and engineering applications. In the context of Bayesian inverse problems, sampling from the posterior distribution is particularly challenging when the forward models are computationally expensive. This challenge escalates further when the posterior distribution is multimodal. To address this, we propose a Gaussian process (GP) based meth… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  22. arXiv:2409.06091  [pdf, other

    cs.LG cs.AI cs.SI stat.ML

    Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity

    Authors: Dongyue Li, Aneesh Sharma, Hongyang R. Zhang

    Abstract: Multitask learning is a widely used paradigm for training models on diverse tasks, with applications ranging from graph neural networks to language model fine-tuning. Since tasks may interfere with each other, a key notion for modeling their relationships is task affinity. This includes pairwise task affinity, computed among pairs of tasks, and higher-order affinity, computed among subsets of task… ▽ More

    Submitted 20 November, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: 16 pages. Appeared in KDD 2024

  23. arXiv:2409.06040  [pdf, other

    astro-ph.GA stat.AP

    Discovery of Two Ultra-Diffuse Galaxies with Unusually Bright Globular Cluster Luminosity Functions via a Mark-Dependently Thinned Point Process (MATHPOP)

    Authors: Dayi Li, Gwendolyn Eadie, Patrick Brown, William Harris, Roberto Abraham, Pieter van Dokkum, Steven Janssens, Samantha Berek, Shany Danieli, Aaron Romanowsky, Joshua Speagle

    Abstract: We present \textsc{Mathpop}, a novel method to infer the globular cluster (GC) counts in ultra-diffuse galaxies (UDGs) and low-surface brightness galaxies (LSBGs). Many known UDGs have a surprisingly high ratio of GC number to surface brightness. However, standard methods to infer GC counts in UDGs face various challenges, such as photometric measurement uncertainties, GC membership uncertainties,… ▽ More

    Submitted 12 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: 8 figures, 5 tables; submitted to ApJ, comments are welcomed

    Journal ref: The Astrophysical Journal, 984(2), 147, 2025

  24. arXiv:2409.03801  [pdf, other

    stat.ML cs.LG

    Resultant: Incremental Effectiveness on Likelihood for Unsupervised Out-of-Distribution Detection

    Authors: Yewen Li, Chaojie Wang, Xiaobo Xia, Xu He, Ruyi An, Dong Li, Tongliang Liu, Bo An, Xinrun Wang

    Abstract: Unsupervised out-of-distribution (U-OOD) detection is to identify OOD data samples with a detector trained solely on unlabeled in-distribution (ID) data. The likelihood function estimated by a deep generative model (DGM) could be a natural detector, but its performance is limited in some popular "hard" benchmarks, such as FashionMNIST (ID) vs. MNIST (OOD). Recent studies have developed various det… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  25. arXiv:2408.13430  [pdf, other

    stat.AP cs.DL cs.GT cs.LG stat.ML

    The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

    Authors: Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan, Didong Li, Kyunghyun Cho, Jianqing Fan, Aaron Roth, Weijie Su

    Abstract: We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML), asking authors with multiple submissions to rank their papers based on perceived quality. In total, we received 1,342 rankings, each from a different author, covering 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be leverag… ▽ More

    Submitted 17 May, 2025; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: To appear in Journal of the American Statistical Association (JASA) as a Discussion Paper

  26. arXiv:2407.17804  [pdf, ps, other

    stat.ME

    Bayesian Spatiotemporal Wombling

    Authors: Aritra Halder, Didong Li, Sudipto Banerjee

    Abstract: Stochastic process models for spatiotemporal data underlying random fields find substantial utility in a range of scientific disciplines. Subsequent to predictive inference on the values of the random field (or spatial surface indexed continuously over time) at arbitrary space-time coordinates, scientific interest often turns to gleaning information regarding zones of rapid spatial-temporal change… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 198 pages

  27. arXiv:2407.10272  [pdf, other

    stat.ME

    Two-way Matrix Autoregressive Model with Thresholds

    Authors: Cheng Yu, Dong Li, Xinyu Zhang, Howell Tong

    Abstract: Recently, matrix-valued time series data have attracted significant attention in the literature with the recognition of threshold nonlinearity representing a significant advance. However, given the fact that a matrix is a two-array structure, it is unfortunate, perhaps even unusual, for the threshold literature to focus on using the same threshold variable for the rows and the columns. In fact, ev… ▽ More

    Submitted 21 January, 2025; v1 submitted 14 July, 2024; originally announced July 2024.

  28. arXiv:2405.20954  [pdf, ps, other

    cs.LG stat.ML

    Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics

    Authors: Deyuan Li, Taesoo Daniel Lee, Marynel Vázquez, Nathan Tsoi

    Abstract: Multiclass neural network classifiers are typically trained using cross-entropy loss but evaluated using metrics derived from the confusion matrix, such as Accuracy, $F_β$-Score, and Matthews Correlation Coefficient. This mismatch between the training objective and evaluation metric can lead to suboptimal performance, particularly when the user's priorities differ from what cross-entropy implicitl… ▽ More

    Submitted 26 May, 2025; v1 submitted 31 May, 2024; originally announced May 2024.

  29. arXiv:2405.15038  [pdf, other

    stat.ME

    A Preferential Latent Space Model for Text Networks

    Authors: Maoyu Zhang, Biao Cai, Dong Li, Xiaoyue Niu, Jingfei Zhang

    Abstract: Network data enriched with textual information, referred to as text networks, arise in a wide range of applications, including email communications, scientific collaborations, and legal contracts. In such settings, both the structure of interactions (i.e., who connects with whom) and their content (i.e., what is communicated) are useful for understanding network relations. Traditional network anal… ▽ More

    Submitted 7 May, 2025; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 31 pages

    MSC Class: G.3; F.2 ACM Class: G.3

  30. arXiv:2405.02551  [pdf, other

    stat.ME math.ST stat.AP

    Power-Enhanced Two-Sample Mean Tests for High-Dimensional Compositional Data with Application to Microbiome Data Analysis

    Authors: Danning Li, Lingzhou Xue, Haoyi Yang, Xiufan Yu

    Abstract: Testing differences in mean vectors is a fundamental task in the analysis of high-dimensional compositional data. Existing methods may suffer from low power if the underlying signal pattern is in a situation that does not favor the deployed test. In this work, we develop two-sample power-enhanced mean tests for high-dimensional compositional data based on the combination of $p$-values, which integ… ▽ More

    Submitted 7 March, 2025; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: 31 pages

  31. arXiv:2404.19495  [pdf

    stat.AP econ.EM stat.ME stat.OT

    Percentage Coefficient (bp) -- Effect Size Analysis (Theory Paper 1)

    Authors: Xinshu Zhao, Dianshi Moses Li, Ze Zack Lai, Piper Liping Liu, Song Harris Ao, Fei You

    Abstract: Percentage coefficient (bp) has emerged in recent publications as an additional and alternative estimator of effect size for regression analysis. This paper retraces the theory behind the estimator. It's posited that an estimator must first serve the fundamental function of enabling researchers and readers to comprehend an estimand, the target of estimation. It may then serve the instrumental func… ▽ More

    Submitted 6 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  32. arXiv:2404.00753  [pdf, other

    math.ST stat.ME

    Subscedastic weighted least squares estimates

    Authors: Jordan Bryan, Haibo Zhou, Didong Li

    Abstract: In the heteroscedastic linear model, the weighted least squares (WLS) estimate of the model coefficients is more efficient than the ordinary least squares (OLS) esti- mate. However, the practical application of WLS is challenging because it requires knowledge of the error variances. Feasible weighted least squares (FLS) estimates, which use approximations of the variances when they are unknown, ma… ▽ More

    Submitted 27 May, 2025; v1 submitted 31 March, 2024; originally announced April 2024.

  33. arXiv:2403.12250  [pdf, other

    stat.ME stat.AP stat.CO

    Bayesian Optimization Sequential Surrogate (BOSS) Algorithm: Fast Bayesian Inference for a Broad Class of Bayesian Hierarchical Models

    Authors: Dayi Li, Ziang Zhang

    Abstract: Approximate Bayesian inference based on Laplace approximation and quadrature methods have become increasingly popular for their efficiency at fitting latent Gaussian models (LGM), which encompass popular models such as Bayesian generalized linear models, survival models, and spatio-temporal models. However, many useful models fall under the LGM framework only if some conditioning parameters are fi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: The authors contributed equally to this work. The names are listed alphabetically

  34. arXiv:2403.06246  [pdf, other

    econ.EM stat.ME

    Estimating Factor-Based Spot Volatility Matrices with Noisy and Asynchronous High-Frequency Data

    Authors: Degui Li, Oliver Linton, Haoxuan Zhang

    Abstract: We propose a new estimator of high-dimensional spot volatility matrices satisfying a low-rank plus sparse structure from noisy and asynchronous high-frequency data collected for an ultra-large number of assets. The noise processes are allowed to be temporally correlated, heteroskedastic, asymptotically vanishing and dependent on the efficient prices. We define a kernel-weighted pre-averaging metho… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  35. arXiv:2402.12397  [pdf, other

    stat.ML cs.LG

    Multi-class Temporal Logic Neural Networks

    Authors: Danyang Li, Roberto Tron

    Abstract: Time-series data can represent the behaviors of autonomous systems, such as drones and self-driving cars. The task of binary and multi-class classification for time-series data has become a prominent area of research. Neural networks represent a popular approach to classifying data; However, they lack interpretability, which poses a significant challenge in extracting meaningful information from t… ▽ More

    Submitted 24 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  36. arXiv:2401.10124  [pdf, other

    stat.ME cs.SI physics.soc-ph stat.AP

    Lower Ricci Curvature for Efficient Community Detection

    Authors: Yun Jin Park, Didong Li

    Abstract: This study introduces the Lower Ricci Curvature (LRC), a novel, scalable, and scale-free discrete curvature designed to enhance community detection in networks. Addressing the computational challenges posed by existing curvature-based methods, LRC offers a streamlined approach with linear computational complexity, making it well-suited for large-scale network analysis. We further develop an LRC-ba… ▽ More

    Submitted 27 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  37. arXiv:2401.07400  [pdf, other

    stat.ME

    Gaussian Processes for Time Series with Lead-Lag Effects with applications to biology data

    Authors: Wancen Mu, Jiawen Chen, Eric S. Davis, Kathleen Reed, Douglas Phanstiel, Michael I. Love, Didong Li

    Abstract: Investigating the relationship, particularly the lead-lag effect, between time series is a common question across various disciplines, especially when uncovering biological process. However, analyzing time series presents several challenges. Firstly, due to technical reasons, the time points at which observations are made are not at uniform inintervals. Secondly, some lead-lag effects are transien… ▽ More

    Submitted 25 September, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

  38. arXiv:2401.05784  [pdf, other

    econ.EM stat.ME

    Covariance Function Estimation for High-Dimensional Functional Time Series with Dual Factor Structures

    Authors: Chenlei Leng, Degui Li, Hanlin Shang, Yingcun Xia

    Abstract: We propose a flexible dual functional factor model for modelling high-dimensional functional time series. In this model, a high-dimensional fully functional factor parametrisation is imposed on the observed functional processes, whereas a low-dimensional version (via series approximation) is assumed for the latent functional factors. We extend the classic principal component analysis technique for… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  39. arXiv:2401.03106  [pdf, other

    stat.ME

    Contrastive linear regression

    Authors: Boyang Zhang, Sarah Nyquist, Andrew Jones, Barbara E. Engelhardt, Didong Li

    Abstract: Contrastive dimension reduction methods have been developed for case-control study data to identify variation that is enriched in the foreground (case) data X relative to the background (control) data Y. Here, we develop contrastive regression for the setting when there is a response variable r associated with each foreground observation. This situation occurs frequently when, for example, the una… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  40. arXiv:2311.06537  [pdf

    cs.CY cs.AI stat.AP stat.ME

    Is Machine Learning Unsafe and Irresponsible in Social Sciences? Paradoxes and Reconsidering from Recidivism Prediction Tasks

    Authors: Jianhong Liu, Dianshi Li

    Abstract: The paper addresses some fundamental and hotly debated issues for high-stakes event predictions underpinning the computational approach to social sciences. We question several prevalent views against machine learning and outline a new paradigm that highlights the promises and promotes the infusion of computational methods and conventional social science approaches.

    Submitted 11 November, 2023; originally announced November 2023.

    Journal ref: Asian J Criminol (2024)

  41. arXiv:2311.03769  [pdf, other

    stat.ME

    Nonparametric Screening for Additive Quantile Regression in Ultra-high Dimension

    Authors: Daoji Li, Yinfei Kong, Dawit Zerom

    Abstract: In practical applications, one often does not know the "true" structure of the underlying conditional quantile function, especially in the ultra-high dimensional setting. To deal with ultra-high dimensionality, quantile-adaptive marginal nonparametric screening methods have been recently developed. However, these approaches may miss important covariates that are marginally independent of the respo… ▽ More

    Submitted 24 April, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: 39 pages, 7 Tables, 2 Figures

  42. arXiv:2311.02450  [pdf, other

    stat.ME

    Factor-guided estimation of large covariance matrix function with conditional functional sparsity

    Authors: Dong Li, Xinghao Qiao, Zihan Wang

    Abstract: This paper addresses the fundamental task of estimating covariance matrix functions for high-dimensional functional data/functional time series. We consider two functional factor structures encompassing either functional factors with scalar loadings or scalar factors with functional loadings, and postulate functional sparsity on the covariance of idiosyncratic errors after taking out the common un… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  43. arXiv:2310.17023  [pdf, other

    stat.ML cs.LG

    On the Identifiability and Interpretability of Gaussian Process Models

    Authors: Jiawen Chen, Wancen Mu, Yun Li, Didong Li

    Abstract: In this paper, we critically examine the prevalent practice of using additive mixtures of Matérn kernels in single-output Gaussian process (GP) models and explore the properties of multiplicative mixtures of Matérn kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Matérn kernels is determined by the… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

    MSC Class: 62M30

  44. arXiv:2309.06428  [pdf, other

    stat.ME

    Tail Gini Functional under Asymptotic Independence

    Authors: Zhaowen Wang, Liujun Chen, Deyuan Li

    Abstract: Tail Gini functional is a measure of tail risk variability for systemic risks, and has many applications in banking, finance and insurance. Meanwhile, there is growing attention on aymptotic independent pairs in quantitative risk management. This paper addresses the estimation of the tail Gini functional under asymptotic independence. We first estimate the tail Gini functional at an intermediate l… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 22 pages, 5 figures

  45. arXiv:2308.03296  [pdf, other

    cs.LG cs.CL stat.ML

    Studying Large Language Model Generalization with Influence Functions

    Authors: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

    Abstract: When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set?… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 119 pages, 47 figures, 22 tables

  46. CoxKnockoff: Controlled Feature Selection for the Cox Model Using Knockoffs

    Authors: Daoji Li, Jinzhao Yu, Hui Zhao

    Abstract: Although there is a huge literature on feature selection for the Cox model, none of the existing approaches can control the false discovery rate (FDR) unless the sample size tends to infinity. In addition, there is no formal power analysis of the knockoffs framework for survival data in the literature. To address those issues, in this paper, we propose a novel controlled feature selection approach… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: 22 pages including the Supporting Information

    Journal ref: Stat, 12(1), e607 (2023)

  47. arXiv:2307.10842  [pdf, other

    cs.CV cs.LG stat.ML

    Label Calibration for Semantic Segmentation Under Domain Shift

    Authors: Ondrej Bohdal, Da Li, Timothy Hospedales

    Abstract: Performance of a pre-trained semantic segmentation model is likely to substantially decrease on data from a new domain. We show a pre-trained model can be adapted to unlabelled target domain data by calculating soft-label prototypes under the domain shift and making predictions according to the prototype closest to the vector with predicted class probabilities. The proposed adaptation procedure is… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: ICLR 2023 Workshop on Pitfalls of Limited Data and Computation for Trustworthy ML

  48. arXiv:2307.10787  [pdf, other

    cs.CV cs.LG stat.ML

    Feed-Forward Source-Free Domain Adaptation via Class Prototypes

    Authors: Ondrej Bohdal, Da Li, Timothy Hospedales

    Abstract: Source-free domain adaptation has become popular because of its practical usefulness and no need to access source data. However, the adaptation process still takes a considerable amount of time and is predominantly based on optimization that relies on back-propagation. In this work we present a simple feed-forward approach that challenges the need for back-propagation based adaptation. Our approac… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: ECCV 2022 Workshop on Out of Distribution Generalization in Computer Vision (OOD-CV)

  49. Nonparametric Estimation of Large Spot Volatility Matrices for High-Frequency Financial Data

    Authors: Ruijun Bu, Degui Li, Oliver Linton, Hanchao Wang

    Abstract: In this paper, we consider estimating spot/instantaneous volatility matrices of high-frequency data collected for a large number of assets. We first combine classic nonparametric kernel-based smoothing with a generalised shrinkage technique in the matrix estimation for noise-free data under a uniform sparsity assumption, a natural extension of the approximate sparsity commonly used in the literatu… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  50. arXiv:2306.08553  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach

    Authors: Hongyang R. Zhang, Dongyue Li, Haotian Ju

    Abstract: The training of over-parameterized neural networks has received much study in recent literature. An important consideration is the regularization of over-parameterized networks due to their highly nonconvex and nonlinear geometry. In this paper, we study noise injection algorithms, which can regularize the Hessian of the loss, leading to regions with flat loss surfaces. Specifically, by injecting… ▽ More

    Submitted 23 September, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: 39 pages

    Journal ref: Trans. Mach. Learn. Res. 2024