Skip to main content

Showing 1–50 of 158 results for author: Song, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.22594  [pdf, ps, other

    math.ST stat.ML

    Multi-Environment GLAMP: Approximate Message Passing for Transfer Learning with Applications to Lasso-based Estimators

    Authors: Longlin Wang, Yanke Song, Kuanhao Jiang, Pragya Sur

    Abstract: Approximate Message Passing (AMP) algorithms enable precise characterization of certain classes of random objects in the high-dimensional limit, and have found widespread applications in fields such as signal processing, statistics, and communications. In this work, we introduce Multi-Environment Generalized Long AMP, a novel AMP framework that applies to transfer learning problems with multiple d… ▽ More

    Submitted 29 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Restructured the previous Section 3 and included reference to Gerbelot and Berthier (Information and Inference, 2023). 85 pages, 3 figures

  2. arXiv:2505.20330  [pdf, other

    cs.LG stat.ML

    Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning

    Authors: Yunfu Song, Zhijian Ou

    Abstract: Our examination of deep generative models (DGMs) developed for semi-supervised learning (SSL), mainly GANs and VAEs, reveals two problems. First, mode missing and mode covering phenomenons are observed in genertion with GANs and VAEs. Second, there exists an awkward conflict between good classification and good generation in SSL by employing directed generative models. To address these problems, w… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: ICML 2018 submission. arXiv admin note: text overlap with arXiv:1808.01630, arXiv:2505.18558

  3. arXiv:2505.17400  [pdf, ps, other

    math.ST stat.ML

    Minimax Rate-Optimal Algorithms for High-Dimensional Stochastic Linear Bandits

    Authors: Jingyu Liu, Yanglei Song

    Abstract: We study the stochastic linear bandit problem with multiple arms over $T$ rounds, where the covariate dimension $d$ may exceed $T$, but each arm-specific parameter vector is $s$-sparse. We begin by analyzing the sequential estimation problem in the single-arm setting, focusing on cumulative mean-squared error. We show that Lasso estimators are provably suboptimal in the sequential setting, exhibit… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  4. arXiv:2505.14725  [pdf, ps, other

    q-bio.GN cs.LG stat.AP

    HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity

    Authors: Xuejun Sun, Yiran Song, Xiaochen Zhou, Ruilie Cai, Yu Zhang, Xinyi Li, Rui Peng, Jialiu Xie, Yuanyuan Yan, Muyao Tang, Prem Lakshmanane, Baiming Zou, James S. Hagood, Raymond J. Pickles, Didong Li, Fei Zou, Xiaojing Zheng

    Abstract: Respiratory viral infections pose a global health burden, yet the cellular immune responses driving protection or pathology remain unclear. Natural infection cohorts often lack pre-exposure baseline data and structured temporal sampling. In contrast, inoculation and vaccination trials generate insightful longitudinal transcriptomic data. However, the scattering of these datasets across platforms,… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  5. arXiv:2505.00822  [pdf, other

    stat.ME cs.LG stat.ML

    Q-Learning with Clustered-SMART (cSMART) Data: Examining Moderators in the Construction of Clustered Adaptive Interventions

    Authors: Yao Song, Kelly Speth, Amy Kilbourne, Andrew Quanbeck, Daniel Almirall, Lu Wang

    Abstract: A clustered adaptive intervention (cAI) is a pre-specified sequence of decision rules that guides practitioners on how best - and based on which measures - to tailor cluster-level intervention to improve outcomes at the level of individuals within the clusters. A clustered sequential multiple assignment randomized trial (cSMART) is a type of trial that is used to inform the empirical development o… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  6. arXiv:2503.23430  [pdf, ps, other

    stat.ML cs.LG math.OC stat.AP

    DGSAM: Domain Generalization via Individual Sharpness-Aware Minimization

    Authors: Youngjun Song, Youngsik Hwang, Jonghun Lee, Heechang Lee, Dong-Young Lim

    Abstract: Domain generalization (DG) aims to learn models that perform well on unseen target domains by training on multiple source domains. Sharpness-Aware Minimization (SAM), known for finding flat minima that improve generalization, has therefore been widely adopted in DG. However, our analysis reveals that SAM in DG may converge to \textit{fake flat minima}, where the total loss surface appears flat in… ▽ More

    Submitted 30 June, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

  7. arXiv:2503.07824  [pdf, other

    stat.ML cs.LG

    Pure Exploration with Feedback Graphs

    Authors: Alessio Russo, Yichen Song, Aldo Pacchiano

    Abstract: We study the sample complexity of pure exploration in an online learning problem with a feedback graph. This graph dictates the feedback available to the learner, covering scenarios between full-information, pure bandit feedback, and settings with no feedback on the chosen action. While variants of this problem have been investigated for regret minimization, no prior work has addressed the pure ex… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  8. arXiv:2502.01627  [pdf, other

    astro-ph.IM astro-ph.HE cs.LG stat.AP

    A Poisson Process AutoDecoder for X-ray Sources

    Authors: Yanke Song, Victoria Ashley Villar, Juan Rafael Martinez-Galarza, Steven Dillmann

    Abstract: X-ray observing facilities, such as the Chandra X-ray Observatory and the eROSITA, have detected millions of astronomical sources associated with high-energy phenomena. The arrival of photons as a function of time follows a Poisson process and can vary by orders-of-magnitude, presenting obstacles for common tasks such as source classification, physical property derivation, and anomaly detection. P… ▽ More

    Submitted 4 February, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: 13 pages, 5 figures

  9. arXiv:2411.02544  [pdf, other

    cs.LG stat.ML

    Pretrained transformer efficiently learns low-dimensional target functions in-context

    Authors: Kazusato Oko, Yujin Song, Taiji Suzuki, Denny Wu

    Abstract: Transformers can efficiently learn in-context from example demonstrations. Most existing theoretical analyses studied the in-context learning (ICL) ability of transformers for linear function classes, where it is typically shown that the minimizer of the pretraining loss implements one gradient descent step on the least squares objective. However, this simplified linear setting arguably does not d… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  10. arXiv:2411.01780  [pdf, other

    cs.LG stat.ML

    Clustering Based on Density Propagation and Subcluster Merging

    Authors: Feiping Nie, Yitao Song, Jingjing Xue, Rong Wang, Xuelong Li

    Abstract: We propose the DPSM method, a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space. Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process, thereby making it suitable for… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

  11. arXiv:2411.01319  [pdf, ps, other

    q-fin.RM stat.ME

    Efficient Nested Estimation of CoVaR: A Decoupled Approach

    Authors: Nifei Lin, Yingda Song, L. Jeff Hong

    Abstract: This paper addresses the estimation of the systemic risk measure known as CoVaR, which quantifies the risk of a financial portfolio conditional on another portfolio being at risk. We identify two principal challenges: conditioning on a zero-probability event and the repricing of portfolios. To tackle these issues, we propose a decoupled approach utilizing smoothing techniques and develop a model-i… ▽ More

    Submitted 2 November, 2024; originally announced November 2024.

  12. arXiv:2410.16722  [pdf

    stat.ME stat.ML

    Robust Variable Selection for High-dimensional Regression with Missing Data and Measurement Errors

    Authors: Zhenhao Zhang, Yunquan Song

    Abstract: In our paper, we focus on robust variable selection for missing data and measurement error. Missing data and measurement errors can lead to confusing data distribution. We propose an exponential loss function with a tuning parameter to apply to Missing and measurement errors data. By adjusting the parameter, the loss function can be better and more robust under various data distributions. We use i… ▽ More

    Submitted 29 June, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: I finished this work in 2023 when I was an undergraduate Student intern in the Department of Data Science and Statistics

  13. arXiv:2410.11081  [pdf, other

    cs.LG stat.ML

    Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

    Authors: Cheng Lu, Yang Song

    Abstract: Consistency models (CMs) are a powerful class of diffusion-based generative models optimized for fast sampling. Most existing CMs are trained using discretized timesteps, which introduce additional hyperparameters and are prone to discretization errors. While continuous-time formulations can mitigate these issues, their success has been limited by training instability. To address this, we propose… ▽ More

    Submitted 1 March, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 Oral

  14. arXiv:2410.09741  [pdf, other

    cs.LG stat.ML

    Real-time Fuel Leakage Detection via Online Change Point Detection

    Authors: Ruimin Chu, Li Chik, Yiliao Song, Jeffrey Chan, Xiaodong Li

    Abstract: Early detection of fuel leakage at service stations with underground petroleum storage systems is a crucial task to prevent catastrophic hazards. Current data-driven fuel leakage detection methods employ offline statistical inventory reconciliation, leading to significant detection delays. Consequently, this can result in substantial financial loss and environmental impact on the surrounding commu… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  15. arXiv:2410.08945  [pdf, other

    stat.AP

    Online stochastic generators using Slepian bases for regional bivariate wind speed ensembles from ERA5

    Authors: Yan Song, Zubair Khalid, Marc G. Genton

    Abstract: Reanalysis data, such as ERA5, provide a comprehensive and detailed representation of the Earth's system by assimilating observations into climate models. While crucial for climate research, they pose significant challenges in terms of generation, storage, and management. For 3-hourly bivariate wind speed ensembles from ERA5, which face these challenges, this paper proposes an online stochastic ge… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  16. arXiv:2410.06586  [pdf

    stat.AP

    Use of Real-World Data and Real-World Evidence in Rare Disease Drug Development: A Statistical Perspective

    Authors: Jie Chen, Susan Gruber, Hana Lee, Haitao Chu, Shiowjen Lee, Haijun Tian, Yan Wang, Weili He, Thomas Jemielita, Yang Song, Roy Tamura, Lu Tian, Yihua Zhao, Yong Chen, Mark van der Laan, Lei Nie

    Abstract: Real-world data (RWD) and real-world evidence (RWE) have been increasingly used in medical product development and regulatory decision-making, especially for rare diseases. After outlining the challenges and possible strategies to address the challenges in rare disease drug development (see the accompanying paper), the Real-World Evidence (RWE) Scientific Working Group of the American Statistical… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  17. arXiv:2410.06585  [pdf

    stat.AP

    Challenges and Possible Strategies to Address Them in Rare Disease Drug Development: A Statistical Perspective

    Authors: Jie Chen, Lei Nie, Shiowjen Lee, Haitao Chu, Haijun Tian, Yan Wang, Weili He, Thomas Jemielita, Susan Gruber, Yang Song, Roy Tamura, Lu Tian, Yihua Zhao, Yong Chen, Mark van der Laan, Hana Lee

    Abstract: Developing drugs for rare diseases presents unique challenges from a statistical perspective. These challenges may include slowly progressive diseases with unmet medical needs, poorly understood natural history, small population size, diversified phenotypes and geneotypes within a disorder, and lack of appropriate surrogate endpoints to measure clinical benefits. The Real-World Evidence (RWE) Scie… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  18. arXiv:2408.11808  [pdf, ps, other

    stat.ME math.ST

    Distance Correlation in Multiple Biased Sampling Models

    Authors: Yuwei Ke, Hok Kan Ling, Yanglei Song

    Abstract: Testing the independence between random vectors is a fundamental problem in statistics. Distance correlation, a recently popular dependence measure, is universally consistent for testing independence against all distributions with finite moments. However, when data are subject to selection bias or collected from multiple sources or schemes, spurious dependence may arise. This creates a need for me… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  19. arXiv:2408.04440  [pdf, other

    stat.CO

    Boosting Earth System Model Outputs And Saving PetaBytes in their Storage Using Exascale Climate Emulators

    Authors: Sameh Abdulah, Allison H. Baker, George Bosilca, Qinglei Cao, Stefano Castruccio, Marc G. Genton, David E. Keyes, Zubair Khalid, Hatem Ltaief, Yan Song, Georgiy L. Stenchikov, Ying Sun

    Abstract: We present the design and scalable implementation of an exascale climate emulator for addressing the escalating computational and storage requirements of high-resolution Earth System Model simulations. We utilize the spherical harmonic transform to stochastically model spatio-temporal variations in climate data. This provides tunable spatio-temporal resolution and significantly improves the fideli… ▽ More

    Submitted 11 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  20. arXiv:2407.11435  [pdf, other

    q-bio.GN cs.LG stat.ML

    Genomic Language Models: Opportunities and Challenges

    Authors: Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S. Song

    Abstract: Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to signif… ▽ More

    Submitted 22 September, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Review article; 26 pages, 3 figures, 1 table

    MSC Class: 92-08; 92B20; 68T50; 68T07

  21. arXiv:2406.13944  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Generalization error of min-norm interpolators in transfer learning

    Authors: Yanke Song, Sohom Bhattacharya, Pragya Sur

    Abstract: This paper establishes the generalization error of pooled min-$\ell_2$-norm interpolation in transfer learning where data from diverse distributions are available. Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms. Previous work characterized their out-of-distribution risk when samples from the test distribution are unavailable during trai… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 53 pages, 2 figures

  22. arXiv:2406.11828  [pdf, other

    cs.LG stat.ML

    Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

    Authors: Kazusato Oko, Yujin Song, Taiji Suzuki, Denny Wu

    Abstract: We study the computational and sample complexity of learning a target function $f_*:\mathbb{R}^d\to\mathbb{R}$ with additive structure, that is, $f_*(x) = \frac{1}{\sqrt{M}}\sum_{m=1}^M f_m(\langle x, v_m\rangle)$, where $f_1,f_2,...,f_M:\mathbb{R}\to\mathbb{R}$ are nonlinear link functions of single-index models (ridge functions) with diverse and near-orthogonal index features $\{v_m\}_{m=1}^M$,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: COLT 2024

  23. arXiv:2406.11184  [pdf, other

    stat.ME math.ST

    HEDE: Heritability estimation in high dimensions by Ensembling Debiased Estimators

    Authors: Yanke Song, Xihong Lin, Pragya Sur

    Abstract: Estimating heritability remains a significant challenge in statistical genetics. Diverse approaches have emerged over the years that are broadly categorized as either random effects or fixed effects heritability methods. In this work, we focus on the latter. We propose HEDE, an ensemble approach to estimate heritability or the signal-to-noise ratio in high-dimensional linear models where the sampl… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 58 pages, 7 figures

  24. arXiv:2406.06149  [pdf, other

    cs.LG stat.ML

    Decoupled Marked Temporal Point Process using Neural Ordinary Differential Equations

    Authors: Yujee Song, Donghyun Lee, Rui Meng, Won Hwa Kim

    Abstract: A Marked Temporal Point Process (MTPP) is a stochastic process whose realization is a set of event-time data. MTPP is often used to understand complex dynamics of asynchronous temporal events such as money transaction, social media, healthcare, etc. Recent studies have utilized deep neural networks to capture complex temporal dependencies of events and generate embedding that aptly represent the o… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 18 pages, 8 figures, The Twelfth International Conference on Learning Representations (ICLR 2024)

  25. arXiv:2406.00396  [pdf, other

    cs.LG cond-mat.stat-mech cs.AI stat.ML

    Stochastic Resetting Mitigates Latent Gradient Bias of SGD from Label Noise

    Authors: Youngkyoung Bae, Yeongwoo Song, Hawoong Jeong

    Abstract: Giving up and starting over may seem wasteful in many situations such as searching for a target or training deep neural networks (DNNs). Our study, though, demonstrates that resetting from a checkpoint can significantly improve generalization performance when training DNNs with noisy labels. In the presence of noisy labels, DNNs initially learn the general patterns of the data but then gradually m… ▽ More

    Submitted 4 March, 2025; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: 30 pages, 14 figures

    Journal ref: Mach. Learn.: Sci. Technol. 6 (2025) 015062

  26. arXiv:2405.07220  [pdf, other

    cs.LG cs.AI stat.ML

    On Discovery of Local Independence over Continuous Variables via Neural Contextual Decomposition

    Authors: Inwoo Hwang, Yunhyeok Kwak, Yeon-Ji Song, Byoung-Tak Zhang, Sanghack Lee

    Abstract: Conditional independence provides a way to understand causal relationships among the variables of interest. An underlying system may exhibit more fine-grained causal relationships especially between a variable and its parents, which will be called the local independence relationships. One of the most widely studied local relationships is Context-Specific Independence (CSI), which holds in a specif… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Conference on Causal Learning and Reasoning (CLeaR), 2023

  27. arXiv:2402.13259  [pdf, other

    stat.ME cs.CE math.NA math.PR

    Fast Discrete-Event Simulation of Markovian Queueing Networks through Euler Approximation

    Authors: L. Jeff Hong, Yingda Song, Tan Wang

    Abstract: The efficient management of large-scale queueing networks is critical for a variety of sectors, including healthcare, logistics, and customer service, where system performance has profound implications for operational effectiveness and cost management. To address this key challenge, our paper introduces simulation techniques tailored for complex, large-scale Markovian queueing networks. We develop… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  28. MAPPING: Debiasing Graph Neural Networks for Fair Node Classification with Limited Sensitive Information Leakage

    Authors: Ying Song, Balaji Palanisamy

    Abstract: Despite remarkable success in diverse web-based applications, Graph Neural Networks(GNNs) inherit and further exacerbate historical discrimination and social stereotypes, which critically hinder their deployments in high-stake domains such as online clinical diagnosis, financial crediting, etc. However, current fairness research that primarily craft on i.i.d data, cannot be trivially replicated to… ▽ More

    Submitted 26 January, 2025; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted by WWW Journal. Code is available at https://github.com/yings0930/MAPPING

    Journal ref: World Wide Web 27, 74 (2024)

  29. arXiv:2311.08384  [pdf, other

    cs.LG cs.AI stat.ML

    Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

    Authors: Yifei Zhou, Ayush Sekhari, Yuda Song, Wen Sun

    Abstract: Hybrid RL is the setting where an RL agent has access to both offline data and online data by interacting with the real-world environment. In this work, we propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data. On-policy methods such as policy gradient and natural policy gradient (NPG) have shown to be more robust to model misspecification, though somet… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: The first two authors contributed equally

  30. arXiv:2310.04367  [pdf

    stat.ML cs.LG

    A Marketplace Price Anomaly Detection System at Scale

    Authors: Akshit Sarpal, Qiwen Kang, Fangping Huang, Yang Song, Lijie Wan

    Abstract: Online marketplaces execute large volume of price updates that are initiated by individual marketplace sellers each day on the platform. This price democratization comes with increasing challenges with data quality. Lack of centralized guardrails that are available for a traditional online retailer causes a higher likelihood for inaccurate prices to get published on the website, leading to poor cu… ▽ More

    Submitted 9 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: 10 pages, 4 figures, 7 tables

  31. arXiv:2310.02216  [pdf, other

    stat.AP

    Efficient stochastic generators with spherical harmonic transformation for high-resolution global climate simulations from CESM2-LENS2

    Authors: Yan Song, Zubair Khalid, Marc G. Genton

    Abstract: Earth system models (ESMs) are fundamental for understanding Earth's complex climate system. However, the computational demands and storage requirements of ESM simulations limit their utility. For the newly published CESM2-LENS2 data, which suffer from this issue, we propose a novel stochastic generator (SG) as a practical complement to the CESM2, capable of rapidly producing emulations closely mi… ▽ More

    Submitted 24 May, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

  32. arXiv:2307.00190  [pdf

    stat.AP

    Estimands in Real-World Evidence Studies

    Authors: Jie Chen, Daniel Scharfstein, Hongwei Wang, Binbing Yu, Yang Song, Weili He, John Scott, Xiwu Lin, Hana Lee

    Abstract: A Real-World Evidence (RWE) Scientific Working Group (SWG) of the American Statistical Association Biopharmaceutical Section (ASA BIOP) has been reviewing statistical considerations for the generation of RWE to support regulatory decision-making. As part of the effort, the working group is addressing estimands in RWE studies. Constructing the right estimand -- the target of estimation -- which ref… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

  33. arXiv:2305.07813  [pdf, other

    stat.ME stat.CO

    Fast robust location and scatter estimation: a depth-based method

    Authors: Maoyu Zhang, Yan Song, Wenlin Dai

    Abstract: The minimum covariance determinant (MCD) estimator is ubiquitous in multivariate analysis, the critical step of which is to select a subset of a given size with the lowest sample covariance determinant. The concentration step (C-step) is a common tool for subset-seeking; however, it becomes computationally demanding for high-dimensional data. To alleviate the challenge, we propose a depth-based al… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  34. arXiv:2305.01188  [pdf, other

    stat.AP stat.ME

    Advancing inverse scattering with surrogate modeling and Bayesian inference for functional inputs

    Authors: Chih-Li Sung, Yao Song, Ying Hung

    Abstract: Inverse scattering aims to infer information about a hidden object by using the received scattered waves and training data collected from forward mathematical models. Recent advances in computing have led to increasing attention towards functional inverse inference, which can reveal more detailed properties of a hidden object. However, rigorous studies on functional inverse, including the reconstr… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  35. arXiv:2304.09868  [pdf, other

    cs.LG cs.AI stat.ML

    Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression

    Authors: Yuxuan Song, Yongyu Wang

    Abstract: This paper proposes a novel framework for accelerating support vector clustering. The proposed method first computes much smaller compressed data sets while preserving the key cluster properties of the original data sets based on a novel spectral data compression approach. Then, the resultant spectrally-compressed data sets are leveraged for the development of fast and high quality algorithm for s… ▽ More

    Submitted 14 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

  36. arXiv:2304.09132  [pdf, other

    stat.ME

    Independence testing for inhomogeneous random graphs

    Authors: Yukun Song, Carey E. Priebe, Minh Tang

    Abstract: Testing for independence between graphs is a problem that arises naturally in social network analysis and neuroscience. In this paper, we address independence testing for inhomogeneous Erdős-Rényi random graphs on the same vertex set. We first formulate a notion of pairwise correlations between the edges of these graphs and derive a necessary condition for their detectability. We next show that th… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: 24 pages, 2 figures

  37. arXiv:2303.01469  [pdf, other

    cs.LG cs.CV stat.ML

    Consistency Models

    Authors: Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever

    Abstract: Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing mult… ▽ More

    Submitted 31 May, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: ICML 2023

  38. Impact of Event Encoding and Dissimilarity Measures on Traffic Crash Characterization Based on Sequence of Events

    Authors: Yu Song, Madhav V. Chitturi, David A. Noyce

    Abstract: Crash sequence analysis has been shown in prior studies to be useful for characterizing crashes and identifying safety countermeasures. Sequence analysis is highly domain-specific, but its various techniques have not been evaluated for adaptation to crash sequences. This paper evaluates the impact of encoding and dissimilarity measures on crash sequence analysis and clustering. Sequence data of in… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

  39. arXiv:2302.01269  [pdf, other

    stat.ME math.ST

    Adjusting for Incomplete Baseline Covariates in Randomized Controlled Trials: A Cross-World Imputation Framework

    Authors: Yilin Song, James P. Hughes, Ting Ye

    Abstract: In randomized controlled trials, adjusting for baseline covariates is often applied to improve the precision of treatment effect estimation. However, missingness in covariates is common. Recently, Zhao & Ding (2022) studied two simple strategies, the single imputation method and missingness indicator method (MIM), to deal with missing covariates, and showed that both methods can provide efficiency… ▽ More

    Submitted 26 August, 2024; v1 submitted 2 February, 2023; originally announced February 2023.

  40. arXiv:2212.01168  [pdf, other

    cs.LG cs.AI physics.comp-ph stat.ML

    Towards Cross Domain Generalization of Hamiltonian Representation via Meta Learning

    Authors: Yeongwoo Song, Hawoong Jeong

    Abstract: Recent advances in deep learning for physics have focused on discovering shared representations of target systems by incorporating physics priors or inductive biases into neural networks. While effective, these methods are limited to the system domain, where the type of system remains consistent and thus cannot ensure the adaptation to new, or unseen physical systems governed by different laws. Fo… ▽ More

    Submitted 27 April, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: Conference paper at ICLR 2024

  41. arXiv:2210.16976  [pdf, other

    cs.LG stat.ML

    Representation Learning for General-sum Low-rank Markov Games

    Authors: Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang

    Abstract: We study multi-agent general-sum Markov games with nonlinear function approximation. We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation. The goal is to design an algorithm that (1) finds an $\varepsilon$-equilibrium policy sample efficiently without prior knowledge of the environment or the representation, and… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

  42. Consistent Covariance estimation for stratum imbalances under minimization method for covariate-adaptive randomization

    Authors: Zixuan Zhao, Yanglei Song, Wenyu Jiang, Dongsheng Tu

    Abstract: Pocock and Simon's minimization method is a popular approach for covariate-adaptive randomization in clinical trials. Valid statistical inference with data collected under the minimization method requires the knowledge of the limiting covariance matrix of within-stratum imbalances, whose existence is only recently established. In this work, we propose a bootstrap-based estimator for this limit and… ▽ More

    Submitted 26 December, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: 29 pages, peer reviewed version, will appear in Scandinavian Journal of Statistics

  43. Intersection Two-Vehicle Crash Scenario Specification for Automated Vehicle Safety Evaluation Using Sequence Analysis and Bayesian Networks

    Authors: Yu Song, Madhav V. Chitturi, David A. Noyce

    Abstract: This paper develops a test scenario specification procedure using crash sequence analysis and Bayesian network modeling. Intersection two-vehicle crash data was obtained from the 2016 to 2018 National Highway Traffic Safety Administration Crash Report Sampling System database. Vehicles involved in the crashes are specifically renumbered based on their initial positions and trajectories. Crash sequ… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

  44. arXiv:2207.12804  [pdf, other

    stat.ME

    Large-Scale Low-Rank Gaussian Process Prediction with Support Points

    Authors: Yan Song, Wenlin Dai, Marc G. Genton

    Abstract: Low-rank approximation is a popular strategy to tackle the "big n problem" associated with large-scale Gaussian process regressions. Basis functions for developing low-rank structures are crucial and should be carefully specified. Predictive processes simplify the problem by inducing basis functions with a covariance function and a set of knots. The existing literature suggests certain practical i… ▽ More

    Submitted 3 September, 2024; v1 submitted 26 July, 2022; originally announced July 2022.

  45. Covariate Adjustment in Randomized Clinical Trials with Missing Covariate and Outcome Data

    Authors: Chia-Rui Chang, Yue Song, Fan Li, Rui Wang

    Abstract: When analyzing data from randomized clinical trials, covariate adjustment can be used to account for chance imbalance in baseline covariates and to increase precision of the treatment effect estimate. A practical barrier to covariate adjustment is the presence of missing data. In this paper, in the light of recent theoretical advancement, we first review several covariate adjustment methods with i… ▽ More

    Submitted 16 May, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

  46. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  47. arXiv:2202.11735  [pdf, other

    stat.ML cs.LG math.ST

    Truncated LinUCB for Stochastic Linear Bandits

    Authors: Yanglei Song, Meng zhou

    Abstract: This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed $d$-dimensional random vectors, and the expected rewards are linear in both the arm parameters and contexts. The LinUCB algorithm, which is near minimax optimal for related linear bandits, is shown to have a cumulative regret that is suboptimal in both the dimension… ▽ More

    Submitted 5 May, 2025; v1 submitted 23 February, 2022; originally announced February 2022.

  48. arXiv:2112.10992  [pdf, other

    cs.CV stat.ML

    Expansion-Squeeze-Excitation Fusion Network for Elderly Activity Recognition

    Authors: Xiangbo Shu, Jiawen Yang, Rui Yan, Yan Song

    Abstract: This work focuses on the task of elderly activity recognition, which is a challenging task due to the existence of individual actions and human-object interactions in elderly activities. Thus, we attempt to effectively aggregate the discriminative information of actions and interactions from both RGB videos and skeleton sequences by attentively fusing multi-modal features. Recently, some nonlinear… ▽ More

    Submitted 24 April, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

  49. arXiv:2111.11010  [pdf, other

    cs.LG stat.ML

    Density Ratio Estimation via Infinitesimal Classification

    Authors: Kristy Choi, Chenlin Meng, Yang Song, Stefano Ermon

    Abstract: Density ratio estimation (DRE) is a fundamental machine learning technique for comparing two probability distributions. However, existing methods struggle in high-dimensional settings, as it is difficult to accurately compare probability distributions based on finite samples. In this work we propose DRE-\infty, a divide-and-conquer approach to reduce DRE to a series of easier subproblems. Inspired… ▽ More

    Submitted 12 March, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: First two authors contributed equally

  50. arXiv:2111.08005  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Solving Inverse Problems in Medical Imaging with Score-Based Generative Models

    Authors: Yang Song, Liyue Shen, Lei Xing, Stefano Ermon

    Abstract: Reconstructing medical images from partial measurements is an important inverse problem in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI). Existing solutions based on machine learning typically train a model to directly map measurements to medical images, leveraging a training dataset of paired images and measurements. These measurements are typically synthesized from images using a… ▽ More

    Submitted 15 June, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: Published at ICLR 2022