Skip to main content

Showing 1–50 of 85 results for author: Nguyen, Q

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.11014  [pdf, ps, other

    stat.ME cs.LG econ.EM

    A Cautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference

    Authors: Harsh Parikh, Trang Quynh Nguyen, Elizabeth A. Stuart, Kara E. Rudolph, Caleb H. Miles

    Abstract: Data integration approaches are increasingly used to enhance the efficiency and generalizability of studies. However, a key limitation of these methods is the assumption that outcome measures are identical across datasets -- an assumption that often does not hold in practice. Consider the following opioid use disorder (OUD) studies: the XBOT trial and the POAT study, both evaluating the effect of… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  2. arXiv:2502.16313  [pdf, other

    stat.AP

    A multilevel model with heterogeneous variances for snap timing in the National Football League

    Authors: Quang Nguyen, Ronald Yurko

    Abstract: Player tracking data have provided great opportunities to generate novel insights into understudied areas of American football, such as pre-snap motion. Using a Bayesian multilevel model with heterogeneous variances, we provide an assessment of NFL quarterbacks and their ability to synchronize the timing of the ball snap with pre-snap movement from their teammates. We focus on passing plays with r… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  3. arXiv:2502.02121  [pdf, other

    cs.LG stat.ML

    BILBO: BILevel Bayesian Optimization

    Authors: Ruth Wan Theng Chew, Quoc Phong Nguyen, Bryan Kian Hsiang Low

    Abstract: Bilevel optimization is characterized by a two-level optimization structure, where the upper-level problem is constrained by optimal lower-level solutions, and such structures are prevalent in real-world problems. The constraint by optimal lower-level solutions poses significant challenges, especially in noisy, constrained, and derivative-free settings, as repeating lower-level optimizations is sa… ▽ More

    Submitted 28 May, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  4. arXiv:2501.08149  [pdf, other

    cs.AI cs.LG stat.ML

    Multiple-Input Variational Auto-Encoder for Anomaly Detection in Heterogeneous Data

    Authors: Phai Vu Dinh, Diep N. Nguyen, Dinh Thai Hoang, Quang Uy Nguyen, Eryk Dutkiewicz

    Abstract: Anomaly detection (AD) plays a pivotal role in AI applications, e.g., in classification, and intrusion/threat detection in cybersecurity. However, most existing methods face challenges of heterogeneity amongst feature subsets posed by non-independent and identically distributed (non-IID) data. We propose a novel neural network model called Multiple-Input Auto-Encoder for AD (MIAEAD) to address thi… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: 16 pages

  5. arXiv:2412.17312  [pdf, other

    cs.LG cs.NE stat.ML

    Improving Pareto Set Learning for Expensive Multi-objective Optimization via Stein Variational Hypernetworks

    Authors: Minh-Duc Nguyen, Phuong Mai Dinh, Quang-Huy Nguyen, Long P. Hoang, Dung D. Le

    Abstract: Expensive multi-objective optimization problems (EMOPs) are common in real-world scenarios where evaluating objective functions is costly and involves extensive computations or physical experiments. Current Pareto set learning methods for such problems often rely on surrogate models like Gaussian processes to approximate the objective functions. These surrogate models can become fragmented, result… ▽ More

    Submitted 15 March, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

    Comments: Accepted to AAAI-25

  6. High-Dimensional Bayesian Optimization via Random Projection of Manifold Subspaces

    Authors: Quoc-Anh Hoang Nguyen, The Hung Tran

    Abstract: Bayesian Optimization (BO) is a popular approach to optimizing expensive-to-evaluate black-box functions. Despite the success of BO, its performance may decrease exponentially as the dimensionality increases. A common framework to tackle this problem is to assume that the objective function depends on a limited set of features that lie on a low-dimensional manifold embedded in the high-dimensional… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  7. arXiv:2411.07221  [pdf, ps, other

    stat.ME

    Self-separated and self-connected models for mediator and outcome missingness in mediation analysis

    Authors: Trang Quynh Nguyen, Razieh Nabi, Fan Yang, Elizabeth A. Stuart

    Abstract: Missing data is a common problem that challenges the study of effects of treatments. In the context of mediation analysis, this paper addresses missingness in the two key variables, mediator and outcome, focusing on identification. We consider self-separated missingness models where identification is achieved by conditional independence assumptions only and self-connected missingness models where… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

  8. arXiv:2407.13904  [pdf, other

    stat.ME

    In defense of MAR over latent ignorability (or latent MAR) for outcome missingness in studying principal causal effects: a causal graph view

    Authors: Trang Quynh Nguyen

    Abstract: This paper concerns outcome missingness in principal stratification analysis. We revisit a common assumption known as latent ignorability or latent missing-at-random (LMAR), often considered a relaxation of missing-at-random (MAR). LMAR posits that the outcome is independent of its missingness if one conditions on principal stratum (which is partially unobservable) in addition to observed variable… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  9. arXiv:2407.03665  [pdf, other

    cs.IR cs.AI cs.LG cs.SI stat.ML

    Heterogeneous Hypergraph Embedding for Recommendation Systems

    Authors: Darnbi Sakong, Viet Hung Vu, Thanh Trung Huynh, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen, Thanh Tam Nguyen

    Abstract: Recent advancements in recommender systems have focused on integrating knowledge graphs (KGs) to leverage their auxiliary information. The core idea of KG-enhanced recommenders is to incorporate rich semantic information for more accurate recommendations. However, two main challenges persist: i) Neglecting complex higher-order interactions in the KG-based user-item network, potentially leading to… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  10. arXiv:2406.17220  [pdf, ps, other

    stat.AP

    NFL Ghosts: A framework for evaluating defender positioning with conditional density estimation

    Authors: Ronald Yurko, Quang Nguyen, Konstantinos Pelechrinis

    Abstract: Player attribution in American football remains an open problem due to the complex nature of twenty-two players interacting on the field, but the granularity of player tracking data provides ample opportunity for novel approaches. In this work, we introduce the first public framework to evaluate spatial and trajectory tracking data of players relative to a baseline distribution of "ghost" defender… ▽ More

    Submitted 22 June, 2025; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 34 pages, 13 figures

  11. arXiv:2405.02449  [pdf, other

    stat.ML cond-mat.mtrl-sci cs.LG q-bio.BM

    Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design

    Authors: Quan Nguyen, Adji Bousso Dieng

    Abstract: Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Published in International Conference on Machine Learning, ICML 2024. Code can be found in the Vertaix GitHub: https://github.com/vertaix/Quality-Weighted-Vendi-Score. Paper dedicated to Kwame Nkrumah

  12. arXiv:2403.14769  [pdf, other

    stat.AP

    Fractional Tackles: Leveraging Player Tracking Data for Within-Play Tackling Evaluation in American Football

    Authors: Quang Nguyen, Ruitong Jiang, Meg Ellingwood, Ronald Yurko

    Abstract: Tackling is a fundamental defensive move in American football, with the main purpose of stopping the forward motion of the ball-carrier. However, current tackling metrics are manually recorded outcomes that are inherently flawed due to their discrete and subjective nature. Using player tracking data, we present a novel framework for assessing tackling contribution in a continuous and objective man… ▽ More

    Submitted 7 January, 2025; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 23 pages, 7 figures, 4 tables

  13. arXiv:2403.01315  [pdf, ps, other

    cs.LG stat.ML

    Near-optimal Per-Action Regret Bounds for Sleeping Bandits

    Authors: Quan Nguyen, Nishant A. Mehta

    Abstract: We derive near-optimal per-action regret bounds for sleeping bandits, in which both the sets of available arms and their losses in every round are chosen by an adversary. In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(K\sqrt{TA\ln{K}})$, obtained indirectly via minimizing internal sleeping regrets. Compared to the min… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: V2: corrected Theorem 8 (FTARL's high probability bound) from log(1/delta) to log(K/delta)

  14. arXiv:2402.01003  [pdf

    stat.AP stat.ME

    Practical challenges in mediation analysis: A guide for applied researchers

    Authors: Megan S. Schuler, Donna L. Coffman, Elizabeth A. Stuart, Trang Q. Nguyen, Brian Vegetabile, Daniel F. McCaffrey

    Abstract: Mediation analysis is a statistical approach that can provide insights regarding the intermediary processes by which an intervention or exposure affects a given outcome. Mediation analyses rose to prominence, particularly in social science research, with the publication of the seminal paper by Baron and Kenny and is now commonly applied in many research disciplines, including health services resea… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  15. Identification of complier and noncomplier average causal effects in the presence of latent missing-at-random (LMAR) outcomes: a unifying view and choices of assumptions

    Authors: Trang Quynh Nguyen, Michelle C. Carlson, Elizabeth A. Stuart

    Abstract: The study of treatment effects is often complicated by noncompliance and missing data. In the one-sided noncompliance setting where of interest are the complier and noncomplier average causal effects (CACE and NACE), we address outcome missingness of the \textit{latent missing at random} type (LMAR, also known as \textit{latent ignorability}). That is, conditional on covariates and treatment assig… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Journal ref: Biostatistics, 2024

  16. Here Comes the STRAIN: Analyzing Defensive Pass Rush in American Football with Player Tracking Data

    Authors: Quang Nguyen, Ronald Yurko, Gregory J. Matthews

    Abstract: In American football, a pass rush is an attempt by the defensive team to disrupt the offense and prevent the quarterback (QB) from completing a pass. Existing metrics for assessing pass rush performance are either discrete-time quantities or based on subjective judgment. Using player tracking data, we propose STRAIN, a novel metric for evaluating pass rushers in the National Football League (NFL)… ▽ More

    Submitted 30 July, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: 12 figures, 7 tables

  17. arXiv:2303.16299  [pdf, other

    stat.ME stat.ML

    Comparison of Methods that Combine Multiple Randomized Trials to Estimate Heterogeneous Treatment Effects

    Authors: Carly Lupton Brantner, Trang Quynh Nguyen, Tengjie Tang, Congwen Zhao, Hwanhee Hong, Elizabeth A. Stuart

    Abstract: Individualized treatment decisions can improve health outcomes, but using data to make these decisions in a reliable, precise, and generalizable way is challenging with a single dataset. Leveraging multiple randomized controlled trials allows for the combination of datasets with unconfounded treatment assignment to better estimate heterogeneous treatment effects. This paper discusses several non-p… ▽ More

    Submitted 15 November, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

  18. arXiv:2303.05032  [pdf, other

    stat.ME

    Sensitivity analysis for principal ignorability violation in estimating complier and noncomplier average causal effects

    Authors: Trang Quynh Nguyen, Elizabeth A. Stuart, Daniel O. Scharfstein, Elizabeth L. Ogburn

    Abstract: An important strategy for identifying principal causal effects, which are often used in settings with noncompliance, is to invoke the principal ignorability (PI) assumption. As PI is untestable, it is important to gauge how sensitive effect estimates are to its violation. We focus on this task for the common one-sided noncompliance setting where there are two principal strata, compliers and noncom… ▽ More

    Submitted 28 March, 2024; v1 submitted 8 March, 2023; originally announced March 2023.

  19. arXiv:2302.13428  [pdf, ps, other

    stat.ME

    Methods for Integrating Trials and Non-Experimental Data to Examine Treatment Effect Heterogeneity

    Authors: Carly Lupton Brantner, Ting-Hsuan Chang, Trang Quynh Nguyen, Hwanhee Hong, Leon Di Stefano, Elizabeth A. Stuart

    Abstract: Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials… ▽ More

    Submitted 28 March, 2023; v1 submitted 26 February, 2023; originally announced February 2023.

  20. arXiv:2302.04713  [pdf, other

    stat.ME

    Platform Trials: the Impact of common Controls on Type One Error and Power

    Authors: Quynh Nguyen, Katharina Hees, Benjamin Hofner

    Abstract: Platform trials offer a framework to study multiple interventions in a single trial with the opportunity of opening and closing arms. The use of a common control in platform trials can increase efficiency as compared to individual control arms or separate trials per treatment. However, the need for multiplicity adjustment as a consequence of common controls is currently a controversial debate amon… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  21. arXiv:2301.07066  [pdf, ps, other

    stat.ME

    Multiple imputation for propensity score analysis with covariates missing at random: some clarity on within and across methods

    Authors: Trang Quynh Nguyen, Elizabeth A. Stuart

    Abstract: In epidemiology and social sciences, propensity score methods are popular for estimating treatment effects using observational data, and multiple imputation is popular for handling covariate missingness. However, how to appropriately use multiple imputation for propensity score analysis is not completely clear. This paper aims to bring clarity on the consistency (or lack thereof) of methods that h… ▽ More

    Submitted 28 August, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

  22. arXiv:2301.04268  [pdf, other

    cs.LG cs.AI stat.ML

    Adversarial Online Multi-Task Reinforcement Learning

    Authors: Quan Nguyen, Nishant A. Mehta

    Abstract: We consider the adversarial online multi-task reinforcement learning setting, where in each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in $\mathcal{M}$ are well-separated under a notion of $λ$-separability… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: To appear at the 34th International Conference on Algorithmic Learning Theory (ALT 2023)

  23. arXiv:2301.04001  [pdf, other

    stat.AP cs.CE

    Big Ideas in Sports Analytics and Statistical Tools for their Investigation

    Authors: Benjamin S. Baumer, Gregory J. Matthews, Quang Nguyen

    Abstract: Sports analytics -- broadly defined as the pursuit of improvement in athletic performance through the analysis of data -- has expanded its footprint both in the professional sports industry and in academia over the past 30 years. In this paper, we connect four big ideas that are common across multiple sports: the expected value of a game state, win probability, measures of team strength, and the u… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    MSC Class: 62P99 ACM Class: J.2

  24. On the use of non-concurrent controls in platform trials: A scoping review

    Authors: Marta Bofill Roig, Cora Burgwinkel, Ursula Garczarek, Franz Koenig, Martin Posch, Quynh Nguyen, Katharina Hees

    Abstract: Platform trials gained popularity during the last few years as they increase flexibility compared to multi-arm trials by allowing new experimental arms entering when the trial already started. Using a shared control group in platform trials increases the trial efficiency compared to separate trials. Because of the later entry of some of the experimental treatment arms, the shared control group inc… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Journal ref: Trials 2023

  25. arXiv:2210.09974  [pdf, other

    quant-ph cs.LG stat.ML

    Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks

    Authors: Louis Schatzki, Martin Larocca, Quynh T. Nguyen, Frederic Sauvage, M. Cerezo

    Abstract: Despite the great promise of quantum machine learning models, there are several challenges one must overcome before unlocking their full potential. For instance, models based on quantum neural networks (QNNs) can suffer from excessive local minima and barren plateaus in their training landscapes. Recently, the nascent field of geometric quantum machine learning (GQML) has emerged as a potential so… ▽ More

    Submitted 13 February, 2024; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: 15+21 pages, 5 + 5 figures. Prior generalization bounds replaced with more general theorem. Comments added about hardness of simulation and narrow gorges

    Report number: LA-UR-22-29899

    Journal ref: npj Quantum Inf 10, 12 (2024)

  26. arXiv:2210.08566  [pdf, other

    quant-ph cs.LG stat.ML

    Theory for Equivariant Quantum Neural Networks

    Authors: Quynh T. Nguyen, Louis Schatzki, Paolo Braccia, Michael Ragone, Patrick J. Coles, Frederic Sauvage, Martin Larocca, M. Cerezo

    Abstract: Quantum neural network architectures that have little-to-no inductive biases are known to face trainability and generalization issues. Inspired by a similar problem, recent breakthroughs in machine learning address this challenge by creating models encoding the symmetries of the learning task. This is materialized through the usage of equivariant neural networks whose action commutes with that of… ▽ More

    Submitted 10 May, 2024; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: 26+21 pages, 12 + 2 figures; journal version with new numerics section

    Report number: LA-UR-22-30859

    Journal ref: PRX Quantum 5, 020328 (2024)

  27. arXiv:2210.07980  [pdf, other

    quant-ph cs.LG math.RT stat.ML

    Representation Theory for Geometric Quantum Machine Learning

    Authors: Michael Ragone, Paolo Braccia, Quynh T. Nguyen, Louis Schatzki, Patrick J. Coles, Frederic Sauvage, Martin Larocca, M. Cerezo

    Abstract: Recent advances in classical machine learning have shown that creating models with inductive biases encoding the symmetries of a problem can greatly improve performance. Importation of these ideas, combined with an existing rich body of work at the nexus of quantum theory and symmetry, has given rise to the field of Geometric Quantum Machine Learning (GQML). Following the success of its classical… ▽ More

    Submitted 7 February, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: 43 pages, 10 figures. Updated to add relevant references

    Report number: LA-UR-22-30670

  28. arXiv:2210.02383  [pdf, other

    stat.AP

    Filling the Gaps: A Multiple Imputation Approach to Estimating Aging Curves in Baseball

    Authors: Quang Nguyen, Gregory J. Matthews

    Abstract: In sports, an aging curve depicts the relationship between average performance and age in athletes' careers. This paper investigates the aging curves for offensive players in Major League Baseball. We study this problem in a missing data context and account for different types of dropouts of baseball players during their careers. We employ a multiple imputation framework for multilevel data to imp… ▽ More

    Submitted 11 March, 2024; v1 submitted 5 October, 2022; originally announced October 2022.

  29. arXiv:2209.07306  [pdf, other

    stat.AP cs.CY math.ST

    Statistical Modeling of Data Breach Risks: Time to Identification and Notification

    Authors: Maochao Xu, Quynh Nhu Nguyen

    Abstract: It is very challenging to predict the cost of a cyber incident owing to the complex nature of cyber risk. However, it is inevitable for insurance companies who offer cyber insurance policies. The time to identifying an incident and the time to noticing the affected individuals are two important components in determining the cost of a cyber incident. In this work, we initialize the study on those t… ▽ More

    Submitted 24 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

  30. arXiv:2202.13597  [pdf, other

    cs.LG stat.ML

    Rectified Max-Value Entropy Search for Bayesian Optimization

    Authors: Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet

    Abstract: Although the existing max-value entropy search (MES) is based on the widely celebrated notion of mutual information, its empirical performance can suffer due to two misconceptions whose implications on the exploration-exploitation trade-off are investigated in this paper. These issues are essential in the development of future acquisition functions and the improvement of the existing ones as they… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

  31. An Examination of Olympic Sport Climbing Competition Format and Scoring System

    Authors: Quang Nguyen, Hannah Butler, Gregory J. Matthews

    Abstract: Sport climbing, which made its Olympic debut at the 2020 Summer Games, generally consists of three separate disciplines: speed climbing, bouldering, and lead climbing. However, the International Olympic Committee (IOC) only allowed one set of medals each for men and women in sport climbing. As a result, the governing body of sport climbing, rather than choosing only one of the three disciplines to… ▽ More

    Submitted 28 March, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: 17 pages, 7 figures

  32. arXiv:2107.14465  [pdf, other

    cs.LG cs.AI stat.ML

    Trusted-Maximizers Entropy Search for Efficient Bayesian Optimization

    Authors: Quoc Phong Nguyen, Zhaoxuan Wu, Bryan Kian Hsiang Low, Patrick Jaillet

    Abstract: Information-based Bayesian optimization (BO) algorithms have achieved state-of-the-art performance in optimizing a black-box objective function. However, they usually require several approximations or simplifying assumptions (without clearly understanding their effects on the BO performance) and/or their generalization to batch BO is computationally unwieldy, especially with an increasing batch si… ▽ More

    Submitted 30 July, 2021; originally announced July 2021.

    Comments: Published as a conference paper at UAI 2021

  33. Poisson Modeling and Predicting English Premier League Goal Scoring

    Authors: Quang Nguyen

    Abstract: The English Premier League is well-known for being not only one of the most popular professional sports leagues in the world, but also one of the toughest competitions to predict. The first purpose of this research was to verify the consistency between goal scoring in the English Premier League and the Poisson process; specifically, the relationships between the number of goals scored in a match a… ▽ More

    Submitted 7 November, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

  34. arXiv:2102.09671  [pdf, other

    cs.LG stat.ML

    When Are Solutions Connected in Deep Networks?

    Authors: Quynh Nguyen, Pierre Brechet, Marco Mondelli

    Abstract: The question of how and why the phenomenon of mode connectivity occurs in training deep neural networks has gained remarkable attention in the research community. From a theoretical perspective, two possible explanations have been proposed: (i) the loss function has connected sublevel sets, and (ii) the solutions found by stochastic gradient descent are dropout stable. While these explanations pro… ▽ More

    Submitted 21 October, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: Accepted at NeurIPS 2021

  35. arXiv:2102.06857  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    On Robust Optimal Transport: Computational Complexity and Barycenter Computation

    Authors: Khang Le, Huy Nguyen, Quang Nguyen, Tung Pham, Hung Bui, Nhat Ho

    Abstract: We consider robust variants of the standard optimal transport, named robust optimal transport, where marginal constraints are relaxed via Kullback-Leibler divergence. We show that Sinkhorn-based algorithms can approximate the optimal cost of robust optimal transport in $\widetilde{\mathcal{O}}(\frac{n^2}{\varepsilon})$ time, in which $n$ is the number of supports of the probability distributions a… ▽ More

    Submitted 27 October, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Advances in NeurIPS, 2021; 52 pages, 10 figures; Khang Le and Huy Nguyen contributed equally to this week

  36. Causal mediation analysis: From simple to more robust strategies for estimation of marginal natural (in)direct effects

    Authors: Trang Quynh Nguyen, Elizabeth L. Ogburn, Ian Schmid, Elizabeth B. Sarker, Noah Greifer, Ina M. Koning, Elizabeth A. Stuart

    Abstract: This paper aims to provide practitioners of causal mediation analysis with a better understanding of estimation options. We take as inputs two familiar strategies (weighting and model-based prediction) and a simple way of combining them (weighted models), and show how a range of estimators can be generated, with different modeling requirements and robustness properties. The primary goal is to help… ▽ More

    Submitted 13 January, 2023; v1 submitted 11 February, 2021; originally announced February 2021.

    MSC Class: 62D20

    Journal ref: Statistics Surveys. 2023. 17:1-41

  37. arXiv:2102.05912  [pdf, other

    stat.ML cs.LG

    On Transportation of Mini-batches: A Hierarchical Approach

    Authors: Khai Nguyen, Dang Nguyen, Quoc Nguyen, Tung Pham, Hung Bui, Dinh Phung, Trung Le, Nhat Ho

    Abstract: Mini-batch optimal transport (m-OT) has been successfully used in practical applications that involve probability measures with a very high number of supports. The m-OT solves several smaller optimal transport problems and then returns the average of their costs and transportation plans. Despite its scalability advantage, the m-OT does not consider the relationship between mini-batches which leads… ▽ More

    Submitted 6 June, 2022; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: Accepted to ICML 2022, 34 pages, 16 figures, 9 tables

  38. arXiv:2101.09612  [pdf, ps, other

    cs.LG stat.ML

    On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths

    Authors: Quynh Nguyen

    Abstract: We give a simple proof for the global convergence of gradient descent in training deep ReLU networks with the standard square loss, and show some of its improvements over the state-of-the-art. In particular, while prior works require all the hidden layers to be wide with width at least $Ω(N^8)$ ($N$ being the number of training samples), we require a single wide layer of linear, quadratic or cubic… ▽ More

    Submitted 11 June, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

    Comments: ICML 2021

  39. arXiv:2101.08576  [pdf, ps, other

    cs.LG stat.ML

    A Note on Connectivity of Sublevel Sets in Deep Learning

    Authors: Quynh Nguyen

    Abstract: It is shown that for deep neural networks, a single wide layer of width $N+1$ ($N$ being the number of training samples) suffices to prove the connectivity of sublevel sets of the training loss function. In the two-layer setting, the same property may not hold even if one has just one neuron less (i.e. width $N$ can lead to disconnected sublevel sets).

    Submitted 21 January, 2021; originally announced January 2021.

  40. arXiv:2012.11654  [pdf, other

    stat.ML cs.LG

    Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks

    Authors: Quynh Nguyen, Marco Mondelli, Guido Montufar

    Abstract: A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been related to the memorization capacity, the global convergence of gradient descent algorithms and the generalization of deep nets. However, existing results either provide bounds in the two-layer setting or assume that t… ▽ More

    Submitted 21 August, 2022; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: appeared at ICML 2021, this version corrects a mistake in Lemma 5.4 which also affects Lemma 5.5. These two Lemmas have been edited and the corresponding proofs corrected. All the other results remain untouched

  41. arXiv:2012.10695  [pdf, other

    cs.LG stat.ML

    An Information-Theoretic Framework for Unifying Active Learning Problems

    Authors: Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet

    Abstract: This paper presents an information-theoretic framework for unifying active learning problems: level set estimation (LSE), Bayesian optimization (BO), and their generalized variant. We first introduce a novel active learning criterion that subsumes an existing LSE algorithm and achieves state-of-the-art performance in LSE problems with a continuous input domain. Then, by exploiting the relationship… ▽ More

    Submitted 19 December, 2020; originally announced December 2020.

    Comments: 35th AAAI Conference on Artificial Intelligence (AAAI 2021), Extended version with derivations, 12 pages

  42. arXiv:2012.10688  [pdf, other

    cs.LG stat.ML

    Top-$k$ Ranking Bayesian Optimization

    Authors: Quoc Phong Nguyen, Sebastian Tay, Bryan Kian Hsiang Low, Patrick Jaillet

    Abstract: This paper presents a novel approach to top-$k$ ranking Bayesian optimization (top-$k$ ranking BO) which is a practical and significant generalization of preferential BO to handle top-$k$ ranking and tie/indifference observations. We first design a surrogate model that is not only capable of catering to the above observations, but is also supported by a classic random utility model. Another equall… ▽ More

    Submitted 19 December, 2020; originally announced December 2020.

    Comments: 35th AAAI Conference on Artificial Intelligence (AAAI 2021), Extended version with derivations, 13 pages

  43. Sensitivity analyses for effect modifiers not observed in the target population when generalizing treatment effects from a randomized controlled trial: Assumptions, models, effect scales, data scenarios, and implementation details

    Authors: Trang Quynh Nguyen, Benjamin Ackerman, Ian Schmid, Stephen R. Cole, Elizabeth A. Stuart

    Abstract: Background: Randomized controlled trials are often used to inform policy and practice for broad populations. The average treatment effect (ATE) for a target population, however, may be different from the ATE observed in a trial if there are effect modifiers whose distribution in the target population is different that from that in the trial. Methods exist to use trial data to estimate the target p… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    Journal ref: PLOS ONE. 2018. 13(12): e0208795

  44. Clarifying causal mediation analysis: Effect identification via three assumptions and five potential outcomes

    Authors: Trang Quynh Nguyen, Ian Schmid, Elizabeth L. Ogburn, Elizabeth A. Stuart

    Abstract: Causal mediation analysis is complicated with multiple effect definitions that require different sets of assumptions for identification. This paper provides a systematic explanation of such assumptions. We define five potential outcome types whose means are involved in various effect definitions. We tackle their mean/distribution's identification, starting with the one that requires the weakest as… ▽ More

    Submitted 7 July, 2022; v1 submitted 18 November, 2020; originally announced November 2020.

    Journal ref: Journal of Causal Inference. 2022. 10:246-279

  45. arXiv:2010.12883  [pdf, other

    cs.LG stat.ML

    Variational Bayesian Unlearning

    Authors: Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet

    Abstract: This paper studies the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We frame this problem as one of minimizing the Kullback-Leibler divergence between the approximate posterior belief of model parameters after directly unlearning from erased data vs. the exact posterior belief from retraining with remaining data. Using the variational… ▽ More

    Submitted 24 October, 2020; originally announced October 2020.

    Comments: 34th Annual Conference on Neural Information Processing Systems (NeurIPS 2020), Extended version with proofs, 22 pages

  46. arXiv:2010.00994  [pdf, other

    cs.LG cs.SI stat.AP

    A local geometry of hyperedges in hypergraphs, and its applications to social networks

    Authors: Dong Quan Ngoc Nguyen, Lin Xing

    Abstract: In many real world datasets arising from social networks, there are hidden higher order relations among data points which cannot be captured using graph modeling. It is natural to use a more general notion of hypergraphs to model such social networks. In this paper, we introduce a new local geometry of hyperdges in hypergraphs which allows to capture higher order relations among data points. Furth… ▽ More

    Submitted 29 September, 2020; originally announced October 2020.

  47. arXiv:2010.00435  [pdf, other

    cs.SI cs.LG stat.AP stat.ML

    Community detection, pattern recognition, and hypergraph-based learning: approaches using metric geometry and persistent homology

    Authors: Dong Quan Ngoc Nguyen, Lin Xing, Lizhen Lin

    Abstract: Hypergraph data appear and are hidden in many places in the modern age. They are data structure that can be used to model many real data examples since their structures contain information about higher order relations among data points. One of the main contributions of our paper is to introduce a new topological structure to hypergraph data which bears a resemblance to a usual metric space structu… ▽ More

    Submitted 29 September, 2020; originally announced October 2020.

  48. arXiv:2009.14311  [pdf, other

    cs.SI cs.LG stat.ML

    Weight Prediction for Variants of Weighted Directed Networks

    Authors: Dong Quan Ngoc Nguyen, Lin Xing, Lizhen Lin

    Abstract: A weighted directed network (WDN) is a directed graph in which each edge is associated to a unique value called weight. These networks are very suitable for modeling real-world social networks in which there is an assessment of one vertex toward other vertices. One of the main problems studied in this paper is prediction of edge weights in such networks. We introduce, for the first time, a metric… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

  49. arXiv:2008.05089  [pdf, other

    cs.LG stat.ML

    Quaternion Graph Neural Networks

    Authors: Dai Quoc Nguyen, Tu Dinh Nguyen, Dinh Phung

    Abstract: Recently, graph neural networks (GNNs) have become an important and active research direction in deep learning. It is worth noting that most of the existing GNN-based methods learn graph representations within the Euclidean vector space. Beyond the Euclidean space, learning representation and embeddings in hyper-complex space have also shown to be a promising and effective approach. To this end, w… ▽ More

    Submitted 6 October, 2021; v1 submitted 11 August, 2020; originally announced August 2020.

    Comments: Camera-ready for ACML 2021. Additional implementations for Gated QGNNs, Dual QGNNs, Simplifying QGNNs

  50. arXiv:2006.12100  [pdf, other

    cs.LG cs.CL cs.SI stat.ML

    A Self-Attention Network based Node Embedding Model

    Authors: Dai Quoc Nguyen, Tu Dinh Nguyen, Dinh Phung

    Abstract: Despite several signs of progress have been made recently, limited research has been conducted for an inductive setting where embeddings are required for newly unseen nodes -- a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extracti… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted version, ECML-PKDD 2020