-
A Cautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference
Authors:
Harsh Parikh,
Trang Quynh Nguyen,
Elizabeth A. Stuart,
Kara E. Rudolph,
Caleb H. Miles
Abstract:
Data integration approaches are increasingly used to enhance the efficiency and generalizability of studies. However, a key limitation of these methods is the assumption that outcome measures are identical across datasets -- an assumption that often does not hold in practice. Consider the following opioid use disorder (OUD) studies: the XBOT trial and the POAT study, both evaluating the effect of…
▽ More
Data integration approaches are increasingly used to enhance the efficiency and generalizability of studies. However, a key limitation of these methods is the assumption that outcome measures are identical across datasets -- an assumption that often does not hold in practice. Consider the following opioid use disorder (OUD) studies: the XBOT trial and the POAT study, both evaluating the effect of medications for OUD on withdrawal symptom severity (not the primary outcome of either trial). While XBOT measures withdrawal severity using the subjective opiate withdrawal scale, POAT uses the clinical opiate withdrawal scale. We analyze this realistic yet challenging setting where outcome measures differ across studies and where neither study records both types of outcomes. Our paper studies whether and when integrating studies with disparate outcome measures leads to efficiency gains. We introduce three sets of assumptions -- with varying degrees of strength -- linking both outcome measures. Our theoretical and empirical results highlight a cautionary tale: integration can improve asymptotic efficiency only under the strongest assumption linking the outcomes. However, misspecification of this assumption leads to bias. In contrast, a milder assumption may yield finite-sample efficiency gains, yet these benefits diminish as sample size increases. We illustrate these trade-offs via a case study integrating the XBOT and POAT datasets to estimate the comparative effect of two medications for opioid use disorder on withdrawal symptoms. By systematically varying the assumptions linking the SOW and COW scales, we show potential efficiency gains and the risks of bias. Our findings emphasize the need for careful assumption selection when fusing datasets with differing outcome measures, offering guidance for researchers navigating this common challenge in modern data integration.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
A multilevel model with heterogeneous variances for snap timing in the National Football League
Authors:
Quang Nguyen,
Ronald Yurko
Abstract:
Player tracking data have provided great opportunities to generate novel insights into understudied areas of American football, such as pre-snap motion. Using a Bayesian multilevel model with heterogeneous variances, we provide an assessment of NFL quarterbacks and their ability to synchronize the timing of the ball snap with pre-snap movement from their teammates. We focus on passing plays with r…
▽ More
Player tracking data have provided great opportunities to generate novel insights into understudied areas of American football, such as pre-snap motion. Using a Bayesian multilevel model with heterogeneous variances, we provide an assessment of NFL quarterbacks and their ability to synchronize the timing of the ball snap with pre-snap movement from their teammates. We focus on passing plays with receivers in motion at the snap and running a route, and define the snap timing as the time between the moment a receiver begins motioning and the ball snap event. We assume a Gamma distribution for the play-level snap timing and model the mean parameter with player and team random effects, along with relevant fixed effects such as the motion type identified via a Gaussian mixture model. Most importantly, we model the shape parameter with quarterback random effects, which enables us to estimate the differences in snap timing variability among NFL quarterbacks. We demonstrate that higher variability in snap timing is beneficial for the passing game, as it relates to facing less havoc created by the opposing defense. We also obtain a quarterback leaderboard based on our snap timing variability measure, and Patrick Mahomes stands out as the top player.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
BILBO: BILevel Bayesian Optimization
Authors:
Ruth Wan Theng Chew,
Quoc Phong Nguyen,
Bryan Kian Hsiang Low
Abstract:
Bilevel optimization is characterized by a two-level optimization structure, where the upper-level problem is constrained by optimal lower-level solutions, and such structures are prevalent in real-world problems. The constraint by optimal lower-level solutions poses significant challenges, especially in noisy, constrained, and derivative-free settings, as repeating lower-level optimizations is sa…
▽ More
Bilevel optimization is characterized by a two-level optimization structure, where the upper-level problem is constrained by optimal lower-level solutions, and such structures are prevalent in real-world problems. The constraint by optimal lower-level solutions poses significant challenges, especially in noisy, constrained, and derivative-free settings, as repeating lower-level optimizations is sample inefficient and predicted lower-level solutions may be suboptimal. We present BILevel Bayesian Optimization (BILBO), a novel Bayesian optimization algorithm for general bilevel problems with blackbox functions, which optimizes both upper- and lower-level problems simultaneously, without the repeated lower-level optimization required by existing methods. BILBO samples from confidence-bounds based trusted sets, which bounds the suboptimality on the lower level. Moreover, BILBO selects only one function query per iteration, where the function query selection strategy incorporates the uncertainty of estimated lower-level solutions and includes a conditional reassignment of the query to encourage exploration of the lower-level objective. The performance of BILBO is theoretically guaranteed with a sublinear regret bound for commonly used kernels and is empirically evaluated on several synthetic and real-world problems.
△ Less
Submitted 28 May, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
Multiple-Input Variational Auto-Encoder for Anomaly Detection in Heterogeneous Data
Authors:
Phai Vu Dinh,
Diep N. Nguyen,
Dinh Thai Hoang,
Quang Uy Nguyen,
Eryk Dutkiewicz
Abstract:
Anomaly detection (AD) plays a pivotal role in AI applications, e.g., in classification, and intrusion/threat detection in cybersecurity. However, most existing methods face challenges of heterogeneity amongst feature subsets posed by non-independent and identically distributed (non-IID) data. We propose a novel neural network model called Multiple-Input Auto-Encoder for AD (MIAEAD) to address thi…
▽ More
Anomaly detection (AD) plays a pivotal role in AI applications, e.g., in classification, and intrusion/threat detection in cybersecurity. However, most existing methods face challenges of heterogeneity amongst feature subsets posed by non-independent and identically distributed (non-IID) data. We propose a novel neural network model called Multiple-Input Auto-Encoder for AD (MIAEAD) to address this. MIAEAD assigns an anomaly score to each feature subset of a data sample to indicate its likelihood of being an anomaly. This is done by using the reconstruction error of its sub-encoder as the anomaly score. All sub-encoders are then simultaneously trained using unsupervised learning to determine the anomaly scores of feature subsets. The final AUC of MIAEAD is calculated for each sub-dataset, and the maximum AUC obtained among the sub-datasets is selected. To leverage the modelling of the distribution of normal data to identify anomalies of the generative models, we develop a novel neural network architecture/model called Multiple-Input Variational Auto-Encoder (MIVAE). MIVAE can process feature subsets through its sub-encoders before learning distribution of normal data in the latent space. This allows MIVAE to identify anomalies that deviate from the learned distribution. We theoretically prove that the difference in the average anomaly score between normal samples and anomalies obtained by the proposed MIVAE is greater than that of the Variational Auto-Encoder (VAEAD), resulting in a higher AUC for MIVAE. Extensive experiments on eight real-world anomaly datasets demonstrate the superior performance of MIAEAD and MIVAE over conventional methods and the state-of-the-art unsupervised models, by up to 6% in terms of AUC score. Alternatively, MIAEAD and MIVAE have a high AUC when applied to feature subsets with low heterogeneity based on the coefficient of variation (CV) score.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
Improving Pareto Set Learning for Expensive Multi-objective Optimization via Stein Variational Hypernetworks
Authors:
Minh-Duc Nguyen,
Phuong Mai Dinh,
Quang-Huy Nguyen,
Long P. Hoang,
Dung D. Le
Abstract:
Expensive multi-objective optimization problems (EMOPs) are common in real-world scenarios where evaluating objective functions is costly and involves extensive computations or physical experiments. Current Pareto set learning methods for such problems often rely on surrogate models like Gaussian processes to approximate the objective functions. These surrogate models can become fragmented, result…
▽ More
Expensive multi-objective optimization problems (EMOPs) are common in real-world scenarios where evaluating objective functions is costly and involves extensive computations or physical experiments. Current Pareto set learning methods for such problems often rely on surrogate models like Gaussian processes to approximate the objective functions. These surrogate models can become fragmented, resulting in numerous small uncertain regions between explored solutions. When using acquisition functions such as the Lower Confidence Bound (LCB), these uncertain regions can turn into pseudo-local optima, complicating the search for globally optimal solutions. To address these challenges, we propose a novel approach called SVH-PSL, which integrates Stein Variational Gradient Descent (SVGD) with Hypernetworks for efficient Pareto set learning. Our method addresses the issues of fragmented surrogate models and pseudo-local optima by collectively moving particles in a manner that smooths out the solution space. The particles interact with each other through a kernel function, which helps maintain diversity and encourages the exploration of underexplored regions. This kernel-based interaction prevents particles from clustering around pseudo-local optima and promotes convergence towards globally optimal solutions. Our approach aims to establish robust relationships between trade-off reference vectors and their corresponding true Pareto solutions, overcoming the limitations of existing methods. Through extensive experiments across both synthetic and real-world MOO benchmarks, we demonstrate that SVH-PSL significantly improves the quality of the learned Pareto set, offering a promising solution for expensive multi-objective optimization problems.
△ Less
Submitted 15 March, 2025; v1 submitted 23 December, 2024;
originally announced December 2024.
-
High-Dimensional Bayesian Optimization via Random Projection of Manifold Subspaces
Authors:
Quoc-Anh Hoang Nguyen,
The Hung Tran
Abstract:
Bayesian Optimization (BO) is a popular approach to optimizing expensive-to-evaluate black-box functions. Despite the success of BO, its performance may decrease exponentially as the dimensionality increases. A common framework to tackle this problem is to assume that the objective function depends on a limited set of features that lie on a low-dimensional manifold embedded in the high-dimensional…
▽ More
Bayesian Optimization (BO) is a popular approach to optimizing expensive-to-evaluate black-box functions. Despite the success of BO, its performance may decrease exponentially as the dimensionality increases. A common framework to tackle this problem is to assume that the objective function depends on a limited set of features that lie on a low-dimensional manifold embedded in the high-dimensional ambient space. The latent space can be linear or more generally nonlinear. To learn feature mapping, existing works usually use an encode-decoder framework which is either computationally expensive or susceptible to overfittting when the labeled data is limited. This paper proposes a new approach for BO in high dimensions by exploiting a new representation of the objective function. Our approach combines a random linear projection to reduce the dimensionality, with a representation learning of the nonlinear manifold. When the geometry of the latent manifold is available, a solution to exploit this geometry is proposed for representation learning. In contrast, we use a neural network. To mitigate overfitting by using the neural network, we train the feature mapping in a geometry-aware semi-supervised manner. Our approach enables efficient optimizing of BO's acquisition function in the low-dimensional space, with the advantage of projecting back to the original high-dimensional space compared to existing works in the same setting. Finally, we show empirically that our algorithm outperforms other high-dimensional BO baselines in various synthetic functions and real applications.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
Self-separated and self-connected models for mediator and outcome missingness in mediation analysis
Authors:
Trang Quynh Nguyen,
Razieh Nabi,
Fan Yang,
Elizabeth A. Stuart
Abstract:
Missing data is a common problem that challenges the study of effects of treatments. In the context of mediation analysis, this paper addresses missingness in the two key variables, mediator and outcome, focusing on identification. We consider self-separated missingness models where identification is achieved by conditional independence assumptions only and self-connected missingness models where…
▽ More
Missing data is a common problem that challenges the study of effects of treatments. In the context of mediation analysis, this paper addresses missingness in the two key variables, mediator and outcome, focusing on identification. We consider self-separated missingness models where identification is achieved by conditional independence assumptions only and self-connected missingness models where identification relies on so-called shadow variables. The first class is somewhat limited as it is constrained by the need to remove a certain number of connections from the model. The second class turns out to include substantial variation in the position of the shadow variable in the causal structure (vis-a-vis the mediator and outcome) and the corresponding implications for the model. In constructing the models, to improve plausibility, we pay close attention to allowing, where possible, dependencies due to unobserved causes of the missingness. In this exploration, we develop theory where needed. This results in templates for identification in this mediation setting, generally useful identification techniques, and perhaps most significantly, synthesis and substantial expansion of shadow variable theory.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
In defense of MAR over latent ignorability (or latent MAR) for outcome missingness in studying principal causal effects: a causal graph view
Authors:
Trang Quynh Nguyen
Abstract:
This paper concerns outcome missingness in principal stratification analysis. We revisit a common assumption known as latent ignorability or latent missing-at-random (LMAR), often considered a relaxation of missing-at-random (MAR). LMAR posits that the outcome is independent of its missingness if one conditions on principal stratum (which is partially unobservable) in addition to observed variable…
▽ More
This paper concerns outcome missingness in principal stratification analysis. We revisit a common assumption known as latent ignorability or latent missing-at-random (LMAR), often considered a relaxation of missing-at-random (MAR). LMAR posits that the outcome is independent of its missingness if one conditions on principal stratum (which is partially unobservable) in addition to observed variables. The literature has focused on methods assuming LMAR (usually supplemented with a more specific assumption about the missingness), without considering the theoretical plausibility and necessity of LMAR. In this paper, we devise a way to represent principal stratum in causal graphs, and use causal graphs to examine this assumption. We find that LMAR is harder to satisfy than MAR, and for the purpose of breaking the dependence between the outcome and its missingness, no benefit is gained from conditioning on principal stratum on top of conditioning on observed variables. This finding has an important implication: MAR should be preferred over LMAR. This is convenient because MAR is easier to handle and (unlike LMAR) if MAR is assumed no additional assumption is needed. We thus turn to focus on the plausibility of MAR and its implications, with a view to facilitate appropriate use of this assumption. We clarify conditions on the causal structure and on auxiliary variables (if available) that need to hold for MAR to hold, and we use MAR to recover effect identification under two dominant identification assumptions (exclusion restriction and principal ignorability). We briefly comment on cases where MAR does not hold. In terms of broader connections, most of the MAR findings are also relevant to classic instrumental variable analysis that targets the local average treatment effect; and the LMAR finding suggests general caution with assumptions that condition on principal stratum.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Heterogeneous Hypergraph Embedding for Recommendation Systems
Authors:
Darnbi Sakong,
Viet Hung Vu,
Thanh Trung Huynh,
Phi Le Nguyen,
Hongzhi Yin,
Quoc Viet Hung Nguyen,
Thanh Tam Nguyen
Abstract:
Recent advancements in recommender systems have focused on integrating knowledge graphs (KGs) to leverage their auxiliary information. The core idea of KG-enhanced recommenders is to incorporate rich semantic information for more accurate recommendations. However, two main challenges persist: i) Neglecting complex higher-order interactions in the KG-based user-item network, potentially leading to…
▽ More
Recent advancements in recommender systems have focused on integrating knowledge graphs (KGs) to leverage their auxiliary information. The core idea of KG-enhanced recommenders is to incorporate rich semantic information for more accurate recommendations. However, two main challenges persist: i) Neglecting complex higher-order interactions in the KG-based user-item network, potentially leading to sub-optimal recommendations, and ii) Dealing with the heterogeneous modalities of input sources, such as user-item bipartite graphs and KGs, which may introduce noise and inaccuracies. To address these issues, we present a novel Knowledge-enhanced Heterogeneous Hypergraph Recommender System (KHGRec). KHGRec captures group-wise characteristics of both the interaction network and the KG, modeling complex connections in the KG. Using a collaborative knowledge heterogeneous hypergraph (CKHG), it employs two hypergraph encoders to model group-wise interdependencies and ensure explainability. Additionally, it fuses signals from the input graphs with cross-view self-supervised learning and attention mechanisms. Extensive experiments on four real-world datasets show our model's superiority over various state-of-the-art baselines, with an average 5.18\% relative improvement. Additional tests on noise resilience, missing data, and cold-start problems demonstrate the robustness of our KHGRec framework. Our model and evaluation datasets are publicly available at \url{https://github.com/viethungvu1998/KHGRec}.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
NFL Ghosts: A framework for evaluating defender positioning with conditional density estimation
Authors:
Ronald Yurko,
Quang Nguyen,
Konstantinos Pelechrinis
Abstract:
Player attribution in American football remains an open problem due to the complex nature of twenty-two players interacting on the field, but the granularity of player tracking data provides ample opportunity for novel approaches. In this work, we introduce the first public framework to evaluate spatial and trajectory tracking data of players relative to a baseline distribution of "ghost" defender…
▽ More
Player attribution in American football remains an open problem due to the complex nature of twenty-two players interacting on the field, but the granularity of player tracking data provides ample opportunity for novel approaches. In this work, we introduce the first public framework to evaluate spatial and trajectory tracking data of players relative to a baseline distribution of "ghost" defenders. We demonstrate our framework in the context of modeling the nearest defender positioning at the moment of catch. In particular, we provide estimates of how much better or worse their observed positioning and trajectory compared to the expected play value of ghost defenders. Our framework leverages multi-dimensional tracking data features through flexible random forests for conditional density estimation in two ways: (1) to model the distribution of receiver yards gained enabling the estimation of within-play expected value, and (2) to model the 2D spatial distribution of baseline ghost defenders. We present novel metrics for measuring player and team performance based on tracking data, and discuss challenges that remain in extending our framework to other aspects of American football.
△ Less
Submitted 22 June, 2025; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design
Authors:
Quan Nguyen,
Adji Bousso Dieng
Abstract:
Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data…
▽ More
Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data. In this paper, we extend the Vendi scores -- a family of interpretable similarity-based diversity metrics -- to account for quality. We then leverage these quality-weighted Vendi scores to tackle experimental design problems across various applications, including drug discovery, materials discovery, and reinforcement learning. We found that quality-weighted Vendi scores allow us to construct policies for experimental design that flexibly balance quality and diversity, and ultimately assemble rich and diverse sets of high-performing data points. Our algorithms led to a 70%-170% increase in the number of effective discoveries compared to baselines.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Fractional Tackles: Leveraging Player Tracking Data for Within-Play Tackling Evaluation in American Football
Authors:
Quang Nguyen,
Ruitong Jiang,
Meg Ellingwood,
Ronald Yurko
Abstract:
Tackling is a fundamental defensive move in American football, with the main purpose of stopping the forward motion of the ball-carrier. However, current tackling metrics are manually recorded outcomes that are inherently flawed due to their discrete and subjective nature. Using player tracking data, we present a novel framework for assessing tackling contribution in a continuous and objective man…
▽ More
Tackling is a fundamental defensive move in American football, with the main purpose of stopping the forward motion of the ball-carrier. However, current tackling metrics are manually recorded outcomes that are inherently flawed due to their discrete and subjective nature. Using player tracking data, we present a novel framework for assessing tackling contribution in a continuous and objective manner. Our approach first identifies when a defender is in a ``contact window'' of the ball-carrier during a play, before assigning value to each window and the players involved. This enables us to devise a new metric called fractional tackles, which credits defenders for halting the ball-carrier's forward motion toward the end zone. We demonstrate that fractional tackles overcome the shortcomings of traditional metrics such as tackles and assists, by providing greater variation and measurable information for players lacking recorded statistics like defensive linemen. We view our contribution as a significant step forward in measuring defensive performance in American football and a clear demonstration of the capabilities of player tracking data.
△ Less
Submitted 7 January, 2025; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Near-optimal Per-Action Regret Bounds for Sleeping Bandits
Authors:
Quan Nguyen,
Nishant A. Mehta
Abstract:
We derive near-optimal per-action regret bounds for sleeping bandits, in which both the sets of available arms and their losses in every round are chosen by an adversary. In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(K\sqrt{TA\ln{K}})$, obtained indirectly via minimizing internal sleeping regrets. Compared to the min…
▽ More
We derive near-optimal per-action regret bounds for sleeping bandits, in which both the sets of available arms and their losses in every round are chosen by an adversary. In a setting with $K$ total arms and at most $A$ available arms in each round over $T$ rounds, the best known upper bound is $O(K\sqrt{TA\ln{K}})$, obtained indirectly via minimizing internal sleeping regrets. Compared to the minimax $Ω(\sqrt{TA})$ lower bound, this upper bound contains an extra multiplicative factor of $K\ln{K}$. We address this gap by directly minimizing the per-action regret using generalized versions of EXP3, EXP3-IX and FTRL with Tsallis entropy, thereby obtaining near-optimal bounds of order $O(\sqrt{TA\ln{K}})$ and $O(\sqrt{T\sqrt{AK}})$. We extend our results to the setting of bandits with advice from sleeping experts, generalizing EXP4 along the way. This leads to new proofs for a number of existing adaptive and tracking regret bounds for standard non-sleeping bandits. Extending our results to the bandit version of experts that report their confidences leads to new bounds for the confidence regret that depends primarily on the sum of experts' confidences. We prove a lower bound, showing that for any minimax optimal algorithms, there exists an action whose regret is sublinear in $T$ but linear in the number of its active rounds.
△ Less
Submitted 29 May, 2024; v1 submitted 2 March, 2024;
originally announced March 2024.
-
Practical challenges in mediation analysis: A guide for applied researchers
Authors:
Megan S. Schuler,
Donna L. Coffman,
Elizabeth A. Stuart,
Trang Q. Nguyen,
Brian Vegetabile,
Daniel F. McCaffrey
Abstract:
Mediation analysis is a statistical approach that can provide insights regarding the intermediary processes by which an intervention or exposure affects a given outcome. Mediation analyses rose to prominence, particularly in social science research, with the publication of the seminal paper by Baron and Kenny and is now commonly applied in many research disciplines, including health services resea…
▽ More
Mediation analysis is a statistical approach that can provide insights regarding the intermediary processes by which an intervention or exposure affects a given outcome. Mediation analyses rose to prominence, particularly in social science research, with the publication of the seminal paper by Baron and Kenny and is now commonly applied in many research disciplines, including health services research. Despite the growth in popularity, applied researchers may still encounter challenges in terms of conducting mediation analyses in practice. In this paper, we provide an overview of conceptual and methodological challenges that researchers face when conducting mediation analyses. Specifically, we discuss the following key challenges: (1) Conceptually differentiating mediators from other third variables, (2) Extending beyond the single mediator context, (3) Identifying appropriate datasets in which measurement and temporal ordering supports the hypothesized mediation model, (4) Selecting mediation effects that reflect the scientific question of interest, (5) Assessing the validity of underlying assumptions of no omitted confounders, (6) Addressing measurement error regarding the mediator, and (7) Clearly reporting results from mediation analyses. We discuss each challenge and highlight ways in which the applied researcher can approach these challenges.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Identification of complier and noncomplier average causal effects in the presence of latent missing-at-random (LMAR) outcomes: a unifying view and choices of assumptions
Authors:
Trang Quynh Nguyen,
Michelle C. Carlson,
Elizabeth A. Stuart
Abstract:
The study of treatment effects is often complicated by noncompliance and missing data. In the one-sided noncompliance setting where of interest are the complier and noncomplier average causal effects (CACE and NACE), we address outcome missingness of the \textit{latent missing at random} type (LMAR, also known as \textit{latent ignorability}). That is, conditional on covariates and treatment assig…
▽ More
The study of treatment effects is often complicated by noncompliance and missing data. In the one-sided noncompliance setting where of interest are the complier and noncomplier average causal effects (CACE and NACE), we address outcome missingness of the \textit{latent missing at random} type (LMAR, also known as \textit{latent ignorability}). That is, conditional on covariates and treatment assigned, the missingness may depend on compliance type. Within the instrumental variable (IV) approach to noncompliance, methods have been proposed for handling LMAR outcome that additionally invoke an exclusion restriction type assumption on missingness, but no solution has been proposed for when a non-IV approach is used. This paper focuses on effect identification in the presence of LMAR outcome, with a view to flexibly accommodate different principal identification approaches. We show that under treatment assignment ignorability and LMAR only, effect nonidentifiability boils down to a set of two connected mixture equations involving unidentified stratum-specific response probabilities and outcome means. This clarifies that (except for a special case) effect identification generally requires two additional assumptions: a \textit{specific missingness mechanism} assumption and a \textit{principal identification} assumption. This provides a template for identifying effects based on separate choices of these assumptions. We consider a range of specific missingness assumptions, including those that have appeared in the literature and some new ones. Incidentally, we find an issue in the existing assumptions, and propose a modification of the assumptions to avoid the issue. Results under different assumptions are illustrated using data from the Baltimore Experience Corps Trial.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Here Comes the STRAIN: Analyzing Defensive Pass Rush in American Football with Player Tracking Data
Authors:
Quang Nguyen,
Ronald Yurko,
Gregory J. Matthews
Abstract:
In American football, a pass rush is an attempt by the defensive team to disrupt the offense and prevent the quarterback (QB) from completing a pass. Existing metrics for assessing pass rush performance are either discrete-time quantities or based on subjective judgment. Using player tracking data, we propose STRAIN, a novel metric for evaluating pass rushers in the National Football League (NFL)…
▽ More
In American football, a pass rush is an attempt by the defensive team to disrupt the offense and prevent the quarterback (QB) from completing a pass. Existing metrics for assessing pass rush performance are either discrete-time quantities or based on subjective judgment. Using player tracking data, we propose STRAIN, a novel metric for evaluating pass rushers in the National Football League (NFL) at the continuous-time within-play level. Inspired by the concept of strain rate in materials science, STRAIN is a simple and interpretable means for measuring defensive pressure in football. It is a directly-observed statistic as a function of two features: the distance between the pass rusher and QB, and the rate at which this distance is being reduced. Our metric possesses great predictability of pressure and stability over time. We also fit a multilevel model for STRAIN to understand the defensive pressure contribution of every pass rusher at the play-level. We apply our approach to NFL data and present results for the first eight weeks of the 2021 regular season. In particular, we provide comparisons of STRAIN for different defensive positions and play outcomes, and rankings of the NFL's best pass rushers according to our metric.
△ Less
Submitted 30 July, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Comparison of Methods that Combine Multiple Randomized Trials to Estimate Heterogeneous Treatment Effects
Authors:
Carly Lupton Brantner,
Trang Quynh Nguyen,
Tengjie Tang,
Congwen Zhao,
Hwanhee Hong,
Elizabeth A. Stuart
Abstract:
Individualized treatment decisions can improve health outcomes, but using data to make these decisions in a reliable, precise, and generalizable way is challenging with a single dataset. Leveraging multiple randomized controlled trials allows for the combination of datasets with unconfounded treatment assignment to better estimate heterogeneous treatment effects. This paper discusses several non-p…
▽ More
Individualized treatment decisions can improve health outcomes, but using data to make these decisions in a reliable, precise, and generalizable way is challenging with a single dataset. Leveraging multiple randomized controlled trials allows for the combination of datasets with unconfounded treatment assignment to better estimate heterogeneous treatment effects. This paper discusses several non-parametric approaches for estimating heterogeneous treatment effects using data from multiple trials. We extend single-study methods to a scenario with multiple trials and explore their performance through a simulation study, with data generation scenarios that have differing levels of cross-trial heterogeneity. The simulations demonstrate that methods that directly allow for heterogeneity of the treatment effect across trials perform better than methods that do not, and that the choice of single-study method matters based on the functional form of the treatment effect. Finally, we discuss which methods perform well in each setting and then apply them to four randomized controlled trials to examine effect heterogeneity of treatments for major depressive disorder.
△ Less
Submitted 15 November, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Sensitivity analysis for principal ignorability violation in estimating complier and noncomplier average causal effects
Authors:
Trang Quynh Nguyen,
Elizabeth A. Stuart,
Daniel O. Scharfstein,
Elizabeth L. Ogburn
Abstract:
An important strategy for identifying principal causal effects, which are often used in settings with noncompliance, is to invoke the principal ignorability (PI) assumption. As PI is untestable, it is important to gauge how sensitive effect estimates are to its violation. We focus on this task for the common one-sided noncompliance setting where there are two principal strata, compliers and noncom…
▽ More
An important strategy for identifying principal causal effects, which are often used in settings with noncompliance, is to invoke the principal ignorability (PI) assumption. As PI is untestable, it is important to gauge how sensitive effect estimates are to its violation. We focus on this task for the common one-sided noncompliance setting where there are two principal strata, compliers and noncompliers. Under PI, compliers and noncompliers share the same outcome-mean-given-covariates function under the control condition. For sensitivity analysis, we allow this function to differ between compliers and noncompliers in several ways, indexed by an odds ratio, a generalized odds ratio, a mean ratio, or a standardized mean difference sensitivity parameter. We tailor sensitivity analysis techniques (with any sensitivity parameter choice) to several types of PI-based main analysis methods, including outcome regression, influence function (IF) based and weighting methods. We illustrate the proposed sensitivity analyses using several outcome types from the JOBS II study. This application estimates nuisance functions parametrically -- for simplicity and accessibility. In addition, we establish rate conditions on nonparametric nuisance estimation for IF-based estimators to be asymptotically normal -- with a view to inform nonparametric inference.
△ Less
Submitted 28 March, 2024; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Methods for Integrating Trials and Non-Experimental Data to Examine Treatment Effect Heterogeneity
Authors:
Carly Lupton Brantner,
Ting-Hsuan Chang,
Trang Quynh Nguyen,
Hwanhee Hong,
Leon Di Stefano,
Elizabeth A. Stuart
Abstract:
Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials…
▽ More
Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials and/or observational datasets. With many new methods available for assessing treatment effect heterogeneity using multiple studies, it is important to understand which methods are best used in which setting, how the methods compare to one another, and what needs to be done to continue progress in this field. This paper reviews these methods broken down by data setting: aggregate-level data, federated learning, and individual participant-level data. We define the conditional average treatment effect and discuss differences between parametric and nonparametric estimators, and we list key assumptions, both those that are required within a single study and those that are necessary for data combination. After describing existing approaches, we compare and contrast them and reveal open areas for future research. This review demonstrates that there are many possible approaches for estimating treatment effect heterogeneity through the combination of datasets, but that there is substantial work to be done to compare these methods through case studies and simulations, extend them to different settings, and refine them to account for various challenges present in real data.
△ Less
Submitted 28 March, 2023; v1 submitted 26 February, 2023;
originally announced February 2023.
-
Platform Trials: the Impact of common Controls on Type One Error and Power
Authors:
Quynh Nguyen,
Katharina Hees,
Benjamin Hofner
Abstract:
Platform trials offer a framework to study multiple interventions in a single trial with the opportunity of opening and closing arms. The use of a common control in platform trials can increase efficiency as compared to individual control arms or separate trials per treatment. However, the need for multiplicity adjustment as a consequence of common controls is currently a controversial debate amon…
▽ More
Platform trials offer a framework to study multiple interventions in a single trial with the opportunity of opening and closing arms. The use of a common control in platform trials can increase efficiency as compared to individual control arms or separate trials per treatment. However, the need for multiplicity adjustment as a consequence of common controls is currently a controversial debate among researchers, pharmaceutical companies, as well as regulators. We investigate the impact of a common control arm in platform trials on the type one error and power in comparison to what would have been obtained with a platform trial with individual control arms in a simulation study. Furthermore, we evaluate the impact on power in case multiplicity adjustment is required in a platform trial. In both study designs, the family-wise error rate (FWER) is inflated compared to a standard, two-armed randomized controlled trial when no multiplicity adjustment is applied. In case of a common control, the FWER inflation is smaller. In most circumstances, a platform trial with a common control is still beneficial in terms of sample size and power after multiplicity adjustment, whereas in some cases, the platform trial with a common control loses the efficiency gain. Therefore, we further discuss the need for adjustment in terms of a family definition or hypotheses dependencies.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Multiple imputation for propensity score analysis with covariates missing at random: some clarity on within and across methods
Authors:
Trang Quynh Nguyen,
Elizabeth A. Stuart
Abstract:
In epidemiology and social sciences, propensity score methods are popular for estimating treatment effects using observational data, and multiple imputation is popular for handling covariate missingness. However, how to appropriately use multiple imputation for propensity score analysis is not completely clear. This paper aims to bring clarity on the consistency (or lack thereof) of methods that h…
▽ More
In epidemiology and social sciences, propensity score methods are popular for estimating treatment effects using observational data, and multiple imputation is popular for handling covariate missingness. However, how to appropriately use multiple imputation for propensity score analysis is not completely clear. This paper aims to bring clarity on the consistency (or lack thereof) of methods that have been proposed, focusing on the within approach (where the effect is estimated separately in each imputed dataset and then the multiple estimates are combined) and the across approach (where typically propensity scores are averaged across imputed datasets before being used for effect estimation). We show that the within method is valid and can be used with any causal effect estimator that is consistent in the full-data setting. Existing across methods are inconsistent, but a different across method that averages the inverse probability weights across imputed datasets is consistent for propensity score weighting. We also comment on methods that rely on imputing a function of the missing covariate rather than the covariate itself, including imputation of the propensity score and of the probability weight. Based on consistency results and practical flexibility, we recommend generally using the standard within method. Throughout, we provide intuition to make the results meaningful to the broad audience of applied researchers.
△ Less
Submitted 28 August, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Adversarial Online Multi-Task Reinforcement Learning
Authors:
Quan Nguyen,
Nishant A. Mehta
Abstract:
We consider the adversarial online multi-task reinforcement learning setting, where in each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in $\mathcal{M}$ are well-separated under a notion of $λ$-separability…
▽ More
We consider the adversarial online multi-task reinforcement learning setting, where in each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in $\mathcal{M}$ are well-separated under a notion of $λ$-separability, and show that this notion generalizes many task-separability notions from previous works. We prove a minimax lower bound of $Ω(K\sqrt{DSAH})$ on the regret of any learning algorithm and an instance-specific lower bound of $Ω(\frac{K}{λ^2})$ in sample complexity for a class of uniformly-good cluster-then-learn algorithms. We use a novel construction called 2-JAO MDP for proving the instance-specific lower bound. The lower bounds are complemented with a polynomial time algorithm that obtains $\tilde{O}(\frac{K}{λ^2})$ sample complexity guarantee for the clustering phase and $\tilde{O}(\sqrt{MK})$ regret guarantee for the learning phase, indicating that the dependency on $K$ and $\frac{1}{λ^2}$ is tight.
△ Less
Submitted 10 January, 2023;
originally announced January 2023.
-
Big Ideas in Sports Analytics and Statistical Tools for their Investigation
Authors:
Benjamin S. Baumer,
Gregory J. Matthews,
Quang Nguyen
Abstract:
Sports analytics -- broadly defined as the pursuit of improvement in athletic performance through the analysis of data -- has expanded its footprint both in the professional sports industry and in academia over the past 30 years. In this paper, we connect four big ideas that are common across multiple sports: the expected value of a game state, win probability, measures of team strength, and the u…
▽ More
Sports analytics -- broadly defined as the pursuit of improvement in athletic performance through the analysis of data -- has expanded its footprint both in the professional sports industry and in academia over the past 30 years. In this paper, we connect four big ideas that are common across multiple sports: the expected value of a game state, win probability, measures of team strength, and the use of sports betting market data. For each, we explore both the shared similarities and individual idiosyncrasies of analytical approaches in each sport. While our focus is on the concepts underlying each type of analysis, any implementation necessarily involves statistical methodologies, computational tools, and data sources. Where appropriate, we outline how data, models, tools, and knowledge of the sport combine to generate actionable insights. We also describe opportunities to share analytical work, but omit an in-depth discussion of individual player evaluation as beyond our scope. This paper should serve as a useful overview for anyone becoming interested in the study of sports analytics.
△ Less
Submitted 10 January, 2023;
originally announced January 2023.
-
On the use of non-concurrent controls in platform trials: A scoping review
Authors:
Marta Bofill Roig,
Cora Burgwinkel,
Ursula Garczarek,
Franz Koenig,
Martin Posch,
Quynh Nguyen,
Katharina Hees
Abstract:
Platform trials gained popularity during the last few years as they increase flexibility compared to multi-arm trials by allowing new experimental arms entering when the trial already started. Using a shared control group in platform trials increases the trial efficiency compared to separate trials. Because of the later entry of some of the experimental treatment arms, the shared control group inc…
▽ More
Platform trials gained popularity during the last few years as they increase flexibility compared to multi-arm trials by allowing new experimental arms entering when the trial already started. Using a shared control group in platform trials increases the trial efficiency compared to separate trials. Because of the later entry of some of the experimental treatment arms, the shared control group includes concurrent and non-concurrent control data. For a given experimental arm, non-concurrent controls refer to patients allocated to the control arm before the arm enters the trial, while concurrent controls refer to control patients that are randomised concurrently to the experimental arm. Using non-concurrent controls can result in bias in the estimate in case of time trends if the appropriate methodology is not used and the assumptions are not met. In this paper, we faced two main objectives. In the first, we aimed to identify the methods currently available for incorporating non-concurrent controls, clarify the key concepts and assumptions, and name the main characteristics of each method. For this purpose, we systematically searched research articles on methods to include non-concurrent controls. The second objective is to summarise the current regulatory view on non-concurrent controls to clarify the key concepts and current guidance. Therefore, we conducted a systematic search in regulatory guidelines regarding using non-concurrent controls and summarised the most relevant arguments and recommended methods. Finally, we discuss the advantages and potential caveats of using non-concurrent controls.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks
Authors:
Louis Schatzki,
Martin Larocca,
Quynh T. Nguyen,
Frederic Sauvage,
M. Cerezo
Abstract:
Despite the great promise of quantum machine learning models, there are several challenges one must overcome before unlocking their full potential. For instance, models based on quantum neural networks (QNNs) can suffer from excessive local minima and barren plateaus in their training landscapes. Recently, the nascent field of geometric quantum machine learning (GQML) has emerged as a potential so…
▽ More
Despite the great promise of quantum machine learning models, there are several challenges one must overcome before unlocking their full potential. For instance, models based on quantum neural networks (QNNs) can suffer from excessive local minima and barren plateaus in their training landscapes. Recently, the nascent field of geometric quantum machine learning (GQML) has emerged as a potential solution to some of those issues. The key insight of GQML is that one should design architectures, such as equivariant QNNs, encoding the symmetries of the problem at hand. Here, we focus on problems with permutation symmetry (i.e., the group of symmetry $S_n$), and show how to build $S_n$-equivariant QNNs. We provide an analytical study of their performance, proving that they do not suffer from barren plateaus, quickly reach overparametrization, and generalize well from small amounts of data. To verify our results, we perform numerical simulations for a graph state classification task. Our work provides the first theoretical guarantees for equivariant QNNs, thus indicating the extreme power and potential of GQML.
△ Less
Submitted 13 February, 2024; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Theory for Equivariant Quantum Neural Networks
Authors:
Quynh T. Nguyen,
Louis Schatzki,
Paolo Braccia,
Michael Ragone,
Patrick J. Coles,
Frederic Sauvage,
Martin Larocca,
M. Cerezo
Abstract:
Quantum neural network architectures that have little-to-no inductive biases are known to face trainability and generalization issues. Inspired by a similar problem, recent breakthroughs in machine learning address this challenge by creating models encoding the symmetries of the learning task. This is materialized through the usage of equivariant neural networks whose action commutes with that of…
▽ More
Quantum neural network architectures that have little-to-no inductive biases are known to face trainability and generalization issues. Inspired by a similar problem, recent breakthroughs in machine learning address this challenge by creating models encoding the symmetries of the learning task. This is materialized through the usage of equivariant neural networks whose action commutes with that of the symmetry. In this work, we import these ideas to the quantum realm by presenting a comprehensive theoretical framework to design equivariant quantum neural networks (EQNN) for essentially any relevant symmetry group. We develop multiple methods to construct equivariant layers for EQNNs and analyze their advantages and drawbacks. Our methods can find unitary or general equivariant quantum channels efficiently even when the symmetry group is exponentially large or continuous. As a special implementation, we show how standard quantum convolutional neural networks (QCNN) can be generalized to group-equivariant QCNNs where both the convolution and pooling layers are equivariant to the symmetry group. We then numerically demonstrate the effectiveness of a SU(2)-equivariant QCNN over symmetry-agnostic QCNN on a classification task of phases of matter in the bond-alternating Heisenberg model. Our framework can be readily applied to virtually all areas of quantum machine learning. Lastly, we discuss about how symmetry-informed models such as EQNNs provide hopes to alleviate central challenges such as barren plateaus, poor local minima, and sample complexity.
△ Less
Submitted 10 May, 2024; v1 submitted 16 October, 2022;
originally announced October 2022.
-
Representation Theory for Geometric Quantum Machine Learning
Authors:
Michael Ragone,
Paolo Braccia,
Quynh T. Nguyen,
Louis Schatzki,
Patrick J. Coles,
Frederic Sauvage,
Martin Larocca,
M. Cerezo
Abstract:
Recent advances in classical machine learning have shown that creating models with inductive biases encoding the symmetries of a problem can greatly improve performance. Importation of these ideas, combined with an existing rich body of work at the nexus of quantum theory and symmetry, has given rise to the field of Geometric Quantum Machine Learning (GQML). Following the success of its classical…
▽ More
Recent advances in classical machine learning have shown that creating models with inductive biases encoding the symmetries of a problem can greatly improve performance. Importation of these ideas, combined with an existing rich body of work at the nexus of quantum theory and symmetry, has given rise to the field of Geometric Quantum Machine Learning (GQML). Following the success of its classical counterpart, it is reasonable to expect that GQML will play a crucial role in developing problem-specific and quantum-aware models capable of achieving a computational advantage. Despite the simplicity of the main idea of GQML -- create architectures respecting the symmetries of the data -- its practical implementation requires a significant amount of knowledge of group representation theory. We present an introduction to representation theory tools from the optics of quantum learning, driven by key examples involving discrete and continuous groups. These examples are sewn together by an exposition outlining the formal capture of GQML symmetries via "label invariance under the action of a group representation", a brief (but rigorous) tour through finite and compact Lie group representation theory, a reexamination of ubiquitous tools like Haar integration and twirling, and an overview of some successful strategies for detecting symmetries.
△ Less
Submitted 7 February, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Filling the Gaps: A Multiple Imputation Approach to Estimating Aging Curves in Baseball
Authors:
Quang Nguyen,
Gregory J. Matthews
Abstract:
In sports, an aging curve depicts the relationship between average performance and age in athletes' careers. This paper investigates the aging curves for offensive players in Major League Baseball. We study this problem in a missing data context and account for different types of dropouts of baseball players during their careers. We employ a multiple imputation framework for multilevel data to imp…
▽ More
In sports, an aging curve depicts the relationship between average performance and age in athletes' careers. This paper investigates the aging curves for offensive players in Major League Baseball. We study this problem in a missing data context and account for different types of dropouts of baseball players during their careers. We employ a multiple imputation framework for multilevel data to impute the player performance associated with the missing seasons, and estimate the aging curves based on the imputed datasets. We then evaluate the effects of different dropout mechanisms on the aging curves through simulation, before applying our method to analyze MLB player data from past seasons. Results suggest an overestimation of the aging curves constructed without considering the unobserved seasons, whereas estimates obtained from multiple imputation address this shortcoming.
△ Less
Submitted 11 March, 2024; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Statistical Modeling of Data Breach Risks: Time to Identification and Notification
Authors:
Maochao Xu,
Quynh Nhu Nguyen
Abstract:
It is very challenging to predict the cost of a cyber incident owing to the complex nature of cyber risk. However, it is inevitable for insurance companies who offer cyber insurance policies. The time to identifying an incident and the time to noticing the affected individuals are two important components in determining the cost of a cyber incident. In this work, we initialize the study on those t…
▽ More
It is very challenging to predict the cost of a cyber incident owing to the complex nature of cyber risk. However, it is inevitable for insurance companies who offer cyber insurance policies. The time to identifying an incident and the time to noticing the affected individuals are two important components in determining the cost of a cyber incident. In this work, we initialize the study on those two metrics via statistical modeling approaches. Particularly, we propose a novel approach to imputing the missing data, and further develop a dependence model to capture the complex pattern exhibited by those two metrics. The empirical study shows that the proposed approach has a satisfactory predictive performance and is superior to other commonly used models.
△ Less
Submitted 24 September, 2022; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Rectified Max-Value Entropy Search for Bayesian Optimization
Authors:
Quoc Phong Nguyen,
Bryan Kian Hsiang Low,
Patrick Jaillet
Abstract:
Although the existing max-value entropy search (MES) is based on the widely celebrated notion of mutual information, its empirical performance can suffer due to two misconceptions whose implications on the exploration-exploitation trade-off are investigated in this paper. These issues are essential in the development of future acquisition functions and the improvement of the existing ones as they…
▽ More
Although the existing max-value entropy search (MES) is based on the widely celebrated notion of mutual information, its empirical performance can suffer due to two misconceptions whose implications on the exploration-exploitation trade-off are investigated in this paper. These issues are essential in the development of future acquisition functions and the improvement of the existing ones as they encourage an accurate measure of the mutual information such as the rectified MES (RMES) acquisition function we develop in this work. Unlike the evaluation of MES, we derive a closed-form probability density for the observation conditioned on the max-value and employ stochastic gradient ascent with reparameterization to efficiently optimize RMES. As a result of a more principled acquisition function, RMES shows a consistent improvement over MES in several synthetic function benchmarks and real-world optimization problems.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
An Examination of Olympic Sport Climbing Competition Format and Scoring System
Authors:
Quang Nguyen,
Hannah Butler,
Gregory J. Matthews
Abstract:
Sport climbing, which made its Olympic debut at the 2020 Summer Games, generally consists of three separate disciplines: speed climbing, bouldering, and lead climbing. However, the International Olympic Committee (IOC) only allowed one set of medals each for men and women in sport climbing. As a result, the governing body of sport climbing, rather than choosing only one of the three disciplines to…
▽ More
Sport climbing, which made its Olympic debut at the 2020 Summer Games, generally consists of three separate disciplines: speed climbing, bouldering, and lead climbing. However, the International Olympic Committee (IOC) only allowed one set of medals each for men and women in sport climbing. As a result, the governing body of sport climbing, rather than choosing only one of the three disciplines to include in the Olympics, decided to create a competition combining all three disciplines. In order to determine a winner, a combined scoring system was created using the product of the ranks across the three disciplines to determine an overall score for each climber. In this work, the rank-product scoring system of sport climbing is evaluated through simulation to investigate its general features, specifically, the advancement probabilities and scores for climbers given certain placements. Additionally, analyses of historical climbing contest results are presented and real examples of violations of the independence of irrelevant alternatives are illustrated. Finally, this work finds evidence that the current competition format is putting speed climbers at a disadvantage.
△ Less
Submitted 28 March, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Trusted-Maximizers Entropy Search for Efficient Bayesian Optimization
Authors:
Quoc Phong Nguyen,
Zhaoxuan Wu,
Bryan Kian Hsiang Low,
Patrick Jaillet
Abstract:
Information-based Bayesian optimization (BO) algorithms have achieved state-of-the-art performance in optimizing a black-box objective function. However, they usually require several approximations or simplifying assumptions (without clearly understanding their effects on the BO performance) and/or their generalization to batch BO is computationally unwieldy, especially with an increasing batch si…
▽ More
Information-based Bayesian optimization (BO) algorithms have achieved state-of-the-art performance in optimizing a black-box objective function. However, they usually require several approximations or simplifying assumptions (without clearly understanding their effects on the BO performance) and/or their generalization to batch BO is computationally unwieldy, especially with an increasing batch size. To alleviate these issues, this paper presents a novel trusted-maximizers entropy search (TES) acquisition function: It measures how much an input query contributes to the information gain on the maximizer over a finite set of trusted maximizers, i.e., inputs optimizing functions that are sampled from the Gaussian process posterior belief of the objective function. Evaluating TES requires either only a stochastic approximation with sampling or a deterministic approximation with expectation propagation, both of which are investigated and empirically evaluated using synthetic benchmark objective functions and real-world optimization problems, e.g., hyperparameter tuning of a convolutional neural network and synthesizing 'physically realizable' faces to fool a black-box face recognition system. Though TES can naturally be generalized to a batch variant with either approximation, the latter is amenable to be scaled to a much larger batch size in our experiments.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
Poisson Modeling and Predicting English Premier League Goal Scoring
Authors:
Quang Nguyen
Abstract:
The English Premier League is well-known for being not only one of the most popular professional sports leagues in the world, but also one of the toughest competitions to predict. The first purpose of this research was to verify the consistency between goal scoring in the English Premier League and the Poisson process; specifically, the relationships between the number of goals scored in a match a…
▽ More
The English Premier League is well-known for being not only one of the most popular professional sports leagues in the world, but also one of the toughest competitions to predict. The first purpose of this research was to verify the consistency between goal scoring in the English Premier League and the Poisson process; specifically, the relationships between the number of goals scored in a match and the Poisson distribution, the time between goals throughout the course of a season and the exponential distribution, and the time location of goals during football games and the continuous uniform distribution. We found that the Poisson process and the three probability distributions accurately describe Premier League goal scoring. In addition, Poisson regression was utilized to predict outcomes for a Premier League season, using different sets of season data and with a large number of simulations being involved. We examined and compared various soccer metrics from our simulation results, including an English club's chances of being the champions, finishing in the top four and bottom three, and relegation points.
△ Less
Submitted 7 November, 2021; v1 submitted 20 May, 2021;
originally announced May 2021.
-
When Are Solutions Connected in Deep Networks?
Authors:
Quynh Nguyen,
Pierre Brechet,
Marco Mondelli
Abstract:
The question of how and why the phenomenon of mode connectivity occurs in training deep neural networks has gained remarkable attention in the research community. From a theoretical perspective, two possible explanations have been proposed: (i) the loss function has connected sublevel sets, and (ii) the solutions found by stochastic gradient descent are dropout stable. While these explanations pro…
▽ More
The question of how and why the phenomenon of mode connectivity occurs in training deep neural networks has gained remarkable attention in the research community. From a theoretical perspective, two possible explanations have been proposed: (i) the loss function has connected sublevel sets, and (ii) the solutions found by stochastic gradient descent are dropout stable. While these explanations provide insights into the phenomenon, their assumptions are not always satisfied in practice. In particular, the first approach requires the network to have one layer with order of $N$ neurons ($N$ being the number of training samples), while the second one requires the loss to be almost invariant after removing half of the neurons at each layer (up to some rescaling of the remaining ones). In this work, we improve both conditions by exploiting the quality of the features at every intermediate layer together with a milder over-parameterization condition. More specifically, we show that: (i) under generic assumptions on the features of intermediate layers, it suffices that the last two hidden layers have order of $\sqrt{N}$ neurons, and (ii) if subsets of features at each layer are linearly separable, then no over-parameterization is needed to show the connectivity. Our experiments confirm that the proposed condition ensures the connectivity of solutions found by stochastic gradient descent, even in settings where the previous requirements do not hold.
△ Less
Submitted 21 October, 2021; v1 submitted 18 February, 2021;
originally announced February 2021.
-
On Robust Optimal Transport: Computational Complexity and Barycenter Computation
Authors:
Khang Le,
Huy Nguyen,
Quang Nguyen,
Tung Pham,
Hung Bui,
Nhat Ho
Abstract:
We consider robust variants of the standard optimal transport, named robust optimal transport, where marginal constraints are relaxed via Kullback-Leibler divergence. We show that Sinkhorn-based algorithms can approximate the optimal cost of robust optimal transport in $\widetilde{\mathcal{O}}(\frac{n^2}{\varepsilon})$ time, in which $n$ is the number of supports of the probability distributions a…
▽ More
We consider robust variants of the standard optimal transport, named robust optimal transport, where marginal constraints are relaxed via Kullback-Leibler divergence. We show that Sinkhorn-based algorithms can approximate the optimal cost of robust optimal transport in $\widetilde{\mathcal{O}}(\frac{n^2}{\varepsilon})$ time, in which $n$ is the number of supports of the probability distributions and $\varepsilon$ is the desired error. Furthermore, we investigate a fixed-support robust barycenter problem between $m$ discrete probability distributions with at most $n$ number of supports and develop an approximating algorithm based on iterative Bregman projections (IBP). For the specific case $m = 2$, we show that this algorithm can approximate the optimal barycenter value in $\widetilde{\mathcal{O}}(\frac{mn^2}{\varepsilon})$ time, thus being better than the previous complexity $\widetilde{\mathcal{O}}(\frac{mn^2}{\varepsilon^2})$ of the IBP algorithm for approximating the Wasserstein barycenter.
△ Less
Submitted 27 October, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Causal mediation analysis: From simple to more robust strategies for estimation of marginal natural (in)direct effects
Authors:
Trang Quynh Nguyen,
Elizabeth L. Ogburn,
Ian Schmid,
Elizabeth B. Sarker,
Noah Greifer,
Ina M. Koning,
Elizabeth A. Stuart
Abstract:
This paper aims to provide practitioners of causal mediation analysis with a better understanding of estimation options. We take as inputs two familiar strategies (weighting and model-based prediction) and a simple way of combining them (weighted models), and show how a range of estimators can be generated, with different modeling requirements and robustness properties. The primary goal is to help…
▽ More
This paper aims to provide practitioners of causal mediation analysis with a better understanding of estimation options. We take as inputs two familiar strategies (weighting and model-based prediction) and a simple way of combining them (weighted models), and show how a range of estimators can be generated, with different modeling requirements and robustness properties. The primary goal is to help build intuitive appreciation for robust estimation that is conducive to sound practice. A second goal is to provide a "menu" of estimators that practitioners can choose from for the estimation of marginal natural (in)direct effects. The estimators generated from this exercise include some that coincide or are similar to existing estimators and others that have not previously appeared in the literature. We note several different ways to estimate the weights for cross-world weighting based on three expressions of the weighting function, including one that is novel; and show how to check the resulting covariate and mediator balance. We use a random continuous weights bootstrap to obtain confidence intervals, and also derive general asymptotic variance formulas for the estimators. The estimators are illustrated using data from an adolescent alcohol use prevention study.
△ Less
Submitted 13 January, 2023; v1 submitted 11 February, 2021;
originally announced February 2021.
-
On Transportation of Mini-batches: A Hierarchical Approach
Authors:
Khai Nguyen,
Dang Nguyen,
Quoc Nguyen,
Tung Pham,
Hung Bui,
Dinh Phung,
Trung Le,
Nhat Ho
Abstract:
Mini-batch optimal transport (m-OT) has been successfully used in practical applications that involve probability measures with a very high number of supports. The m-OT solves several smaller optimal transport problems and then returns the average of their costs and transportation plans. Despite its scalability advantage, the m-OT does not consider the relationship between mini-batches which leads…
▽ More
Mini-batch optimal transport (m-OT) has been successfully used in practical applications that involve probability measures with a very high number of supports. The m-OT solves several smaller optimal transport problems and then returns the average of their costs and transportation plans. Despite its scalability advantage, the m-OT does not consider the relationship between mini-batches which leads to undesirable estimation. Moreover, the m-OT does not approximate a proper metric between probability measures since the identity property is not satisfied. To address these problems, we propose a novel mini-batch scheme for optimal transport, named Batch of Mini-batches Optimal Transport (BoMb-OT), that finds the optimal coupling between mini-batches and it can be seen as an approximation to a well-defined distance on the space of probability measures. Furthermore, we show that the m-OT is a limit of the entropic regularized version of the BoMb-OT when the regularized parameter goes to infinity. Finally, we carry out experiments on various applications including deep generative models, deep domain adaptation, approximate Bayesian computation, color transfer, and gradient flow to show that the BoMb-OT can be widely applied and performs well in various applications.
△ Less
Submitted 6 June, 2022; v1 submitted 11 February, 2021;
originally announced February 2021.
-
On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths
Authors:
Quynh Nguyen
Abstract:
We give a simple proof for the global convergence of gradient descent in training deep ReLU networks with the standard square loss, and show some of its improvements over the state-of-the-art. In particular, while prior works require all the hidden layers to be wide with width at least $Ω(N^8)$ ($N$ being the number of training samples), we require a single wide layer of linear, quadratic or cubic…
▽ More
We give a simple proof for the global convergence of gradient descent in training deep ReLU networks with the standard square loss, and show some of its improvements over the state-of-the-art. In particular, while prior works require all the hidden layers to be wide with width at least $Ω(N^8)$ ($N$ being the number of training samples), we require a single wide layer of linear, quadratic or cubic width depending on the type of initialization. Unlike many recent proofs based on the Neural Tangent Kernel (NTK), our proof need not track the evolution of the entire NTK matrix, or more generally, any quantities related to the changes of activation patterns during training. Instead, we only need to track the evolution of the output at the last hidden layer, which can be done much more easily thanks to the Lipschitz property of ReLU. Some highlights of our setting: (i) all the layers are trained with standard gradient descent, (ii) the network has standard parameterization as opposed to the NTK one, and (iii) the network has a single wide layer as opposed to having all wide hidden layers as in most of NTK-related results.
△ Less
Submitted 11 June, 2021; v1 submitted 23 January, 2021;
originally announced January 2021.
-
A Note on Connectivity of Sublevel Sets in Deep Learning
Authors:
Quynh Nguyen
Abstract:
It is shown that for deep neural networks, a single wide layer of width $N+1$ ($N$ being the number of training samples) suffices to prove the connectivity of sublevel sets of the training loss function. In the two-layer setting, the same property may not hold even if one has just one neuron less (i.e. width $N$ can lead to disconnected sublevel sets).
It is shown that for deep neural networks, a single wide layer of width $N+1$ ($N$ being the number of training samples) suffices to prove the connectivity of sublevel sets of the training loss function. In the two-layer setting, the same property may not hold even if one has just one neuron less (i.e. width $N$ can lead to disconnected sublevel sets).
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks
Authors:
Quynh Nguyen,
Marco Mondelli,
Guido Montufar
Abstract:
A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been related to the memorization capacity, the global convergence of gradient descent algorithms and the generalization of deep nets. However, existing results either provide bounds in the two-layer setting or assume that t…
▽ More
A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been related to the memorization capacity, the global convergence of gradient descent algorithms and the generalization of deep nets. However, existing results either provide bounds in the two-layer setting or assume that the spectrum of the NTK matrices is bounded away from 0 for multi-layer networks. In this paper, we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU nets, both in the limiting case of infinite widths and for finite widths. In the finite-width setting, the network architectures we consider are fairly general: we require the existence of a wide layer with roughly order of $N$ neurons, $N$ being the number of data samples; and the scaling of the remaining layer widths is arbitrary (up to logarithmic factors). To obtain our results, we analyze various quantities of independent interest: we give lower bounds on the smallest singular value of hidden feature matrices, and upper bounds on the Lipschitz constant of input-output feature maps.
△ Less
Submitted 21 August, 2022; v1 submitted 21 December, 2020;
originally announced December 2020.
-
An Information-Theoretic Framework for Unifying Active Learning Problems
Authors:
Quoc Phong Nguyen,
Bryan Kian Hsiang Low,
Patrick Jaillet
Abstract:
This paper presents an information-theoretic framework for unifying active learning problems: level set estimation (LSE), Bayesian optimization (BO), and their generalized variant. We first introduce a novel active learning criterion that subsumes an existing LSE algorithm and achieves state-of-the-art performance in LSE problems with a continuous input domain. Then, by exploiting the relationship…
▽ More
This paper presents an information-theoretic framework for unifying active learning problems: level set estimation (LSE), Bayesian optimization (BO), and their generalized variant. We first introduce a novel active learning criterion that subsumes an existing LSE algorithm and achieves state-of-the-art performance in LSE problems with a continuous input domain. Then, by exploiting the relationship between LSE and BO, we design a competitive information-theoretic acquisition function for BO that has interesting connections to upper confidence bound and max-value entropy search (MES). The latter connection reveals a drawback of MES which has important implications on not only MES but also on other MES-based acquisition functions. Finally, our unifying information-theoretic framework can be applied to solve a generalized problem of LSE and BO involving multiple level sets in a data-efficient manner. We empirically evaluate the performance of our proposed algorithms using synthetic benchmark functions, a real-world dataset, and in hyperparameter tuning of machine learning models.
△ Less
Submitted 19 December, 2020;
originally announced December 2020.
-
Top-$k$ Ranking Bayesian Optimization
Authors:
Quoc Phong Nguyen,
Sebastian Tay,
Bryan Kian Hsiang Low,
Patrick Jaillet
Abstract:
This paper presents a novel approach to top-$k$ ranking Bayesian optimization (top-$k$ ranking BO) which is a practical and significant generalization of preferential BO to handle top-$k$ ranking and tie/indifference observations. We first design a surrogate model that is not only capable of catering to the above observations, but is also supported by a classic random utility model. Another equall…
▽ More
This paper presents a novel approach to top-$k$ ranking Bayesian optimization (top-$k$ ranking BO) which is a practical and significant generalization of preferential BO to handle top-$k$ ranking and tie/indifference observations. We first design a surrogate model that is not only capable of catering to the above observations, but is also supported by a classic random utility model. Another equally important contribution is the introduction of the first information-theoretic acquisition function in BO with preferential observation called multinomial predictive entropy search (MPES) which is flexible in handling these observations and optimized for all inputs of a query jointly. MPES possesses superior performance compared with existing acquisition functions that select the inputs of a query one at a time greedily. We empirically evaluate the performance of MPES using several synthetic benchmark functions, CIFAR-$10$ dataset, and SUSHI preference dataset.
△ Less
Submitted 19 December, 2020;
originally announced December 2020.
-
Sensitivity analyses for effect modifiers not observed in the target population when generalizing treatment effects from a randomized controlled trial: Assumptions, models, effect scales, data scenarios, and implementation details
Authors:
Trang Quynh Nguyen,
Benjamin Ackerman,
Ian Schmid,
Stephen R. Cole,
Elizabeth A. Stuart
Abstract:
Background: Randomized controlled trials are often used to inform policy and practice for broad populations. The average treatment effect (ATE) for a target population, however, may be different from the ATE observed in a trial if there are effect modifiers whose distribution in the target population is different that from that in the trial. Methods exist to use trial data to estimate the target p…
▽ More
Background: Randomized controlled trials are often used to inform policy and practice for broad populations. The average treatment effect (ATE) for a target population, however, may be different from the ATE observed in a trial if there are effect modifiers whose distribution in the target population is different that from that in the trial. Methods exist to use trial data to estimate the target population ATE, provided the distributions of treatment effect modifiers are observed in both the trial and target population -- an assumption that may not hold in practice.
Methods: The proposed sensitivity analyses address the situation where a treatment effect modifier is observed in the trial but not the target population. These methods are based on an outcome model or the combination of such a model and weighting adjustment for observed differences between the trial sample and target population. They accommodate several types of outcome models: linear models (including single time outcome and pre- and post-treatment outcomes) for additive effects, and models with log or logit link for multiplicative effects. We clarify the methods' assumptions and provide detailed implementation instructions.
Illustration: We illustrate the methods using an example generalizing the effects of an HIV treatment regimen from a randomized trial to a relevant target population.
Conclusion: These methods allow researchers and decision-makers to have more appropriate confidence when drawing conclusions about target population effects.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
Clarifying causal mediation analysis: Effect identification via three assumptions and five potential outcomes
Authors:
Trang Quynh Nguyen,
Ian Schmid,
Elizabeth L. Ogburn,
Elizabeth A. Stuart
Abstract:
Causal mediation analysis is complicated with multiple effect definitions that require different sets of assumptions for identification. This paper provides a systematic explanation of such assumptions. We define five potential outcome types whose means are involved in various effect definitions. We tackle their mean/distribution's identification, starting with the one that requires the weakest as…
▽ More
Causal mediation analysis is complicated with multiple effect definitions that require different sets of assumptions for identification. This paper provides a systematic explanation of such assumptions. We define five potential outcome types whose means are involved in various effect definitions. We tackle their mean/distribution's identification, starting with the one that requires the weakest assumptions and gradually building up to the one that requires the strongest assumptions. This presentation shows clearly why an assumption is required for one estimand and not another, and provides a succinct table from which an applied researcher could pick out the assumptions required for identifying the causal effects they target. Using a running example, the paper illustrates the assembling and consideration of identifying assumptions for a range of causal contrasts. For several that are commonly encountered in the literature, this exercise clarifies that identification requires weaker assumptions than those often stated in the literature. This attention to the details also draws attention to the differences in the positivity assumption for different estimands, with practical implications. Clarity on the identifying assumptions of these various estimands will help researchers conduct appropriate mediation analyses and interpret the results with appropriate caution given the plausibility of the assumptions.
△ Less
Submitted 7 July, 2022; v1 submitted 18 November, 2020;
originally announced November 2020.
-
Variational Bayesian Unlearning
Authors:
Quoc Phong Nguyen,
Bryan Kian Hsiang Low,
Patrick Jaillet
Abstract:
This paper studies the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We frame this problem as one of minimizing the Kullback-Leibler divergence between the approximate posterior belief of model parameters after directly unlearning from erased data vs. the exact posterior belief from retraining with remaining data. Using the variational…
▽ More
This paper studies the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We frame this problem as one of minimizing the Kullback-Leibler divergence between the approximate posterior belief of model parameters after directly unlearning from erased data vs. the exact posterior belief from retraining with remaining data. Using the variational inference (VI) framework, we show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief given the full data (i.e., including the remaining data); the latter prevents catastrophic unlearning that can render the model useless. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging. We propose two novel tricks to tackle this challenge. We empirically demonstrate our unlearning methods on Bayesian models such as sparse Gaussian process and logistic regression using synthetic and real-world datasets.
△ Less
Submitted 24 October, 2020;
originally announced October 2020.
-
A local geometry of hyperedges in hypergraphs, and its applications to social networks
Authors:
Dong Quan Ngoc Nguyen,
Lin Xing
Abstract:
In many real world datasets arising from social networks, there are hidden higher order relations among data points which cannot be captured using graph modeling. It is natural to use a more general notion of hypergraphs to model such social networks. In this paper, we introduce a new local geometry of hyperdges in hypergraphs which allows to capture higher order relations among data points. Furth…
▽ More
In many real world datasets arising from social networks, there are hidden higher order relations among data points which cannot be captured using graph modeling. It is natural to use a more general notion of hypergraphs to model such social networks. In this paper, we introduce a new local geometry of hyperdges in hypergraphs which allows to capture higher order relations among data points. Furthermore based on this new geometry, we also introduce new methodology--the nearest neighbors method in hypergraphs--for analyzing datasets arising from sociology.
△ Less
Submitted 29 September, 2020;
originally announced October 2020.
-
Community detection, pattern recognition, and hypergraph-based learning: approaches using metric geometry and persistent homology
Authors:
Dong Quan Ngoc Nguyen,
Lin Xing,
Lizhen Lin
Abstract:
Hypergraph data appear and are hidden in many places in the modern age. They are data structure that can be used to model many real data examples since their structures contain information about higher order relations among data points. One of the main contributions of our paper is to introduce a new topological structure to hypergraph data which bears a resemblance to a usual metric space structu…
▽ More
Hypergraph data appear and are hidden in many places in the modern age. They are data structure that can be used to model many real data examples since their structures contain information about higher order relations among data points. One of the main contributions of our paper is to introduce a new topological structure to hypergraph data which bears a resemblance to a usual metric space structure. Using this new topological space structure of hypergraph data, we propose several approaches to study community detection problem, detecting persistent features arising from homological structure of hypergraph data. Also based on the topological space structure of hypergraph data introduced in our paper, we introduce a modified nearest neighbors methods which is a generalization of the classical nearest neighbors methods from machine learning. Our modified nearest neighbors methods have an advantage of being very flexible and applicable even for discrete structures as in hypergraphs. We then apply our modified nearest neighbors methods to study sign prediction problem in hypegraph data constructed using our method.
△ Less
Submitted 29 September, 2020;
originally announced October 2020.
-
Weight Prediction for Variants of Weighted Directed Networks
Authors:
Dong Quan Ngoc Nguyen,
Lin Xing,
Lizhen Lin
Abstract:
A weighted directed network (WDN) is a directed graph in which each edge is associated to a unique value called weight. These networks are very suitable for modeling real-world social networks in which there is an assessment of one vertex toward other vertices. One of the main problems studied in this paper is prediction of edge weights in such networks. We introduce, for the first time, a metric…
▽ More
A weighted directed network (WDN) is a directed graph in which each edge is associated to a unique value called weight. These networks are very suitable for modeling real-world social networks in which there is an assessment of one vertex toward other vertices. One of the main problems studied in this paper is prediction of edge weights in such networks. We introduce, for the first time, a metric geometry approach to studying edge weight prediction in WDNs. We modify a usual notion of WDNs, and introduce a new type of WDNs which we coin the term \textit{almost-weighted directed networks} (AWDNs). AWDNs can capture the weight information of a network from a given training set. We then construct a class of metrics (or distances) for AWDNs which equips such networks with a metric space structure. Using the metric geometry structure of AWDNs, we propose modified $k$ nearest neighbors (kNN) methods and modified support-vector machine (SVM) methods which will then be used to predict edge weights in AWDNs. In many real-world datasets, in addition to edge weights, one can also associate weights to vertices which capture information of vertices; association of weights to vertices especially plays an important role in graph embedding problems. Adopting a similar approach, we introduce two new types of directed networks in which weights are associated to either a subset of origin vertices or a subset of terminal vertices . We, for the first time, construct novel classes of metrics on such networks, and based on these new metrics propose modified $k$NN and SVM methods for predicting weights of origins and terminals in these networks. We provide experimental results on several real-world datasets, using our geometric methodologies.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Quaternion Graph Neural Networks
Authors:
Dai Quoc Nguyen,
Tu Dinh Nguyen,
Dinh Phung
Abstract:
Recently, graph neural networks (GNNs) have become an important and active research direction in deep learning. It is worth noting that most of the existing GNN-based methods learn graph representations within the Euclidean vector space. Beyond the Euclidean space, learning representation and embeddings in hyper-complex space have also shown to be a promising and effective approach. To this end, w…
▽ More
Recently, graph neural networks (GNNs) have become an important and active research direction in deep learning. It is worth noting that most of the existing GNN-based methods learn graph representations within the Euclidean vector space. Beyond the Euclidean space, learning representation and embeddings in hyper-complex space have also shown to be a promising and effective approach. To this end, we propose Quaternion Graph Neural Networks (QGNN) to learn graph representations within the Quaternion space. As demonstrated, the Quaternion space, a hyper-complex vector space, provides highly meaningful computations and analogical calculus through Hamilton product compared to the Euclidean and complex vector spaces. Our QGNN obtains state-of-the-art results on a range of benchmark datasets for graph classification and node classification. Besides, regarding knowledge graphs, our QGNN-based embedding model achieves state-of-the-art results on three new and challenging benchmark datasets for knowledge graph completion. Our code is available at: \url{https://github.com/daiquocnguyen/QGNN}.
△ Less
Submitted 6 October, 2021; v1 submitted 11 August, 2020;
originally announced August 2020.
-
A Self-Attention Network based Node Embedding Model
Authors:
Dai Quoc Nguyen,
Tu Dinh Nguyen,
Dinh Phung
Abstract:
Despite several signs of progress have been made recently, limited research has been conducted for an inductive setting where embeddings are required for newly unseen nodes -- a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extracti…
▽ More
Despite several signs of progress have been made recently, limited research has been conducted for an inductive setting where embeddings are required for newly unseen nodes -- a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extraction. To this end, we propose SANNE -- a novel unsupervised embedding model -- whose central idea is to employ a transformer self-attention network to iteratively aggregate vector representations of nodes in random walks. Our SANNE aims to produce plausible embeddings not only for present nodes, but also for newly unseen nodes. Experimental results show that the proposed SANNE obtains state-of-the-art results for the node classification task on well-known benchmark datasets.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.