-
Mediation Analysis for Sparse and Irregularly Spaced Longitudinal Outcomes with Application to the MrOS Sleep Study
Authors:
Rui Ren,
Haoyi Yang,
Qian Xiao,
Lingzhou Xue,
Yuan Huang
Abstract:
Mediation analysis has become a widely used method for identifying the pathways through which an independent variable influences a dependent variable via intermediate mediators. However, limited research addresses the case where mediators are high-dimensional and the outcome is represented by sparse, irregularly spaced longitudinal data. To address these challenges, we propose a mediation analysis…
▽ More
Mediation analysis has become a widely used method for identifying the pathways through which an independent variable influences a dependent variable via intermediate mediators. However, limited research addresses the case where mediators are high-dimensional and the outcome is represented by sparse, irregularly spaced longitudinal data. To address these challenges, we propose a mediation analysis approach for scalar exposures, high-dimensional mediators, and sparse longitudinal outcomes. This approach effectively identifies significant mediators by addressing two key issues: (i) the underlying correlation structure within the sparse and irregular cognitive measurements, and (ii) adjusting mediation effects to handle the high-dimensional set of candidate mediators. In the MrOS Sleep study, our primary objective is to explore lipid pathways that may mediate the relationship between rest-activity rhythms and longitudinal cognitive decline in older men. Our findings suggest a potential mechanism involving rest-activity rhythms, lipid metabolites, and cognitive decline, and highlight significant mediators identified through multiple testing procedures.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
A First-order Generative Bilevel Optimization Framework for Diffusion Models
Authors:
Quan Xiao,
Hui Yuan,
A F M Saif,
Gaowen Liu,
Ramana Kompella,
Mengdi Wang,
Tianyi Chen
Abstract:
Diffusion models, which iteratively denoise data samples to synthesize high-quality outputs, have achieved empirical success across domains. However, optimizing these models for downstream tasks often involves nested bilevel structures, such as tuning hyperparameters for fine-tuning tasks or noise schedules in training dynamics, where traditional bilevel methods fail due to the infinite-dimensiona…
▽ More
Diffusion models, which iteratively denoise data samples to synthesize high-quality outputs, have achieved empirical success across domains. However, optimizing these models for downstream tasks often involves nested bilevel structures, such as tuning hyperparameters for fine-tuning tasks or noise schedules in training dynamics, where traditional bilevel methods fail due to the infinite-dimensional probability space and prohibitive sampling costs. We formalize this challenge as a generative bilevel optimization problem and address two key scenarios: (1) fine-tuning pre-trained models via an inference-only lower-level solver paired with a sample-efficient gradient estimator for the upper level, and (2) training diffusion models from scratch with noise schedule optimization by reparameterizing the lower-level problem and designing a computationally tractable gradient estimator. Our first-order bilevel framework overcomes the incompatibility of conventional bilevel methods with diffusion processes, offering theoretical grounding and computational practicality. Experiments demonstrate that our method outperforms existing fine-tuning and hyperparameter search baselines.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Optimal design of experiments with quantitative-sequence factors
Authors:
Yaping Wang,
Sixu Liu,
Qian Xiao
Abstract:
A new type of experiment with joint considerations of quantitative and sequence factors is recently drawing much attention in medical science, bio-engineering, and many other disciplines. The input spaces of such experiments are semi-discrete and often very large. Thus, efficient and economical experimental designs are required. Based on the transformations and aggregations of good lattice point s…
▽ More
A new type of experiment with joint considerations of quantitative and sequence factors is recently drawing much attention in medical science, bio-engineering, and many other disciplines. The input spaces of such experiments are semi-discrete and often very large. Thus, efficient and economical experimental designs are required. Based on the transformations and aggregations of good lattice point sets, we construct a new class of optimal quantitative-sequence (QS) designs that are marginally coupled, pair-balanced, space-filling, and asymptotically orthogonal. The proposed QS designs have a certain flexibility in run and factor sizes and are especially appealing for high-dimensional cases.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Determining The Number of Factors in Two-Way Factor Model of High-Dimensional Matrix-Variate Time Series: A White-Noise based Method for Serial Correlation Models
Authors:
Qiang Xia
Abstract:
In this paper, we study a new two-way factor model for high-dimensional matrix-variate time series. To estimate the number of factors in this two-way factor model, we decompose the series into two parts: one being a non-weakly correlated series and the other being a weakly correlated noise. By comparing the difference between two series, we can construct white-noise based signal statistics to dete…
▽ More
In this paper, we study a new two-way factor model for high-dimensional matrix-variate time series. To estimate the number of factors in this two-way factor model, we decompose the series into two parts: one being a non-weakly correlated series and the other being a weakly correlated noise. By comparing the difference between two series, we can construct white-noise based signal statistics to determine the number of factors in row loading matrix (column loading matrix). Furthermore, to mitigate the negative impact on the accuracy of the estimation, which is caused by the interaction between the row loading matrix and the column loading matrix, we propose a transformation so that the transformed model only contains the row loading matrix (column loading matrix). We define sequences of ratios of two test statistics as signal statistics to determine the number of factors and derive the consistence of the estimation. We implement the numerical studies to examine the performance of the new methods.
△ Less
Submitted 25 January, 2025; v1 submitted 23 January, 2025;
originally announced January 2025.
-
A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints
Authors:
Liuyuan Jiang,
Quan Xiao,
Victor M. Tenorio,
Fernando Real-Rojas,
Antonio G. Marques,
Tianyi Chen
Abstract:
Interest in bilevel optimization has grown in recent years, partially due to its applications to tackle challenging machine-learning problems. Several exciting recent works have been centered around developing efficient gradient-based algorithms that can solve bilevel optimization problems with provable guarantees. However, the existing literature mainly focuses on bilevel problems either without…
▽ More
Interest in bilevel optimization has grown in recent years, partially due to its applications to tackle challenging machine-learning problems. Several exciting recent works have been centered around developing efficient gradient-based algorithms that can solve bilevel optimization problems with provable guarantees. However, the existing literature mainly focuses on bilevel problems either without constraints, or featuring only simple constraints that do not couple variables across the upper and lower levels, excluding a range of complex applications. Our paper studies this challenging but less explored scenario and develops a (fully) first-order algorithm, which we term BLOCC, to tackle BiLevel Optimization problems with Coupled Constraints. We establish rigorous convergence theory for the proposed algorithm and demonstrate its effectiveness on two well-known real-world applications - hyperparameter selection in support vector machine (SVM) and infrastructure planning in transportation networks using the real data from the city of Seville.
△ Less
Submitted 25 August, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
A Bayesian Circadian Hidden Markov Model to Infer Rest-Activity Rhythms Using 24-hour Actigraphy Data
Authors:
Jiachen Lu,
Qian Xiao,
Cici Bauer
Abstract:
24-hour actigraphy data collected by wearable devices offer valuable insights into physical activity types, intensity levels, and rest-activity rhythms (RAR). RARs, or patterns of rest and activity exhibited over a 24-hour period, are regulated by the body's circadian system, synchronizing physiological processes with external cues like the light-dark cycle. Disruptions to these rhythms, such as i…
▽ More
24-hour actigraphy data collected by wearable devices offer valuable insights into physical activity types, intensity levels, and rest-activity rhythms (RAR). RARs, or patterns of rest and activity exhibited over a 24-hour period, are regulated by the body's circadian system, synchronizing physiological processes with external cues like the light-dark cycle. Disruptions to these rhythms, such as irregular sleep patterns, daytime drowsiness or shift work, have been linked to adverse health outcomes including metabolic disorders, cardiovascular disease, depression, and even cancer, making RARs a critical area of health research.
In this study, we propose a Bayesian Circadian Hidden Markov Model (BCHMM) that explicitly incorporates 24-hour circadian oscillators mirroring human biological rhythms. The model assumes that observed activity counts are conditional on hidden activity states through Gaussian emission densities, with transition probabilities modeled by state-specific sinusoidal functions. Our comprehensive simulation study reveals that BCHMM outperforms frequentist approaches in identifying the underlying hidden states, particularly when the activity states are difficult to separate. BCHMM also excels with smaller Kullback-Leibler divergence on estimated densities. With the Bayesian framework, we address the label-switching problem inherent to hidden Markov models via a positive constraint on mean parameters. From the proposed BCHMM, we can infer the 24-hour rest-activity profile via time-varying state probabilities, to characterize the person-level RAR. We demonstrate the utility of the proposed BCHMM using 2011-2014 National Health and Nutrition Examination Survey (NHANES) data, where worsened RAR, indicated by lower probabilities in low-activity state during the day and higher probabilities in high-activity state at night, is associated with an increased risk of diabetes.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Practice with Graph-based ANN Algorithms on Sparse Data: Chi-square Two-tower model, HNSW, Sign Cauchy Projections
Authors:
Ping Li,
Weijie Zhao,
Chao Wang,
Qi Xia,
Alice Wu,
Lijun Peng
Abstract:
Sparse data are common. The traditional ``handcrafted'' features are often sparse. Embedding vectors from trained models can also be very sparse, for example, embeddings trained via the ``ReLu'' activation function. In this paper, we report our exploration of efficient search in sparse data with graph-based ANN algorithms (e.g., HNSW, or SONG which is the GPU version of HNSW), which are popular in…
▽ More
Sparse data are common. The traditional ``handcrafted'' features are often sparse. Embedding vectors from trained models can also be very sparse, for example, embeddings trained via the ``ReLu'' activation function. In this paper, we report our exploration of efficient search in sparse data with graph-based ANN algorithms (e.g., HNSW, or SONG which is the GPU version of HNSW), which are popular in industrial practice, e.g., search and ads (advertising).
We experiment with the proprietary ads targeting application, as well as benchmark public datasets. For ads targeting, we train embeddings with the standard ``cosine two-tower'' model and we also develop the ``chi-square two-tower'' model. Both models produce (highly) sparse embeddings when they are integrated with the ``ReLu'' activation function. In EBR (embedding-based retrieval) applications, after we the embeddings are trained, the next crucial task is the approximate near neighbor (ANN) search for serving. While there are many ANN algorithms we can choose from, in this study, we focus on the graph-based ANN algorithm (e.g., HNSW-type).
Sparse embeddings should help improve the efficiency of EBR. One benefit is the reduced memory cost for the embeddings. The other obvious benefit is the reduced computational time for evaluating similarities, because, for graph-based ANN algorithms such as HNSW, computing similarities is often the dominating cost. In addition to the effort on leveraging data sparsity for storage and computation, we also integrate ``sign cauchy random projections'' (SignCRP) to hash vectors to bits, to further reduce the memory cost and speed up the ANN search. In NIPS'13, SignCRP was proposed to hash the chi-square similarity, which is a well-adopted nonlinear kernel in NLP and computer vision. Therefore, the chi-square two-tower model, SignCRP, and HNSW are now tightly integrated.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
Adaptive Testing for Alphas in High-dimensional Factor Pricing Models
Authors:
Qiang Xia,
Xianyang Zhang
Abstract:
This paper proposes a new procedure to validate the multi-factor pricing theory by testing the presence of alpha in linear factor pricing models with a large number of assets. Because the market's inefficient pricing is likely to occur to a small fraction of exceptional assets, we develop a testing procedure that is particularly powerful against sparse signals. Based on the high-dimensional Gaussi…
▽ More
This paper proposes a new procedure to validate the multi-factor pricing theory by testing the presence of alpha in linear factor pricing models with a large number of assets. Because the market's inefficient pricing is likely to occur to a small fraction of exceptional assets, we develop a testing procedure that is particularly powerful against sparse signals. Based on the high-dimensional Gaussian approximation theory, we propose a simulation-based approach to approximate the limiting null distribution of the test. Our numerical studies show that the new procedure can deliver a reasonable size and achieve substantial power improvement compared to the existing tests under sparse alternatives, and especially for weak signals.
△ Less
Submitted 22 May, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
On Penalty-based Bilevel Gradient Descent Method
Authors:
Han Shen,
Quan Xiao,
Tianyi Chen
Abstract:
Bilevel optimization enjoys a wide range of applications in emerging machine learning and signal processing problems such as hyper-parameter optimization, image reconstruction, meta-learning, adversarial training, and reinforcement learning. However, bilevel optimization problems are traditionally known to be difficult to solve. Recent progress on bilevel algorithms mainly focuses on bilevel optim…
▽ More
Bilevel optimization enjoys a wide range of applications in emerging machine learning and signal processing problems such as hyper-parameter optimization, image reconstruction, meta-learning, adversarial training, and reinforcement learning. However, bilevel optimization problems are traditionally known to be difficult to solve. Recent progress on bilevel algorithms mainly focuses on bilevel optimization problems through the lens of the implicit-gradient method, where the lower-level objective is either strongly convex or unconstrained. In this work, we tackle a challenging class of bilevel problems through the lens of the penalty method. We show that under certain conditions, the penalty reformulation recovers the (local) solutions of the original bilevel problem. Further, we propose the penalty-based bilevel gradient descent (PBGD) algorithm and establish its finite-time convergence for the constrained bilevel problem with lower-level constraints yet without lower-level strong convexity. Experiments on synthetic and real datasets showcase the efficiency of the proposed PBGD algorithm.
△ Less
Submitted 6 January, 2025; v1 submitted 10 February, 2023;
originally announced February 2023.
-
A Scalable Gaussian Process for Large-Scale Periodic Data
Authors:
Yongxiang Li,
Yuting Pu,
Changming Cheng,
Qian Xiao
Abstract:
The periodic Gaussian process (PGP) has been increasingly used to model periodic data due to its high accuracy. Yet, computing the likelihood of PGP has a high computational complexity of $\mathcal{O}\left(n^{3}\right)$ ($n$ is the data size), which hinders its wide application. To address this issue, we propose a novel circulant PGP (CPGP) model for large-scale periodic data collected at grids th…
▽ More
The periodic Gaussian process (PGP) has been increasingly used to model periodic data due to its high accuracy. Yet, computing the likelihood of PGP has a high computational complexity of $\mathcal{O}\left(n^{3}\right)$ ($n$ is the data size), which hinders its wide application. To address this issue, we propose a novel circulant PGP (CPGP) model for large-scale periodic data collected at grids that are commonly seen in signal processing applications. The proposed CPGP decomposes the log-likelihood of PGP into the sum of two computationally scalable composite log-likelihoods, which do not involve any approximations. Computing the likelihood of CPGP requires only $\mathcal{O}\left(p^{2}\right)$ (or $\mathcal{O}\left(p\log p\right)$ in some special cases) time for grid observations, where the segment length $p$ is independent of and much smaller than $n$. Simulations and real case studies are presented to show the superiority of CPGP over some state-of-the-art methods, especially for applications requiring periodicity estimation. This new modeling technique can greatly advance the applicability of PGP in many areas and allow the modeling of many previously intractable problems.
△ Less
Submitted 8 February, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
Alternating Implicit Projected SGD and Its Efficient Variants for Equality-constrained Bilevel Optimization
Authors:
Quan Xiao,
Han Shen,
Wotao Yin,
Tianyi Chen
Abstract:
Stochastic bilevel optimization, which captures the inherent nested structure of machine learning problems, is gaining popularity in many recent applications. Existing works on bilevel optimization mostly consider either unconstrained problems or constrained upper-level problems. This paper considers the stochastic bilevel optimization problems with equality constraints both in the upper and lower…
▽ More
Stochastic bilevel optimization, which captures the inherent nested structure of machine learning problems, is gaining popularity in many recent applications. Existing works on bilevel optimization mostly consider either unconstrained problems or constrained upper-level problems. This paper considers the stochastic bilevel optimization problems with equality constraints both in the upper and lower levels. By leveraging the special structure of the equality constraints problem, the paper first presents an alternating implicit projected SGD approach and establishes the $\tilde{\cal O}(ε^{-2})$ sample complexity that matches the state-of-the-art complexity of ALSET \citep{chen2021closing} for unconstrained bilevel problems. To further save the cost of projection, the paper presents two alternating implicit projection-efficient SGD approaches, where one algorithm enjoys the $\tilde{\cal O}(ε^{-2}/T)$ upper-level and $\tilde{\cal O}(ε^{-1.5}/T^{\frac{3}{4}})$ lower-level projection complexity with ${\cal O}(T)$ lower-level batch size, and the other one enjoys $\tilde{\cal O}(ε^{-1.5})$ upper-level and lower-level projection complexity with ${\cal O}(1)$ batch size. Application to federated bilevel optimization has been presented to showcase the empirical performance of our algorithms. Our results demonstrate that equality-constrained bilevel optimization with strongly-convex lower-level problems can be solved as efficiently as stochastic single-level optimization problems.
△ Less
Submitted 12 February, 2023; v1 submitted 13 November, 2022;
originally announced November 2022.
-
Modeling and Active Learning for Experiments with Quantitative-Sequence Factors
Authors:
Qian Xiao,
Yaping Wang,
Abhyuday Mandal,
Xinwei Deng
Abstract:
A new type of experiment that aims to determine the optimal quantities of a sequence of factors is eliciting considerable attention in medical science, bioengineering, and many other disciplines. Such studies require the simultaneous optimization of both quantities and the sequence orders of several components which are called quantitative-sequence (QS) factors. Given the large and semi-discrete s…
▽ More
A new type of experiment that aims to determine the optimal quantities of a sequence of factors is eliciting considerable attention in medical science, bioengineering, and many other disciplines. Such studies require the simultaneous optimization of both quantities and the sequence orders of several components which are called quantitative-sequence (QS) factors. Given the large and semi-discrete solution spaces in such experiments, efficiently identifying optimal or near-optimal solutions by using a small number of experimental trials is a nontrivial task. To address this challenge, we propose a novel active learning approach, called QS-learning, to enable effective modeling and efficient optimization for experiments with QS factors. QS-learning consists of three parts: a novel mapping-based additive Gaussian process (MaGP) model, an efficient global optimization scheme (QS-EGO), and a new class of optimal designs (QS-design). The theoretical properties of the proposed method are investigated, and optimization techniques using analytical gradients are developed. The performance of the proposed method is demonstrated via a real drug experiment on lymphoma treatment and several simulation studies.
△ Less
Submitted 12 September, 2022; v1 submitted 6 September, 2022;
originally announced September 2022.
-
A Certifiable Security Patch for Object Tracking in Self-Driving Systems via Historical Deviation Modeling
Authors:
Xudong Pan,
Qifan Xiao,
Mi Zhang,
Min Yang
Abstract:
Self-driving cars (SDC) commonly implement the perception pipeline to detect the surrounding obstacles and track their moving trajectories, which lays the ground for the subsequent driving decision making process. Although the security of obstacle detection in SDC is intensively studied, not until very recently the attackers start to exploit the vulnerability of the tracking module. Compared with…
▽ More
Self-driving cars (SDC) commonly implement the perception pipeline to detect the surrounding obstacles and track their moving trajectories, which lays the ground for the subsequent driving decision making process. Although the security of obstacle detection in SDC is intensively studied, not until very recently the attackers start to exploit the vulnerability of the tracking module. Compared with solely attacking the object detectors, this new attack strategy influences the driving decision more effectively with less attack budgets. However, little is known on whether the revealed vulnerability remains effective in end-to-end self-driving systems and, if so, how to mitigate the threat.
In this paper, we present the first systematic research on the security of object tracking in SDC. Through a comprehensive case study on the full perception pipeline of a popular open-sourced self-driving system, Baidu's Apollo, we prove the mainstream multi-object tracker (MOT) based on Kalman Filter (KF) is unsafe even with an enabled multi-sensor fusion mechanism. Our root cause analysis reveals, the vulnerability is innate to the design of KF-based MOT, which shall error-handle the prediction results from the object detectors yet the adopted KF algorithm is prone to trust the observation more when its deviation from the prediction is larger. To address this design flaw, we propose a simple yet effective security patch for KF-based MOT, the core of which is an adaptive strategy to balance the focus of KF on observations and predictions according to the anomaly index of the observation-prediction deviation, and has certified effectiveness against a generalized hijacking attack model. Extensive evaluation on $4$ KF-based existing MOT implementations (including 2D and 3D, academic and Apollo ones) validate the defense effectiveness and the trivial performance overhead of our approach.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Lazy Queries Can Reduce Variance in Zeroth-order Optimization
Authors:
Quan Xiao,
Qing Ling,
Tianyi Chen
Abstract:
A major challenge of applying zeroth-order (ZO) methods is the high query complexity, especially when queries are costly. We propose a novel gradient estimation technique for ZO methods based on adaptive lazy queries that we term as LAZO. Different from the classic one-point or two-point gradient estimation methods, LAZO develops two alternative ways to check the usefulness of old queries from pre…
▽ More
A major challenge of applying zeroth-order (ZO) methods is the high query complexity, especially when queries are costly. We propose a novel gradient estimation technique for ZO methods based on adaptive lazy queries that we term as LAZO. Different from the classic one-point or two-point gradient estimation methods, LAZO develops two alternative ways to check the usefulness of old queries from previous iterations, and then adaptively reuses them to construct the low-variance gradient estimates. We rigorously establish that through judiciously reusing the old queries, LAZO can reduce the variance of stochastic gradient estimates so that it not only saves queries per iteration but also achieves the regret bound for the symmetric two-point method. We evaluate the numerical performance of LAZO, and demonstrate the low-variance property and the performance gain of LAZO in both regret and query complexity relative to several existing ZO methods. The idea of LAZO is general, and can be applied to other variants of ZO methods.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning
Authors:
Momin Abbas,
Quan Xiao,
Lisha Chen,
Pin-Yu Chen,
Tianyi Chen
Abstract:
Model-agnostic meta learning (MAML) is currently one of the dominating approaches for few-shot meta-learning. Albeit its effectiveness, the optimization of MAML can be challenging due to the innate bilevel problem structure. Specifically, the loss landscape of MAML is much more complex with possibly more saddle points and local minimizers than its empirical risk minimization counterpart. To addres…
▽ More
Model-agnostic meta learning (MAML) is currently one of the dominating approaches for few-shot meta-learning. Albeit its effectiveness, the optimization of MAML can be challenging due to the innate bilevel problem structure. Specifically, the loss landscape of MAML is much more complex with possibly more saddle points and local minimizers than its empirical risk minimization counterpart. To address this challenge, we leverage the recently invented sharpness-aware minimization and develop a sharpness-aware MAML approach that we term Sharp-MAML. We empirically demonstrate that Sharp-MAML and its computation-efficient variant can outperform the plain-vanilla MAML baseline (e.g., $+3\%$ accuracy on Mini-Imagenet). We complement the empirical study with the convergence rate analysis and the generalization bound of Sharp-MAML. To the best of our knowledge, this is the first empirical and theoretical study on sharpness-aware minimization in the context of bilevel learning. The code is available at https://github.com/mominabbass/Sharp-MAML.
△ Less
Submitted 14 August, 2022; v1 submitted 8 June, 2022;
originally announced June 2022.
-
Modality-Balanced Embedding for Video Retrieval
Authors:
Xun Wang,
Bingqing Ke,
Xuanping Li,
Fangyu Liu,
Mingyu Zhang,
Xiao Liang,
Qiushi Xiao,
Cheng Luo,
Yue Yu
Abstract:
Video search has become the main routine for users to discover videos relevant to a text query on large short-video sharing platforms. During training a query-video bi-encoder model using online search logs, we identify a modality bias phenomenon that the video encoder almost entirely relies on text matching, neglecting other modalities of the videos such as vision, audio. This modality imbalancer…
▽ More
Video search has become the main routine for users to discover videos relevant to a text query on large short-video sharing platforms. During training a query-video bi-encoder model using online search logs, we identify a modality bias phenomenon that the video encoder almost entirely relies on text matching, neglecting other modalities of the videos such as vision, audio. This modality imbalanceresults from a) modality gap: the relevance between a query and a video text is much easier to learn as the query is also a piece of text, with the same modality as the video text; b) data bias: most training samples can be solved solely by text matching. Here we share our practices to improve the first retrieval stage including our solution for the modality imbalance issue. We propose MBVR (short for Modality Balanced Video Retrieval) with two key components: manually generated modality-shuffled (MS) samples and a dynamic margin (DM) based on visual relevance. They can encourage the video encoder to pay balanced attentions to each modality. Through extensive experiments on a real world dataset, we show empirically that our method is both effective and efficient in solving modality bias problem. We have also deployed our MBVR in a large video platform and observed statistically significant boost over a highly optimized baseline in an A/B test and manual GSB evaluations.
△ Less
Submitted 17 May, 2022; v1 submitted 18 April, 2022;
originally announced April 2022.
-
EzGP: Easy-to-Interpret Gaussian Process Models for Computer Experiments with Both Quantitative and Qualitative Factors
Authors:
Qian Xiao,
Abhyuday Mandal,
C. Devon Lin,
Xinwei Deng
Abstract:
Computer experiments with both quantitative and qualitative (QQ) inputs are commonly used in science and engineering applications. Constructing desirable emulators for such computer experiments remains a challenging problem. In this article, we propose an easy-to-interpret Gaussian process (EzGP) model for computer experiments to reflect the change of the computer model under the different level c…
▽ More
Computer experiments with both quantitative and qualitative (QQ) inputs are commonly used in science and engineering applications. Constructing desirable emulators for such computer experiments remains a challenging problem. In this article, we propose an easy-to-interpret Gaussian process (EzGP) model for computer experiments to reflect the change of the computer model under the different level combinations of qualitative factors. The proposed modeling strategy, based on an additive Gaussian process, is flexible to address the heterogeneity of computer models involving multiple qualitative factors. We also develop two useful variants of the EzGP model to achieve computational efficiency for data with high dimensionality and large sizes. The merits of these models are illustrated by several numerical examples and a real data application.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
Estimating Demand Flexibility Using Siamese LSTM Neural Networks
Authors:
Guangchun Ruan,
Daniel S. Kirschen,
Haiwang Zhong,
Qing Xia,
Chongqing Kang
Abstract:
There is an opportunity in modern power systems to explore the demand flexibility by incentivizing consumers with dynamic prices. In this paper, we quantify demand flexibility using an efficient tool called time-varying elasticity, whose value may change depending on the prices and decision dynamics. This tool is particularly useful for evaluating the demand response potential and system reliabili…
▽ More
There is an opportunity in modern power systems to explore the demand flexibility by incentivizing consumers with dynamic prices. In this paper, we quantify demand flexibility using an efficient tool called time-varying elasticity, whose value may change depending on the prices and decision dynamics. This tool is particularly useful for evaluating the demand response potential and system reliability. Recent empirical evidences have highlighted some abnormal features when studying demand flexibility, such as delayed responses and vanishing elasticities after price spikes. Existing methods fail to capture these complicated features because they heavily rely on some predefined (often over-simplified) regression expressions. Instead, this paper proposes a model-free methodology to automatically and accurately derive the optimal estimation pattern. We further develop a two-stage estimation process with Siamese long short-term memory (LSTM) networks. Here, a LSTM network encodes the price response, while the other network estimates the time-varying elasticities. In the case study, the proposed framework and models are validated to achieve higher overall estimation accuracy and better description for various abnormal features when compared with the state-of-the-art methods.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Quantitative Assessment of U.S. Bulk Power Systems and Market Operations during COVID-19
Authors:
Guangchun Ruan,
Jiahan Wu,
Haiwang Zhong,
Qing Xia,
Le Xie
Abstract:
Starting in early 2020, the novel coronavirus disease (COVID-19) severely affected the U.S., causing substantial changes in the operations of bulk power systems and electricity markets. In this paper, we develop a data-driven analysis to substantiate the pandemic's impacts from the perspectives of power system security, electric power generation, electric power demand and electricity prices. Our r…
▽ More
Starting in early 2020, the novel coronavirus disease (COVID-19) severely affected the U.S., causing substantial changes in the operations of bulk power systems and electricity markets. In this paper, we develop a data-driven analysis to substantiate the pandemic's impacts from the perspectives of power system security, electric power generation, electric power demand and electricity prices. Our results suggest that both electric power demand and electricity prices have discernibly dropped during the COVID-19 pandemic. Geographical variances in the impact are observed and quantified, and the bulk power market and power system operations in the northeast region are most severely affected. All the data sources, assessment criteria, and analysis codes reported in this paper are available on a GitHub repository.
△ Less
Submitted 30 August, 2020;
originally announced March 2021.
-
A Single-Timescale Method for Stochastic Bilevel Optimization
Authors:
Tianyi Chen,
Yuejiao Sun,
Quan Xiao,
Wotao Yin
Abstract:
Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta lea…
▽ More
Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $ε$-stationary point of the bilevel problem, STABLE requires ${\cal O}(ε^{-2})$ samples in total; and to achieve an $ε$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(ε^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.
△ Less
Submitted 30 March, 2022; v1 submitted 9 February, 2021;
originally announced February 2021.
-
Musings about Constructions of Efficient Latin Hypercube Designs with Flexible Run-sizes
Authors:
Hongzhi Wang,
Qian Xiao,
Abhyuday Mandal
Abstract:
Efficient Latin hypercube designs (LHDs), including maximin distance LHDs, maximum projection LHDs and orthogonal LHDs, are widely used in computer experiments. It is challenging to construct such designs with flexible sizes, especially for large ones. In the current literature, various algebraic methods and search algorithms have been proposed for identifying efficient LHDs, each having its own p…
▽ More
Efficient Latin hypercube designs (LHDs), including maximin distance LHDs, maximum projection LHDs and orthogonal LHDs, are widely used in computer experiments. It is challenging to construct such designs with flexible sizes, especially for large ones. In the current literature, various algebraic methods and search algorithms have been proposed for identifying efficient LHDs, each having its own pros and cons. In this paper, we review, summarize and compare some currently popular methods aiming to provide guidance for experimenters on what method should be used in practice. Using the R package we developed which integrates and improves various algebraic and searching methods, many of the designs found in this paper are better than the existing ones. They are easy to use for practitioners and can serve as benchmarks for the future developments on LHDs.
△ Less
Submitted 8 January, 2021; v1 submitted 18 October, 2020;
originally announced October 2020.
-
Distant Transfer Learning via Deep Random Walk
Authors:
Qiao Xiao,
Yu Zhang
Abstract:
Transfer learning, which is to improve the learning performance in the target domain by leveraging useful knowledge from the source domain, often requires that those two domains are very close, which limits its application scope. Recently, distant transfer learning has been studied to transfer knowledge between two distant or even totally unrelated domains via auxiliary domains that are usually un…
▽ More
Transfer learning, which is to improve the learning performance in the target domain by leveraging useful knowledge from the source domain, often requires that those two domains are very close, which limits its application scope. Recently, distant transfer learning has been studied to transfer knowledge between two distant or even totally unrelated domains via auxiliary domains that are usually unlabeled as a bridge in the spirit of human transitive inference that it is possible to connect two completely unrelated concepts together through gradual knowledge transfer. In this paper, we study distant transfer learning by proposing a DeEp Random Walk basEd distaNt Transfer (DERWENT) method. Different from existing distant transfer learning models that implicitly identify the path of knowledge transfer between the source and target instances through auxiliary instances, the proposed DERWENT model can explicitly learn such paths via the deep random walk technique. Specifically, based on sequences identified by the random walk technique on a data graph where source and target data have no direct edges, the proposed DERWENT model enforces adjacent data points in a squence to be similar, makes the ending data point be represented by other data points in the same sequence, and considers weighted training losses of source data. Empirical studies on several benchmark datasets demonstrate that the proposed DERWENT algorithm yields the state-of-the-art performance.
△ Less
Submitted 13 June, 2020;
originally announced June 2020.
-
Supervised Whole DAG Causal Discovery
Authors:
Hebi Li,
Qi Xiao,
Jin Tian
Abstract:
We propose to address the task of causal structure learning from data in a supervised manner. Existing work on learning causal directions by supervised learning is restricted to learning pairwise relation, and not well suited for whole DAG discovery. We propose a novel approach of modeling the whole DAG structure discovery as a supervised learning. To fit the problem in hand, we propose to use per…
▽ More
We propose to address the task of causal structure learning from data in a supervised manner. Existing work on learning causal directions by supervised learning is restricted to learning pairwise relation, and not well suited for whole DAG discovery. We propose a novel approach of modeling the whole DAG structure discovery as a supervised learning. To fit the problem in hand, we propose to use permutation equivariant models that align well with the problem domain. We evaluate the proposed approach extensively on synthetic graphs of size 10,20,50,100 and real data, and show promising results compared with a variety of previous approaches.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging
Authors:
Zhaohan Xiong,
Qing Xia,
Zhiqiang Hu,
Ning Huang,
Cheng Bian,
Yefeng Zheng,
Sulaiman Vesal,
Nishant Ravikumar,
Andreas Maier,
Xin Yang,
Pheng-Ann Heng,
Dong Ni,
Caizi Li,
Qianqian Tong,
Weixin Si,
Elodie Puybareau,
Younes Khoudli,
Thierry Geraud,
Chen Chen,
Wenjia Bai,
Daniel Rueckert,
Lingchao Xu,
Xiahai Zhuang,
Xinzhe Luo,
Shuman Jia
, et al. (19 additional authors not shown)
Abstract:
Segmentation of cardiac images, particularly late gadolinium-enhanced magnetic resonance imaging (LGE-MRI) widely used for visualizing diseased cardiac structures, is a crucial first step for clinical diagnosis and treatment. However, direct segmentation of LGE-MRIs is challenging due to its attenuated contrast. Since most clinical studies have relied on manual and labor-intensive approaches, auto…
▽ More
Segmentation of cardiac images, particularly late gadolinium-enhanced magnetic resonance imaging (LGE-MRI) widely used for visualizing diseased cardiac structures, is a crucial first step for clinical diagnosis and treatment. However, direct segmentation of LGE-MRIs is challenging due to its attenuated contrast. Since most clinical studies have relied on manual and labor-intensive approaches, automatic methods are of high interest, particularly optimized machine learning approaches. To address this, we organized the "2018 Left Atrium Segmentation Challenge" using 154 3D LGE-MRIs, currently the world's largest cardiac LGE-MRI dataset, and associated labels of the left atrium segmented by three medical experts, ultimately attracting the participation of 27 international teams. In this paper, extensive analysis of the submitted algorithms using technical and biological metrics was performed by undergoing subgroup analysis and conducting hyper-parameter analysis, offering an overall picture of the major design choices of convolutional neural networks (CNNs) and practical considerations for achieving state-of-the-art left atrium segmentation. Results show the top method achieved a dice score of 93.2% and a mean surface to a surface distance of 0.7 mm, significantly outperforming prior state-of-the-art. Particularly, our analysis demonstrated that double, sequentially used CNNs, in which a first CNN is used for automatic region-of-interest localization and a subsequent CNN is used for refined regional segmentation, achieved far superior results than traditional methods and pipelines containing single CNNs. This large-scale benchmarking study makes a significant step towards much-improved segmentation methods for cardiac LGE-MRIs, and will serve as an important benchmark for evaluating and comparing the future works in the field.
△ Less
Submitted 7 May, 2020; v1 submitted 26 April, 2020;
originally announced April 2020.
-
Image denoising via K-SVD with primal-dual active set algorithm
Authors:
Quan Xiao,
Canhong Wen,
Zirui Yan
Abstract:
K-SVD algorithm has been successfully applied to image denoising tasks dozens of years but the big bottleneck in speed and accuracy still needs attention to break. For the sparse coding stage in K-SVD, which involves $\ell_{0}$ constraint, prevailing methods usually seek approximate solutions greedily but are less effective once the noise level is high. The alternative $\ell_{1}$ optimization is p…
▽ More
K-SVD algorithm has been successfully applied to image denoising tasks dozens of years but the big bottleneck in speed and accuracy still needs attention to break. For the sparse coding stage in K-SVD, which involves $\ell_{0}$ constraint, prevailing methods usually seek approximate solutions greedily but are less effective once the noise level is high. The alternative $\ell_{1}$ optimization is proved to be powerful than $\ell_{0}$, however, the time consumption prevents it from the implementation. In this paper, we propose a new K-SVD framework called K-SVD$_P$ by applying the Primal-dual active set (PDAS) algorithm to it. Different from the greedy algorithms based K-SVD, the K-SVD$_P$ algorithm develops a selection strategy motivated by KKT (Karush-Kuhn-Tucker) condition and yields to an efficient update in the sparse coding stage. Since the K-SVD$_P$ algorithm seeks for an equivalent solution to the dual problem iteratively with simple explicit expression in this denoising problem, speed and quality of denoising can be reached simultaneously. Experiments are carried out and demonstrate the comparable denoising performance of our K-SVD$_P$ with state-of-the-art methods.
△ Less
Submitted 19 January, 2020;
originally announced January 2020.
-
Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders
Authors:
Hebi Li,
Qi Xiao,
Shixin Tian,
Jin Tian
Abstract:
Machine learning models are vulnerable to adversarial examples. Iterative adversarial training has shown promising results against strong white-box attacks. However, adversarial training is very expensive, and every time a model needs to be protected, such expensive training scheme needs to be performed. In this paper, we propose to apply iterative adversarial training scheme to an external auto-e…
▽ More
Machine learning models are vulnerable to adversarial examples. Iterative adversarial training has shown promising results against strong white-box attacks. However, adversarial training is very expensive, and every time a model needs to be protected, such expensive training scheme needs to be performed. In this paper, we propose to apply iterative adversarial training scheme to an external auto-encoder, which once trained can be used to protect other models directly. We empirically show that our model outperforms other purifying-based methods against white-box attacks, and transfers well to directly protect other base models with different architectures.
△ Less
Submitted 26 May, 2019;
originally announced May 2019.
-
Application of Kriging Models for a Drug Combination Experiment on Lung Cancer
Authors:
Qian Xiao,
Lin Wang,
Hongquan Xu
Abstract:
Combinatorial drugs have been widely applied in disease treatment, especially chemotherapy for cancer, due to its improved efficacy and reduced toxicity compared with individual drugs. The study of combinatorial drugs requires efficient experimental designs and proper follow-up statistical modelling techniques. Linear and non-linear models are often used in the response surface modelling for such…
▽ More
Combinatorial drugs have been widely applied in disease treatment, especially chemotherapy for cancer, due to its improved efficacy and reduced toxicity compared with individual drugs. The study of combinatorial drugs requires efficient experimental designs and proper follow-up statistical modelling techniques. Linear and non-linear models are often used in the response surface modelling for such experiments. We propose the use of Kriging models to better depict the response surfaces of combinatorial drugs and take into account the measurement error. We further study how proper experimental designs can reduce the required number of runs. We illustrate our method via a combinatorial drug experiment on lung cancer. We demonstrate that only 27 runs are needed to predict all 512 runs in the original experiment and achieve better precision than existing analysis.
△ Less
Submitted 28 January, 2018;
originally announced January 2018.
-
Calculating correlation coefficient for Gaussian copula
Authors:
Qing Xiao
Abstract:
When Gaussian copula with linear correlation coefficient is used to model correlated random variables, one crucial issue is to determine a suitable correlation coefficient $ρ_z$ in normal space for two variables with correlation coefficient $ρ_x$. This paper attempts to address this problem. For two continuous variables, the marginal transformation is approximated by a weighted sum of Hermite poly…
▽ More
When Gaussian copula with linear correlation coefficient is used to model correlated random variables, one crucial issue is to determine a suitable correlation coefficient $ρ_z$ in normal space for two variables with correlation coefficient $ρ_x$. This paper attempts to address this problem. For two continuous variables, the marginal transformation is approximated by a weighted sum of Hermite polynomials, then, with Mehler's formula, a polynomial of $ρ_z$ is derived to approximate the function relationship between $ρ_x$ and $ρ_z$. If a discrete variable is involved, the marginal transformation is decomposed into piecewise continuous ones, and $ρ_x$ is expressed as a polynomial of $ρ_z$ by Taylor expansion. For a given $ρ_x$, $ρ_z$ can be efficiently determined by solving a polynomial equation.
△ Less
Submitted 2 August, 2016;
originally announced August 2016.
-
Generating correlated random vector by polynomial normal transformation
Authors:
Qing Xiao
Abstract:
This paper develops a polynomial normal transformation model, whereby various non-normal probability distributions can be simulated by the standard normal distribution. Two methods are presented to determine the coefficients of polynomial model: (1) probability weighted moment (PWM) matching (2) percentile matching. Compared to the existing raw moment or L-moment matching, the proposed methods are…
▽ More
This paper develops a polynomial normal transformation model, whereby various non-normal probability distributions can be simulated by the standard normal distribution. Two methods are presented to determine the coefficients of polynomial model: (1) probability weighted moment (PWM) matching (2) percentile matching. Compared to the existing raw moment or L-moment matching, the proposed methods are more computationally convenient, and can be used to estimate the coefficients of polynomial model with a higher degree. Furthermore, for two correlated random variables, a polynomial equation is derived to estimate the equivalent correlation coefficient in standard normal space, and random vector with non-normal marginal distributions and prescribed correlation matrix can be generated. Finally, numerical examples are worked to demonstrate the proposed method.
△ Less
Submitted 26 August, 2015;
originally announced August 2015.
-
A method for calculating quantile function and its further use for data fitting
Authors:
Qing Xiao
Abstract:
This paper introduces a polynomial transformation model based on Weibull distribution, whereby the analytical representation of the quantile function for many probability distributions can be obtained. Firstly, the target random variable $x$ with specified distribution is expressed as a polynomial of a Weibull random variable $z$, the coefficients are conveniently determined by the percentile matc…
▽ More
This paper introduces a polynomial transformation model based on Weibull distribution, whereby the analytical representation of the quantile function for many probability distributions can be obtained. Firstly, the target random variable $x$ with specified distribution is expressed as a polynomial of a Weibull random variable $z$, the coefficients are conveniently determined by the percentile matching method. Then, substituting $z$ with its quantile function $z=λ[-ln(1-u)]^{1/k}$ gives the analytical expression of the quantile function of $x$. Furthermore, using the probability weighted moments matching method, this polynomial transformation model can be used for data fitting. Through numerical experiment, it makes evident that the proposed model is capable of handling some distributions close to binomial which are difficult for the extant approaches, and the quantile functions of various distributions are accurately approximated within the probit range $[10^{-4},1-10^{-4}]$.
△ Less
Submitted 25 August, 2015;
originally announced August 2015.
-
Towards a Mathematical Foundation of Immunology and Amino Acid Chains
Authors:
Wen-Jun Shen,
Hau-San Wong,
Quan-Wu Xiao,
Xin Guo,
Stephen Smale
Abstract:
We attempt to set a mathematical foundation of immunology and amino acid chains. To measure the similarities of these chains, a kernel on strings is defined using only the sequence of the chains and a good amino acid substitution matrix (e.g. BLOSUM62). The kernel is used in learning machines to predict binding affinities of peptides to human leukocyte antigens DR (HLA-DR) molecules. On both fixed…
▽ More
We attempt to set a mathematical foundation of immunology and amino acid chains. To measure the similarities of these chains, a kernel on strings is defined using only the sequence of the chains and a good amino acid substitution matrix (e.g. BLOSUM62). The kernel is used in learning machines to predict binding affinities of peptides to human leukocyte antigens DR (HLA-DR) molecules. On both fixed allele (Nielsen and Lund 2009) and pan-allele (Nielsen et.al. 2010) benchmark databases, our algorithm achieves the state-of-the-art performance. The kernel is also used to define a distance on an HLA-DR allele set based on which a clustering analysis precisely recovers the serotype classifications assigned by WHO (Nielsen and Lund 2009, and Marsh et.al. 2010). These results suggest that our kernel relates well the chain structure of both peptides and HLA-DR molecules to their biological functions, and that it offers a simple, powerful and promising methodology to immunology and amino acid chain studies.
△ Less
Submitted 25 June, 2012; v1 submitted 28 May, 2012;
originally announced May 2012.