-
Bellman Optimality of Average-Reward Robust Markov Decision Processes with a Constant Gain
Authors:
Shengbo Wang,
Nian Si
Abstract:
Learning and optimal control under robust Markov decision processes (MDPs) have received increasing attention, yet most existing theory, algorithms, and applications focus on finite-horizon or discounted models. The average-reward formulation, while natural in many operations research and management contexts, remains underexplored. This is primarily because the dynamic programming foundations are…
▽ More
Learning and optimal control under robust Markov decision processes (MDPs) have received increasing attention, yet most existing theory, algorithms, and applications focus on finite-horizon or discounted models. The average-reward formulation, while natural in many operations research and management contexts, remains underexplored. This is primarily because the dynamic programming foundations are technically challenging and only partially understood, with several fundamental questions remaining open. This paper steps toward a general framework for average-reward robust MDPs by analyzing the constant-gain setting. We study the average-reward robust control problem with possible information asymmetries between the controller and an S-rectangular adversary. Our analysis centers on the constant-gain robust Bellman equation, examining both the existence of solutions and their relationship to the optimal average reward. Specifically, we identify when solutions to the robust Bellman equation characterize the optimal average reward and stationary policies, and we provide sufficient conditions ensuring solutions' existence. These findings expand the dynamic programming theory for average-reward robust MDPs and lay a foundation for robust dynamic decision making under long-run average criteria in operational environments.
△ Less
Submitted 17 September, 2025;
originally announced September 2025.
-
Representation-Aware Distributionally Robust Optimization: A Knowledge Transfer Framework
Authors:
Zitao Wang,
Nian Si,
Molei Liu
Abstract:
We propose REpresentation-Aware Distributionally Robust Estimation (READ), a novel framework for Wasserstein distributionally robust learning that accounts for predictive representations when guarding against distributional shifts. Unlike classical approaches that treat all feature perturbations equally, READ embeds a multidimensional alignment parameter into the transport cost, allowing the model…
▽ More
We propose REpresentation-Aware Distributionally Robust Estimation (READ), a novel framework for Wasserstein distributionally robust learning that accounts for predictive representations when guarding against distributional shifts. Unlike classical approaches that treat all feature perturbations equally, READ embeds a multidimensional alignment parameter into the transport cost, allowing the model to differentially discourage perturbations along directions associated with informative representations. This yields robustness to feature variation while preserving invariant structure. Our first contribution is a theoretical foundation: we show that seminorm regularizations for linear regression and binary classification arise as Wasserstein distributionally robust objectives, thereby providing tractable reformulations of READ and unifying a broad class of regularized estimators under the DRO lens. Second, we adopt a principled procedure for selecting the Wasserstein radius using the techniques of robust Wasserstein profile inference. This further enables the construction of valid, representation-aware confidence regions for model parameters with distinct geometric features. Finally, we analyze the geometry of READ estimators as the alignment parameters vary and propose an optimization algorithm to estimate the projection of the global optimum onto this solution surface. This procedure selects among equally robust estimators while optimally constructing a representation structure. We conclude by demonstrating the effectiveness of our framework through extensive simulations and a real-world study, providing a powerful robust estimation grounded in learning representation.
△ Less
Submitted 11 September, 2025;
originally announced September 2025.
-
Near-Optimal Sample Complexities of Divergence-based S-rectangular Distributionally Robust Reinforcement Learning
Authors:
Zhenghao Li,
Shengbo Wang,
Nian Si
Abstract:
Distributionally robust reinforcement learning (DR-RL) has recently gained significant attention as a principled approach that addresses discrepancies between training and testing environments. To balance robustness, conservatism, and computational traceability, the literature has introduced DR-RL models with SA-rectangular and S-rectangular adversaries. While most existing statistical analyses fo…
▽ More
Distributionally robust reinforcement learning (DR-RL) has recently gained significant attention as a principled approach that addresses discrepancies between training and testing environments. To balance robustness, conservatism, and computational traceability, the literature has introduced DR-RL models with SA-rectangular and S-rectangular adversaries. While most existing statistical analyses focus on SA-rectangular models, owing to their algorithmic simplicity and the optimality of deterministic policies, S-rectangular models more accurately capture distributional discrepancies in many real-world applications and often yield more effective robust randomized policies. In this paper, we study the empirical value iteration algorithm for divergence-based S-rectangular DR-RL and establish near-optimal sample complexity bounds of $\widetilde{O}(|\mathcal{S}||\mathcal{A}|(1-γ)^{-4}\varepsilon^{-2})$, where $\varepsilon$ is the target accuracy, $|\mathcal{S}|$ and $|\mathcal{A}|$ denote the cardinalities of the state and action spaces, and $γ$ is the discount factor. To the best of our knowledge, these are the first sample complexity results for divergence-based S-rectangular models that achieve optimal dependence on $|\mathcal{S}|$, $|\mathcal{A}|$, and $\varepsilon$ simultaneously. We further validate this theoretical dependence through numerical experiments on a robust inventory control problem and a theoretical worst-case example, demonstrating the fast learning performance of our proposed algorithm.
△ Less
Submitted 2 October, 2025; v1 submitted 17 May, 2025;
originally announced May 2025.
-
Sample Complexity of Distributionally Robust Average-Reward Reinforcement Learning
Authors:
Zijun Chen,
Shengbo Wang,
Nian Si
Abstract:
Motivated by practical applications where stable long-term performance is critical-such as robotics, operations research, and healthcare-we study the problem of distributionally robust (DR) average-reward reinforcement learning. We propose two algorithms that achieve near-optimal sample complexity. The first reduces the problem to a DR discounted Markov decision process (MDP), while the second, An…
▽ More
Motivated by practical applications where stable long-term performance is critical-such as robotics, operations research, and healthcare-we study the problem of distributionally robust (DR) average-reward reinforcement learning. We propose two algorithms that achieve near-optimal sample complexity. The first reduces the problem to a DR discounted Markov decision process (MDP), while the second, Anchored DR Average-Reward MDP, introduces an anchoring state to stabilize the controlled transition kernels within the uncertainty set. Assuming the nominal MDP is uniformly ergodic, we prove that both algorithms attain a sample complexity of $\widetilde{O}\left(|\mathbf{S}||\mathbf{A}| t_{\mathrm{mix}}^2\varepsilon^{-2}\right)$ for estimating the optimal policy as well as the robust average reward under KL and $f_k$-divergence-based uncertainty sets, provided the uncertainty radius is sufficiently small. Here, $\varepsilon$ is the target accuracy, $|\mathbf{S}|$ and $|\mathbf{A}|$ denote the sizes of the state and action spaces, and $t_{\mathrm{mix}}$ is the mixing time of the nominal MDP. This represents the first finite-sample convergence guarantee for DR average-reward reinforcement learning. We further validate the convergence rates of our algorithms through numerical experiments.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
RemoteChess: Enhancing Older Adults' Social Connectedness via Designing a Virtual Reality Chinese Chess (Xiangqi) Community
Authors:
Qianjie Wei,
Xiaoying Wei,
Yiqi Liang,
Fan Lin,
Nuonan Si,
Mingming Fan
Abstract:
The decline of social connectedness caused by distance and physical limitations severely affects older adults' well-being and mental health. While virtual reality (VR) is promising for older adults to socialize remotely, existing social VR designs primarily focus on verbal communication (e.g., reminiscent, chat). Actively engaging in shared activities is also an important aspect of social connecti…
▽ More
The decline of social connectedness caused by distance and physical limitations severely affects older adults' well-being and mental health. While virtual reality (VR) is promising for older adults to socialize remotely, existing social VR designs primarily focus on verbal communication (e.g., reminiscent, chat). Actively engaging in shared activities is also an important aspect of social connection. We designed RemoteChess, which constructs a social community and a culturally relevant activity (i.e., Chinese chess) for older adults to play while engaging in social interaction. We conducted a user study with groups of older adults interacting with each other through RemoteChess. Our findings indicate that RemoteChess enhanced participants' social connectedness by offering familiar environments, culturally relevant social catalysts, and asymmetric interactions. We further discussed design guidelines for designing culturally relevant social activities in VR to promote social connectedness for older adults.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Knowledge-Guided Wasserstein Distributionally Robust Optimization
Authors:
Zitao Wang,
Ziyuan Wang,
Molei Liu,
Nian Si
Abstract:
Transfer learning is a popular strategy to leverage external knowledge and improve statistical efficiency, particularly with a limited target sample. We propose a novel knowledge-guided Wasserstein Distributionally Robust Optimization (KG-WDRO) framework that adaptively incorporates multiple sources of external knowledge to overcome the conservativeness of vanilla WDRO, which often results in over…
▽ More
Transfer learning is a popular strategy to leverage external knowledge and improve statistical efficiency, particularly with a limited target sample. We propose a novel knowledge-guided Wasserstein Distributionally Robust Optimization (KG-WDRO) framework that adaptively incorporates multiple sources of external knowledge to overcome the conservativeness of vanilla WDRO, which often results in overly pessimistic shrinkage toward zero. Our method constructs smaller Wasserstein ambiguity sets by controlling the transportation along directions informed by the source knowledge. This strategy can alleviate perturbations on the predictive projection of the covariates and protect against information loss. Theoretically, we establish the equivalence between our WDRO formulation and the knowledge-guided shrinkage estimation based on collinear similarity, ensuring tractability and geometrizing the feasible set. This also reveals a novel and general interpretation for recent shrinkage-based transfer learning approaches from the perspective of distributional robustness. In addition, our framework can adjust for scaling differences in the regression models between the source and target and accommodates general types of regularization such as lasso and ridge. Extensive simulations demonstrate the superior performance and adaptivity of KG-WDRO in enhancing small-sample transfer learning.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
ScoreFusion: Fusing Score-based Generative Models via Kullback-Leibler Barycenters
Authors:
Hao Liu,
Junze Tony Ye,
Jose Blanchet,
Nian Si
Abstract:
We introduce ScoreFusion, a theoretically grounded method for fusing multiple pre-trained diffusion models that are assumed to generate from auxiliary populations. ScoreFusion is particularly useful for enhancing the generative modeling of a target population with limited observed data. Our starting point considers the family of KL barycenters of the auxiliary populations, which is proven to be an…
▽ More
We introduce ScoreFusion, a theoretically grounded method for fusing multiple pre-trained diffusion models that are assumed to generate from auxiliary populations. ScoreFusion is particularly useful for enhancing the generative modeling of a target population with limited observed data. Our starting point considers the family of KL barycenters of the auxiliary populations, which is proven to be an optimal parametric class in the KL sense, but difficult to learn. Nevertheless, by recasting the learning problem as score matching in denoising diffusion, we obtain a tractable way of computing the optimal KL barycenter weights. We prove a dimension-free sample complexity bound in total variation distance, provided that the auxiliary models are well-fitted for their own task and the auxiliary tasks combined capture the target well. The sample efficiency of ScoreFusion is demonstrated by learning handwritten digits. We also provide a simple adaptation of a Stable Diffusion denoising pipeline that enables sampling from the KL barycenter of two auxiliary checkpoints; on a portrait generation task, our method produces faces that enhance population heterogeneity relative to the auxiliary distributions.
△ Less
Submitted 16 April, 2025; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Statistical Learning of Distributionally Robust Stochastic Control in Continuous State Spaces
Authors:
Shengbo Wang,
Nian Si,
Jose Blanchet,
Zhengyuan Zhou
Abstract:
We explore the control of stochastic systems with potentially continuous state and action spaces, characterized by the state dynamics $X_{t+1} = f(X_t, A_t, W_t)$. Here, $X$, $A$, and $W$ represent the state, action, and exogenous random noise processes, respectively, with $f$ denoting a known function that describes state transitions. Traditionally, the noise process $\{W_t, t \geq 0\}$ is assume…
▽ More
We explore the control of stochastic systems with potentially continuous state and action spaces, characterized by the state dynamics $X_{t+1} = f(X_t, A_t, W_t)$. Here, $X$, $A$, and $W$ represent the state, action, and exogenous random noise processes, respectively, with $f$ denoting a known function that describes state transitions. Traditionally, the noise process $\{W_t, t \geq 0\}$ is assumed to be independent and identically distributed, with a distribution that is either fully known or can be consistently estimated. However, the occurrence of distributional shifts, typical in engineering settings, necessitates the consideration of the robustness of the policy. This paper introduces a distributionally robust stochastic control paradigm that accommodates possibly adaptive adversarial perturbation to the noise distribution within a prescribed ambiguity set. We examine two adversary models: current-action-aware and current-action-unaware, leading to different dynamic programming equations. Furthermore, we characterize the optimal finite sample minimax rates for achieving uniform learning of the robust value function across continuum states under both adversary types, considering ambiguity sets defined by $f_k$-divergence and Wasserstein distance. Finally, we demonstrate the applicability of our framework across various real-world settings.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Calibration-then-Calculation: A Variance Reduced Metric Framework in Deep Click-Through Rate Prediction Models
Authors:
Yewen Fan,
Nian Si,
Xiangchen Song,
Kun Zhang
Abstract:
The adoption of deep learning across various fields has been extensive, yet there is a lack of focus on evaluating the performance of deep learning pipelines. Typically, with the increased use of large datasets and complex models, the training process is run only once and the result is compared to previous benchmarks. This practice can lead to imprecise comparisons due to the variance in neural ne…
▽ More
The adoption of deep learning across various fields has been extensive, yet there is a lack of focus on evaluating the performance of deep learning pipelines. Typically, with the increased use of large datasets and complex models, the training process is run only once and the result is compared to previous benchmarks. This practice can lead to imprecise comparisons due to the variance in neural network evaluation metrics, which stems from the inherent randomness in the training process. Traditional solutions, such as running the training process multiple times, are often infeasible due to computational constraints. In this paper, we introduce a novel metric framework, the Calibrated Loss Metric, designed to address this issue by reducing the variance present in its conventional counterpart. Consequently, this new metric enhances the accuracy in detecting effective modeling improvements. Our approach is substantiated by theoretical justifications and extensive experimental validations within the context of Deep Click-Through Rate Prediction Models.
△ Less
Submitted 17 May, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Seller-Side Experiments under Interference Induced by Feedback Loops in Two-Sided Platforms
Authors:
Zhihua Zhu,
Zheng Cai,
Liang Zheng,
Nian Si
Abstract:
Two-sided platforms are central to modern commerce and content sharing and often utilize A/B testing for developing new features. While user-side experiments are common, seller-side experiments become crucial for specific interventions and metrics. This paper investigates the effects of interference caused by feedback loops on seller-side experiments in two-sided platforms, with a particular focus…
▽ More
Two-sided platforms are central to modern commerce and content sharing and often utilize A/B testing for developing new features. While user-side experiments are common, seller-side experiments become crucial for specific interventions and metrics. This paper investigates the effects of interference caused by feedback loops on seller-side experiments in two-sided platforms, with a particular focus on the counterfactual interleaving design, proposed in \citet{ha2020counterfactual,nandy2021b}. These feedback loops, often generated by pacing algorithms, cause outcomes from earlier sessions to influence subsequent ones. This paper contributes by creating a mathematical framework to analyze this interference, theoretically estimating its impact, and conducting empirical evaluations of the counterfactual interleaving design in real-world scenarios. Our research shows that feedback loops can result in misleading conclusions about the treatment effects.
△ Less
Submitted 9 February, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
MPN: Leveraging Multilingual Patch Neuron for Cross-lingual Model Editing
Authors:
Nianwen Si,
Hao Zhang,
Weiqiang Zhang
Abstract:
Large language models are known for encoding a vast amount of factual knowledge, but they often becomes outdated due to the ever-changing nature of external information. A promising solution to this challenge is the utilization of model editing methods to update the knowledge in an efficient manner. However, the majority of existing model editing techniques are limited to monolingual frameworks, t…
▽ More
Large language models are known for encoding a vast amount of factual knowledge, but they often becomes outdated due to the ever-changing nature of external information. A promising solution to this challenge is the utilization of model editing methods to update the knowledge in an efficient manner. However, the majority of existing model editing techniques are limited to monolingual frameworks, thus failing to address the crucial issue of cross-lingual knowledge synchronization for multilingual models. To tackle this problem, we propose a simple yet effective method that trains multilingual patch neuron to store cross-lingual knowledge. It can be easily adapted to existing approaches to enhance their cross-lingual editing capabilities. To evaluate our method, we conduct experiments using both the XNLI dataset and a self-constructed XFEVER dataset. Experimental results demonstrate that our proposed method achieves improved performance in cross-lingual editing tasks without requiring excessive modifications to the original methodology, thereby showcasing its user-friendly characteristics. Codes will be released soon.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
Authors:
Nianwen Si,
Hao Zhang,
Heyu Chang,
Wenlin Zhang,
Dan Qu,
Weiqiang Zhang
Abstract:
In recent years, large language models (LLMs) have spurred a new research paradigm in natural language processing. Despite their excellent capability in knowledge-based question answering and reasoning, their potential to retain faulty or even harmful knowledge poses risks of malicious application. The challenge of mitigating this issue and transforming these models into purer assistants is crucia…
▽ More
In recent years, large language models (LLMs) have spurred a new research paradigm in natural language processing. Despite their excellent capability in knowledge-based question answering and reasoning, their potential to retain faulty or even harmful knowledge poses risks of malicious application. The challenge of mitigating this issue and transforming these models into purer assistants is crucial for their widespread applicability. Unfortunately, Retraining LLMs repeatedly to eliminate undesirable knowledge is impractical due to their immense parameters. Knowledge unlearning, derived from analogous studies on machine unlearning, presents a promising avenue to address this concern and is notably advantageous in the context of LLMs. It allows for the removal of harmful knowledge in an efficient manner, without affecting unrelated knowledge in the model. To this end, we provide a survey of knowledge unlearning in the era of LLMs. Firstly, we formally define the knowledge unlearning problem and distinguish it from related works. Subsequently, we categorize existing knowledge unlearning methods into three classes: those based on parameter optimization, parameter merging, and in-context learning, and introduce details of these unlearning methods. We further present evaluation datasets used in existing methods, and finally conclude this survey by presenting the ongoing challenges and future directions.
△ Less
Submitted 7 December, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
On the Foundation of Distributionally Robust Reinforcement Learning
Authors:
Shengbo Wang,
Nian Si,
Jose Blanchet,
Zhengyuan Zhou
Abstract:
Motivated by the need for a robust policy in the face of environment shifts between training and deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around robust Markov decision processes (RMDPs). This framework obliges the decision maker to choose an optimal poli…
▽ More
Motivated by the need for a robust policy in the face of environment shifts between training and deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around robust Markov decision processes (RMDPs). This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary. By unifying and extending existing formulations, we rigorously construct RMDPs that embrace various modeling attributes for both the decision maker and the adversary. These attributes include the structure of information availability-covering history-dependent, Markov, and Markov time-homogeneous dynamics-as well as constraints on the shifts induced by the adversary, with a focus on SA- and S-rectangularity. Within this RMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP). From an algorithmic standpoint, the existence of DPP holds significant implications, as the vast majority of existing data and computationally efficient DRRL algorithms are reliant on the DPP. To investigate its existence, we systematically analyze various combinations of controller and adversary attributes, presenting streamlined proofs based on a unified methodology. We then construct counterexamples for settings where a fully general DPP fails to hold and establish asymptotically optimal history-dependent policies for key scenarios where the DPP is absent.
△ Less
Submitted 24 August, 2025; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Tackling Interference Induced by Data Training Loops in A/B Tests: A Weighted Training Approach
Authors:
Nian Si
Abstract:
In modern recommendation systems, the standard pipeline involves training machine learning models on historical data to predict user behaviors and improve recommendations continuously. However, these data training loops can introduce interference in A/B tests, where data generated by control and treatment algorithms, potentially with different distributions, are combined. To address these challeng…
▽ More
In modern recommendation systems, the standard pipeline involves training machine learning models on historical data to predict user behaviors and improve recommendations continuously. However, these data training loops can introduce interference in A/B tests, where data generated by control and treatment algorithms, potentially with different distributions, are combined. To address these challenges, we introduce a novel approach called weighted training. This approach entails training a model to predict the probability of each data point appearing in either the treatment or control data and subsequently applying weighted losses during model training. We demonstrate that this approach achieves the least variance among all estimators that do not cause shifts in the training distributions. Through simulation studies, we demonstrate the lower bias and variance of our approach compared to other methods.
△ Less
Submitted 4 April, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Tuning Large language model for End-to-end Speech Translation
Authors:
Hao Zhang,
Nianwen Si,
Yaqi Chen,
Wenlin Zhang,
Xukui Yang,
Dan Qu,
Xiaolin Jiao
Abstract:
With the emergence of large language models (LLMs), multimodal models based on LLMs have demonstrated significant potential. Models such as LLaSM, X-LLM, and SpeechGPT exhibit an impressive ability to comprehend and generate human instructions. However, their performance often falters when faced with complex tasks like end-to-end speech translation (E2E-ST), a cross-language and cross-modal transl…
▽ More
With the emergence of large language models (LLMs), multimodal models based on LLMs have demonstrated significant potential. Models such as LLaSM, X-LLM, and SpeechGPT exhibit an impressive ability to comprehend and generate human instructions. However, their performance often falters when faced with complex tasks like end-to-end speech translation (E2E-ST), a cross-language and cross-modal translation task. In comparison to single-modal models, multimodal models lag behind in these scenarios. This paper introduces LST, a Large multimodal model designed to excel at the E2E-ST task. LST consists of a speech frontend, an adapter, and a LLM backend. The training of LST consists of two stages: (1) Modality adjustment, where the adapter is tuned to align speech representation with text embedding space, and (2) Downstream task fine-tuning, where both the adapter and LLM model are trained to optimize performance on the E2EST task. Experimental results on the MuST-C speech translation benchmark demonstrate that LST-13B achieves BLEU scores of 30.39/41.55/35.33 on En-De/En-Fr/En-Es language pairs, surpassing previous models and establishing a new state-of-the-art. Additionally, we conduct an in-depth analysis of single-modal model selection and the impact of training strategies, which lays the foundation for future research. We will open up our code and models after review.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Drift Control of High-Dimensional RBM: A Computational Method Based on Neural Networks
Authors:
Baris Ata,
J. Michael Harrison,
Nian Si
Abstract:
Motivated by applications in queueing theory, we consider a stochastic control problem whose state space is the $d$-dimensional positive orthant. The controlled process $Z$ evolves as a reflected Brownian motion whose covariance matrix is exogenously specified, as are its directions of reflection from the orthant's boundary surfaces. A system manager chooses a drift vector $θ(t)$ at each time $t$…
▽ More
Motivated by applications in queueing theory, we consider a stochastic control problem whose state space is the $d$-dimensional positive orthant. The controlled process $Z$ evolves as a reflected Brownian motion whose covariance matrix is exogenously specified, as are its directions of reflection from the orthant's boundary surfaces. A system manager chooses a drift vector $θ(t)$ at each time $t$ based on the history of $Z$, and the cost rate at time $t$ depends on both $Z(t)$ and $θ(t)$. In our initial problem formulation, the objective is to minimize expected discounted cost over an infinite planning horizon, after which we treat the corresponding ergodic control problem. Extending earlier work by Han et al. (Proceedings of the National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a simulation-based computational method that relies heavily on deep neural network technology. For test problems studied thus far, our method is accurate to within a fraction of one percent, and is computationally feasible in dimensions up to at least $d=30$.
△ Less
Submitted 7 August, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Sample Complexity of Variance-reduced Distributionally Robust Q-learning
Authors:
Shengbo Wang,
Nian Si,
Jose Blanchet,
Zhengyuan Zhou
Abstract:
Dynamic decision-making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment in which the data is collected can differ from that of the environment in which the model is deployed. This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced coun…
▽ More
Dynamic decision-making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment in which the data is collected can differ from that of the environment in which the model is deployed. This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced counterpart, that can effectively learn a robust policy despite distributional shifts. These algorithms are designed to efficiently approximate the $q$-function of an infinite-horizon $γ$-discounted robust Markov decision process with Kullback-Leibler ambiguity set to an entry-wise $ε$-degree of precision. Further, the variance-reduced distributionally robust Q-learning combines the synchronous Q-learning with variance-reduction techniques to enhance its performance. Consequently, we establish that it attains a minimax sample complexity upper bound of $\tilde O(|\mathbf{S}||\mathbf{A}|(1-γ)^{-4}ε^{-2})$, where $\mathbf{S}$ and $\mathbf{A}$ denote the state and action spaces. This is the first complexity result that is independent of the ambiguity size $δ$, thereby providing new complexity theoretic insights. Additionally, a series of numerical experiments confirm the theoretical findings and the efficiency of the algorithms in handling distributional shifts.
△ Less
Submitted 4 September, 2024; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
Authors:
Hao Zhang,
Nianwen Si,
Yaqi Chen,
Wenlin Zhang,
Xukui Yang,
Dan Qu,
Wei-Qiang Zhang
Abstract:
The end-to-end speech translation (E2E-ST) model has gradually become a mainstream paradigm due to its low latency and less error propagation. However, it is non-trivial to train such a model well due to the task complexity and data scarcity. The speech-and-text modality differences result in the E2E-ST model performance usually inferior to the corresponding machine translation (MT) model. Based o…
▽ More
The end-to-end speech translation (E2E-ST) model has gradually become a mainstream paradigm due to its low latency and less error propagation. However, it is non-trivial to train such a model well due to the task complexity and data scarcity. The speech-and-text modality differences result in the E2E-ST model performance usually inferior to the corresponding machine translation (MT) model. Based on the above observation, existing methods often use sharingmechanisms to carry out implicit knowledge transfer by imposing various constraints. However, the final model often performs worse on the MT task than the MT model trained alone, which means that the knowledge transfer ability of this method is also limited. To deal with these problems, we propose the FCCL (Fine- and Coarse- Granularity Contrastive Learning) approach for E2E-ST, which makes explicit knowledge transfer through cross-modal multi-grained contrastive learning. A key ingredient of our approach is applying contrastive learning at both sentence- and frame-level to give the comprehensive guide for extracting speech representations containing rich semantic information.In addition, we adopt a simple whitening method to alleviate the representation degeneration in the MT model, which adversely affects contrast learning. Experiments on the MuST-C benchmark show that our proposed approach significantly outperforms the state-of-the-art E2E-ST baselines on all eight language pairs. Further analysis indicates that FCCL can free up its capacity from learning grammatical structure information and force more layers to learn semantic information.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation
Authors:
Hao Zhang,
Nianwen Si,
Yaqi Chen,
Wenlin Zhang,
Xukui Yang,
Dan Qu,
Zhen Li
Abstract:
Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training. However, transcriptions are not always available, and how to improve the ST model performance without transcription, i.e., data efficiency, has rarely been studied in…
▽ More
Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training. However, transcriptions are not always available, and how to improve the ST model performance without transcription, i.e., data efficiency, has rarely been studied in the literature. In this paper, we propose Decoupled Non-parametric Knowledge Distillation (DNKD) from data perspective to improve the data efficiency. Our method follows the knowledge distillation paradigm. However, instead of obtaining the teacher distribution from a sophisticated MT model, we construct it from a non-parametric datastore via k-Nearest-Neighbor (kNN) retrieval, which removes the dependence on transcription and MT model. Then we decouple the classic knowledge distillation loss into target and non-target distillation to enhance the effect of the knowledge among non-target logits, which is the prominent "dark knowledge". Experiments on MuST-C corpus show that, the proposed method can achieve consistent improvement over the strong baseline without requiring any transcription.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
A Finite Sample Complexity Bound for Distributionally Robust Q-learning
Authors:
Shengbo Wang,
Nian Si,
Jose Blanchet,
Zhengyuan Zhou
Abstract:
We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust $Q$-learning framework studied in Liu et al. [2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prov…
▽ More
We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust $Q$-learning framework studied in Liu et al. [2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust $Q$-function within an $ε$ error in the sup norm is upper bounded by $\tilde O(|S||A|(1-γ)^{-5}ε^{-2}p_{\wedge}^{-6}δ^{-4})$, where $γ$ is the discount rate, $p_{\wedge}$ is the non-zero minimal support probability of the transition kernels and $δ$ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.
△ Less
Submitted 31 July, 2024; v1 submitted 25 February, 2023;
originally announced February 2023.
-
Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems
Authors:
Yewen Fan,
Nian Si,
Kun Zhang
Abstract:
Calibration is defined as the ratio of the average predicted click rate to the true click rate. The optimization of calibration is essential to many online advertising recommendation systems because it directly affects the downstream bids in ads auctions and the amount of money charged to advertisers. Despite its importance, calibration optimization often suffers from a problem called "maximizatio…
▽ More
Calibration is defined as the ratio of the average predicted click rate to the true click rate. The optimization of calibration is essential to many online advertising recommendation systems because it directly affects the downstream bids in ads auctions and the amount of money charged to advertisers. Despite its importance, calibration optimization often suffers from a problem called "maximization bias". Maximization bias refers to the phenomenon that the maximum of predicted values overestimates the true maximum. The problem is introduced because the calibration is computed on the set selected by the prediction model itself. It persists even if unbiased predictions can be achieved on every datapoint and worsens when covariate shifts exist between the training and test sets. To mitigate this problem, we theorize the quantification of maximization bias and propose a variance-adjusting debiasing (VAD) meta-algorithm in this paper. The algorithm is efficient, robust, and practical as it is able to mitigate maximization bias problems under covariate shifts, neither incurring additional online serving costs nor compromising the ranking performance. We demonstrate the effectiveness of the proposed algorithm using a state-of-the-art recommendation neural network model on a large-scale real-world dataset.
△ Less
Submitted 21 March, 2023; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Testing Group Fairness via Optimal Transport Projections
Authors:
Nian Si,
Karthyek Murthy,
Jose Blanchet,
Viet Anh Nguyen
Abstract:
We present a statistical testing framework to detect if a given machine learning classifier fails to satisfy a wide range of group fairness notions. The proposed test is a flexible, interpretable, and statistically rigorous tool for auditing whether exhibited biases are intrinsic to the algorithm or due to the randomness in the data. The statistical challenges, which may arise from multiple impact…
▽ More
We present a statistical testing framework to detect if a given machine learning classifier fails to satisfy a wide range of group fairness notions. The proposed test is a flexible, interpretable, and statistically rigorous tool for auditing whether exhibited biases are intrinsic to the algorithm or due to the randomness in the data. The statistical challenges, which may arise from multiple impact criteria that define group fairness and which are discontinuous on model parameters, are conveniently tackled by projecting the empirical measure onto the set of group-fair probability models using optimal transport. This statistic is efficiently computed using linear programming and its asymptotic distribution is explicitly obtained. The proposed framework can also be used to test for testing composite fairness hypotheses and fairness with multiple sensitive attributes. The optimal transport testing formulation improves interpretability by characterizing the minimal covariate perturbations that eliminate the bias observed in the audit.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Robust Bayesian Classification Using an Optimistic Score Ratio
Authors:
Viet Anh Nguyen,
Nian Si,
Jose Blanchet
Abstract:
We build a Bayesian contextual classification model using an optimistic score ratio for robust binary classification when there is limited information on the class-conditional, or contextual, distribution. The optimistic score searches for the distribution that is most plausible to explain the observed outcomes in the testing sample among all distributions belonging to the contextual ambiguity set…
▽ More
We build a Bayesian contextual classification model using an optimistic score ratio for robust binary classification when there is limited information on the class-conditional, or contextual, distribution. The optimistic score searches for the distribution that is most plausible to explain the observed outcomes in the testing sample among all distributions belonging to the contextual ambiguity set which is prescribed using a limited structural constraint on the mean vector and the covariance matrix of the underlying contextual distribution. We show that the Bayesian classifier using the optimistic score ratio is conceptually attractive, delivers solid statistical guarantees and is computationally tractable. We showcase the power of the proposed optimistic score ratio classifier on both synthetic and empirical data.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
Distributionally Robust Batch Contextual Bandits
Authors:
Nian Si,
Fan Zhang,
Zhengyuan Zhou,
Jose Blanchet
Abstract:
Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the s…
▽ More
Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset.
△ Less
Submitted 11 September, 2023; v1 submitted 9 June, 2020;
originally announced June 2020.