Skip to main content

Showing 1–8 of 8 results for author: Ying, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.12391  [pdf, other

    cs.LG

    Reward-Safety Balance in Offline Safe RL via Diffusion Regularization

    Authors: Junyu Guo, Zhi Zheng, Donghao Ying, Ming Jin, Shangding Gu, Costas Spanos, Javad Lavaei

    Abstract: Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent has only a fixed dataset -- common in realistic tasks to prevent unsafe exploration. To address this, we propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL), which first uses a diffusion model to capture the behavioral policy… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  2. arXiv:2405.14741  [pdf, other

    math.OC cs.LG stat.ML

    Subsampled Ensemble Can Improve Generalization Tail Exponentially

    Authors: Huajie Qian, Donghao Ying, Henry Lam, Wotao Yin

    Abstract: Ensemble learning is a popular technique to improve the accuracy of machine learning models. It traditionally hinges on the rationale that aggregating multiple weak models can lead to better models with lower variance and hence higher stability, especially for discontinuous base learners. In this paper, we provide a new perspective on ensembling. By selecting the best model trained on subsamples v… ▽ More

    Submitted 1 February, 2025; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 42 pages, 18 figures

  3. arXiv:2305.17568  [pdf, other

    cs.LG math.OC

    Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities

    Authors: Donghao Ying, Yunkai Zhang, Yuhao Ding, Alec Koppel, Javad Lavaei

    Abstract: We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. The objective and constraints are described by {\it general utilities}, i.e., nonlinear functions of the long-term state-action occupancy measure, which encompass broader decision-making goals such as risk, exploratio… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: 50 pages

  4. arXiv:2305.17567  [pdf, other

    cs.GT math.OC

    No-Regret Learning in Dynamic Competition with Reference Effects Under Logit Demand

    Authors: Mengzi Amy Guo, Donghao Ying, Javad Lavaei, Zuo-Jun Max Shen

    Abstract: This work is dedicated to the algorithm design in a competitive framework, with the primary goal of learning a stable equilibrium. We consider the dynamic price competition between two firms operating within an opaque marketplace, where each firm lacks information about its competitor. The demand follows the multinomial logit (MNL) choice model, which depends on the consumers' observed price and t… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

  5. arXiv:2302.07938  [pdf, ps, other

    cs.LG cs.AI cs.MA

    Scalable Multi-Agent Reinforcement Learning with General Utilities

    Authors: Donghao Ying, Yuhao Ding, Alec Koppel, Javad Lavaei

    Abstract: We study the scalable multi-agent reinforcement learning (MARL) with general utilities, defined as nonlinear functions of the team's long-term state-action occupancy measure. The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team. By exploiting the spatial correlation decay property of th… ▽ More

    Submitted 26 August, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: Supplementary material for the contribution to American Control Conference 2023 under the same title

  6. arXiv:2205.10715  [pdf, other

    cs.LG math.OC

    Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

    Authors: Donghao Ying, Mengzi Amy Guo, Hyunin Lee, Yuhao Ding, Javad Lavaei, Zuo-Jun Max Shen

    Abstract: We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the chal… ▽ More

    Submitted 26 May, 2024; v1 submitted 21 May, 2022; originally announced May 2022.

  7. arXiv:2110.08923  [pdf, ps, other

    cs.LG math.OC

    A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

    Authors: Donghao Ying, Yuhao Ding, Javad Lavaei

    Abstract: We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By leveraging the entropy regularization, our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be… ▽ More

    Submitted 7 April, 2023; v1 submitted 17 October, 2021; originally announced October 2021.

    Comments: 24 pages, AISTATS22

  8. Kronecker Product Correlation Model and Limited Feedback Codebook Design in a 3D Channel Model

    Authors: Dawei Ying, Frederick W. Vook, Timothy A. Thomas, David J. Love, Amitava Ghosh

    Abstract: A 2D antenna array introduces a new level of control and additional degrees of freedom in multiple-input-multiple-output (MIMO) systems particularly for the so-called "massive MIMO" systems. To accurately assess the performance gains of these large arrays, existing azimuth-only channel models have been extended to handle 3D channels by modeling both the elevation and azimuth dimensions. In this pa… ▽ More

    Submitted 13 January, 2014; originally announced January 2014.

    Comments: 6 pages, 5 figures, to appear at IEEE ICC 2014