Skip to main content

Showing 1–10 of 10 results for author: Zhan, S S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.00131  [pdf, ps, other

    cs.LG cs.AI

    Adapting Offline Reinforcement Learning with Online Delays

    Authors: Simon Sinong Zhan, Qingyuan Wu, Frank Yang, Xiangyu Shi, Chao Huang, Qi Zhu

    Abstract: Offline-to-online deployment of reinforcement-learning (RL) agents must bridge two gaps: (1) the sim-to-real gap, where real systems add latency and other imperfections not present in simulation, and (2) the interaction gap, where policies trained purely offline face out-of-distribution states during online execution because gathering new interaction data is costly or risky. Agents therefore have… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  2. arXiv:2505.00546  [pdf, ps, other

    cs.LG

    Directly Forecasting Belief for Reinforcement Learning with Delays

    Authors: Qingyuan Wu, Yuhui Wang, Simon Sinong Zhan, Yixuan Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang

    Abstract: Reinforcement learning (RL) with delays is challenging as sensory perceptions lag behind the actual events: the RL agent needs to estimate the real state of its environment based on past observations. State-of-the-art (SOTA) methods typically employ recursive, step-by-step forecasting of states. This can cause the accumulation of compounding errors. To tackle this problem, our novel belief estimat… ▽ More

    Submitted 7 June, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Journal ref: 42nd International Conference on Machine Learning, ICML 2025

  3. arXiv:2412.02931  [pdf, other

    cs.LG cs.AI eess.SY

    Inverse Delayed Reinforcement Learning

    Authors: Simon Sinong Zhan, Qingyuan Wu, Zhian Ruan, Frank Yang, Philip Wang, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu

    Abstract: Inverse Reinforcement Learning (IRL) has demonstrated effectiveness in a variety of imitation tasks. In this paper, we introduce an IRL framework designed to extract rewarding features from expert trajectories affected by delayed disturbances. Instead of relying on direct observations, our approach employs an efficient off-policy adversarial training framework to derive expert features and recover… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  4. arXiv:2410.03847  [pdf, other

    cs.LG cs.AI

    Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments

    Authors: Simon Sinong Zhan, Qingyuan Wu, Philip Wang, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu

    Abstract: In this paper, we aim to tackle the limitation of the Adversarial Inverse Reinforcement Learning (AIRL) method in stochastic environments where theoretical results cannot hold and performance is degraded. To address this issue, we propose a novel method which infuses the dynamics information into the reward shaping with the theoretical guarantee for the induced optimal policy in the stochastic env… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  5. arXiv:2408.08592  [pdf, other

    cs.RO

    Case Study: Runtime Safety Verification of Neural Network Controlled System

    Authors: Frank Yang, Sinong Simon Zhan, Yixuan Wang, Chao Huang, Qi Zhu

    Abstract: Neural networks are increasingly used in safety-critical applications such as robotics and autonomous vehicles. However, the deployment of neural-network-controlled systems (NNCSs) raises significant safety concerns. Many recent advances overlook critical aspects of verifying control and ensuring safety in real-time scenarios. This paper presents a case study on using POLAR-Express, a state-of-the… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 15 pages, 5 figures, submitted to Runtime Verification 2024

  6. arXiv:2405.14226  [pdf, other

    cs.LG cs.AI

    Variational Delayed Policy Optimization

    Authors: Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang

    Abstract: In environments with delayed observation, state augmentation by including actions within the delay window is adopted to retrieve Markovian property to enable reinforcement learning (RL). However, state-of-the-art (SOTA) RL techniques with Temporal-Difference (TD) learning frameworks often suffer from learning inefficiency, due to the significant expansion of the augmented state space with the dela… ▽ More

    Submitted 21 October, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024 (Spotlight)

  7. arXiv:2402.03141  [pdf, other

    cs.LG cs.AI eess.SY

    Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

    Authors: Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang

    Abstract: Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary ta… ▽ More

    Submitted 5 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  8. arXiv:2312.00812  [pdf, other

    cs.AI cs.LG eess.SY

    Empowering Autonomous Driving with Large Language Models: A Safety Perspective

    Authors: Yixuan Wang, Ruochen Jiao, Sinong Simon Zhan, Chengtian Lang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu

    Abstract: Autonomous Driving (AD) encounters significant safety hurdles in long-tail unforeseen driving scenarios, largely stemming from the non-interpretability and poor generalization of the deep neural networks within the AD system, particularly in out-of-distribution and uncertain data. To this end, this paper explores the integration of Large Language Models (LLMs) into AD systems, leveraging their rob… ▽ More

    Submitted 22 March, 2024; v1 submitted 27 November, 2023; originally announced December 2023.

    Comments: Accepted to LLMAgent workshop @ICLR2024

  9. arXiv:2311.02227  [pdf, other

    cs.LG cs.AI eess.SY

    State-Wise Safe Reinforcement Learning With Pixel Observations

    Authors: Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu

    Abstract: In the context of safe exploration, Reinforcement Learning (RL) has long grappled with the challenges of balancing the tradeoff between maximizing rewards and minimizing safety violations, particularly in complex environments with contact-rich or non-smooth dynamics, and when dealing with high-dimensional pixel observations. Furthermore, incorporating state-wise safety constraints in the explorati… ▽ More

    Submitted 11 December, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures

  10. arXiv:2209.15090  [pdf, other

    eess.SY cs.LG

    Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

    Authors: Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, Qi Zhu

    Abstract: It is quite challenging to ensure the safety of reinforcement learning (RL) agents in an unknown and stochastic environment under hard constraints that require the system state not to reach certain specified unsafe regions. Many popular safe RL methods such as those based on the Constrained Markov Decision Process (CMDP) paradigm formulate safety violations in a cost function and try to constrain… ▽ More

    Submitted 13 June, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Accepted to ICML 2023