Search | arXiv e-print repository

arXiv:1902.09467 [pdf, ps, other]

Reinforcement Learning to Minimize Age of Information with an Energy Harvesting Sensor with HARQ and Sensing Cost

Authors: Elif Tuğçe Ceran, Deniz Gündüz, András György

Abstract: The time average expected age of information (AoI) is studied for status updates sent from an energy-harvesting transmitter with a finite-capacity battery. The optimal scheduling policy is first studied under different feedback mechanisms when the channel and energy harvesting statistics are known. For the case of unknown environments, an average-cost reinforcement learning algorithm is proposed t… ▽ More The time average expected age of information (AoI) is studied for status updates sent from an energy-harvesting transmitter with a finite-capacity battery. The optimal scheduling policy is first studied under different feedback mechanisms when the channel and energy harvesting statistics are known. For the case of unknown environments, an average-cost reinforcement learning algorithm is proposed that learns the system parameters and the status update policy in real time. The effectiveness of the proposed methods is verified through numerical results. △ Less

Submitted 24 January, 2019; originally announced February 2019.

arXiv:1712.07084 [pdf, ps, other]

A Reinforcement-Learning Approach to Proactive Caching in Wireless Networks

Authors: Samuel O. Somuyiwa, Andras Gyorgy, Deniz Gunduz

Abstract: We consider a mobile user accessing contents in a dynamic environment, where new contents are generated over time (by the user's contacts), and remain relevant to the user for random lifetimes. The user, equipped with a finite-capacity cache memory, randomly accesses the system, and requests all the relevant contents at the time of access. The system incurs an energy cost associated with the numbe… ▽ More We consider a mobile user accessing contents in a dynamic environment, where new contents are generated over time (by the user's contacts), and remain relevant to the user for random lifetimes. The user, equipped with a finite-capacity cache memory, randomly accesses the system, and requests all the relevant contents at the time of access. The system incurs an energy cost associated with the number of contents downloaded and the channel quality at that time. Assuming causal knowledge of the channel quality, the content profile, and the user-access behavior, we model the proactive caching problem as a Markov decision process with the goal of minimizing the long-term average energy cost. We first prove the optimality of a threshold-based proactive caching scheme, which dynamically caches or removes appropriate contents from the memory, prior to being requested by the user, depending on the channel state. The optimal threshold values depend on the system state, and hence, are computationally intractable. Therefore, we propose parametric representations for the threshold values, and use reinforcement-learning algorithms to find near-optimal parametrizations. We demonstrate through simulations that the proposed schemes significantly outperform classical reactive downloading, and perform very close to a genie-aided lower bound. △ Less

Submitted 19 December, 2017; originally announced December 2017.

arXiv:1609.06331 [pdf, ps, other]

Max-affine estimators for convex stochastic programming

Authors: Gábor Balázs, András György, Csaba Szepesvári

Abstract: In this paper, we consider two sequential decision making problems with a convexity structure, namely an energy storage optimization task and a multi-product assembly example. We formulate these problems in the stochastic programming framework and discuss an approximate dynamic programming technique for their solutions. As the cost-to-go functions are convex in these cases, we use max-affine estim… ▽ More In this paper, we consider two sequential decision making problems with a convexity structure, namely an energy storage optimization task and a multi-product assembly example. We formulate these problems in the stochastic programming framework and discuss an approximate dynamic programming technique for their solutions. As the cost-to-go functions are convex in these cases, we use max-affine estimates for their approximations. To train such a max-affine estimate, we provide a new convex regression algorithm, and evaluate it empirically for these planning scenarios. △ Less

Submitted 18 September, 2016; originally announced September 2016.

Showing 1–3 of 3 results for author: György, A