-
Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems
Authors:
Kesav Kaza,
Rahul Meshram,
Varun Mehta,
S. N. Merchant
Abstract:
This paper studies a class of constrained restless multi-armed bandits (CRMAB). The constraints are in the form of time varying set of actions (set of available arms). This variation can be either stochastic or semi-deterministic. Given a set of arms, a fixed number of them can be chosen to be played in each decision interval. The play of each arm yields a state dependent reward. The current state…
▽ More
This paper studies a class of constrained restless multi-armed bandits (CRMAB). The constraints are in the form of time varying set of actions (set of available arms). This variation can be either stochastic or semi-deterministic. Given a set of arms, a fixed number of them can be chosen to be played in each decision interval. The play of each arm yields a state dependent reward. The current states of arms are partially observable through binary feedback signals from arms that are played. The current availability of arms is fully observable. The objective is to maximize long term cumulative reward. The uncertainty about future availability of arms along with partial state information makes this objective challenging. Applications for CRMAB can be found in resource allocation in cyber-physical systems involving components with time varying availability.
First, this optimization problem is analyzed using Whittle's index policy. To this end, a constrained restless single-armed bandit is studied. It is shown to admit a threshold-type optimal policy and is also indexable. An algorithm to compute Whittle's index is presented. An alternate solution method with lower complexity is also presented in the form of an online rollout policy. A detailed discussion on the complexity of both these schemes is also presented, which suggests that online rollout policy with short look ahead is simpler to implement than Whittle's index computation. Further, upper bounds on the value function are derived in order to estimate the degree of sub-optimality of various solutions. The simulation study compares the performance of Whittle's index, online rollout, myopic and modified Whittle's index policies.
△ Less
Submitted 6 September, 2021; v1 submitted 18 April, 2019;
originally announced April 2019.
-
Sequential Decision Making with Limited Observation Capability: Application to Wireless Networks
Authors:
Kesav Kaza,
Rahul Meshram,
Varun Mehta,
S. N. Merchant
Abstract:
This work studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional instantaneous feedback. We call them lazy restless bandits (LRB) as the events of decision-making are sparser than events of state transition. Hence, feedback after each decision event is the cumulative effect of the following state transition event…
▽ More
This work studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional instantaneous feedback. We call them lazy restless bandits (LRB) as the events of decision-making are sparser than events of state transition. Hence, feedback after each decision event is the cumulative effect of the following state transition events. The states of arms are hidden from the decision-maker and rewards for actions are state dependent. The decision-maker needs to choose one arm in each decision interval, such that long term cumulative reward is maximized.
As the states are hidden, the decision-maker maintains and updates its belief about them. It is shown that LRBs admit an optimal policy which has threshold structure in belief space. The Whittle-index policy for solving LRB problem is analyzed; indexability of LRBs is shown. Further, closed-form index expressions are provided for two sets of special cases; for more general cases, an algorithm for index computation is provided. An extensive simulation study is presented; Whittle-index, modified Whittle-index and myopic policies are compared. Lagrangian relaxation of the problem provides an upper bound on the optimal value function; it is used to assess the degree of sub-optimality various policies.
△ Less
Submitted 29 January, 2019; v1 submitted 4 January, 2018;
originally announced January 2018.
-
Symbiotic Cognitive Relaying with mobile Secondary nodes in Cognitive Radio Networks
Authors:
Prakash Chaki,
Gouri Nawathe,
Aaqib Patel,
S. N. Merchant,
U. B. Desai
Abstract:
In a Symbiotic Cognitive Relaying (SCR) scenario, the Secondary users(SU) nodes can act as multihop relays to assist the communication between Primary User(PU) nodes in the case of a weak direct link. In return, the SU nodes are incentivised with the right to carry out SU-SU communication using licensed PU band for a fixed amount of time, referred to as the 'Time Incentive'. Existing work on SCR i…
▽ More
In a Symbiotic Cognitive Relaying (SCR) scenario, the Secondary users(SU) nodes can act as multihop relays to assist the communication between Primary User(PU) nodes in the case of a weak direct link. In return, the SU nodes are incentivised with the right to carry out SU-SU communication using licensed PU band for a fixed amount of time, referred to as the 'Time Incentive'. Existing work on SCR is constrained to a fixed ad-hoc SU network. In this paper, we introduce mobility in SCR by considering mobile SU nodes while keeping the PU nodes fixed. This paper uses a specific mobility pattern and routing strategy for the SU nodes to propose theoretical bounds on the throughput and delay of PU-PU transmission. We derive analytically the least throughput and maximum delay possible in our model.
△ Less
Submitted 18 December, 2012;
originally announced December 2012.
-
Capacity and Spectral Efficiency of Interference Avoiding Cognitive Radio with Imperfect Detection
Authors:
Aaqib Patel,
Md. Zafar Ali Khan,
S. N. Merchant,
U. B. Desai
Abstract:
In this paper, we consider a model in which the unlicensed or the Secondary User (SU) equipped with a Cognitive Radio (CR) (together referred to as CR) interweaves its transmission with that of the licensed or the Primary User (PU). In this model, when the CR detects the PU to be (i) busy it does not transmit and; (ii) PU to be idle it transmits. Two situations based on CR's detection of PU are co…
▽ More
In this paper, we consider a model in which the unlicensed or the Secondary User (SU) equipped with a Cognitive Radio (CR) (together referred to as CR) interweaves its transmission with that of the licensed or the Primary User (PU). In this model, when the CR detects the PU to be (i) busy it does not transmit and; (ii) PU to be idle it transmits. Two situations based on CR's detection of PU are considered, where the CR detects PU (i) perfectly - referred to as the "ideal case" and; (ii) imperfectly - referred to as "non ideal case". For both the cases we bring out the rate region, sum capacity of PU and CR and spectral efficiency factor - the ratio of sum capacity of PU and CR to the capacity of PU without CR. We consider the Rayleigh fading channel to provide insight to our results. For the ideal case we study the effect of PU occupancy on spectral efficiency factor. For the non ideal case, in addition to the effect of occupancy, we study the effect of false alarm and missed detection on the rate region and spectral efficiency factor. We characterize the set of values of false alarm and missed detection probabilities for which the system benefits, in the form of admissible regions. We show that false alarm has a more profound effect on the spectral efficiency factor than missed detection. We also show that when PU occupancy is small, the effects of both false alarm and missed detection decrease. Finally, for the standard detection techniques viz. energy detection, matched filter and magnitude squared coherence, we show that that the matched filter performs best followed by magnitude squared coherence followed by energy detection with respect to spectral efficiency factor.
△ Less
Submitted 15 May, 2012;
originally announced May 2012.