Skip to main content

Showing 1–50 of 66 results for author: Proutiere, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.01324  [pdf, ps, other

    stat.ML cs.IT cs.LG math.PR

    Near-Optimal Clustering in Mixture of Markov Chains

    Authors: Junghyun Lee, Yassir Jedra, Alexandre Proutière, Se-Young Yun

    Abstract: We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$. The goal is to accurately group trajectories according to their underlying generative model. We begin by deriving an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 36 pages

  2. arXiv:2505.15342  [pdf, ps, other

    stat.ML cs.LG math.ST

    Policy Testing in Markov Decision Processes

    Authors: Kaito Ariu, Po-An Wang, Alexandre Proutiere, Kenshi Abe

    Abstract: We study the policy testing problem in discounted Markov decision processes (MDPs) under the fixed-confidence setting. The goal is to determine whether the value of a given policy exceeds a specified threshold while minimizing the number of observations. We begin by deriving an instance-specific lower bound that any algorithm must satisfy. This lower bound is characterized as the solution to an op… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  3. arXiv:2410.23434  [pdf, ps, other

    cs.LG

    Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation

    Authors: Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere

    Abstract: We consider the problem of learning an $\varepsilon$-optimal policy in controlled dynamical systems with low-rank latent structure. For this problem, we present LoRa-PI (Low-Rank Policy Iteration), a model-free learning algorithm alternating between policy improvement and policy evaluation steps. In the latter, the algorithm estimates the low-rank matrix corresponding to the (state, action) value… ▽ More

    Submitted 10 November, 2024; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: Accepted for presentation at the Conference on Neural Information Processing Systems (NeurIPS) 2024

  4. arXiv:2407.15277  [pdf, other

    cs.LG math.ST stat.ML

    Conformal Predictions under Markovian Data

    Authors: Frédéric Zheng, Alexandre Proutiere

    Abstract: We study the split Conformal Prediction method when applied to Markovian data. We quantify the gap in terms of coverage induced by the correlations in the data (compared to exchangeable data). This gap strongly depends on the mixing properties of the underlying Markov chain, and we prove that it typically scales as $\sqrt{t_\mathrm{mix}\ln(n)/n}$ (where $t_\mathrm{mix}$ is the mixing time of the c… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  5. arXiv:2407.00801  [pdf, other

    cs.LG

    Model-Free Active Exploration in Reinforcement Learning

    Authors: Alessio Russo, Alexandre Proutiere

    Abstract: We study the problem of exploration in Reinforcement Learning and present a novel model-free solution. We adopt an information-theoretical viewpoint and start from the instance-specific lower bound of the number of samples that have to be collected to identify a nearly-optimal policy. Deriving this lower bound along with the optimal exploration strategy entails solving an intricate optimization pr… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Journal ref: Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

  6. arXiv:2402.15739  [pdf, other

    cs.LG stat.ML

    Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery

    Authors: Yassir Jedra, William Réveillard, Stefan Stojanovic, Alexandre Proutiere

    Abstract: We study contextual bandits with low-rank structure where, in each round, if the (context, arm) pair $(i,j)\in [m]\times [n]$ is selected, the learner observes a noisy sample of the $(i,j)$-th entry of an unknown low-rank reward matrix. Successive contexts are generated randomly in an i.i.d. manner and are revealed to the learner. For such bandits, we present efficient algorithms for policy evalua… ▽ More

    Submitted 4 July, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  7. arXiv:2312.12137  [pdf, other

    cs.LG stat.ML

    Best Arm Identification with Fixed Budget: A Large Deviation Perspective

    Authors: Po-An Wang, Ruo-Chun Tzeng, Alexandre Proutiere

    Abstract: We consider the problem of identifying the best arm in stochastic Multi-Armed Bandits (MABs) using a fixed sampling budget. Characterizing the minimal instance-specific error probability for this problem constitutes one of the important remaining open problems in MABs. When arms are selected using a static sampling strategy, the error probability decays exponentially with the number of samples at… ▽ More

    Submitted 19 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: We made small for implementing SH algorithm, now it has been corrected

  8. arXiv:2310.06793  [pdf, ps, other

    cs.LG stat.ML

    Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning

    Authors: Stefan Stojanovic, Yassir Jedra, Alexandre Proutiere

    Abstract: We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure. In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP. In both cases, each entry of the matrix carries important information, and we seek estimation metho… ▽ More

    Submitted 27 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: To appear in NeurIPS 2023

  9. arXiv:2310.04842  [pdf, other

    eess.SY cs.AI

    Sub-linear Regret in Adaptive Model Predictive Control

    Authors: Damianos Tranos, Alexandre Proutiere

    Abstract: We consider the problem of adaptive Model Predictive Control (MPC) for uncertain linear-systems with additive disturbances and with state and input constraints. We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online algorithm that combines the certainty-equivalence principle and polytopic tubes. Specifically, at any given step, STT-MPC infers the system dynamics using the… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  10. arXiv:2308.12000  [pdf, other

    stat.ML cs.LG

    On Universally Optimal Algorithms for A/B Testing

    Authors: Po-An Wang, Kaito Ariu, Alexandre Proutiere

    Abstract: We study the problem of best-arm identification with fixed budget in stochastic multi-armed bandits with Bernoulli rewards. For the problem with two arms, also known as the A/B testing problem, we prove that there is no algorithm that (i) performs as well as the algorithm sampling each arm equally (referred to as the {\it uniform sampling} algorithm) in all instances, and that (ii) strictly outper… ▽ More

    Submitted 4 June, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted at ICML 2024

  11. arXiv:2306.12968  [pdf, other

    cs.SI cs.LG stat.ML

    Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model

    Authors: Kaito Ariu, Alexandre Proutiere, Se-Young Yun

    Abstract: In this paper, we investigate the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters whose sizes grow linearly with the total number of nodes. We derive the necessary and sufficient conditions under which the expected number of misclassified nodes is less than $ s $, for any number $ s = o(n) $. To achieve this, we propose IAC (In… ▽ More

    Submitted 2 February, 2025; v1 submitted 18 June, 2023; originally announced June 2023.

  12. arXiv:2304.02574  [pdf, other

    cs.LG cs.AI eess.SY

    Conformal Off-Policy Evaluation in Markov Decision Processes

    Authors: Daniele Foffano, Alessio Russo, Alexandre Proutiere

    Abstract: Reinforcement Learning aims at identifying and evaluating efficient control policies from data. In many real-world applications, the learner is not allowed to experiment and cannot gather data in an online manner (this is the case when experimenting is expensive, risky or unethical). For such applications, the reward of a given policy (the target policy) must be estimated using historical data gat… ▽ More

    Submitted 19 September, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

    Journal ref: 2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023

  13. arXiv:2211.15129  [pdf, other

    stat.ML cs.AI cs.LG

    On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure

    Authors: Alessio Russo, Alexandre Proutiere

    Abstract: We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor). The objective is to learn the optimal (representation, predictor)-pair for each task, under the assumption that the optimal representation is common to a… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI23)

  14. arXiv:2208.08480  [pdf, other

    cs.LG math.ST stat.ML

    Nearly Optimal Latent State Decoding in Block MDPs

    Authors: Yassir Jedra, Junghyun Lee, Alexandre Proutière, Se-Young Yun

    Abstract: We investigate the problems of model estimation and reward-free learning in episodic Block MDPs. In these MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are first interested in estimating the latent state decoding function (the mapping from the observations to latent states) based on data generated under a fixed behavior poli… ▽ More

    Submitted 24 February, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: Y. Jedra and J. Lee contributed equally; 100 pages, 3 figures; Accepted to the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023)

  15. arXiv:2208.05633  [pdf, ps, other

    cs.LG stat.ML

    Best Policy Identification in Linear MDPs

    Authors: Jerome Taupin, Yassir Jedra, Alexandre Proutiere

    Abstract: We investigate the problem of best policy identification in discounted linear Markov Decision Processes in the fixed confidence setting under a generative model. We first derive an instance-specific lower bound on the expected number of samples required to identify an $\varepsilon$-optimal policy with probability $1-δ$. The lower bound characterizes the optimal sampling rule as the solution of an… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

  16. arXiv:2204.06910  [pdf, other

    cs.NI cs.LG

    Measurement-based Admission Control in Sliced Networks: A Best Arm Identification Approach

    Authors: Simon Lindståhl, Alexandre Proutiere, Andreas Johnsson

    Abstract: In sliced networks, the shared tenancy of slices requires adaptive admission control of data flows, based on measurements of network resources. In this paper, we investigate the design of measurement-based admission control schemes, deciding whether a new data flow can be admitted and in this case, on which slice. The objective is to devise a joint measurement and decision strategy that returns a… ▽ More

    Submitted 9 August, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

  17. arXiv:2201.02169  [pdf, other

    cs.LG

    Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

    Authors: Filippo Vannella, Alexandre Proutiere, Yassir Jedra, Jaeseong Jeong

    Abstract: Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. In this paper, we devise algorithms learning optimal tilt control policies from existing data (in the so-called passive learning setting) or from data actively generated by the algorithms (the active learning setting). We formalize the design of such algorithms as a B… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

  18. arXiv:2109.14429  [pdf, ps, other

    cs.LG eess.SY math.OC stat.ML

    Minimal Expected Regret in Linear Quadratic Control

    Authors: Yassir Jedra, Alexandre Proutiere

    Abstract: We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices $A$ and $B$ may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time $T$ is upper bounded (i) by $\widetilde{O}((d_u+d_x)\sqrt{d_xT})$ when $A$ and $B$ are unknown, (ii) by… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

  19. arXiv:2109.07171  [pdf, other

    eess.SY cs.CR cs.LG

    Balancing detectability and performance of attacks on the control channel of Markov Decision Processes

    Authors: Alessio Russo, Alexandre Proutiere

    Abstract: We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs). This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods. The policies resulting from these methods have been shown to be vulnerable to attacks perturb… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

  20. Online Learning of Optimally Diverse Rankings

    Authors: Stefan Magureanu, Alexandre Proutiere, Marcus Isaksson, Boxun Zhang

    Abstract: Search engines answer users' queries by listing relevant items (e.g. documents, songs, products, web pages, ...). These engines rely on algorithms that learn to rank items so as to present an ordered list maximizing the probability that it contains relevant item. The main challenge in the design of learning-to-rank algorithms stems from the fact that queries often have different meanings for diffe… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: 26 pages, 4 Figures, accepted in ACM SIGMETRICS 2018

    Journal ref: Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 1, Issue 2, December 2017, Article No 32

  21. arXiv:2106.14338  [pdf, ps, other

    cs.LG stat.ML

    Regret Analysis in Deterministic Reinforcement Learning

    Authors: Damianos Tranos, Alexandre Proutiere

    Abstract: We consider Markov Decision Processes (MDPs) with deterministic transitions and study the problem of regret minimization, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter (in contrast to previous minimax approaches) and thus, truly quantify the fundamental limit of… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  22. arXiv:2106.02847  [pdf, other

    stat.ML cs.LG

    Navigating to the Best Policy in Markov Decision Processes

    Authors: Aymen Al Marjani, Aurélien Garivier, Alexandre Proutiere

    Abstract: We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible. We propose a problem-dependent lower bound on the average number of steps required before a correct answer can be given with probability at least $1-δ$. We further… ▽ More

    Submitted 25 October, 2021; v1 submitted 5 June, 2021; originally announced June 2021.

  23. arXiv:2010.12363  [pdf, other

    stat.ML cs.IR cs.LG

    Regret in Online Recommendation Systems

    Authors: Kaito Ariu, Narae Ryu, Se-Young Yun, Alexandre Proutière

    Abstract: This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items. Importantly, an item cannot be recommended twice to the same u… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: Advances in Neural Information Processing Systems (NeurIPS 2020)

  24. arXiv:2010.11994  [pdf, other

    stat.ML cs.LG

    Thresholded Lasso Bandit

    Authors: Kaito Ariu, Kenshi Abe, Alexandre Proutière

    Abstract: In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only. We present Thresholded Lasso bandit, an algorithm that (i) estimates the vector defining the reward function as well as its sparse support, i.e., signifi… ▽ More

    Submitted 19 June, 2022; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: International Conference on Machine Learning (ICML 2022), Proceedings of Machine Learning Research

  25. arXiv:2009.13405  [pdf, other

    stat.ML cs.LG

    Adaptive Sampling for Best Policy Identification in Markov Decision Processes

    Authors: Aymen Al Marjani, Alexandre Proutiere

    Abstract: We investigate the problem of best-policy identification in discounted Markov Decision Processes (MDPs) when the learner has access to a generative model. The objective is to devise a learning algorithm returning the best policy as early as possible. We first derive a problem-specific lower bound of the sample complexity satisfied by any learning algorithm. This lower bound corresponds to an optim… ▽ More

    Submitted 10 May, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: 43 pages

  26. arXiv:2006.16073  [pdf, ps, other

    stat.ML cs.LG

    Optimal Best-arm Identification in Linear Bandits

    Authors: Yassir Jedra, Alexandre Proutiere

    Abstract: We study the problem of best-arm identification with fixed confidence in stochastic linear bandits. The objective is to identify the best arm with a given level of certainty while minimizing the sampling budget. We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds, asymptotically almost surely and in expectation. The algorithm relies on an arm samplin… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

  27. arXiv:2005.10577  [pdf, other

    cs.LG stat.ML

    Off-policy Learning for Remote Electrical Tilt Optimization

    Authors: Filippo Vannella, Jaeseong Jeong, Alexandre Proutiere

    Abstract: We address the problem of Remote Electrical Tilt (RET) optimization using off-policy Contextual Multi-Armed-Bandit (CMAB) techniques. The goal in RET optimization is to control the orientation of the vertical tilt angle of the antenna to optimize Key Performance Indicators (KPIs) representing the Quality of Service (QoS) perceived by the users in cellular networks. Learning an improved tilt update… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

  28. arXiv:2004.01141  [pdf, ps, other

    cs.LG stat.ML

    Predictive Bandits

    Authors: Simon Lindståhl, Alexandre Proutiere, Andreas Johnsson

    Abstract: We introduce and study a new class of stochastic bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms (so that their rewards in this round can be predicted). These measurements are costly, and may be corrupted by noise. The decision maker then selects an arm to be actually played in th… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Comments: 10 pages, 4 figures, conference

  29. arXiv:2003.07937  [pdf, ps, other

    math.ST cs.LG eess.SY stat.ML

    Finite-time Identification of Stable Linear Systems: Optimality of the Least-Squares Estimator

    Authors: Yassir Jedra, Alexandre Proutiere

    Abstract: We present a new finite-time analysis of the estimation error of the Ordinary Least Squares (OLS) estimator for stable linear time-invariant systems. We characterize the number of observed samples (the length of the observed trajectory) sufficient for the OLS estimator to be $(\varepsilon,δ)$-PAC, i.e., to yield an estimation error less than $\varepsilon$ with probability at least $1-δ$. We show t… ▽ More

    Submitted 26 March, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

  30. arXiv:1912.09705  [pdf, other

    cs.LG math.OC stat.ML

    Distributed Online Optimization with Long-Term Constraints

    Authors: Deming Yuan, Alexandre Proutiere, Guodong Shi

    Abstract: We consider distributed online convex optimization problems, where the distributed system consists of various computing units connected through a time-varying communication graph. In each time step, each computing unit selects a constrained vector, experiences a loss equal to an arbitrary convex function evaluated at this vector, and may communicate to its neighbors in the graph. The objective is… ▽ More

    Submitted 20 December, 2019; originally announced December 2019.

  31. Optimal Clustering from Noisy Binary Feedback

    Authors: Kaito Ariu, Jungseul Ok, Alexandre Proutiere, Se-Young Yun

    Abstract: We study the problem of clustering a set of items from binary user feedback. Such a problem arises in crowdsourcing platforms solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent reCAPTCHA systems, users clicks (binary answers) can be used to efficiently label images. In our inference problem, items are grouped into initially unknown non-overl… ▽ More

    Submitted 5 February, 2024; v1 submitted 14 October, 2019; originally announced October 2019.

  32. arXiv:1909.13079  [pdf, ps, other

    cs.LG stat.ML

    An Optimal Algorithm for Multiplayer Multi-Armed Bandits

    Authors: Alexandre Proutiere, Po-An Wang

    Abstract: The paper addresses the Multiplayer Multi-Armed Bandit (MMAB) problem, where $M$ decision makers or players collaborate to maximize their cumulative reward. When several players select the same arm, a collision occurs and no reward is collected on this arm. Players involved in a collision are informed about this collision. We present DPE (Decentralized Parsimonious Exploration), a decentralized al… ▽ More

    Submitted 26 October, 2019; v1 submitted 28 September, 2019; originally announced September 2019.

    Comments: 14 pages

  33. arXiv:1907.13548  [pdf, other

    cs.LG cs.CR stat.ML

    Optimal Attacks on Reinforcement Learning Policies

    Authors: Alessio Russo, Alexandre Proutiere

    Abstract: Control policies, trained using the Deep Reinforcement Learning, have been recently shown to be vulnerable to adversarial attacks introducing even very small perturbations to the policy input. The attacks proposed so far have been designed using heuristics, and build on existing adversarial example crafting techniques used to dupe classifiers in supervised learning. In contrast, this paper investi… ▽ More

    Submitted 31 July, 2019; originally announced July 2019.

  34. arXiv:1906.11392  [pdf, other

    math.OC cs.LG stat.ML

    From self-tuning regulators to reinforcement learning and back again

    Authors: Nikolai Matni, Alexandre Proutiere, Anders Rantzer, Stephen Tu

    Abstract: Machine and reinforcement learning (RL) are increasingly being applied to plan and control the behavior of autonomous systems interacting with the physical world. Examples include self-driving vehicles, distributed sensor networks, and agile robots. However, when machine learning is to be applied in these new settings, the algorithms had better come with the same type of reliability, robustness, a… ▽ More

    Submitted 22 September, 2019; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: Tutorial paper, 2019 IEEE Conference on Decision and Control, to appear

  35. arXiv:1903.10343  [pdf, ps, other

    eess.SY cs.LG

    Sample Complexity Lower Bounds for Linear System Identification

    Authors: Yassir Jedra, Alexandre Proutiere

    Abstract: This paper establishes problem-specific sample complexity lower bounds for linear system identification problems. The sample complexity is defined in the PAC framework: it corresponds to the time it takes to identify the system parameters with prescribed accuracy and confidence levels. By problem-specific, we mean that the lower bound explicitly depends on the system to be identified (which contra… ▽ More

    Submitted 25 March, 2019; originally announced March 2019.

  36. arXiv:1902.04774  [pdf, ps, other

    cs.LG cs.DC math.OC stat.ML

    Distributed Online Linear Regression

    Authors: Deming Yuan, Alexandre Proutiere, Guodong Shi

    Abstract: We study online linear regression problems in a distributed setting, where the data is spread over a network. In each round, each network node proposes a linear predictor, with the objective of fitting the \emph{network-wide} data. It then updates its predictor for the next round according to the received local feedback and information received from neighboring nodes. The predictions made at a giv… ▽ More

    Submitted 13 February, 2019; originally announced February 2019.

  37. arXiv:1807.00664  [pdf, other

    cs.CV

    Learning to Personalize in Appearance-Based Gaze Tracking

    Authors: Erik Lindén, Jonas Sjöstrand, Alexandre Proutiere

    Abstract: Personal variations severely limit the performance of appearance-based gaze tracking. Adapting to these variations using standard neural network model adaptation methods is difficult. The problems range from overfitting, due to small amounts of training data, to underfitting, due to restrictive model architectures. We tackle these problems by introducing the SPatial Adaptive GaZe Estimator (SPAZE)… ▽ More

    Submitted 2 September, 2019; v1 submitted 2 July, 2018; originally announced July 2018.

  38. arXiv:1806.00775  [pdf, other

    cs.LG stat.ML

    Exploration in Structured Reinforcement Learning

    Authors: Jungseul Ok, Alexandre Proutiere, Damianos Tranos

    Abstract: We address reinforcement learning problems with finite state and action spaces where the underlying MDP has some known structure that could be potentially exploited to minimize the exploration rates of suboptimal (state, action) pairs. For any arbitrary structure, we derive problem-specific regret lower bounds satisfied by any learning algorithm. These lower bounds are made explicit for unstructur… ▽ More

    Submitted 29 November, 2018; v1 submitted 3 June, 2018; originally announced June 2018.

  39. arXiv:1712.09232  [pdf, other

    math.PR cs.IT math.ST

    Clustering in Block Markov Chains

    Authors: Jaron Sanders, Alexandre Proutière, Se-Young Yun

    Abstract: This paper considers cluster detection in Block Markov Chains (BMCs). These Markov chains are characterized by a block structure in their transition matrix. More precisely, the $n$ possible states are divided into a finite number of $K$ groups or clusters, such that states in the same cluster exhibit the same transition rates to other states. One observes a trajectory of the Markov chain, and the… ▽ More

    Submitted 29 July, 2019; v1 submitted 26 December, 2017; originally announced December 2017.

    Comments: 73 pages, 18 plots, second revision

  40. arXiv:1711.00400  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    Minimal Exploration in Structured Stochastic Bandits

    Authors: Richard Combes, Stefan Magureanu, Alexandre Proutiere

    Abstract: This paper introduces and addresses a wide class of stochastic bandit problems where the function mapping the arm to the corresponding reward exhibits some known structural properties. Most existing structures (e.g. linear, Lipschitz, unimodal, combinatorial, dueling, ...) are covered by our framework. We derive an asymptotic instance-specific regret lower bound for these problems, and develop OSS… ▽ More

    Submitted 1 November, 2017; originally announced November 2017.

    Comments: 13 pages, NIPS 2017

  41. arXiv:1704.05986  [pdf, ps, other

    cs.GT

    Strategic Arrivals to Queues Offering Priority Service

    Authors: Rajat Talak, D. Manjunath, Alexandre Proutiere

    Abstract: We consider strategic arrivals to a FCFS service system that starts service at a fixed time and has to serve a fixed number of customers, e.g., an airplane boarding system. Arriving early induces a higher waiting cost (waiting before service begins) while arriving late induces a cost because earlier arrivals take the better seats. We first consider arrivals of heterogeneous customers that choose a… ▽ More

    Submitted 11 August, 2018; v1 submitted 19 April, 2017; originally announced April 2017.

  42. arXiv:1510.05956  [pdf, ps, other

    math.PR cs.LG cs.SI stat.ML

    Optimal Cluster Recovery in the Labeled Stochastic Block Model

    Authors: Se-Young Yun, Alexandre Proutiere

    Abstract: We consider the problem of community detection or clustering in the labeled Stochastic Block Model (LSBM) with a finite number $K$ of clusters of sizes linearly growing with the global population of items $n$. Every pair of items is labeled independently at random, and label $\ell$ appears with probability $p(i,j,\ell)$ between two items in clusters indexed by $i$ and $j$, respectively. The object… ▽ More

    Submitted 21 May, 2016; v1 submitted 20 October, 2015; originally announced October 2015.

    Comments: arXiv admin note: text overlap with arXiv:1412.7335

  43. arXiv:1507.03323  [pdf, other

    cs.SI

    Boolean Gossip Networks

    Authors: Bo Li, Junfeng Wu, Hongsheng Qi, Alexandre Proutiere, Guodong Shi

    Abstract: This paper proposes and investigates a Boolean gossip model as a simplified but non-trivial probabilistic Boolean network. With positive node interactions, in view of standard theories from Markov chains, we prove that the node states asymptotically converge to an agreement at a binary random variable, whose distribution is characterized for large-scale networks by mean-field approximation. Using… ▽ More

    Submitted 21 May, 2017; v1 submitted 13 July, 2015; originally announced July 2015.

  44. arXiv:1507.03292  [pdf, ps, other

    cs.LG

    Cluster-Aided Mobility Predictions

    Authors: Jaeseong Jeong, Mathieu Leconte, Alexandre Proutiere

    Abstract: Predicting the future location of users in wireless net- works has numerous applications, and can help service providers to improve the quality of service perceived by their clients. The location predictors proposed so far estimate the next location of a specific user by inspecting the past individual trajectories of this user. As a consequence, when the training data collected for a given user is… ▽ More

    Submitted 21 January, 2016; v1 submitted 12 July, 2015; originally announced July 2015.

  45. arXiv:1502.03475  [pdf, other

    cs.LG math.OC stat.ML

    Combinatorial Bandits Revisited

    Authors: Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, Marc Lelarge

    Abstract: This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ES… ▽ More

    Submitted 5 November, 2015; v1 submitted 11 February, 2015; originally announced February 2015.

    Comments: 30 pages, Advances in Neural Information Processing Systems 28 (NIPS 2015)

  46. arXiv:1412.7335  [pdf, ps, other

    cs.SI cs.DS

    Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms

    Authors: Se-Young Yun, Alexandre Proutiere

    Abstract: We consider the problem of community detection in the Stochastic Block Model with a finite number $K$ of communities of sizes linearly growing with the network size $n$. This model consists in a random graph such that each pair of vertices is connected independently with probability $p$ within communities and $q$ across communities. One observes a realization of this random graph, and the objectiv… ▽ More

    Submitted 23 December, 2014; originally announced December 2014.

  47. arXiv:1412.1990  [pdf, other

    cs.SI

    Emergent Behaviors over Signed Random Dynamical Networks: Relative-State-Flipping Model

    Authors: Guodong Shi, Alexandre Proutiere, Mikael Johansson, John. S. Baras, Karl H. Johansson

    Abstract: We study asymptotic dynamical patterns that emerge among a set of nodes interacting in a dynamically evolving signed random network, where positive links carry out standard consensus and negative links induce relative-state flipping. A sequence of deterministic signed graphs define potential node interactions that take place independently. Each node receives a positive recommendation consistent wi… ▽ More

    Submitted 5 December, 2014; originally announced December 2014.

    Comments: arXiv admin note: substantial text overlap with arXiv:1309.5488

  48. arXiv:1411.1279  [pdf, ps, other

    cs.SI cs.DS

    Streaming, Memory Limited Algorithms for Community Detection

    Authors: Se-Young Yun, Marc Lelarge, Alexandre Proutiere

    Abstract: In this paper, we consider sparse networks consisting of a finite number of non-overlapping communities, i.e. disjoint clusters, so that there is higher density within clusters than across clusters. Both the intra- and inter-cluster edge densities vanish when the size of the graph grows large, making the cluster reconstruction problem nosier and hence difficult to solve. We are interested in scena… ▽ More

    Submitted 3 November, 2014; originally announced November 2014.

    Comments: NIPS 2014

  49. Emergent Behaviors over Signed Random Dynamical Networks: State-Flipping Model

    Authors: Guodong Shi, Alexandre Proutiere, Mikael Johansson, John S. Baras, Karl H. Johansson

    Abstract: Recent studies from social, biological, and engineering network systems have drawn attention to the dynamics over signed networks, where each link is associated with a positive/negative sign indicating trustful/mistrustful, activator/inhibitor, or secure/malicious interactions. We study asymptotic dynamical patterns that emerge among a set of nodes that interact in a dynamically evolving signed ra… ▽ More

    Submitted 1 November, 2014; originally announced November 2014.

    Comments: IEEE Transactions on Control of Network Systems, in press. arXiv admin note: substantial text overlap with arXiv:1309.5488

  50. arXiv:1406.7447  [pdf, other

    cs.LG

    Unimodal Bandits without Smoothness

    Authors: Richard Combes, Alexandre Proutiere

    Abstract: We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. No further assumption is made regarding the smoothness and the structure of the expected reward function. For these problems, we propose the Stochastic Pentachotomy (SP) algorithm, and derive finite-time upper bounds on its regret and optimization err… ▽ More

    Submitted 6 March, 2015; v1 submitted 28 June, 2014; originally announced June 2014.

    Comments: 25 pages