Skip to main content

Showing 1–11 of 11 results for author: Ariu, K

Searching in archive math. Search in all archives.
.
  1. arXiv:2509.22426  [pdf, ps, other

    cs.LG cs.GT cs.MA math.OC

    Learning from Delayed Feedback in Games via Extra Prediction

    Authors: Yuma Fujimoto, Kenshi Abe, Kaito Ariu

    Abstract: This study raises and addresses the problem of time-delayed feedback in learning in games. Because learning in games assumes that multiple agents independently learn their strategies, a discrepancy in optimization often emerges among the agents. To overcome this discrepancy, the prediction of the future reward is incorporated into algorithms, typically known as Optimistic Follow-the-Regularized-Le… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 11 pages, 3 figures (main); 9 pages (appendix)

  2. arXiv:2506.05747  [pdf, ps, other

    math.OC

    Asymmetric Perturbation in Solving Bilinear Saddle-Point Optimization

    Authors: Kenshi Abe, Mitsuki Sakamoto, Kaito Ariu, Atsushi Iwasaki

    Abstract: This paper proposes an asymmetric perturbation technique for solving saddle-point optimization problems, commonly arising in min-max problems, game theory, and constrained optimization. Perturbing payoffs or values are known to be effective in stabilizing learning dynamics and finding an exact solution or equilibrium. However, it requires careful adjustment of the perturbation magnitude; otherwise… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  3. arXiv:2505.15342  [pdf, ps, other

    stat.ML cs.LG math.ST

    Policy Testing in Markov Decision Processes

    Authors: Kaito Ariu, Po-An Wang, Alexandre Proutiere, Kenshi Abe

    Abstract: We study the policy testing problem in discounted Markov decision processes (MDPs) under the fixed-confidence setting. The goal is to determine whether the value of a given policy exceeds a specified threshold while minimizing the number of observations. We begin by deriving an instance-specific lower bound that any algorithm must satisfy. This lower bound is characterized as the solution to an op… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  4. arXiv:2410.12306  [pdf, other

    cs.GT cs.MA econ.TH math.DS

    Time-Varyingness in Auction Breaks Revenue Equivalence

    Authors: Yuma Fujimoto, Kaito Ariu, Kenshi Abe

    Abstract: Auction is one of the most representative buying-selling systems. A celebrated study shows that the seller's expected revenue is equal in equilibrium, regardless of the type of auction, typically first-price and second-price auctions. Here, however, we hypothesize that when some auction environments vary with time, this revenue equivalence may not be maintained. In second-price auctions, the equil… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 11 pages, 3 figures (main); 7 pages, 1 figure (appendix)

  5. arXiv:2408.10595  [pdf, other

    cs.GT cs.MA math.OC nlin.CD

    Synchronization in Learning in Periodic Zero-Sum Games Triggers Divergence from Nash Equilibrium

    Authors: Yuma Fujimoto, Kaito Ariu, Kenshi Abe

    Abstract: Learning in zero-sum games studies a situation where multiple agents competitively learn their strategy. In such multi-agent learning, we often see that the strategies cycle around their optimum, i.e., Nash equilibrium. When a game periodically varies (called a ``periodic'' game), however, the Nash equilibrium moves generically. How learning dynamics behave in such periodic games is of interest bu… ▽ More

    Submitted 5 March, 2025; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 figures (main); 9 pages, 3 figurea (appendix)

  6. arXiv:2405.14546  [pdf, other

    cs.GT cs.MA math.OC nlin.CD

    Global Behavior of Learning Dynamics in Zero-Sum Games with Memory Asymmetry

    Authors: Yuma Fujimoto, Kaito Ariu, Kenshi Abe

    Abstract: This study examines the global behavior of dynamics in learning in games between two players, X and Y. We consider the simplest situation for memory asymmetry between two players: X memorizes the other Y's previous action and uses reactive strategies, while Y has no memory. Although this memory complicates their learning dynamics, we characterize the global behavior of such complex dynamics by dis… ▽ More

    Submitted 4 March, 2025; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures (main); 5 pages (appendix)

  7. arXiv:2402.10825  [pdf, other

    cs.GT cs.MA math.OC nlin.CD

    Nash Equilibrium and Learning Dynamics in Three-Player Matching $m$-Action Games

    Authors: Yuma Fujimoto, Kaito Ariu, Kenshi Abe

    Abstract: Learning in games discusses the processes where multiple players learn their optimal strategies through the repetition of game plays. The dynamics of learning between two players in zero-sum games, such as Matching Pennies, where their benefits are competitive, have already been well analyzed. However, it is still unexplored and challenging to analyze the dynamics of learning among three players.… ▽ More

    Submitted 5 March, 2025; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 9 pages, 4 figures (main), 9 pages, 1 figure (appendix)

  8. arXiv:2305.13619  [pdf, other

    cs.GT cs.MA math.OC nlin.CD

    Memory Asymmetry Creates Heteroclinic Orbits to Nash Equilibrium in Learning in Zero-Sum Games

    Authors: Yuma Fujimoto, Kaito Ariu, Kenshi Abe

    Abstract: Learning in games considers how multiple agents maximize their own rewards through repeated games. Memory, an ability that an agent changes his/her action depending on the history of actions in previous games, is often introduced into learning to explore more clever strategies and discuss the decision-making of real agents like humans. However, such games with memory are hard to analyze because th… ▽ More

    Submitted 16 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 9 pages & 5 figures (main), 5 pages & 2 figures (appendix)

  9. arXiv:2302.01073  [pdf, other

    cs.GT cs.MA math.OC nlin.CD

    Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium

    Authors: Yuma Fujimoto, Kaito Ariu, Kenshi Abe

    Abstract: Repeated games consider a situation where multiple agents are motivated by their independent rewards throughout learning. In general, the dynamics of their learning become complex. Especially when their rewards compete with each other like zero-sum games, the dynamics often do not converge to their optimum, i.e., the Nash equilibrium. To tackle such complexity, many studies have understood various… ▽ More

    Submitted 22 May, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 8 pages & 4 figures (main), 6 pages & 1figure (appendix)

  10. arXiv:2201.04469  [pdf, other

    stat.ML cs.LG econ.EM math.ST

    Optimal Best Arm Identification in Two-Armed Bandits with a Fixed Budget under a Small Gap

    Authors: Masahiro Kato, Kaito Ariu, Masaaki Imaizumi, Masahiro Nomura, Chao Qin

    Abstract: We consider fixed-budget best-arm identification in two-armed Gaussian bandit problems. One of the longstanding open questions is the existence of an optimal strategy under which the probability of misidentification matches a lower bound. We show that a strategy following the Neyman allocation rule (Neyman, 1934) is asymptotically optimal when the gap between the expected rewards is small. First,… ▽ More

    Submitted 28 December, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

  11. arXiv:2106.14077  [pdf, other

    cs.LG econ.EM math.ST stat.ME stat.ML

    The Role of Contextual Information in Best Arm Identification

    Authors: Masahiro Kato, Kaito Ariu

    Abstract: We study the best-arm identification problem with fixed confidence when contextual (covariate) information is available in stochastic bandits. Although we can use contextual information in each round, we are interested in the marginalized mean reward over the contextual distribution. Our goal is to identify the best arm with a minimal number of samplings under a given value of the error rate. We s… ▽ More

    Submitted 26 February, 2024; v1 submitted 26 June, 2021; originally announced June 2021.