Skip to main content

Showing 1–2 of 2 results for author: Kikkawa, N

Searching in archive stat. Search in all archives.
.
  1. arXiv:2411.00339  [pdf, other

    stat.ML cs.LG

    Unified theory of upper confidence bound policies for bandit problems targeting total reward, maximal reward, and more

    Authors: Nobuaki Kikkawa, Hiroshi Ohno

    Abstract: The upper confidence bound (UCB) policy is recognized as an order-optimal solution for the classical total-reward bandit problem. While similar UCB-based approaches have been applied to the max bandit problem, which aims to maximize the cumulative maximal reward, their order optimality remains unclear. In this study, we clarify the unified conditions under which the UCB policy achieves the order o… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

  2. arXiv:2212.08225  [pdf, other

    stat.ML cs.LG physics.chem-ph

    Materials Discovery using Max K-Armed Bandit

    Authors: Nobuaki Kikkawa, Hiroshi Ohno

    Abstract: Search algorithms for the bandit problems are applicable in materials discovery. However, the objectives of the conventional bandit problem are different from those of materials discovery. The conventional bandit problem aims to maximize the total rewards, whereas materials discovery aims to achieve breakthroughs in material properties. The max K-armed bandit (MKB) problem, which aims to acquire t… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Report number: 22-0186

    Journal ref: J.Mach.Learn.Res. 25(100) (2024) 1-40