Learning mirror maps in policy mirror descent
Authors:
Carlo Alfano,
Sebastian Towers,
Silvia Sapora,
Chris Lu,
Patrick Rebeschini
Abstract:
Policy Mirror Descent (PMD) is a popular framework in reinforcement learning, serving as a unifying perspective that encompasses numerous algorithms. These algorithms are derived through the selection of a mirror map and enjoy finite-time convergence guarantees. Despite its popularity, the exploration of PMD's full potential is limited, with the majority of research focusing on a particular mirror…
▽ More
Policy Mirror Descent (PMD) is a popular framework in reinforcement learning, serving as a unifying perspective that encompasses numerous algorithms. These algorithms are derived through the selection of a mirror map and enjoy finite-time convergence guarantees. Despite its popularity, the exploration of PMD's full potential is limited, with the majority of research focusing on a particular mirror map -- namely, the negative entropy -- which gives rise to the renowned Natural Policy Gradient (NPG) method. It remains uncertain from existing theoretical studies whether the choice of mirror map significantly influences PMD's efficacy. In our work, we conduct empirical investigations to show that the conventional mirror map choice (NPG) often yields less-than-optimal outcomes across several standard benchmark environments. Using evolutionary strategies, we identify more efficient mirror maps that enhance the performance of PMD. We first focus on a tabular environment, i.e. Grid-World, where we relate existing theoretical bounds with the performance of PMD for a few standard mirror maps and the learned one. We then show that it is possible to learn a mirror map that outperforms the negative entropy in more complex environments, such as the MinAtar suite. Our results suggest that mirror maps generalize well across various environments, raising questions about how to best match a mirror map to an environment's structure and characteristics.
△ Less
Submitted 7 June, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
Advanced geometrical constructs in a Pueblo ceremonial site, c 1200 CE
Authors:
Sherry Towers
Abstract:
Summer 2015 marked the 100th anniversary of the excavation by J.W. Fewkes of the Sun Temple in Mesa Verde National Park, Colorado, an ancient complex prominently located atop a mesa, constructed by the ancestral Pueblo peoples approximately 800 years ago. While the D-shaped structure is generally recognized by modern Pueblo peoples as a ceremonial complex, the exact uses of the site are unknown, a…
▽ More
Summer 2015 marked the 100th anniversary of the excavation by J.W. Fewkes of the Sun Temple in Mesa Verde National Park, Colorado, an ancient complex prominently located atop a mesa, constructed by the ancestral Pueblo peoples approximately 800 years ago. While the D-shaped structure is generally recognized by modern Pueblo peoples as a ceremonial complex, the exact uses of the site are unknown, although the site has been shown to have key solar and lunar alignments. In this study, we examined the potential that the site was laid out using advanced knowledge of geometrical constructs. Using aerial imagery in conjunction with ground measurements, we performed a survey of key features of the site. We found apparent evidence that the ancestral Pueblo peoples laid out the site using the Golden rectangle, Pythagorean 3:4:5 triangles, equilateral triangles, and 45 degree right triangles. The survey also revealed that a single unit of measurement, L = 30.5+/-0.5 cm, or one third of that, appeared to be associated with many key features of the site. Further study is needed to determine if this unit of measurement is common to other ancestral Pueblo sites, and also if geometric constructs are apparent at other sites. These findings represent the first potential quantitative evidence of knowledge of advanced geometrical constructs in a prehistoric North American society, which is particularly remarkable given that the ancestral Pueblo peoples had no written language or number system.
△ Less
Submitted 15 January, 2017; v1 submitted 31 May, 2016;
originally announced May 2016.