Skip to main content

Showing 1–8 of 8 results for author: Guo, Z D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.16667  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Representation Learning via Non-Contrastive Mutual Information

    Authors: Zhaohan Daniel Guo, Bernardo Avila Pires, Khimya Khetarpal, Dale Schuurmans, Bo Dai

    Abstract: Labeling data is often very time consuming and expensive, leaving us with a majority of unlabeled data. Self-supervised representation learning methods such as SimCLR (Chen et al., 2020) or BYOL (Grill et al., 2020) have been very successful at learning meaningful latent representations from unlabeled image data, resulting in much more general and transferable representations for downstream tasks.… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    ACM Class: I.2.6; I.2.10

  2. arXiv:2312.00886  [pdf, other

    stat.ML cs.AI cs.GT cs.LG cs.MA

    Nash Learning from Human Feedback

    Authors: Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human preferences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to… ▽ More

    Submitted 11 June, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  3. arXiv:2206.08332  [pdf, other

    cs.LG cs.AI stat.ML

    BYOL-Explore: Exploration by Bootstrapped Prediction

    Authors: Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

    Abstract: We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. BYOL-Explore learns a world representation, the world dynamics, and an exploration policy all-together by optimizing a single prediction loss in the latent space with no additional auxiliary objective. We show that BYOL-Explore is effective in DM-HARD-8, a challeng… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  4. arXiv:2006.07733  [pdf, other

    cs.LG cs.CV stat.ML

    Bootstrap your own latent: A new approach to self-supervised Learning

    Authors: Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

    Abstract: We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the… ▽ More

    Submitted 10 September, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

  5. arXiv:1906.07805  [pdf, other

    cs.LG cs.AI stat.ML

    Directed Exploration for Reinforcement Learning

    Authors: Zhaohan Daniel Guo, Emma Brunskill

    Abstract: Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general. From small, tabular settings such as gridworlds to large, continuous and sparse reward settings such as robotic object manipulation tasks, exploration through adding an uncertainty bonus to the reward function has been shown to be effective when the uncertainty is able to accurately drive ex… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

  6. arXiv:1811.06407  [pdf, other

    cs.LG stat.ML

    Neural Predictive Belief Representations

    Authors: Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Rémi Munos

    Abstract: Unsupervised representation learning has succeeded with excellent results in many applications. It is an especially powerful tool to learn a good representation of environments with partial or noisy observations. In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far. In this paper, we investigate whet… ▽ More

    Submitted 19 August, 2019; v1 submitted 15 November, 2018; originally announced November 2018.

  7. arXiv:1703.03454  [pdf, other

    cs.LG stat.ML

    Sample Efficient Feature Selection for Factored MDPs

    Authors: Zhaohan Daniel Guo, Emma Brunskill

    Abstract: In reinforcement learning, the state of the real world is often represented by feature vectors. However, not all of the features may be pertinent for solving the current task. We propose Feature Selection Explore and Exploit (FS-EE), an algorithm that automatically selects the necessary features while learning a Factored Markov Decision Process, and prove that under mild assumptions, its sample co… ▽ More

    Submitted 9 March, 2017; originally announced March 2017.

  8. arXiv:1605.08062  [pdf, other

    cs.LG cs.AI stat.ML

    A PAC RL Algorithm for Episodic POMDPs

    Authors: Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill

    Abstract: Many interesting real world domains involve reinforcement learning (RL) in partially observable environments. Efficient learning in such domains is important, but existing sample complexity bounds for partially observable RL are at least exponential in the episode length. We give, to our knowledge, the first partially observable RL algorithm with a polynomial bound on the number of episodes on whi… ▽ More

    Submitted 1 June, 2016; v1 submitted 25 May, 2016; originally announced May 2016.

    Journal ref: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 510-518, 2016