Skip to main content

Showing 1–22 of 22 results for author: Obando-Ceron, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.26626  [pdf, ps, other

    cs.LG

    Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

    Authors: Siddarth Venkatraman, Vineet Jain, Sarthak Mittal, Vedant Shah, Johan Obando-Ceron, Yoshua Bengio, Brian R. Bartoldson, Bhavya Kailkhura, Guillaume Lajoie, Glen Berseth, Nikolay Malkin, Moksh Jain

    Abstract: Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary m… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

    Comments: 24 pages, 9 figures

  2. arXiv:2506.15544  [pdf, ps, other

    cs.LG

    Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

    Authors: Roger Creus Castanyer, Johan Obando-Ceron, Lu Li, Pierre-Luc Bacon, Glen Berseth, Aaron Courville, Pablo Samuel Castro

    Abstract: Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure mode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  3. arXiv:2506.13672  [pdf, ps, other

    cs.LG

    The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning

    Authors: Jiashun Liu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan

    Abstract: Off-policy deep reinforcement learning (RL) typically leverages replay buffers for reusing past experiences during learning. This can help improve sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it can have the effect of "polluting" the replay buffer with data which can exacerbate optimization challenges in addition to w… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Proceedings of the 42nd International Conference on Machine Learning (ICML 2025)

  4. arXiv:2506.03404  [pdf, ps, other

    cs.LG cs.AI

    The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks

    Authors: Walter Mayor, Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro

    Abstract: The use of parallel actors for data collection has been an effective technique used in reinforcement learning (RL) algorithms. The manner in which data is collected in these algorithms, controlled via the number of parallel environments and the rollout length, induces a form of bias-variance trade-off; the number of training passes over the collected data, on the other hand, must strike a balance… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Proceedings of the 42nd International Conference on Machine Learning (ICML 2025)

  5. arXiv:2506.00592  [pdf, ps, other

    cs.LG cs.AI

    Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn

    Authors: Hongyao Tang, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Glen Berseth

    Abstract: Plasticity, or the ability of an agent to adapt to new tasks, environments, or distributions, is crucial for continual learning. In this paper, we study the loss of plasticity in deep continual RL from the lens of churn: network output variability for out-of-batch data induced by mini-batch training. We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted to ICML 2025

  6. arXiv:2505.24061  [pdf, ps, other

    cs.LG

    Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

    Authors: Jiashun Liu, Zihao Wu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan

    Abstract: Deep reinforcement learning (RL) agents frequently suffer from neuronal activity loss, which impairs their ability to adapt to new data and learn continually. A common method to quantify and address this issue is the tau-dormant neuron ratio, which uses activation statistics to measure the expressive ability of neurons. While effective for simple MLP-based agents, this approach loses statistical p… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  7. arXiv:2504.07072  [pdf, other

    cs.CL cs.CV

    Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

    Authors: Israfel Salazar, Manuel Fernández Burda, Shayekh Bin Islam, Arshia Soltani Moakhar, Shivalika Singh, Fabian Farestam, Angelika Romanou, Danylo Boiko, Dipika Khullar, Mike Zhang, Dominik Krzemiński, Jekaterina Novikova, Luísa Shimabucoro, Joseph Marvin Imperial, Rishabh Maheshwary, Sharad Duwal, Alfonso Amayuelas, Swati Rajwal, Jebish Purbey, Ahmed Ruby, Nicholas Popovič, Marek Suppa, Azmine Toushik Wasi, Ram Mohan Rao Kadiyala, Olga Tsymboi , et al. (20 additional authors not shown)

    Abstract: The evaluation of vision-language models (VLMs) has mainly relied on English-language benchmarks, leaving significant gaps in both multilingual and multicultural coverage. While multilingual benchmarks have expanded, both in size and languages, many rely on translations of English datasets, failing to capture cultural nuances. In this work, we propose Kaleidoscope, as the most comprehensive exam b… ▽ More

    Submitted 29 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: v2: corrected the author list

  8. arXiv:2504.06949  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Adaptive Computation Pruning for the Forgetting Transformer

    Authors: Zhixuan Lin, Johan Obando-Ceron, Xu Owen He, Aaron Courville

    Abstract: The recently proposed Forgetting Transformer (FoX) incorporates a forget gate into softmax attention and has shown consistently better or on-par performance compared to the standard RoPE-based Transformer. Notably, many attention heads in FoX tend to forget quickly, causing their output at each timestep to rely primarily on local context. Based on this observation, we propose Adaptive Computation… ▽ More

    Submitted 11 August, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: Published as a conference paper at COLM 2025

  9. arXiv:2503.18929  [pdf, other

    cs.LG

    Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

    Authors: Brian R. Bartoldson, Siddarth Venkatraman, James Diffenderfer, Moksh Jain, Tal Ben-Nun, Seanie Lee, Minsu Kim, Johan Obando-Ceron, Yoshua Bengio, Bhavya Kailkhura

    Abstract: Reinforcement learning (RL) is a critical component of large language model (LLM) post-training. However, existing on-policy algorithms used for post-training are inherently incompatible with the use of experience replay buffers, which can be populated scalably by distributed off-policy actors to enhance exploration as compute increases. We propose efficiently obtaining this benefit of replay buff… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  10. arXiv:2411.16508  [pdf, other

    cs.CV cs.CL

    All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

    Authors: Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, Noor Ahsan, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti, Amandeep Kumar, Kartik Kuckreja, Mykola Maslych, Wafa Al Ghallabi, Mihail Mihaylov, Chao Qin, Abdelrahman M Shaker, Mike Zhang, Mahardika Krisna Ihsani, Amiel Esplana, Monil Gokani, Shachar Mirkin, Harsh Singh, Ashay Srivastava, Endre Hamerlik, Fathinah Asma Izzati, Fadillah Adamsyah Maani , et al. (44 additional authors not shown)

    Abstract: Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All La… ▽ More

    Submitted 30 April, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: A Multilingual Multimodal cultural benchmark for 100 languages

  11. arXiv:2410.07994  [pdf, ps, other

    cs.LG

    Neuroplastic Expansion in Deep Reinforcement Learning

    Authors: Jiashun Liu, Johan Obando-Ceron, Aaron Courville, Ling Pan

    Abstract: The loss of plasticity in learning agents, analogous to the solidification of neural pathways in biological brains, significantly impedes learning and adaptation in reinforcement learning due to its non-stationary nature. To address this fundamental challenge, we propose a novel approach, {\it Neuroplastic Expansion} (NE), inspired by cortical expansion in cognitive science. NE maintains learnabil… ▽ More

    Submitted 2 June, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

  12. arXiv:2410.01930  [pdf, other

    cs.LG cs.AI

    Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

    Authors: Ghada Sokar, Johan Obando-Ceron, Aaron Courville, Hugo Larochelle, Pablo Samuel Castro

    Abstract: The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While soft mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reasons behind their effectiveness remain largely unknown. In this work we provide an in-depth analysis identifying the key factors driving this performanc… ▽ More

    Submitted 26 February, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

  13. arXiv:2406.18420  [pdf, other

    cs.LG cs.AI

    Mixture of Experts in a Mixture of RL settings

    Authors: Timon Willi, Johan Obando-Ceron, Jakob Foerster, Karolina Dziugaite, Pablo Samuel Castro

    Abstract: Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's lea… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  14. arXiv:2406.17523  [pdf, other

    cs.LG cs.AI

    On the consistency of hyper-parameter selection in value-based deep reinforcement learning

    Authors: Johan Obando-Ceron, João G. M. Araújo, Aaron Courville, Pablo Samuel Castro

    Abstract: Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed tec… ▽ More

    Submitted 29 November, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  15. arXiv:2402.12479  [pdf, other

    cs.LG cs.AI

    In value-based deep reinforcement learning, a pruned network is a good network

    Authors: Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro

    Abstract: Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  16. arXiv:2402.08609  [pdf, other

    cs.LG cs.AI

    Mixtures of Experts Unlock Parameter Scaling for Deep RL

    Authors: Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

    Abstract: The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-… ▽ More

    Submitted 26 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  17. arXiv:2310.03882  [pdf, other

    cs.LG cs.AI

    Small batch deep reinforcement learning

    Authors: Johan Obando-Ceron, Marc G. Bellemare, Pablo Samuel Castro

    Abstract: In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant pe… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Published at NeurIPS 2023

  18. arXiv:2307.10519  [pdf, other

    cs.RO

    Probabilistic Multimodal Depth Estimation Based on Camera-LiDAR Sensor Fusion

    Authors: Johan S. Obando-Ceron, Victor Romero-Cano, Sildomar Monteiro

    Abstract: Multi-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There have been outstanding advances in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution, or LiDAR sensors, due to the precise geometric data they provide. However, each of these suffers f… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: 15 pages. arXiv admin note: text overlap with arXiv:1411.6387, arXiv:1702.02706 by other authors

    Journal ref: Machine Vision and Applications journal, 2023

  19. arXiv:2305.19452  [pdf, other

    cs.LG cs.AI

    Bigger, Better, Faster: Human-level Atari with human-level efficiency

    Authors: Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro

    Abstract: We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a dis… ▽ More

    Submitted 13 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ICML 2023, revised version

  20. arXiv:2304.14082  [pdf, other

    cs.LG cs.SE

    JaxPruner: A concise library for sparsity research

    Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

    Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

  21. arXiv:2011.14826  [pdf, other

    cs.LG cs.AI

    Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research

    Authors: Johan S. Obando-Ceron, Pablo Samuel Castro

    Abstract: Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators. New methods are typically evaluated on a set of environments that have now become standard, such as Atari 2600 games. While these benchmarks help standardize evaluation, their computational cost has the unfortunate side effect… ▽ More

    Submitted 21 May, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

    Comments: Proceedings of the 38th International Conference on Machine Learning (ICML 2021)

  22. arXiv:1912.09595  [pdf, other

    cs.LG stat.ML

    Exploiting the potential of deep reinforcement learning for classification tasks in high-dimensional and unstructured data

    Authors: Johan S. Obando-Ceron, Victor Romero Cano, Walter Mayor Toro

    Abstract: This paper presents a framework for efficiently learning feature selection policies which use less features to reach a high classification precision on large unstructured data. It uses a Deep Convolutional Autoencoder (DCAE) for learning compact feature spaces, in combination with recently-proposed Reinforcement Learning (RL) algorithms as Double DQN and Retrace.

    Submitted 19 December, 2019; originally announced December 2019.