Skip to main content

Showing 1–11 of 11 results for author: Nekoei, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04103  [pdf, ps, other

    cs.AI cs.LG stat.ML

    How to Train Your LLM Web Agent: A Statistical Diagnosis

    Authors: Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia

    Abstract: LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with open-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions; and second, the high compute costs required to post-train LLM-based web agents.… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  2. arXiv:2506.00140  [pdf, ps, other

    cs.AI cs.LG econ.GN

    Balancing Profit and Fairness in Risk-Based Pricing Markets

    Authors: Jesse Thibodeau, Hadi Nekoei, Afaf Taïk, Janarthanan Rajendran, Golnoosh Farnadi

    Abstract: Dynamic, risk-based pricing can systematically exclude vulnerable consumer groups from essential resources such as health insurance and consumer credit. We show that a regulator can realign private incentives with social objectives through a learned, interpretable tax schedule. First, we provide a formal proposition that bounding each firm's \emph{local} demographic gap implicitly bounds the \emph… ▽ More

    Submitted 4 June, 2025; v1 submitted 30 May, 2025; originally announced June 2025.

  3. arXiv:2503.14555  [pdf, other

    cs.MA cs.AI

    A Generalist Hanabi Agent

    Authors: Arjun V Sudhakar, Hadi Nekoei, Mathieu Reymond, Miao Liu, Janarthanan Rajendran, Sarath Chandar

    Abstract: Traditional multi-agent reinforcement learning (MARL) systems can develop cooperative strategies through repeated interactions. However, these systems are unable to perform well on any other setting than the one they have been trained on, and struggle to successfully cooperate with unfamiliar collaborators. This is particularly visible in the Hanabi benchmark, a popular 2-to-5 player cooperative c… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  4. arXiv:2405.01616  [pdf, other

    q-bio.BM cs.AI cs.LG

    Generative Active Learning for the Search of Small-molecule Protein Binders

    Authors: Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra , et al. (9 additional authors not shown)

    Abstract: Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  5. arXiv:2404.14620  [pdf, other

    cs.LG cs.CY

    Fairness Incentives in Response to Unfair Dynamic Pricing

    Authors: Jesse Thibodeau, Hadi Nekoei, Afaf Taïk, Janarthanan Rajendran, Golnoosh Farnadi

    Abstract: The use of dynamic pricing by profit-maximizing firms gives rise to demand fairness concerns, measured by discrepancies in consumer groups' demand responses to a given pricing strategy. Notably, dynamic pricing may result in buyer distributions unreflective of those of the underlying population, which can be problematic in markets where fair representation is socially desirable. To address this, p… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  6. arXiv:2308.10284  [pdf, other

    cs.LG cs.AI cs.MA

    Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi

    Authors: Hadi Nekoei, Xutong Zhao, Janarthanan Rajendran, Miao Liu, Sarath Chandar

    Abstract: Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with Zero-Shot Coordination (ZSC) have gained significant attention in recent years. ZSC refers to the ability of agents to coordinate zero-shot (without additional interaction experience) with independently trained agents. While ZSC is crucial for cooperative MARL agents, it might not be possible for complex tasks and changing envir… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  7. arXiv:2302.02792  [pdf, other

    cs.LG

    Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

    Authors: Hadi Nekoei, Akilesh Badrinaaraayanan, Amit Sinha, Mohammad Amini, Janarthanan Rajendran, Aditya Mahajan, Sarath Chandar

    Abstract: Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme… ▽ More

    Submitted 17 August, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  8. arXiv:2301.02593  [pdf, other

    cs.MA cs.AI cs.LG eess.SY

    Multi-Agent Reinforcement Learning for Fast-Timescale Demand Response of Residential Loads

    Authors: Vincent Mai, Philippe Maisonneuve, Tianyu Zhang, Hadi Nekoei, Liam Paull, Antoine Lesage-Landry

    Abstract: To integrate high amounts of renewable energy resources, electrical power grids must be able to cope with high amplitude, fast timescale variations in power generation. Frequency regulation through demand response has the potential to coordinate temporally flexible loads, such as air conditioners, to counteract these variations. Existing approaches for discrete control with dynamic constraints str… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

    Comments: Presented as an extended abstract at AAMAS 2023

  9. arXiv:2103.03216  [pdf, other

    cs.LG cs.AI cs.MA

    Continuous Coordination As a Realistic Scenario for Lifelong Learning

    Authors: Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron Courville, Sarath Chandar

    Abstract: Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of L… ▽ More

    Submitted 14 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: 19 pages with supplementary materials. Added results for Lifelong RL methods and some future work. Accepted to ICML 2021

  10. arXiv:2102.08501  [pdf, other

    cs.LG stat.ML

    DEUP: Direct Epistemic Uncertainty Prediction

    Authors: Salem Lahlou, Moksh Jain, Hadi Nekoei, Victor Ion Butoi, Paul Bertin, Jarrid Rector-Brooks, Maksym Korablyov, Yoshua Bengio

    Abstract: Epistemic Uncertainty is a measure of the lack of knowledge of a learner which diminishes with more evidence. While existing work focuses on using the variance of the Bayesian posterior due to parameter uncertainty as a measure of epistemic uncertainty, we argue that this does not capture the part of lack of knowledge induced by model misspecification. We discuss how the excess risk, which is the… ▽ More

    Submitted 3 February, 2023; v1 submitted 16 February, 2021; originally announced February 2021.

  11. arXiv:2007.03158  [pdf, other

    cs.LG cs.AI stat.ML

    The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

    Authors: Harm van Seijen, Hadi Nekoei, Evan Racah, Sarath Chandar

    Abstract: Deep model-based Reinforcement Learning (RL) has the potential to substantially improve the sample-efficiency of deep RL. While various challenges have long held it back, a number of papers have recently come out reporting success with deep model-based methods. This is a great development, but the lack of a consistent metric to evaluate such methods makes it difficult to compare various approaches… ▽ More

    Submitted 3 December, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: NeurIPS 2020, code: https://github.com/chandar-lab/LoCA