Skip to main content

Showing 1–50 of 83 results for author: Poupart, P

.
  1. arXiv:2510.05271  [pdf, ps, other

    cs.HC

    Chrysalis: A Unified System for Comparing Active Teaching and Passive Learning with AI Agents in Education

    Authors: Prashanth Arun, Vinita Vader, Erya Xu, Brent McCready-Branch, Sarah Seabrook, Kyle Scholz, Ana Crisan, Igor Grossmann, Pascal Poupart

    Abstract: AI-assisted learning has seen a remarkable uptick over the last few years, mainly due to the rise in popularity of Large Language Models (LLMs). Their ability to hold long-form, natural language interactions with users makes them excellent resources for exploring school- and university-level topics in a dynamic, active manner. We compare students' experiences when interacting with an LLM companion… ▽ More

    Submitted 6 October, 2025; originally announced October 2025.

    Comments: 13 pages. Submitted to EAAI-26 symposium. Under review

  2. arXiv:2510.04394  [pdf, ps, other

    cs.CL cs.LG

    Time Is Effort: Estimating Human Post-Editing Time for Grammar Error Correction Tool Evaluation

    Authors: Ankit Vadehra, Bill Johnson, Gene Saunders, Pascal Poupart

    Abstract: Text editing can involve several iterations of revision. Incorporating an efficient Grammar Error Correction (GEC) tool in the initial correction round can significantly impact further human editing effort and final text quality. This raises an interesting question to quantify GEC Tool usability: How much effort can the GEC Tool save users? We present the first large-scale dataset of post-editing… ▽ More

    Submitted 5 October, 2025; originally announced October 2025.

    Comments: Accepted for publication in the 4th HCI+NLP Workshop (Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing; part of EMNLP 2025)

  3. arXiv:2506.12036  [pdf, ps, other

    cs.LG cs.AI

    A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models

    Authors: Yanting Miao, William Loh, Pacal Poupart, Suraj Kothawade

    Abstract: Recent work uses reinforcement learning (RL) to fine-tune text-to-image diffusion models, improving text-image alignment and sample quality. However, existing approaches introduce unnecessary complexity: they cache the full sampling trajectory, depend on differentiable reward models or large preference datasets, or require specialized guidance techniques. Motivated by the "golden noise" hypothesis… ▽ More

    Submitted 1 July, 2025; v1 submitted 22 May, 2025; originally announced June 2025.

    Comments: 17 pages, 6 figures

  4. arXiv:2506.06926  [pdf, ps, other

    cs.LG

    Basis Transformers for Multi-Task Tabular Regression

    Authors: Wei Min Loh, Jiaqi Shang, Pascal Poupart

    Abstract: Dealing with tabular data is challenging due to partial information, noise, and heterogeneous structure. Existing techniques often struggle to simultaneously address key aspects of tabular data such as textual information, a variable number of columns, and unseen data without metadata besides column names. We propose a novel architecture, \textit{basis transformers}, specifically designed to tackl… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  5. arXiv:2506.06261  [pdf, ps, other

    cs.AI cs.LG

    Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

    Authors: Jihwan Jeong, Xiaoyu Wang, Jingmin Wang, Scott Sanner, Pascal Poupart

    Abstract: Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifie… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  6. arXiv:2506.05876  [pdf, ps, other

    cs.GT cs.AI

    Information Bargaining: Bilateral Commitment in Bayesian Persuasion

    Authors: Yue Lin, Shuhui Zhu, William A Cunningham, Wenhao Li, Pascal Poupart, Hongyuan Zha, Baoxiang Wang

    Abstract: Bayesian persuasion, an extension of cheap-talk communication, involves an informed sender committing to a signaling scheme to influence a receiver's actions. Compared to cheap talk, this sender's commitment enables the receiver to verify the incentive compatibility of signals beforehand, facilitating cooperation. While effective in one-shot scenarios, Bayesian persuasion faces computational compl… ▽ More

    Submitted 9 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

  7. arXiv:2505.23913  [pdf, ps, other

    cs.LG stat.ML

    Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling

    Authors: Gustavo Sutter Pessurno de Carvalho, Mohammed Abdulrahman, Hao Wang, Sriram Ganapathi Subramanian, Marc St-Aubin, Sharon O'Sullivan, Lawrence Wan, Luis Ricardez-Sandoval, Pascal Poupart, Agustinus Kristiadi

    Abstract: The optimization of expensive black-box functions is ubiquitous in science and engineering. A common solution to this problem is Bayesian optimization (BO), which is generally comprised of two components: (i) a surrogate model and (ii) an acquisition function, which generally require expensive re-training and optimization steps at each iteration, respectively. Although recent work enabled in-conte… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  8. arXiv:2504.11412  [pdf, other

    cs.LG cs.AI

    Measures of Variability for Risk-averse Policy Gradient

    Authors: Yudong Luo, Yangchen Pan, Jiaqi Tan, Pascal Poupart

    Abstract: Risk-averse reinforcement learning (RARL) is critical for decision-making under uncertainty, which is especially valuable in high-stake applications. However, most existing works focus on risk measures, e.g., conditional value-at-risk (CVaR), while measures of variability remain underexplored. In this paper, we comprehensively study nine common measures of variability, namely Variance, Gini Deviat… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  9. arXiv:2503.03866  [pdf, other

    cs.AI cs.GT cs.LG cs.MA

    Learning to Negotiate via Voluntary Commitment

    Authors: Shuhui Zhu, Baoxiang Wang, Sriram Ganapathi Subramanian, Pascal Poupart

    Abstract: The partial alignment and conflict of autonomous agents lead to mixed-motive scenarios in many real-world applications. However, agents may fail to cooperate in practice even when cooperation yields a better outcome. One well known reason for this failure comes from non-credible commitments. To facilitate commitments among agents for better cooperation, we define Markov Commitment Games (MCGs), a… ▽ More

    Submitted 19 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: Accepted by AISTATS 2025

  10. arXiv:2502.04517  [pdf, ps, other

    cs.LG cs.CL

    Towards Cost-Effective Reward Guided Text Generation

    Authors: Ahmad Rashid, Ruotian Wu, Rongqi Fan, Hongliang Li, Agustinus Kristiadi, Pascal Poupart

    Abstract: Reward-guided text generation (RGTG) has emerged as a viable alternative to offline reinforcement learning from human feedback (RLHF). RGTG methods can align baseline language models to human preferences without further training like in standard RLHF methods. However, they rely on a reward model to score each candidate token generated by the language model at inference, incurring significant test-… ▽ More

    Submitted 6 July, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

    Comments: 18 pages. Work accepted at ICML 2025

  11. arXiv:2412.05717  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Soft Driving Constraints from Vectorized Scene Embeddings while Imitating Expert Trajectories

    Authors: Niloufar Saeidi Mobarakeh, Behzad Khamidehi, Chunlin Li, Hamidreza Mirkhani, Fazel Arasteh, Mohammed Elmahgiubi, Weize Zhang, Kasra Rezaee, Pascal Poupart

    Abstract: The primary goal of motion planning is to generate safe and efficient trajectories for vehicles. Traditionally, motion planning models are trained using imitation learning to mimic the behavior of human experts. However, these models often lack interpretability and fail to provide clear justifications for their decisions. We propose a method that integrates constraint learning into imitation learn… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

  12. arXiv:2409.07569  [pdf, other

    cs.LG cs.AI

    A Comprehensive Survey on Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

    Authors: Guiliang Liu, Sheng Xu, Shicheng Liu, Ashish Gaurav, Sriram Ganapathi Subramanian, Pascal Poupart

    Abstract: Inverse Constrained Reinforcement Learning (ICRL) is the task of inferring the implicit constraints that expert agents adhere to, based on their demonstration data. As an emerging research topic, ICRL has received considerable attention in recent years. This article presents a categorical survey of the latest advances in ICRL. It serves as a comprehensive reference for machine learning researchers… ▽ More

    Submitted 31 January, 2025; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: 36 pages

    Journal ref: Transactions on Machine Learning Research, 2025

  13. arXiv:2407.12164  [pdf, other

    cs.CV cs.AI cs.LG

    Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning

    Authors: Yanting Miao, William Loh, Suraj Kothawade, Pascal Poupart, Abdullah Rashwan, Yeqing Li

    Abstract: Text-to-image generative models have recently attracted considerable interest, enabling the synthesis of high-quality images from textual prompts. However, these models often lack the capability to generate specific subjects from given reference images or to synthesize novel renditions under varying conditions. Methods like DreamBooth and Subject-driven Text-to-Image (SuTI) have made significant p… ▽ More

    Submitted 22 December, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: NeurIPS 2024

  14. arXiv:2407.08337  [pdf, other

    cs.LG cs.DC stat.ML

    FedLog: Personalized Federated Classification with Less Communication and More Flexibility

    Authors: Haolin Yu, Guojun Zhang, Pascal Poupart

    Abstract: Federated representation learning (FRL) aims to learn personalized federated models with effective feature extraction from local data. FRL algorithms that share the majority of the model parameters face significant challenges with huge communication overhead. This overhead stems from the millions of neural network parameters and slow aggregation progress of the averaging heuristic. To reduce the o… ▽ More

    Submitted 11 October, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  15. arXiv:2407.03951  [pdf, ps, other

    cs.LG

    Uncertainty-Guided Likelihood Tree Search

    Authors: Julia Grosse, Ruotian Wu, Ahmad Rashid, Cheng Zhang, Philipp Hennig, Pascal Poupart, Agustinus Kristiadi

    Abstract: Tree search is a fundamental tool for planning, as many sequential decision-making problems can be framed as searching over tree-structured spaces. We propose an uncertainty-guided tree search algorithm for settings where the reward function is a log-likelihood function of the paths. Due to the combinatorial explosion of the tree size, the set of paths for which one can obtain rewards is sparse, p… ▽ More

    Submitted 4 September, 2025; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: 10 pages

  16. arXiv:2406.16782  [pdf, other

    cs.LG

    Confidence Aware Inverse Constrained Reinforcement Learning

    Authors: Sriram Ganapathi Subramanian, Guiliang Liu, Mohammed Elmahgiubi, Kasra Rezaee, Pascal Poupart

    Abstract: In coming up with solutions to real-world problems, humans implicitly adhere to constraints that are too numerous and complex to be specified completely. However, reinforcement learning (RL) agents need these constraints to learn the correct optimal policy in these settings. The field of Inverse Constraint Reinforcement Learning (ICRL) deals with this problem and provides algorithms that aim to es… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Paper to appear in ICML 2024

  17. arXiv:2406.07780  [pdf, ps, other

    cs.LG cs.CL

    A Critical Look At Tokenwise Reward-Guided Text Generation

    Authors: Ahmad Rashid, Ruotian Wu, Julia Grosse, Agustinus Kristiadi, Pascal Poupart

    Abstract: Large language models (LLMs) can be improved by aligning with human preferences through fine-tuning -- the so-called reinforcement learning from human feedback (RLHF). However, the cost of fine-tuning an LLM is prohibitive for many users. Due to their ability to bypass LLM fine-tuning, prediction-time tokenwise reward-guided text generation (RGTG) methods have recently been proposed. They use a re… ▽ More

    Submitted 25 September, 2025; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Work accepted at COLM 2025

  18. arXiv:2406.06459  [pdf, other

    cs.LG

    How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

    Authors: Agustinus Kristiadi, Felix Strieth-Kalthoff, Sriram Ganapathi Subramanian, Vincent Fortuin, Pascal Poupart, Geoff Pleiss

    Abstract: Bayesian optimization (BO) is an integral part of automated scientific discovery -- the so-called self-driving lab -- where human inputs are ideally minimal or at least non-blocking. However, scientists often have strong intuition, and thus human feedback is still useful. Nevertheless, prior works in enhancing BO with expert feedback, such as by incorporating it in an offline or online but blockin… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: AABI 2024. Code: https://github.com/wiseodd/bo-async-feedback

  19. arXiv:2403.11062  [pdf, other

    cs.LG math.OC

    A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization

    Authors: Yudong Luo, Yangchen Pan, Han Wang, Philip Torr, Pascal Poupart

    Abstract: Reinforcement learning algorithms utilizing policy gradients (PG) to optimize Conditional Value at Risk (CVaR) face significant challenges with sample inefficiency, hindering their practical applications. This inefficiency stems from two main facts: a focus on tail-end performance that overlooks many sampled trajectories, and the potential of gradient vanishing when the lower tail of the return di… ▽ More

    Submitted 28 June, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: RLC 2024

  20. arXiv:2403.04221  [pdf, other

    cs.LG cs.AI

    Why Online Reinforcement Learning is Causal

    Authors: Oliver Schulte, Pascal Poupart

    Abstract: Reinforcement learning (RL) and causal modelling naturally complement each other. The goal of causal modelling is to predict the effects of interventions in an environment, while the goal of reinforcement learning is to select interventions that maximize the rewards the agent receives from the environment. Reinforcement learning includes the two most powerful sources of information for estimating… ▽ More

    Submitted 10 July, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: 43 pages. Version 2 discusses policy evaluation for partially observable MDPs based on a causal model

    ACM Class: I.2.6

  21. arXiv:2402.05015  [pdf, other

    cs.LG

    A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?

    Authors: Agustinus Kristiadi, Felix Strieth-Kalthoff, Marta Skreta, Pascal Poupart, Alán Aspuru-Guzik, Geoff Pleiss

    Abstract: Automation is one of the cornerstones of contemporary material discovery. Bayesian optimization (BO) is an essential part of such workflows, enabling scientists to leverage prior domain knowledge into efficient exploration of a large molecular space. While such prior knowledge can take many forms, there has been significant fanfare around the ancillary scientific knowledge encapsulated in large la… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: ICML 2024. Code: https://github.com/wiseodd/lapeft-bayesopt

  22. arXiv:2312.09817  [pdf, other

    cs.LG stat.ML

    Calibrated One Round Federated Learning with Bayesian Inference in the Predictive Space

    Authors: Mohsin Hasan, Guojun Zhang, Kaiyang Guo, Xi Chen, Pascal Poupart

    Abstract: Federated Learning (FL) involves training a model over a dataset distributed among clients, with the constraint that each client's dataset is localized and possibly heterogeneous. In FL, small and noisy datasets are common, highlighting the need for well-calibrated models that represent the uncertainty of predictions. The closest FL techniques to achieving such goals are the Bayesian FL methods wh… ▽ More

    Submitted 9 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 7 pages, 2 figures. To appear at AAAI 2024

  23. arXiv:2311.03683  [pdf, other

    cs.LG

    Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks

    Authors: Ahmad Rashid, Serena Hacker, Guojun Zhang, Agustinus Kristiadi, Pascal Poupart

    Abstract: Discriminatively trained, deterministic neural networks are the de facto choice for classification problems. However, even though they achieve state-of-the-art results on in-domain test sets, they tend to be overconfident on out-of-distribution (OOD) data. For instance, ReLU networks - a popular class of neural network architectures - have been shown to almost always yield high confidence predicti… ▽ More

    Submitted 27 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at AISTATS 2024

  24. arXiv:2307.08873  [pdf, other

    cs.LG cs.AI

    An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

    Authors: Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan

    Abstract: Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to… ▽ More

    Submitted 2 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  25. arXiv:2307.05228  [pdf, other

    cs.CL cs.LG

    Attribute Controlled Dialogue Prompting

    Authors: Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart

    Abstract: Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks. However, both discrete prompting and continuous prompting assume fixed prompts for all data samples within a task, neglecting the fact that inputs vary greatly in some tasks such as open-domain dialogue generation. In this paper, we present a novel, instanc… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: Accepted at ACL 2023 In Findings

  26. arXiv:2212.05998  [pdf, other

    cs.LG cs.CL

    Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

    Authors: Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi

    Abstract: Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the ca… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Published at EMNLP 2022 (Findings)

  27. arXiv:2211.14960  [pdf, other

    cs.LG stat.ML

    Label Alignment Regularization for Distribution Shift

    Authors: Ehsan Imani, Guojun Zhang, Runjia Li, Jun Luo, Pascal Poupart, Philip H. S. Torr, Yangchen Pan

    Abstract: Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix. Drawing inspiration from this observation, we propose a regularization method for unsupervised domain adaptation that encourages alignment between the predictions in the target domain and its t… ▽ More

    Submitted 11 September, 2024; v1 submitted 27 November, 2022; originally announced November 2022.

  28. arXiv:2206.15444  [pdf, other

    cs.LG

    Learning Functions on Multiple Sets using Multi-Set Transformers

    Authors: Kira Selby, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart

    Abstract: We propose a general deep architecture for learning functions on multiple permutation-invariant sets. We also show how to generalize this architecture to sets of elements of any dimension by dimension equivariance. We demonstrate that our architecture is a universal approximator of these functions, and show superior results to existing methods on a variety of tasks including counting tasks, alignm… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  29. arXiv:2206.09670  [pdf, other

    cs.LG

    Benchmarking Constraint Inference in Inverse Reinforcement Learning

    Authors: Guiliang Liu, Yudong Luo, Ashish Gaurav, Kasra Rezaee, Pascal Poupart

    Abstract: When deploying Reinforcement Learning (RL) agents into a physical system, we must ensure that these agents are well aware of the underlying constraints. In many real-world problems, however, the constraints are often hard to specify mathematically and unknown to the RL agents. To tackle these issues, Inverse Constrained Reinforcement Learning (ICRL) empirically estimates constraints from expert de… ▽ More

    Submitted 2 March, 2023; v1 submitted 20 June, 2022; originally announced June 2022.

  30. arXiv:2206.09526  [pdf, other

    cs.LG stat.ML

    Robust One Round Federated Learning with Predictive Space Bayesian Inference

    Authors: Mohsin Hasan, Zehao Zhang, Kaiyang Guo, Mahdi Karami, Guojun Zhang, Xi Chen, Pascal Poupart

    Abstract: Making predictions robust is an important challenge. A separate challenge in federated learning (FL) is to reduce the number of communication rounds, particularly since doing so reduces performance in heterogeneous data settings. To tackle both issues, we take a Bayesian perspective on the problem of learning a global model. We show how the global predictive posterior can be approximated using cli… ▽ More

    Submitted 19 June, 2022; originally announced June 2022.

    Comments: 7 pages, 1 figure. Code is publicly available at https://github.com/hasanmohsin/FedPredSpace_1Round

  31. arXiv:2206.06357  [pdf, other

    cs.LG

    Federated Bayesian Neural Regression: A Scalable Global Federated Gaussian Process

    Authors: Haolin Yu, Kaiyang Guo, Mahdi Karami, Xi Chen, Guojun Zhang, Pascal Poupart

    Abstract: In typical scenarios where the Federated Learning (FL) framework applies, it is common for clients to have insufficient training data to produce an accurate model. Thus, models that provide not only point estimations, but also some notion of confidence are beneficial. Gaussian Process (GP) is a powerful Bayesian model that comes with naturally well-calibrated variance estimations. However, it is c… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: 10 pages main text, 5 pages appendix, 5 figures

  32. arXiv:2206.01311  [pdf, other

    cs.LG

    Learning Soft Constraints From Constrained Expert Demonstrations

    Authors: Ashish Gaurav, Kasra Rezaee, Guiliang Liu, Pascal Poupart

    Abstract: Inverse reinforcement learning (IRL) methods assume that the expert data is generated by an agent optimizing some reward function. However, in many settings, the agent may optimize a reward function subject to some constraints, where the constraints induce behaviors that may be otherwise difficult to express with just a reward function. We consider the setting where the reward function is given, a… ▽ More

    Submitted 27 April, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: ICLR 2023 camera ready version (incl. supplementary material)

  33. arXiv:2205.13697  [pdf, other

    cs.LG cs.AI cs.MA

    FedFormer: Contextual Federation with Attention in Reinforcement Learning

    Authors: Liam Hebert, Lukasz Golab, Pascal Poupart, Robin Cohen

    Abstract: A core issue in multi-agent federated reinforcement learning is defining how to aggregate insights from multiple agents. This is commonly done by taking the average of each participating agent's model weights into one common model (FedAvg). We instead propose FedFormer, a novel federation strategy that utilizes Transformer Attention to contextually aggregate embeddings from models originating from… ▽ More

    Submitted 2 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Our source code can be found at https://github.com/liamhebert/FedFormer. Accepted at AAMAS 2023

  34. arXiv:2205.12428  [pdf, other

    cs.LG cs.CL

    Do we need Label Regularization to Fine-tune Pre-trained Language Models?

    Authors: Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi

    Abstract: Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model. Considering the ever-growing size of pre-trained language models (PLMs), KD is often adopted in many NLP tasks involving PLMs. However, it is evident that in KD, deploying the teacher network during training adds to the memory an… ▽ More

    Submitted 12 April, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Published at EACL 2023

  35. arXiv:2204.07674  [pdf, other

    cs.CL

    CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation

    Authors: Md Akmal Haidar, Mehdi Rezagholizadeh, Abbas Ghaddar, Khalil Bibi, Philippe Langlais, Pascal Poupart

    Abstract: Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained language models. Recent years have seen a surge of research aiming to improve KD by leveraging Contrastive Learning, Intermediate Layer Distillation, Data Augmentation, and Adversarial Training. In this work, we propose a learning based data augmentation technique tailored for knowledge distillation, call… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

  36. arXiv:2112.12321  [pdf, other

    cs.LG cs.NI

    Physics Constrained Flow Neural Network for Short-Timescale Predictions in Data Communications Networks

    Authors: Xiangle Cheng, James He, Shihan Xiao, Yingxue Zhang, Zhitang Chen, Pascal Poupart, Fenglin Li

    Abstract: Machine learning is gaining growing momentum in various recent models for the dynamic analysis of information flows in data communications networks. These preliminary models often rely on off-the-shelf learning models to predict from historical statistics while disregarding the physics governing the generating behaviors of these flows. This paper instead introduces Flow Neural Network (FlowNN) to… ▽ More

    Submitted 2 April, 2023; v1 submitted 22 December, 2021; originally announced December 2021.

  37. arXiv:2112.09099  [pdf, other

    cs.MA

    Decentralized Mean Field Games

    Authors: Sriram Ganapathi Subramanian, Matthew E. Taylor, Mark Crowley, Pascal Poupart

    Abstract: Multiagent reinforcement learning algorithms have not been widely adopted in large scale environments with many agents as they often scale poorly with the number of agents. Using mean field theory to aggregate agents has been proposed as a solution to this problem. However, almost all previous methods in this area make a strong assumption of a centralized system where all the agents in the environ… ▽ More

    Submitted 13 April, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: This work is to appear in AAAI-22. Recent version has minor formatting changes and some typos corrected

  38. arXiv:2109.10164  [pdf, other

    cs.CL

    RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation

    Authors: Md Akmal Haidar, Nithin Anchuri, Mehdi Rezagholizadeh, Abbas Ghaddar, Philippe Langlais, Pascal Poupart

    Abstract: Intermediate layer knowledge distillation (KD) can improve the standard KD technique (which only targets the output of teacher and student models) especially over large pre-trained language models. However, intermediate layer distillation suffers from excessive computational burdens and engineering efforts required for setting up a proper layer mapping. To address these problems, we propose a RAnd… ▽ More

    Submitted 1 October, 2021; v1 submitted 21 September, 2021; originally announced September 2021.

  39. arXiv:2109.04286  [pdf, other

    cs.LG cs.AI stat.ML

    NTS-NOTEARS: Learning Nonparametric DBNs With Prior Knowledge

    Authors: Xiangyu Sun, Oliver Schulte, Guiliang Liu, Pascal Poupart

    Abstract: We describe NTS-NOTEARS, a score-based structure learning method for time-series data to learn dynamic Bayesian networks (DBNs) that captures nonlinear, lagged (inter-slice) and instantaneous (intra-slice) relations among variables. NTS-NOTEARS utilizes 1D convolutional neural networks (CNNs) to model the dependence of child variables on their parents; 1D CNN is a neural function approximation mod… ▽ More

    Submitted 1 March, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: AISTATS 2023

  40. arXiv:2106.03632  [pdf, other

    cs.LG cs.AI stat.ML

    Quantifying and Improving Transferability in Domain Generalization

    Authors: Guojun Zhang, Han Zhao, Yaoliang Yu, Pascal Poupart

    Abstract: Out-of-distribution generalization is one of the key challenges when transferring a model from the lab to the real world. Existing efforts mostly focus on building invariant features among source and target domains. Based on invariant features, a high-performing classifier on source domains could hopefully behave equally well on a target domain. In other words, the invariant features are \emph{tra… ▽ More

    Submitted 1 November, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  41. arXiv:2104.08420  [pdf, other

    cs.CL cs.LG

    Robust Embeddings Via Distributions

    Authors: Kira A. Selby, Yinong Wang, Ruizhe Wang, Peyman Passban, Ahmad Rashid, Mehdi Rezagholizadeh, Pascal Poupart

    Abstract: Despite recent monumental advances in the field, many Natural Language Processing (NLP) models still struggle to perform adequately on noisy domains. We propose a novel probabilistic embedding-level method to improve the robustness of NLP models. Our method, Robust Embeddings via Distributions (RED), incorporates information from both noisy tokens and surrounding context to obtain distributions ov… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

  42. arXiv:2103.01039  [pdf, other

    cs.CV

    Self-Supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map

    Authors: Elmira Amirloo, Mohsen Rohani, Ershad Banijamali, Jun Luo, Pascal Poupart

    Abstract: While supervised learning is widely used for perception modules in conventional autonomous driving solutions, scalability is hindered by the huge amount of data labeling needed. In contrast, while end-to-end architectures do not require labeled data and are potentially more scalable, interpretability is sacrificed. We introduce a novel architecture that is trained in a fully self-supervised fashio… ▽ More

    Submitted 29 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Journal ref: CVPR 2021

  43. arXiv:2012.15791  [pdf, other

    cs.MA

    Partially Observable Mean Field Reinforcement Learning

    Authors: Sriram Ganapathi Subramanian, Matthew E. Taylor, Mark Crowley, Pascal Poupart

    Abstract: Traditional multi-agent reinforcement learning algorithms are not scalable to environments with more than a few agents, since these algorithms are exponential in the number of agents. Recent research has introduced successful methods to scale multi-agent reinforcement learning algorithms to many agent scenarios using mean field theory. Previous work in this field assumes that an agent has access t… ▽ More

    Submitted 24 January, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: Paper to be published in International Conference on Autonomous Agents and Multiagent Systems (AAMAS) - 2021. New version has some typos corrected

  44. arXiv:2012.13478  [pdf, other

    cs.LG cs.CV

    Prediction by Anticipation: An Action-Conditional Prediction Method based on Interaction Learning

    Authors: Ershad Banijamali, Mohsen Rohani, Elmira Amirloo, Jun Luo, Pascal Poupart

    Abstract: In autonomous driving (AD), accurately predicting changes in the environment can effectively improve safety and comfort. Due to complex interactions among traffic participants, however, it is very hard to achieve accurate prediction for a long horizon. To address this challenge, we propose prediction by anticipation, which views interaction in terms of a latent probabilistic generative process whe… ▽ More

    Submitted 24 December, 2020; originally announced December 2020.

  45. arXiv:2006.14592  [pdf, other

    cs.LG math.OC stat.ML

    Newton-type Methods for Minimax Optimization

    Authors: Guojun Zhang, Kaiwen Wu, Pascal Poupart, Yaoliang Yu

    Abstract: Differential games, in particular two-player sequential zero-sum games (a.k.a. minimax optimization), have been an important modeling tool in applied science and received renewed interest in machine learning due to many recent applications, such as adversarial training, generative models and reinforcement learning. However, existing theory mostly focuses on convex-concave functions with few except… ▽ More

    Submitted 18 February, 2023; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: code update

  46. arXiv:2003.03731  [pdf, ps, other

    math.OC cs.LG math.AG

    A Positivstellensatz for Conditional SAGE Signomials

    Authors: Allen Houze Wang, Priyank Jaini, Yaoliang Yu, Pascal Poupart

    Abstract: Recently, the conditional SAGE certificate has been proposed as a sufficient condition for signomial positivity over a convex set. In this article, we show that the conditional SAGE certificate is $\textit{complete}$. That is, for any signomial $f(\mathbf{x}) = \sum_{j=1}^{\ell}c_j \exp(\mathbf{A}_j\mathbf{x})$ defined by rational exponents that is positive over a compact convex set $\mathcal{X}$,… ▽ More

    Submitted 24 October, 2020; v1 submitted 8 March, 2020; originally announced March 2020.

    Comments: 19 pages, preprint

  47. arXiv:2003.03645  [pdf, other

    cs.CL cs.AI cs.HC

    Generating Emotionally Aligned Responses in Dialogues using Affect Control Theory

    Authors: Nabiha Asghar, Ivan Kobyzev, Jesse Hoey, Pascal Poupart, Muhammad Bilal Sheikh

    Abstract: State-of-the-art neural dialogue systems excel at syntactic and semantic modelling of language, but often have a hard time establishing emotional alignment with the human interactant during a conversation. In this work, we bring Affect Control Theory (ACT), a socio-mathematical model of emotions for human-human interactions, to the neural dialogue generation setting. ACT makes predictions about ho… ▽ More

    Submitted 16 April, 2020; v1 submitted 7 March, 2020; originally announced March 2020.

  48. arXiv:2002.11875  [pdf, other

    cs.LG math.OC stat.ML

    Optimality and Stability in Non-Convex Smooth Games

    Authors: Guojun Zhang, Pascal Poupart, Yaoliang Yu

    Abstract: Convergence to a saddle point for convex-concave functions has been studied for decades, while recent years has seen a surge of interest in non-convex (zero-sum) smooth games, motivated by their recent wide applications. It remains an intriguing research challenge how local optimal points are defined and which algorithm can converge to such points. An interesting concept is known as the local mini… ▽ More

    Submitted 3 February, 2022; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: accepted by JMLR 2022

  49. arXiv:2002.10631  [pdf, other

    cs.LG cs.CV stat.ML

    Batch norm with entropic regularization turns deterministic autoencoders into generative models

    Authors: Amur Ghose, Abdullah Rashwan, Pascal Poupart

    Abstract: The variational autoencoder is a well defined deep generative model that utilizes an encoder-decoder framework where an encoding neural network outputs a non-deterministic code for reconstructing an input. The encoder achieves this by sampling from a distribution for every input, instead of outputting a deterministic code per input. The great advantage of this process is that it allows the use of… ▽ More

    Submitted 21 September, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

    Journal ref: Published in the Proceedings of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2020

  50. arXiv:2002.09127  [pdf, other

    cs.CL cs.LG

    Learning Dynamic Belief Graphs to Generalize on Text-Based Games

    Authors: Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, William L. Hamilton

    Abstract: Playing text-based games requires skills in processing natural language and sequential decision making. Achieving human-level performance on text-based games remains an open challenge, and prior research has largely relied on hand-crafted structured representations and heuristics. In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured represent… ▽ More

    Submitted 11 May, 2021; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: Bug fixed in Table 1