Showing 1–2 of 2 results for author: Ackermann, J

Search v0.5.6 released 2020-02-24

arXiv:2402.13934 [pdf, other]

cs.LG cs.AI cs.CL stat.ML

Do Efficient Transformers Really Save Computation?

Authors: Kai Yang, Jan Ackermann, Zhenyu He, Guhao Feng, Bohang Zhang, Yunzhen Feng, Qiwei Ye, Di He, Liwei Wang

Abstract: As transformer-based language models are trained on increasingly large datasets and with vast numbers of parameters, finding more efficient alternatives to the standard Transformer has become very valuable. While many efficient Transformers and Transformer alternatives have been proposed, none provide theoretical guarantees that they are a suitable replacement for the standard Transformer. This ma… ▽ More As transformer-based language models are trained on increasingly large datasets and with vast numbers of parameters, finding more efficient alternatives to the standard Transformer has become very valuable. While many efficient Transformers and Transformer alternatives have been proposed, none provide theoretical guarantees that they are a suitable replacement for the standard Transformer. This makes it challenging to identify when to use a specific model and what directions to prioritize for further investigation. In this paper, we aim to understand the capabilities and limitations of efficient Transformers, specifically the Sparse Transformer and the Linear Transformer. We focus on their reasoning capability as exhibited by Chain-of-Thought (CoT) prompts and follow previous works to model them as Dynamic Programming (DP) problems. Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size. Nonetheless, we identify a class of DP problems for which these models can be more efficient than the standard Transformer. We confirm our theoretical results through experiments on representative DP tasks, adding to the understanding of efficient Transformers' practical strengths and weaknesses. △ Less

Submitted 8 November, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 20 pages, ICML 2024 Camera Ready Version
arXiv:1910.01465 [pdf, other]

cs.LG cs.AI cs.MA stat.ML

Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

Authors: Johannes Ackermann, Volker Gabler, Takayuki Osa, Masashi Sugiyama

Abstract: Many real world tasks require multiple agents to work together. Multi-agent reinforcement learning (RL) methods have been proposed in recent years to solve these tasks, but current methods often fail to efficiently learn policies. We thus investigate the presence of a common weakness in single-agent RL, namely value function overestimation bias, in the multi-agent setting. Based on our findings, w… ▽ More Many real world tasks require multiple agents to work together. Multi-agent reinforcement learning (RL) methods have been proposed in recent years to solve these tasks, but current methods often fail to efficiently learn policies. We thus investigate the presence of a common weakness in single-agent RL, namely value function overestimation bias, in the multi-agent setting. Based on our findings, we propose an approach that reduces this bias by using double centralized critics. We evaluate it on six mixed cooperative-competitive tasks, showing a significant advantage over current methods. Finally, we investigate the application of multi-agent methods to high-dimensional robotic tasks and show that our approach can be used to learn decentralized policies in this domain. △ Less

Submitted 2 December, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

Comments: Accepted for the Deep RL Workshop at NeurIPS 2019; Changes for v2: Changed Figures 3,4, due to an error in the implementation of MATD3. Please refer to this version for fair evaluation

Search v0.5.6 released 2020-02-24