Skip to main content

Showing 1–50 of 104 results for author: Fried, D

.
  1. arXiv:2506.02355  [pdf, ps, other

    cs.LG

    Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening

    Authors: Andre He, Daniel Fried, Sean Welleck

    Abstract: Reinforcement learning has emerged as an effective framework for training large language models on structured language-conditioned tasks. We identify a critical flaw of Group Relative Policy Optimization (GRPO), a widely used RL algorithm in this setting. For tasks that require multi-sample performance, such as formal theorem proving, GRPO biasedly reinforces already probable solutions and neglect… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  2. arXiv:2504.20294  [pdf, other

    cs.AI cs.CL cs.HC

    mrCAD: Multimodal Refinement of Computer-aided Designs

    Authors: William P. McCarthy, Saujas Vaduguru, Karl D. D. Willis, Justin Matejka, Judith E. Fan, Daniel Fried, Yewen Pu

    Abstract: A key feature of human collaboration is the ability to iteratively refine the concepts we have communicated. In contrast, while generative AI excels at the \textit{generation} of content, it often struggles to make specific language-guided \textit{modifications} of its prior outputs. To bridge the gap between how humans and machines perform edits, we present mrCAD, a dataset of multimodal instruct… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: the first two authors contributed equally

  3. arXiv:2504.06821  [pdf, other

    cs.CL

    Inducing Programmatic Skills for Agentic Tasks

    Authors: Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig, Daniel Fried

    Abstract: To succeed in common digital tasks such as web navigation, agents must carry out a variety of specialized tasks such as searching for products or planning a travel route. To tackle these tasks, agents can bootstrap themselves by learning task-specific skills online through interaction with the web environment. In this work, we demonstrate that programs are an effective representation for skills. W… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  4. arXiv:2503.07358  [pdf, other

    cs.CL cs.SE

    RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

    Authors: Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, Carolyn Rose

    Abstract: We present RepoST, a scalable method to construct environments that provide execution feedback for repository-level code generation for both training and evaluation. Unlike existing works that aim to build entire repositories for execution, which is challenging for both human and LLMs, we provide execution feedback with sandbox testing, which isolates a given target function and its dependencies t… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  5. arXiv:2502.18449  [pdf, other

    cs.SE cs.AI cs.CL

    SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

    Authors: Yuxiang Wei, Olivier Duchenne, Jade Copet, Quentin Carbonneaux, Lingming Zhang, Daniel Fried, Gabriel Synnaeve, Rishabh Singh, Sida I. Wang

    Abstract: The recent DeepSeek-R1 release has demonstrated the immense potential of reinforcement learning (RL) in enhancing the general reasoning capabilities of large language models (LLMs). While DeepSeek-R1 and other follow-up work primarily focus on applying RL to competitive coding and math problems, this paper introduces SWE-RL, the first approach to scale RL-based LLM reasoning for real-world softwar… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  6. arXiv:2502.16339  [pdf, other

    cs.MA cs.CL cs.GT

    Dynamic Coalition Structure Detection in Natural Language-based Interactions

    Authors: Abhishek N. Kulkarni, Andy Liu, Jean-Raphael Gaglione, Daniel Fried, Ufuk Topcu

    Abstract: In strategic multi-agent sequential interactions, detecting dynamic coalition structures is crucial for understanding how self-interested agents coordinate to influence outcomes. However, natural-language-based interactions introduce unique challenges to coalition detection due to ambiguity over intents and difficulty in modeling players' subjective perspectives. We propose a new method that lever… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Comments: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025)

  7. arXiv:2501.00912  [pdf, other

    cs.CV cs.CL

    AutoPresent: Designing Structured Visuals from Scratch

    Authors: Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell

    Abstract: Designing structured visuals such as presentation slides is essential for communicative needs, necessitating both content creation and visual planning skills. In this work, we tackle the challenge of automated slide generation, where models produce slide presentations from natural language (NL) instructions. We first introduce the SlidesBench benchmark, the first benchmark for slide generation wit… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

  8. arXiv:2410.18359  [pdf, ps, other

    cs.CL

    Improving Model Factuality with Fine-grained Critique-based Evaluator

    Authors: Yiqing Xie, Wenxuan Zhou, Pradyot Prakash, Di Jin, Yuning Mao, Quintin Fettes, Arya Talebzadeh, Sinong Wang, Han Fang, Carolyn Rose, Daniel Fried, Hejia Zhang

    Abstract: Factuality evaluation aims to detect factual errors produced by language models (LMs) and hence guide the development of more factual models. Towards this goal, we train a factuality evaluator, FenCE, that provides LM generators with claim-level factuality feedback. We conduct data augmentation on a combination of public judgment datasets to train FenCE to (1) generate textual critiques along with… ▽ More

    Submitted 1 June, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

  9. arXiv:2410.03893  [pdf, other

    cs.LG cs.AI

    Human-aligned Chess with a Bit of Search

    Authors: Yiming Zhang, Athul Paul Jacob, Vivian Lai, Daniel Fried, Daphne Ippolito

    Abstract: Chess has long been a testbed for AI's quest to match human intelligence, and in recent years, chess AI systems have surpassed the strongest humans at the game. However, these systems are not human-aligned; they are unable to match the skill levels of all human partners or model human-like behaviors beyond piece movement. In this paper, we introduce Allie, a chess-playing AI designed to bridge the… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  10. arXiv:2409.19801  [pdf, other

    cs.SE cs.AI cs.CL

    CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells

    Authors: Atharva Naik, Marcus Alenius, Daniel Fried, Carolyn Rose

    Abstract: The task of automated code review has recently gained a lot of attention from the machine learning community. However, current review comment evaluation metrics rely on comparisons with a human-written reference for a given code change (also called a diff). Furthermore, code review is a one-to-many problem, like generation and summarization, with many "valid reviews" for a diff. Thus, we develop C… ▽ More

    Submitted 16 March, 2025; v1 submitted 29 September, 2024; originally announced September 2024.

  11. arXiv:2409.07429  [pdf, other

    cs.CL

    Agent Workflow Memory

    Authors: Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig

    Abstract: Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this proc… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  12. arXiv:2407.14044  [pdf, other

    cs.CL cs.AI

    ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?

    Authors: Siddhant Waghjale, Vishruth Veerendranath, Zora Zhiruo Wang, Daniel Fried

    Abstract: Although large language models (LLMs) have been largely successful in generating functionally correct programs, conditioning models to produce efficient solutions while ensuring correctness remains a challenge. Further, unreliability in benchmarking code efficiency is a hurdle across varying hardware specifications for popular interpreted languages such as Python. In this paper, we present ECCO, a… ▽ More

    Submitted 9 October, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: EMNLP 2024; Project Page: https://ecco-code-eff.github.io/

  13. arXiv:2407.05430  [pdf, other

    cs.DS

    Hamming Distance Oracle

    Authors: Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus

    Abstract: In this paper, we present and study the \emph{Hamming distance oracle problem}. In this problem, the task is to preprocess two strings $S$ and $T$ of lengths $n$ and $m$, respectively, to obtain a data-structure that is able to answer queries regarding the Hamming distance between a substring of $S$ and a substring of $T$. For a constant size alphabet strings, we show that for every $x\le nm$ th… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  14. arXiv:2407.02499  [pdf, other

    cs.PL cs.AI

    Amortizing Pragmatic Program Synthesis with Rankings

    Authors: Yewen Pu, Saujas Vaduguru, Priyan Vaithilingam, Elena Glassman, Daniel Fried

    Abstract: The usage of Rational Speech Acts (RSA) framework has been successful in building \emph{pragmatic} program synthesizers that return programs which, in addition to being logically consistent with user-generated examples, account for the fact that a user chooses their examples informatively. We present a general method of amortizing the slow, exact RSA synthesizer. Our method first query the exact R… ▽ More

    Submitted 1 June, 2024; originally announced July 2024.

    Comments: icml 2024. This work supersedes and serves as a new version of arXiv:2309.03225

  15. arXiv:2407.01476  [pdf, other

    cs.AI cs.CL cs.LG

    Tree Search for Language Model Agents

    Authors: Jing Yu Koh, Stephen McAleer, Daniel Fried, Ruslan Salakhutdinov

    Abstract: Autonomous agents powered by language models (LMs) have demonstrated promise in their ability to perform decision-making tasks such as web automation. However, a key limitation remains: LMs, primarily optimized for natural language understanding and generation, struggle with multi-step reasoning, planning, and using environmental feedback when attempting to solve realistic computer tasks. Towards… ▽ More

    Submitted 12 October, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 12 pages. Models and code available at https://jykoh.com/search-agents

  16. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks o… ▽ More

    Submitted 1 April, 2025; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: Accpeted at ICLR 2025 (Oral), built with love by the BigCode community :)

  17. arXiv:2406.14497  [pdf, other

    cs.SE cs.CL

    CodeRAG-Bench: Can Retrieval Augment Code Generation?

    Authors: Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, Daniel Fried

    Abstract: While language models (LMs) have proven remarkably adept at generating code, many programs are challenging for LMs to generate using their parametric knowledge alone. Providing external contexts such as library documentation can facilitate generating accurate and functional code. Despite the success of retrieval-augmented generation (RAG) in various text-oriented tasks, its potential for improving… ▽ More

    Submitted 26 February, 2025; v1 submitted 20 June, 2024; originally announced June 2024.

  18. arXiv:2406.12814  [pdf, other

    cs.LG cs.CL cs.CR cs.CV

    Dissecting Adversarial Robustness of Multimodal LM Agents

    Authors: Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan

    Abstract: As language models (LMs) are used to build autonomous agents in real environments, ensuring their adversarial robustness becomes a critical challenge. Unlike chatbots, agents are compound systems with multiple components taking actions, which existing LMs safety evaluations do not adequately address. To bridge this gap, we manually create 200 targeted adversarial tasks and evaluation scripts in a… ▽ More

    Submitted 4 February, 2025; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: ICLR 2025. Also oral at NeurIPS 2024 Open-World Agents Workshop

  19. arXiv:2405.20253  [pdf, other

    cs.CL

    Evaluating Large Language Model Biases in Persona-Steered Generation

    Authors: Andy Liu, Mona Diab, Daniel Fried

    Abstract: The task of persona-steered text generation requires large language models (LLMs) to generate text that reflects the distribution of views that an individual fitting a persona could have. People have multifaceted personas, but prior work on bias in LLM-generated opinions has only explored multiple-choice settings or one-dimensional personas. We define an incongruous persona as a persona with multi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted to Findings of ACL 2024. Code and data available at https://github.com/andyjliu/persona-steered-generation-bias

  20. arXiv:2405.14173  [pdf, other

    cs.AI cs.HC

    Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication

    Authors: Shenghui Chen, Daniel Fried, Ufuk Topcu

    Abstract: Developing autonomous agents that can strategize and cooperate with humans under information asymmetry is challenging without effective communication in natural language. We introduce a shared-control game, where two players collectively control a token in alternating turns to achieve a common objective under incomplete information. We formulate a policy synthesis problem for an autonomous agent i… ▽ More

    Submitted 1 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: with appendix

  21. arXiv:2405.08760  [pdf, other

    cs.CL cs.AI

    Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs

    Authors: Akhila Yerukola, Saujas Vaduguru, Daniel Fried, Maarten Sap

    Abstract: Humans often express their communicative intents indirectly or non-literally, which requires their interlocutors -- human or AI -- to understand beyond the literal meaning of words. While most existing work has focused on discriminative evaluations, we present a new approach to generatively evaluate large language models' (LLMs') intention understanding by examining their responses to non-literal… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  22. arXiv:2404.11673  [pdf, other

    cs.DS

    Hairpin Completion Distance Lower Bound

    Authors: Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus

    Abstract: Hairpin completion, derived from the hairpin formation observed in DNA biochemistry, is an operation applied to strings, particularly useful in DNA computing. Conceptually, a right hairpin completion operation transforms a string $S$ into $S\cdot S'$ where $S'$ is the reverse complement of a prefix of $S$. Similarly, a left hairpin completion operation transforms a string $S$ into $S'\cdot S$ wher… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: To be published in CPM 2024

    MSC Class: 68W32 ACM Class: F.2.2

  23. arXiv:2404.01158  [pdf, other

    cs.CL cs.RO

    Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

    Authors: Casey Kennington, Malihe Alikhani, Heather Pon-Barry, Katherine Atwell, Yonatan Bisk, Daniel Fried, Felix Gervits, Zhao Han, Mert Inan, Michael Johnston, Raj Korpan, Diane Litman, Matthew Marge, Cynthia Matuszek, Ross Mead, Shiwali Mohan, Raymond Mooney, Natalie Parde, Jivko Sinapov, Angela Stewart, Matthew Stone, Stefanie Tellex, Tom Williams

    Abstract: The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: NSF Report on the "Dialogue with Robots" Workshop held in Pittsburg, PA, April 2023

  24. arXiv:2404.00566  [pdf, other

    cs.SE cs.CL

    CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

    Authors: Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, Carolyn Rose

    Abstract: To adequately test modern code generation systems, evaluation benchmarks must execute and test the code generated by the system. However, these execution and testing requirements have largely limited benchmarks to settings where code is easily executable or has human-written tests. To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to… ▽ More

    Submitted 2 October, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  25. arXiv:2403.15452  [pdf, other

    cs.CL cs.AI

    What Are Tools Anyway? A Survey from the Language Model Perspective

    Authors: Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig

    Abstract: Language models (LMs) are powerful yet mostly for text generation tasks. Tools have substantially enhanced their performance for tasks that require complex skills. However, many works adopt the term "tool" in different ways, raising the question: What is a tool anyway? Subsequently, where and how do tools help LMs? In this survey, we provide a unified definition of tools as external programs used… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  26. arXiv:2402.15449  [pdf, other

    cs.CL cs.LG

    Repetition Improves Language Model Embeddings

    Authors: Jacob Mitchell Springer, Suhas Kotha, Daniel Fried, Graham Neubig, Aditi Raghunathan

    Abstract: Recent approaches to improving the extraction of text embeddings from autoregressive large language models (LLMs) have largely focused on improvements to data, backbone pretrained language models, or improving task-differentiation via instructions. In this work, we address an architectural limitation of autoregressive models: token embeddings cannot contain information from tokens that appear late… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 36 pages, 11 figures, 16 tables

  27. arXiv:2401.13649  [pdf, other

    cs.LG cs.CL cs.CV

    VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

    Authors: Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried

    Abstract: Autonomous agents capable of planning, reasoning, and executing actions on the web offer a promising avenue for automating computer tasks. However, the majority of existing benchmarks primarily focus on text-based agents, neglecting many natural tasks that require visual information to effectively solve. Given that most computer interfaces cater to human perception, visual information often augmen… ▽ More

    Submitted 5 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to ACL 2024. 24 pages. Project page: https://jykoh.com/vwa

  28. arXiv:2401.12869  [pdf, other

    cs.AI

    TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks

    Authors: Zhiruo Wang, Daniel Fried, Graham Neubig

    Abstract: Language models (LMs) can solve tasks such as answering questions about tables or images by writing programs. However, using primitive functions often leads to verbose and error-prone programs, and higher-level functions require expert design. To enable better solutions without human labor, we ask code LMs to curate reusable high-level functions, and use them to write solutions. We present TROVE,… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  29. arXiv:2401.11290  [pdf, other

    cs.LO cs.FL

    On Dependent Variables in Reactive Synthesis

    Authors: S. Akshay, Eliyahu Basa, Supratik Chakraborty, Dror Fried

    Abstract: Given a Linear Temporal Logic (LTL) formula over input and output variables, reactive synthesis requires us to design a deterministic Mealy machine that gives the values of outputs at every time step for every sequence of inputs, such that the LTL formula is satisfied. In this paper, we investigate the notion of dependent variables in the context of reactive synthesis. Inspired by successful pre-p… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Comments: Full version of conference paper published in TACAS'24

  30. arXiv:2311.08584  [pdf, other

    cs.CL

    Asking More Informative Questions for Grounded Retrieval

    Authors: Sedrick Keh, Justin T. Chiu, Daniel Fried

    Abstract: When a model is trying to gather information in an interactive setting, it benefits from asking informative questions. However, in the case of a grounded multi-turn image identification task, previous studies have been constrained to polar yes/no questions, limiting how much information the model can gain in a single turn. We present an approach that formulates more informative, open-ended questio… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  31. arXiv:2311.05740  [pdf, other

    cs.LG cs.AI cs.PL

    Generating Pragmatic Examples to Train Neural Program Synthesizers

    Authors: Saujas Vaduguru, Daniel Fried, Yewen Pu

    Abstract: Programming-by-example is the task of synthesizing a program that is consistent with a set of user-provided input-output examples. As examples are often an under-specification of one's intent, a good synthesizer must choose the intended program from the many that are consistent with the given set of examples. Prior work frames program synthesis as a cooperative game between a listener (that synthe… ▽ More

    Submitted 16 April, 2025; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: ICLR 2024

  32. arXiv:2311.02253  [pdf, other

    cs.LG cs.AI

    Comparative Knowledge Distillation

    Authors: Alex Wilf, Alex Tianyi Xu, Paul Pu Liang, Alexander Obolenskiy, Daniel Fried, Louis-Philippe Morency

    Abstract: In the era of large scale pretrained models, Knowledge Distillation (KD) serves an important role in transferring the wisdom of computationally heavy teacher models to lightweight, efficient student models while preserving performance. Traditional KD paradigms, however, assume readily available access to teacher models for frequent inference -- a notion increasingly at odds with the realities of c… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2310.13011

  33. arXiv:2311.00317  [pdf, other

    cs.CL cs.LG cs.SE

    Data Augmentation for Code Translation with Comparable Corpora and Multiple References

    Authors: Yiqing Xie, Atharva Naik, Daniel Fried, Carolyn Rose

    Abstract: One major challenge of translating code between programming languages is that parallel training data is often limited. To overcome this challenge, we present two data augmentation techniques, one that builds comparable corpora (i.e., code pairs with similar functionality), and another that augments existing parallel data with multiple reference translations. Specifically, we build and analyze mult… ▽ More

    Submitted 4 October, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Findings (with minor updates on the flowcharts)

  34. arXiv:2310.17140  [pdf, other

    cs.CL cs.AI

    Symbolic Planning and Code Generation for Grounded Dialogue

    Authors: Justin T. Chiu, Wenting Zhao, Derek Chen, Saujas Vaduguru, Alexander M. Rush, Daniel Fried

    Abstract: Large language models (LLMs) excel at processing and generating both text and code. However, LLMs have had limited applicability in grounded task-oriented dialogue as they are difficult to steer toward task objectives and fail to handle novel grounding. We present a modular and interpretable grounded dialogue system that addresses these shortcomings by composing LLMs with a symbolic planner and gr… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023

  35. API-Assisted Code Generation for Question Answering on Varied Table Structures

    Authors: Yihan Cao, Shuyi Chen, Ryan Liu, Zhiruo Wang, Daniel Fried

    Abstract: A persistent challenge to table question answering (TableQA) by generating executable programs has been adapting to varied table structures, typically requiring domain-specific logical forms. In response, this paper introduces a unified TableQA framework that: (1) provides a unified representation for structured tables as multi-index Pandas data frames, (2) uses Python as a powerful querying langu… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 camera ready, 13 pages, 11 figures

    Journal ref: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2023, pages 14536-14548, Singapore

  36. arXiv:2310.11667  [pdf, other

    cs.AI cs.CL cs.LG

    SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

    Authors: Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap

    Abstract: Humans are social beings; we pursue social goals in our daily interactions, which is a crucial aspect of social intelligence. Yet, AI systems' abilities in this realm remain elusive. We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and evaluate their social intelligence. In our environment, agents role-play and interact under a wide va… ▽ More

    Submitted 22 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Preprint, 43 pages. The first two authors contribute equally

  37. arXiv:2310.02670  [pdf, other

    cs.DS

    Searching 2D-Strings for Matching Frames

    Authors: Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus, Adrian Miclaus, Arseny Shur

    Abstract: We introduce the natural notion of a matching frame in a $2$-dimensional string. A matching frame in a $2$-dimensional $n\times m$ string $M$, is a rectangle such that the strings written on the horizontal sides of the rectangle are identical, and so are the strings written on the vertical sides of the rectangle. Formally, a matching frame in $M$ is a tuple $(u,d,\ell,r)$ such that… ▽ More

    Submitted 18 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  38. arXiv:2309.03225   

    cs.PL cs.AI

    Amortizing Pragmatic Program Synthesis with Rankings

    Authors: Yewen Pu, Saujas Vaduguru, Priyan Vaithilingam, Elena Glassman, Daniel Fried

    Abstract: In program synthesis, an intelligent system takes in a set of user-generated examples and returns a program that is logically consistent with these examples. The usage of Rational Speech Acts (RSA) framework has been successful in building \emph{pragmatic} program synthesizers that return programs which -- in addition to being logically consistent -- account for the fact that a user chooses their… ▽ More

    Submitted 11 July, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: I accidentally submitted a new version of this (arXiv:2407.02499) instead of replacing this one, so I'll take this one out as it is out-dated

    ACM Class: I.2.2; D.3.0

  39. arXiv:2307.13854  [pdf, other

    cs.AI cs.CL cs.LG

    WebArena: A Realistic Web Environment for Building Autonomous Agents

    Authors: Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig

    Abstract: With advances in generative AI, there is now potential for autonomous agents to manage daily tasks via natural language commands. However, current agents are primarily created and tested in simplified synthetic environments, leading to a disconnect with real-world scenarios. In this paper, we build an environment for language-guided agents that is highly realistic and reproducible. Specifically, w… ▽ More

    Submitted 16 April, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: Our code, data, environment reproduction resources, and video demonstrations are publicly available at https://webarena.dev/

  40. arXiv:2306.08818  [pdf, other

    cs.CL

    Pragmatic Inference with a CLIP Listener for Contrastive Captioning

    Authors: Jiefu Ou, Benno Krojer, Daniel Fried

    Abstract: We propose a simple yet effective and robust method for contrastive captioning: generating discriminative captions that distinguish target images from very similar alternative distractor images. Our approach is built on a pragmatic inference procedure that formulates captioning as a reference game between a speaker, which produces possible captions describing the target, and a listener, which sele… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Findings of ACL 2023, fixed some references

  41. arXiv:2305.18705  [pdf, other

    cs.DS

    Algorithmic Foundations of Inexact Computing

    Authors: John Augustine, Dror Fried, Krishna V. Palem, Duc-Hung Pham, Anshumali Shrivastava

    Abstract: Inexact computing also referred to as approximate computing is a style of designing algorithms and computing systems wherein the accuracy of correctness of algorithms executing on them is deliberately traded for significant resource savings. Significant progress has been reported in this regard both in terms of hardware as well as software or custom algorithms that exploited this approach resultin… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  42. arXiv:2305.17216  [pdf, other

    cs.CL cs.CV cs.LG

    Generating Images with Multimodal Language Models

    Authors: Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

    Abstract: We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue. Ours is the first approach capable of conditioning on arbitrarily interleaved image and text inputs to… ▽ More

    Submitted 13 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023. Project page: http://jykoh.com/gill

  43. arXiv:2305.06161  [pdf, other

    cs.CL cs.AI cs.PL cs.SE

    StarCoder: may the source be with you!

    Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

    Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More

    Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  44. arXiv:2301.13823  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Grounding Language Models to Images for Multimodal Inputs and Outputs

    Authors: Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried

    Abstract: We propose an efficient method to ground pretrained text-only language models to the visual domain, enabling them to process arbitrarily interleaved image-and-text data, and generate text interleaved with retrieved images. Our method leverages the abilities of language models learnt from large scale text-only pretraining, such as in-context learning and free-form text generation. We keep the langu… ▽ More

    Submitted 13 June, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Published in ICML 2023. Project page: https://jykoh.com/fromage

  45. arXiv:2301.03988  [pdf, other

    cs.SE cs.AI cs.LG

    SantaCoder: don't reach for the stars!

    Authors: Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo , et al. (16 additional authors not shown)

    Abstract: The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigat… ▽ More

    Submitted 24 February, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

  46. arXiv:2212.10481  [pdf, other

    cs.SE cs.AI cs.CL

    Execution-Based Evaluation for Open-Domain Code Generation

    Authors: Zhiruo Wang, Shuyan Zhou, Daniel Fried, Graham Neubig

    Abstract: To extend the scope of coding queries to more realistic settings, we propose ODEX, the first Open-Domain EXecution-based natural language (NL) to Python code generation dataset. ODEX has 945 NL-Code pairs spanning 79 diverse libraries, along with 1,707 human-written test cases for execution. Our NL-Code pairs are harvested from StackOverflow forums to encourage natural and practical coding queries… ▽ More

    Submitted 19 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  47. arXiv:2211.16490  [pdf, other

    cs.LG cs.CL cs.PL cs.SE

    Coder Reviewer Reranking for Code Generation

    Authors: Tianyi Zhang, Tao Yu, Tatsunori B. Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I. Wang

    Abstract: Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions. Inspired by collaborative programming, we propose Coder-Reviewer reranking. We augment Coder language models from past work, which generate programs given language instructions, with Reviewer models, which evaluate the… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  48. arXiv:2211.15521  [pdf, other

    cs.CV cs.CL

    G^3: Geolocation via Guidebook Grounding

    Authors: Grace Luo, Giscard Biamby, Trevor Darrell, Daniel Fried, Anna Rohrbach

    Abstract: We demonstrate how language can improve geolocation: the task of predicting the location where an image was taken. Here we study explicit knowledge from human-written guidebooks that describe the salient and class-discriminative visual features humans use for geolocation. We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locat… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Findings of EMNLP 2022

  49. arXiv:2211.12615  [pdf, other

    cs.CL cs.AI

    AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies

    Authors: Weiyan Shi, Emily Dinan, Adi Renduchintala, Daniel Fried, Athul Paul Jacob, Zhou Yu, Mike Lewis

    Abstract: Existing approaches built separate classifiers to detect nonsense in dialogues. In this paper, we show that without external classifiers, dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages. For example, if an agent believes its partner is likely to respond "I don't understand" to a candidate message… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  50. arXiv:2211.11501  [pdf, other

    cs.SE cs.CL

    DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

    Authors: Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu

    Abstract: We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as NumPy and Pandas. Compared to prior works, DS-1000 incorporates three core features. First, our problems reflect diverse, realistic, and practical use cases since we collected them from StackOverflow. Second, our automatic evaluation is highly specific (reliable) -- acro… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.