Skip to main content

Showing 1–17 of 17 results for author: Saad-Falcon, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.14047  [pdf, other

    cs.AI

    Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods

    Authors: Junlin Wang, Shang Zhu, Jon Saad-Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou

    Abstract: There is intense interest in investigating how inference time compute (ITC) (e.g. repeated sampling, refinements, etc) can improve large language model (LLM) capabilities. At the same time, recent breakthroughs in reasoning models, such as Deepseek-R1, unlock the opportunity for reinforcement learning to improve LLM reasoning skills. An in-depth understanding of how ITC interacts with reasoning ac… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  2. arXiv:2412.13091  [pdf, other

    cs.CL cs.AI

    LMUnit: Fine-grained Evaluation with Natural Language Unit Tests

    Authors: Jon Saad-Falcon, Rajan Vivek, William Berrios, Nandita Shankar Naik, Matija Franklin, Bertie Vidgen, Amanpreet Singh, Douwe Kiela, Shikib Mehri

    Abstract: As language models become integral to critical workflows, assessing their behavior remains a fundamental challenge -- human evaluation is costly and noisy, while automated metrics provide only coarse, difficult-to-interpret signals. We introduce natural language unit tests, a paradigm that decomposes response quality into explicit, testable criteria, along with a unified scoring model, LMUnit, whi… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  3. arXiv:2409.15254  [pdf, other

    cs.LG cs.AI cs.CL

    Archon: An Architecture Search Framework for Inference-Time Techniques

    Authors: Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, Azalia Mirhoseini

    Abstract: Inference-time techniques are emerging as highly effective tools to enhance large language model (LLM) capabilities. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of individual inference-time techniques and the interactions between them. Additionally, efficiently and automatically searching the spa… ▽ More

    Submitted 3 October, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

  4. arXiv:2402.07440  [pdf, other

    cs.IR cs.LG

    Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

    Authors: Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher Ré

    Abstract: Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval perform… ▽ More

    Submitted 17 November, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: International Conference on Machine Learning (ICML) 2024

  5. arXiv:2311.09476  [pdf, other

    cs.CL cs.AI cs.IR

    ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems

    Authors: Jon Saad-Falcon, Omar Khattab, Christopher Potts, Matei Zaharia

    Abstract: Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate. We introduce ARES, an Automated RAG Evaluation System, for evaluating RAG systems along the dimensions of context relevance, answer faithfulness, and answer relevance. By creating its own synthetic training data, ARES finetunes lightwe… ▽ More

    Submitted 31 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  6. arXiv:2309.08872  [pdf, other

    cs.CL cs.AI cs.LG

    PDFTriage: Question Answering over Long, Structured Documents

    Authors: Jon Saad-Falcon, Joe Barrow, Alexa Siu, Ani Nenkova, David Seunghyun Yoon, Ryan A. Rossi, Franck Dernoncourt

    Abstract: Large Language Models (LLMs) have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with dif… ▽ More

    Submitted 8 November, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

  7. arXiv:2303.00807  [pdf, other

    cs.IR cs.CL

    UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

    Authors: Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz, Salim Roukos, Avirup Sil, Md Arafat Sultan, Christopher Potts

    Abstract: Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. The method begins by generati… ▽ More

    Submitted 13 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Long Paper at Empirical Methods in Natural Language Processing (EMNLP) 2023

  8. arXiv:2212.01340  [pdf, other

    cs.IR cs.CL

    Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

    Authors: Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup Sil, Radu Florian, Md Arafat Sultan, Salim Roukos, Matei Zaharia, Christopher Potts

    Abstract: Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality.… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  9. arXiv:2210.13510  [pdf, other

    cs.HC

    Evaluation of Argo Scholar with Observational Study

    Authors: Kevin Li, Haoyang Yang, Evan Montoya, Anish Upadhayay, Zhiyan Zhou, Jon Saad-Falcon, Duen Horng Chau

    Abstract: Discovering and making sense of relevant literature is fundamental in any scientific field. Node-link diagram-based visualization tools can aid this process; however, existing tools have been evaluated only on small scales. This paper evaluates Argo Scholar, an open-source visualization tool designed for interactive exploration of literature and easy sharing of exploration results. A large-scale u… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: VIS IEEE 22

  10. arXiv:2207.04993  [pdf, other

    cs.CL

    Embedding Recycling for Language Models

    Authors: Jon Saad-Falcon, Amanpreet Singh, Luca Soldaini, Mike D'Arcy, Arman Cohan, Doug Downey

    Abstract: Real-world applications of neural language models often involve running many different models over the same corpus. The high computational cost of these runs has led to interest in techniques that can reuse the contextualized embeddings produced in previous runs to speed training and inference of future ones. We refer to this approach as embedding recycling (ER). While multiple ER techniques have… ▽ More

    Submitted 30 January, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: EACL Findings 2023

  11. arXiv:2112.01488  [pdf, other

    cs.IR cs.CL

    ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

    Authors: Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia

    Abstract: Neural information retrieval (IR) has greatly advanced search and other knowledge-intensive language tasks. While many neural IR methods encode queries and documents into single-vector representations, late interaction models produce multi-vector representations at the granularity of each token and decompose relevance modeling into scalable token-level computations. This decomposition has been sho… ▽ More

    Submitted 10 July, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: NAACL 2022. Omar and Keshav contributed equally to this work

  12. arXiv:2110.14060  [pdf, other

    cs.HC

    Argo Scholar: Interactive Visual Exploration of Literature in Browsers

    Authors: Kevin Li, Haoyang Yang, Anish Upadhayay, Zhiyan Zhou, Jon Saad-Falcon, Duen Horng Chau

    Abstract: Discovering and making sense of relevant research literature is fundamental to becoming knowledgeable in any scientific discipline. Visualization can aid this process; however, existing tools' adoption and impact have often been constrained, such as by their reliance on small curated paper datasets that quickly become outdated or a lack of support for personalized exploration. We introduce Argo Sc… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: IEEE VIS 2021

  13. arXiv:2106.11846  [pdf, other

    econ.GN cs.IR

    Quantifying the Impact of Human Capital, Job History, and Language Factors on Job Seniority with a Large-scale Analysis of Resumes

    Authors: Austin P Wright, Caleb Ziems, Haekyu Park, Jon Saad-Falcon, Duen Horng Chau, Diyi Yang, Maria Tomprou

    Abstract: As job markets worldwide have become more competitive and applicant selection criteria have become more opaque, and different (and sometimes contradictory) information and advice is available for job seekers wishing to progress in their careers, it has never been more difficult to determine which factors in a résumé most effectively help career progression. In this work we present a novel, large s… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: 9 Pages, 5 Figures, 8 Tables

  14. arXiv:2103.16435  [pdf, other

    cs.LG cs.AI cs.HC

    EnergyVis: Interactively Tracking and Exploring Energy Consumption for ML Models

    Authors: Omar Shaikh, Jon Saad-Falcon, Austin P Wright, Nilaksh Das, Scott Freitas, Omar Isaac Asensio, Duen Horng Chau

    Abstract: The advent of larger machine learning (ML) models have improved state-of-the-art (SOTA) performance in various modeling tasks, ranging from computer vision to natural language. As ML models continue increasing in size, so does their respective energy consumption and computational requirements. However, the methods for tracking, reporting, and comparing energy consumption remain limited. We present… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: 7 pages, 5 figures; CHI 2021 Extended Abstracts

  15. arXiv:2010.04625  [pdf, other

    cs.CL cs.AI cs.LG

    Examining the Ordering of Rhetorical Strategies in Persuasive Requests

    Authors: Omar Shaikh, Jiaao Chen, Jon Saad-Falcon, Duen Horng Chau, Diyi Yang

    Abstract: Interpreting how persuasive language influences audiences has implications across many domains like advertising, argumentation, and propaganda. Persuasion relies on more than a message's content. Arranging the order of the message itself (i.e., ordering specific rhetorical strategies) also plays an important role. To examine how strategy orderings contribute to persuasiveness, we first utilize a V… ▽ More

    Submitted 11 October, 2020; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Findings of EMNLP 2020

  16. arXiv:2009.00091  [pdf

    cs.DL cs.CL cs.HC

    Mapping Researchers with PeopleMap

    Authors: Jon Saad-Falcon, Omar Shaikh, Zijie J. Wang, Austin P. Wright, Sasha Richardson, Duen Horng Chau

    Abstract: Discovering research expertise at universities can be a difficult task. Directories routinely become outdated, and few help in visually summarizing researchers' work or supporting the exploration of shared interests among researchers. This results in lost opportunities for both internal and external entities to discover new connections, nurture research collaboration, and explore the diversity of… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

    Comments: 2020 IEEE Visualization

  17. arXiv:2006.06105  [pdf, other

    cs.DL cs.CL cs.HC

    PeopleMap: Visualization Tool for Mapping Out Researchers using Natural Language Processing

    Authors: Jon Saad-Falcon, Omar Shaikh, Zijie J. Wang, Austin P. Wright, Sasha Richardson, Duen Horng Chau

    Abstract: Discovering research expertise at institutions can be a difficult task. Manually curated university directories easily become out of date and they often lack the information necessary for understanding a researcher's interests and past work, making it harder to explore the diversity of research at an institution and identify research talents. This results in lost opportunities for both internal an… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

    Comments: 7 pages, 3 figures, submission to the 29th ACM International Conference on Information and Knowledge Management (CIKM '20), October 19-23, 2020, Galway, Ireland