Skip to main content

Showing 1–21 of 21 results for author: Sanders, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.03640  [pdf, other

    cs.CL cs.AI cs.CV

    Bonsai: Interpretable Tree-Adaptive Grounded Reasoning

    Authors: Kate Sanders, Benjamin Van Durme

    Abstract: To develop general-purpose collaborative agents, humans need reliable AI systems that can (1) adapt to new domains and (2) transparently reason with uncertainty to allow for verification and correction. Black-box models demonstrate powerful data processing abilities but do not satisfy these criteria due to their opaqueness, domain specificity, and lack of uncertainty awareness. We introduce Bonsai… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 9 pages, preprint

    MSC Class: 68T50; 68T37 ACM Class: I.2.7

  2. arXiv:2504.00939  [pdf, other

    cs.CV cs.CL

    WikiVideo: Article Generation from Multiple Videos

    Authors: Alexander Martin, Reno Kriz, William Gantt Walden, Kate Sanders, Hannah Recknor, Eugene Yang, Francis Ferraro, Benjamin Van Durme

    Abstract: We present the challenging task of automatically creating a high-level Wikipedia-style article that aggregates information from multiple diverse videos about real-world events, such as natural disasters or political elections. Videos are intuitive sources for retrieval-augmented generation (RAG), but most contemporary RAG workflows focus heavily on text and existing methods for video-based summari… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Repo can be found here: https://github.com/alexmartin1722/wikivideo

  3. arXiv:2503.21717  [pdf, other

    cs.CL

    CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?

    Authors: Jiefu Ou, William Gantt Walden, Kate Sanders, Zhengping Jiang, Kaiser Sun, Jeffrey Cheng, William Jurayj, Miriam Wanner, Shaobo Liang, Candice Morgan, Seunghoon Han, Weiqi Wang, Chandler May, Hannah Recknor, Daniel Khashabi, Benjamin Van Durme

    Abstract: A core part of scientific peer review involves providing expert critiques that directly assess the scientific claims a paper makes. While it is now possible to automatically generate plausible (if generic) reviews, ensuring that these reviews are sound and grounded in the papers' claims remains challenging. To facilitate LLM benchmarking on these challenges, we introduce CLAIMCHECK, an annotated d… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  4. arXiv:2503.20698  [pdf, other

    cs.CV cs.IR

    MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion

    Authors: Saron Samuel, Dan DeGenaro, Jimena Guallar-Blasco, Kate Sanders, Oluwaseun Eisape, Tanner Spendlove, Arun Reddy, Alexander Martin, Andrew Yates, Eugene Yang, Cameron Carpenter, David Etter, Efsun Kayi, Matthew Wiesner, Kenton Murray, Reno Kriz

    Abstract: Videos inherently contain multiple modalities, including visual events, text overlays, sounds, and speech, all of which are important for retrieval. However, state-of-the-art multimodal language models like VAST and LanguageBind are built on vision-language models (VLMs), and thus overly prioritize visual signals. Retrieval benchmarks further reinforce this bias by focusing on visual queries and n… ▽ More

    Submitted 9 May, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  5. arXiv:2503.19009  [pdf, other

    cs.CV cs.IR

    Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

    Authors: Arun Reddy, Alexander Martin, Eugene Yang, Andrew Yates, Kate Sanders, Kenton Murray, Reno Kriz, Celso M. de Melo, Benjamin Van Durme, Rama Chellappa

    Abstract: In this work, we tackle the problem of text-to-video retrieval (T2VR). Inspired by the success of late interaction techniques in text-document, text-image, and text-video retrieval, our approach, Video-ColBERT, introduces a simple and efficient mechanism for fine-grained similarity assessment between queries and videos. Video-ColBERT is built upon 3 main components: a fine-grained spatial and temp… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025. 13 pages, 4 figures. Approved for public release: distribution unlimited

  6. arXiv:2501.02825  [pdf, other

    cs.LG

    Randomly Sampled Language Reasoning Problems Explain Limits of LLMs

    Authors: Kavi Gupta, Kate Sanders, Armando Solar-Lezama

    Abstract: While LLMs have revolutionized the field of machine learning due to their high performance across a range of tasks, they are known to perform poorly in planning, hallucinate false answers, have degraded performance on less canonical versions of the same task, and answer incorrectly on a variety of specific prompts. There are several emerging theories of LLM performance with some predictive power,… ▽ More

    Submitted 26 May, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

    Comments: 10 pages, 4 figures, 2 tables

  7. arXiv:2410.11619  [pdf, other

    cs.CV cs.CL

    MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval

    Authors: Reno Kriz, Kate Sanders, David Etter, Kenton Murray, Cameron Carpenter, Kelly Van Ochten, Hannah Recknor, Jimena Guallar-Blasco, Alexander Martin, Ronald Colaianni, Nolan King, Eugene Yang, Benjamin Van Durme

    Abstract: Efficiently retrieving and synthesizing information from large-scale multimodal collections has become a critical challenge. However, existing video retrieval datasets suffer from scope limitations, primarily focusing on matching descriptive but vague queries with small collections of professionally edited, English-centric videos. To address this gap, we introduce $\textbf{MultiVENT 2.0}$, a large… ▽ More

    Submitted 10 February, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

  8. arXiv:2410.05267  [pdf, other

    cs.CL cs.CV

    Grounding Partially-Defined Events in Multimodal Data

    Authors: Kate Sanders, Reno Kriz, David Etter, Hannah Recknor, Alexander Martin, Cameron Carpenter, Jingyang Lin, Benjamin Van Durme

    Abstract: How are we able to learn about complex current events just from short snippets of video? While natural language enables straightforward ways to represent under-specified, partially observable events, visual data does not facilitate analogous methods and, consequently, introduces unique challenges in event understanding. With the growing prevalence of vision-capable AI agents, these systems must be… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: Preprint; 9 pages; 2024 EMNLP Findings

  9. arXiv:2407.03572  [pdf, other

    cs.CL

    Core: Robust Factual Precision with Informative Sub-Claim Identification

    Authors: Zhengping Jiang, Jingyu Zhang, Nathaniel Weir, Seth Ebner, Miriam Wanner, Kate Sanders, Daniel Khashabi, Anqi Liu, Benjamin Van Durme

    Abstract: Hallucinations pose a challenge to the application of large language models (LLMs) thereby motivating the development of metrics to evaluate factual precision. We observe that popular metrics using the Decompose-Then-Verify framework, such as \FActScore, can be manipulated by adding obvious or repetitive subclaims to artificially inflate scores. This observation motivates our new customizable plug… ▽ More

    Submitted 15 October, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  10. arXiv:2406.09646  [pdf, other

    cs.CV cs.AI

    A Survey of Video Datasets for Grounded Event Understanding

    Authors: Kate Sanders, Benjamin Van Durme

    Abstract: While existing video benchmarks largely consider specialized downstream tasks like retrieval or question-answering (QA), contemporary multimodal AI systems must be capable of well-rounded common-sense reasoning akin to human visual understanding. A critical component of human temporal-visual perception is our ability to identify and cognitively model "things happening", or events. Historically, vi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  11. On the Evaluation of Machine-Generated Reports

    Authors: James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler

    Abstract: Large Language Models (LLMs) have enabled new ways to satisfy information needs. Although great strides have been made in applying them to settings like document ranking and short-form text generation, they still struggle to compose complete, accurate, and verifiable long-form reports. Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of… ▽ More

    Submitted 9 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures, accepted at SIGIR 2024 as perspective paper

  12. arXiv:2403.11905  [pdf, other

    cs.AI cs.CL cs.CV cs.HC

    Tur[k]ingBench: A Challenge Benchmark for Web Agents

    Authors: Kevin Xu, Yeganeh Kordi, Tanay Nayak, Adi Asija, Yizhong Wang, Kate Sanders, Adam Byerly, Jingyu Zhang, Benjamin Van Durme, Daniel Khashabi

    Abstract: Can advanced multi-modal models effectively tackle complex web-based tasks? Such tasks are often found on crowdsourcing platforms, where crowdworkers engage in challenging micro-tasks within web-based environments. Building on this idea, we present TurkingBench, a benchmark consisting of tasks presented as web pages with textual instructions and multi-modal contexts. Unlike previous approaches t… ▽ More

    Submitted 21 February, 2025; v1 submitted 18 March, 2024; originally announced March 2024.

  13. arXiv:2402.19467  [pdf, other

    cs.CL cs.AI cs.CV

    TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning

    Authors: Kate Sanders, Nathaniel Weir, Benjamin Van Durme

    Abstract: It is challenging for models to understand complex, multimodal content such as television clips, and this is in part because video-language models often rely on single-modality reasoning and lack interpretability. To combat these issues we propose TV-TREES, the first multimodal entailment tree generator. TV-TREES serves as an approach to video understanding that promotes interpretable joint-modali… ▽ More

    Submitted 10 October, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 9 pages, EMNLP 2024

    ACM Class: I.2.7; I.2.10

  14. arXiv:2402.14798  [pdf, other

    cs.CL cs.AI

    Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic

    Authors: Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme

    Abstract: Recent language models enable new opportunities for structured reasoning with text, such as the construction of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy datasets and limited… ▽ More

    Submitted 12 August, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  15. arXiv:2307.03153  [pdf, other

    cs.IR cs.CV cs.MM

    MultiVENT: Multilingual Videos of Events with Aligned Natural Text

    Authors: Kate Sanders, David Etter, Reno Kriz, Benjamin Van Durme

    Abstract: Everyday news coverage has shifted from traditional broadcasts towards a wide range of presentation formats such as first-hand, unedited video footage. Datasets that reflect the diverse array of multimodal, multilingual news sources available online could be used to teach models to benefit from this shift, but existing news video datasets focus on traditional news broadcasts produced for English-s… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  16. arXiv:2210.03102  [pdf, other

    cs.CV cs.AI

    Ambiguous Images With Human Judgments for Robust Visual Event Classification

    Authors: Kate Sanders, Reno Kriz, Anqi Liu, Benjamin Van Durme

    Abstract: Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambigu… ▽ More

    Submitted 22 October, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: 10 pages, NeurIPS 2022 Datasets and Benchmarks Track

    ACM Class: I.2.10; I.4.8; I.2.0

  17. arXiv:2105.02345  [pdf, other

    cs.RO

    A Multi-Chamber Smart Suction Cup for Adaptive Gripping and Haptic Exploration

    Authors: Tae Myung Huh, Kate Sanders, Michael Danielczuk, Monica Li, Yunliang Chen, Ken Goldberg, Hannah S. Stuart

    Abstract: We present a novel robot end-effector for gripping and haptic exploration. Tactile sensing through suction flow monitoring is applied to a new suction cup design that contains multiple chambers for air flow. Each chamber connects with its own remote pressure transducer, which enables both absolute and differential pressure measures between chambers. By changing the overall vacuum applied to this s… ▽ More

    Submitted 18 October, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

  18. RV-GAN: Segmenting Retinal Vascular Structure in Fundus Photographs using a Novel Multi-scale Generative Adversarial Network

    Authors: Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli, Stewart Lee Zuckerbrod, Kenton M. Sanders, Salah A. Baker

    Abstract: High fidelity segmentation of both macro and microvascular structure of the retina plays a pivotal role in determining degenerative retinal diseases, yet it is a difficult problem. Due to successive resolution loss in the encoding phase combined with the inability to recover this lost information in the decoding phase, autoencoding based segmentation approaches are limited in their ability to extr… ▽ More

    Submitted 14 May, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: Accepted to MICCAI2021

  19. arXiv:2011.11696  [pdf, other

    cs.RO

    Mechanical Search on Shelves using Lateral Access X-RAY

    Authors: Huang Huang, Marcus Dominguez-Kuhne, Jeffrey Ichnowski, Vishal Satish, Michael Danielczuk, Kate Sanders, Andrew Lee, Anelia Angelova, Vincent Vanhoucke, Ken Goldberg

    Abstract: Efficiently finding an occluded object with lateral access arises in many contexts such as warehouses, retail, healthcare, shipping, and homes. We introduce LAX-RAY (Lateral Access maXimal Reduction of occupancY support Area), a system to automate the mechanical search for occluded objects on shelves. For such lateral access environments, LAX-RAY couples a perception pipeline predicting a target o… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

    Comments: Huang Huang and Marcus Dominguez-Kuhne contributed equally

  20. arXiv:2007.10420  [pdf, other

    cs.RO cs.AI

    Non-Markov Policies to Reduce Sequential Failures in Robot Bin Picking

    Authors: Kate Sanders, Michael Danielczuk, Jeffrey Mahler, Ajay Tanwani, Ken Goldberg

    Abstract: A new generation of automated bin picking systems using deep learning is evolving to support increasing demand for e-commerce. To accommodate a wide variety of products, many automated systems include multiple gripper types and/or tool changers. However, for some objects, sequential grasp failures are common: when a computed grasp fails to lift and remove the object, the bin is often left unchange… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

    Comments: 2020 IEEE International Conference on Automation Science and Engineering (CASE)

    ACM Class: I.2.9

  21. Fundus2Angio: A Conditional GAN Architecture for Generating Fluorescein Angiography Images from Retinal Fundus Photography

    Authors: Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli, Stewart Lee Zuckerbrod, Salah A. Baker, Kenton M. Sanders

    Abstract: Carrying out clinical diagnosis of retinal vascular degeneration using Fluorescein Angiography (FA) is a time consuming process and can pose significant adverse effects on the patient. Angiography requires insertion of a dye that may cause severe adverse effects and can even be fatal. Currently, there are no non-invasive systems capable of generating Fluorescein Angiography images. However, retina… ▽ More

    Submitted 29 September, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

    Comments: 14 pages, Accepted to 15th International Symposium on Visual Computing 2020