Skip to main content

Showing 1–10 of 10 results for author: Haas, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.07968  [pdf, ps, other

    cs.CL

    SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge

    Authors: Lukas Haas, Gal Yona, Giovanni D'Antonio, Sasha Goldshtein, Dipanjan Das

    Abstract: We introduce SimpleQA Verified, a 1,000-prompt benchmark for evaluating Large Language Model (LLM) short-form factuality based on OpenAI's SimpleQA. It addresses critical limitations in OpenAI's benchmark, including noisy and incorrect labels, topical biases, and question redundancy. SimpleQA Verified was created through a rigorous multi-stage filtering process involving de-duplication, topic bala… ▽ More

    Submitted 9 September, 2025; originally announced September 2025.

  2. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3284 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 22 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  3. arXiv:2505.12220  [pdf, ps, other

    cs.LG

    Machine Learning Applications Related to Suicide in Military and Veterans: A Scoping Literature Review

    Authors: Yuhan Zhang, Yishu Wei, Yanshan Wang, Yunyu Xiao, COL, Ronald K. Poropatich, Gretchen L. Haas, Yiye Zhang, Chunhua Weng, Jinze Liu, Lisa A. Brenner, James M. Bjork, Yifan Peng

    Abstract: Suicide remains one of the main preventable causes of death among active service members and veterans. Early detection and prediction are crucial in suicide prevention. Machine learning techniques have yielded promising results in this area recently. This study aims to assess and summarize current research and provides a comprehensive review regarding the application of machine learning techniques… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  4. arXiv:2501.03200  [pdf, other

    cs.CL

    The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input

    Authors: Alon Jacovi, Andrew Wang, Chris Alberti, Connie Tao, Jon Lipovetz, Kate Olszewska, Lukas Haas, Michelle Liu, Nate Keating, Adam Bloniarz, Carl Saroufim, Corey Fry, Dror Marcus, Doron Kukliansky, Gaurav Singh Tomar, James Swirhun, Jinwei Xing, Lily Wang, Madhu Gurumurthy, Michael Aaron, Moran Ambar, Rachana Fellinger, Rui Wang, Zizhao Zhang, Sasha Goldshtein , et al. (1 additional authors not shown)

    Abstract: We introduce FACTS Grounding, an online leaderboard and associated benchmark that evaluates language models' ability to generate text that is factually accurate with respect to given context in the user prompt. In our benchmark, each prompt includes a user request and a full document, with a maximum length of 32k tokens, requiring long-form responses. The long-form responses are required to be ful… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  5. arXiv:2307.05845  [pdf, other

    cs.CV cs.LG

    PIGEON: Predicting Image Geolocations

    Authors: Lukas Haas, Michal Skreta, Silas Alberti, Chelsea Finn

    Abstract: Planet-scale image geolocalization remains a challenging problem due to the diversity of images originating from anywhere in the world. Although approaches based on vision transformers have made significant progress in geolocalization accuracy, success in prior literature is constrained to narrow distributions of images of landmarks, and performance has not generalized to unseen places. We present… ▽ More

    Submitted 28 May, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Accepted at CVPR 2024

  6. arXiv:2302.00275  [pdf, other

    cs.CV cs.LG

    Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization

    Authors: Lukas Haas, Silas Alberti, Michal Skreta

    Abstract: Image geolocalization is the challenging task of predicting the geographic coordinates of origin for a given photo. It is an unsolved problem relying on the ability to combine visual clues with general knowledge about the world to make accurate predictions across geographies. We present $\href{https://huggingface.co/geolocal/StreetCLIP}{\text{StreetCLIP}}$, a robust, publicly available foundation… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  7. arXiv:2206.02270  [pdf, other

    cs.CV cs.AI

    Estimating building energy efficiency from street view imagery, aerial imagery, and land surface temperature data

    Authors: Kevin Mayer, Lukas Haas, Tianyuan Huang, Juan Bernabé-Moreno, Ram Rajagopal, Martin Fischer

    Abstract: Current methods to determine the energy efficiency of buildings require on-site visits of certified energy auditors which makes the process slow, costly, and geographically incomplete. To accelerate the identification of promising retrofit targets on a large scale, we propose to estimate building energy efficiency from widely available and remotely sensed data sources only, namely street view, aer… ▽ More

    Submitted 24 August, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

  8. arXiv:2104.02363  [pdf, other

    math.AG cs.CC

    Young Flattenings in the Schur module basis

    Authors: Lennart J. Haas, Christian Ikenmeyer

    Abstract: There are several isomorphic constructions for the irreducible polynomial representations of the general linear group in characteristic zero. The two most well-known versions are called Schur modules and Weyl modules. Steven Sam used a Weyl module implementation in 2009 for his Macaulay2 package PieriMaps. This implementation can be used to compute so-called Young flattenings of polynomials. Over… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    MSC Class: 05E10; 68Q17 ACM Class: I.1.2

  9. Canonical Representations of k-Safety Hyperproperties

    Authors: Bernd Finkbeiner, Lennart Haas, Hazem Torfah

    Abstract: Hyperproperties elevate the traditional view of trace properties form sets of traces to sets of sets of traces and provide a formalism for expressing information-flow policies. For trace properties, algorithms for verification, monitoring, and synthesis are typically based on a representation of the properties as omega-automata. For hyperproperties, a similar, canonical automata-theoretic represen… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

    Comments: Published in: 2019 IEEE 32nd Computer Security Foundations Symposium (CSF)

  10. arXiv:cs/0310006  [pdf

    cs.DB

    The Lowell Database Research Self Assessment

    Authors: Serge Abiteboul, Rakesh Agrawal, Phil Bernstein, Mike Carey, Stefano Ceri, Bruce Croft, David DeWitt, Mike Franklin, Hector Garcia Molina, Dieter Gawlick, Jim Gray, Laura Haas, Alon Halevy, Joe Hellerstein, Yannis Ioannidis, Martin Kersten, Michael Pazzani, Mike Lesk, David Maier, Jeff Naughton, Hans Schek, Timos Sellis, Avi Silberschatz, Mike Stonebraker, Rick Snodgrass , et al. (4 additional authors not shown)

    Abstract: A group of senior database researchers gathers every few years to assess the state of database research and to point out problem areas that deserve additional focus. This report summarizes the discussion and conclusions of the sixth ad-hoc meeting held May 4-6, 2003 in Lowell, Mass. It observes that information management continues to be a critical component of most complex software systems. It… ▽ More

    Submitted 6 October, 2003; originally announced October 2003.

    Comments: Details of this workshop (presentations and notes) are at http://research.microsoft.com/~gray/lowell/

    ACM Class: H; H.2; H.3; H.4; H.5