Skip to main content

Showing 1–50 of 2,233 results for author: Nathan

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02833  [pdf, ps, other

    cs.CL

    Generalizing Verifiable Instruction Following

    Authors: Valentina Pyatkin, Saumya Malik, Victoria Graf, Hamish Ivison, Shengyi Huang, Pradeep Dasigi, Nathan Lambert, Hannaneh Hajishirzi

    Abstract: A crucial factor for successful human and AI interaction is the ability of language models or chatbots to follow human instructions precisely. A common feature of instructions are output constraints like ``only answer with yes or no" or ``mention the word `abrakadabra' at least 3 times" that the user adds to craft a more useful answer. Even today's strongest models struggle with fulfilling such co… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 11 pages

  2. arXiv:2507.02207  [pdf

    physics.soc-ph cs.HC physics.ed-ph physics.plasm-ph

    Public perspectives on the design of fusion energy facilities

    Authors: Nathan Kawamoto, Daniel Hoover, Jonathan Xie, Jacob Walters, Katie Snyder, Aditi Verma

    Abstract: As fusion energy technologies approach demonstration and commercial deployment, understanding public perspectives on future fusion facilities will be critical for achieving social license, especially because fusion energy facilities, unlike large fission reactors, may be sited in closer proximity to people and communities, due to distinct regulatory frameworks. In a departure from the 'decide-anno… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 33 pages

  3. arXiv:2507.00909  [pdf, ps, other

    cs.DC cs.AI cs.PF eess.SY

    Turning AI Data Centers into Grid-Interactive Assets: Results from a Field Demonstration in Phoenix, Arizona

    Authors: Philip Colangelo, Ayse K. Coskun, Jack Megrue, Ciaran Roberts, Shayan Sengupta, Varun Sivaram, Ethan Tiao, Aroon Vijaykar, Chris Williams, Daniel C. Wilson, Zack MacFarland, Daniel Dreiling, Nathan Morey, Anuja Ratnayake, Baskar Vairamohan

    Abstract: Artificial intelligence (AI) is fueling exponential electricity demand growth, threatening grid reliability, raising prices for communities paying for new energy infrastructure, and stunting AI innovation as data centers wait for interconnection to constrained grids. This paper presents the first field demonstration, in collaboration with major corporate partners, of a software-only approach--Emer… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 10 pages, 6 figures, 1 table

  4. arXiv:2507.00419  [pdf, ps, other

    physics.geo-ph cs.AI

    Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-Shot Subsurface Understanding

    Authors: Yimin Dou, Xinming Wu, Nathan L Bangs, Harpreet Singh Sethi, Jintao Li, Hang Gao, Zhixiang Guo

    Abstract: Understanding Earth's subsurface is critical for energy transition, natural hazard mitigation, and planetary science. Yet subsurface analysis remains fragmented, with separate models required for structural interpretation, stratigraphic analysis, geobody segmentation, and property modeling-each tightly coupled to specific data distributions and task formulations. We introduce the Geological Everyt… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  5. arXiv:2507.00011  [pdf, ps, other

    cs.LG cs.AI

    Novel RL approach for efficient Elevator Group Control Systems

    Authors: Nathan Vaartjes, Vincent Francois-Lavet

    Abstract: Efficient elevator traffic management in large buildings is critical for minimizing passenger travel times and energy consumption. Because heuristic- or pattern-detection-based controllers struggle with the stochastic and combinatorial nature of dispatching, we model the six-elevator, fifteen-floor system at Vrije Universiteit Amsterdam as a Markov Decision Process and train an end-to-end Reinforc… ▽ More

    Submitted 12 June, 2025; originally announced July 2025.

    Comments: 15 pages, 12 figures

  6. arXiv:2506.22653  [pdf, ps, other

    cs.AI

    URSA: The Universal Research and Scientific Agent

    Authors: Michael Grosskopf, Russell Bent, Rahul Somasundaram, Isaac Michaud, Arthur Lui, Nathan Debardeleben, Earl Lawrence

    Abstract: Large language models (LLMs) have moved far beyond their initial form as simple chatbots, now carrying out complex reasoning, planning, writing, coding, and research tasks. These skills overlap significantly with those that human scientists use day-to-day to solve complex problems that drive the cutting edge of research. Using LLMs in "agentic" AI has the potential to revolutionize modern science… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 31 pages, 9 figures

  7. arXiv:2506.22645  [pdf, ps, other

    cs.LG stat.ML

    Cost-effective Reduced-Order Modeling via Bayesian Active Learning

    Authors: Amir Hossein Rahmati, Nathan M. Urban, Byung-Jun Yoon, Xiaoning Qian

    Abstract: Machine Learning surrogates have been developed to accelerate solving systems dynamics of complex processes in different science and engineering applications. To faithfully capture governing systems dynamics, these methods rely on large training datasets, hence restricting their applicability in real-world problems. In this work, we propose BayPOD-AL, an active learning framework based on an uncer… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  8. arXiv:2506.22423  [pdf, ps, other

    cs.LG cs.CR cs.RO

    ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks

    Authors: Pritam Dash, Ethan Chan, Nathan P. Lawrence, Karthik Pattabiraman

    Abstract: Unmanned Aerial Vehicles (UAVs) depend on onboard sensors for perception, navigation, and control. However, these sensors are susceptible to physical attacks, such as GPS spoofing, that can corrupt state estimates and lead to unsafe behavior. While reinforcement learning (RL) offers adaptive control capabilities, existing safe RL methods are ineffective against such attacks. We present ARMOR (Adap… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  9. arXiv:2506.21794  [pdf

    cs.CY

    Shifting Narratives: A Longitudinal Analysis of Media Trends and Public Attitudes on Homelessness

    Authors: Akshay Irudayaraj, Nathan Ye, Yash Chainani

    Abstract: Within the field of media framing, homelessness has been a historically under-researched topic. Framing theory states that the media's method of presenting information plays a pivotal role in controlling public sentiment toward a topic. The sentiment held towards homeless individuals influences their ability to access jobs, housing, and resources as a result of discrimination. This study analyzes… ▽ More

    Submitted 29 June, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

    Comments: 21 pages, 7 figures, 12 tables

  10. arXiv:2506.21689  [pdf, ps, other

    cs.RO

    Optimal Motion Scaling for Delayed Telesurgery

    Authors: Jason Lim, Florian Richter, Zih-Yun Chiu, Jaeyon Lee, Ethan Quist, Nathan Fisher, Jonathan Chambers, Steven Hong, Michael C. Yip

    Abstract: Robotic teleoperation over long communication distances poses challenges due to delays in commands and feedback from network latency. One simple yet effective strategy to reduce errors and increase performance under delay is to downscale the relative motion between the operating surgeon and the robot. The question remains as to what is the optimal scaling factor, and how this value changes dependi… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted to IROS 2025

  11. arXiv:2506.21476  [pdf, ps, other

    cs.CV

    Global and Local Entailment Learning for Natural World Imagery

    Authors: Srikumar Sastry, Aayush Dhakal, Eric Xing, Subash Khanal, Nathan Jacobs

    Abstract: Learning the hierarchical structure of data in vision-language models is a significant challenge. Previous works have attempted to address this challenge by employing entailment learning. However, these approaches fail to model the transitive nature of entailment explicitly, which establishes the relationship between order and semantics within a representation space. In this work, we introduce Rad… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted at ICCV 2025

  12. arXiv:2506.20990  [pdf, ps, other

    cs.LG cs.CL cs.CV

    SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes

    Authors: Yifan Yang, Zhen Zhang, Rupak Vignesh Swaminathan, Jing Liu, Nathan Susanj, Zheng Zhang

    Abstract: Fine-tuning vision language models (VLMs) has achieved remarkable performance across various downstream tasks; yet, it requires access to model gradients through backpropagation (BP), making them unsuitable for memory-constrained, inference-only edge devices. To address this limitation, previous work has explored various BP-free fine-tuning methods. However, these approaches often rely on high-var… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  13. arXiv:2506.20331  [pdf, ps, other

    cs.CL cs.LG

    Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content

    Authors: Rian Touchent, Nathan Godey, Eric de la Clergerie

    Abstract: We introduce Biomed-Enriched, a biomedical text dataset constructed from PubMed via a two-stage annotation process. In the first stage, a large language model annotates 400K paragraphs from PubMed scientific articles, assigning scores for their type (review, study, clinical case, other), domain (clinical, biomedical, other), and educational quality. The educational quality score (rated 1 to 5) est… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Dataset link: https://hf.co/datasets/almanach/Biomed-Enriched

  14. arXiv:2506.20025  [pdf, ps, other

    cs.LG stat.ML

    Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining

    Authors: Nathan Stromberg, Christos Thrampoulidis, Lalitha Sankar

    Abstract: While machine learning models become more capable in discriminative tasks at scale, their ability to overcome biases introduced by training data has come under increasing scrutiny. Previous results suggest that there are two extremes of parameterization with very different behaviors: the population (underparameterized) setting where loss weighting is optimal and the separable overparameterized set… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  15. arXiv:2506.19863  [pdf, ps, other

    physics.comp-ph cs.AI

    Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy Research

    Authors: Ahmed Almeldein, Mohammed Alnaggar, Rick Archibald, Tom Beck, Arpan Biswas, Rike Bostelmann, Wes Brewer, Chris Bryan, Christopher Calle, Cihangir Celik, Rajni Chahal, Jong Youl Choi, Arindam Chowdhury, Mark Cianciosa, Franklin Curtis, Gregory Davidson, Sebastian De Pascuale, Lisa Fassino, Ana Gainaru, Yashika Ghai, Luke Gibson, Qian Gong, Christopher Greulich, Scott Greenwood, Cory Hauck , et al. (25 additional authors not shown)

    Abstract: The AI for Nuclear Energy workshop at Oak Ridge National Laboratory evaluated the potential of Large Language Models (LLMs) to accelerate fusion and fission research. Fourteen interdisciplinary teams explored diverse nuclear science challenges using ChatGPT, Gemini, Claude, and other AI models over a single day. Applications ranged from developing foundation models for fusion reactor control to au… ▽ More

    Submitted 26 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  16. arXiv:2506.19703  [pdf, ps, other

    cs.LG

    Learning-aided Bigraph Matching Approach to Multi-Crew Restoration of Damaged Power Networks Coupled with Road Transportation Networks

    Authors: Nathan Maurer, Harshal Kaushik, Roshni Anna Jacob, Jie Zhang, Souma Chowdhury

    Abstract: The resilience of critical infrastructure networks (CINs) after disruptions, such as those caused by natural hazards, depends on both the speed of restoration and the extent to which operational functionality can be regained. Allocating resources for restoration is a combinatorial optimal planning problem that involves determining which crews will repair specific network nodes and in what order. T… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: IDETC 2025

  17. arXiv:2506.18040  [pdf, ps, other

    cs.RO

    StereoTacTip: Vision-based Tactile Sensing with Biomimetic Skin-Marker Arrangements

    Authors: Chenghua Lu, Kailuan Tang, Xueming Hui, Haoran Li, Saekwang Nam, Nathan F. Lepora

    Abstract: Vision-Based Tactile Sensors (VBTSs) stand out for their superior performance due to their high-information content output. Recently, marker-based VBTSs have been shown to give accurate geometry reconstruction when using stereo cameras. \uhl{However, many marker-based VBTSs use complex biomimetic skin-marker arrangements, which presents issues for the geometric reconstruction of the skin surface f… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 11 pages, 13 figures

  18. arXiv:2506.15933  [pdf, ps, other

    cs.LG

    CORAL: Disentangling Latent Representations in Long-Tailed Diffusion

    Authors: Esther Rodriguez, Monica Welfert, Samuel McDowell, Nathan Stromberg, Julian Antolin Camarena, Lalitha Sankar

    Abstract: Diffusion models have achieved impressive performance in generating high-quality and diverse synthetic data. However, their success typically assumes a class-balanced training distribution. In real-world settings, multi-class data often follow a long-tailed distribution, where standard diffusion models struggle -- producing low-diversity and lower-quality samples for tail classes. While this degra… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  19. arXiv:2506.15881  [pdf, ps, other

    cs.LG

    T-SHRED: Symbolic Regression for Regularization and Model Discovery with Transformer Shallow Recurrent Decoders

    Authors: Alexey Yermakov, David Zoro, Mars Liyao Gao, J. Nathan Kutz

    Abstract: SHallow REcurrent Decoders (SHRED) are effective for system identification and forecasting from sparse sensor measurements. Such models are light-weight and computationally efficient, allowing them to be trained on consumer laptops. SHRED-based models rely on Recurrent Neural Networks (RNNs) and a simple Multi-Layer Perceptron (MLP) for the temporal encoding and spatial decoding respectively. Desp… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 16 pages, 5 figures, submitted to Transactions of the Royal Society (Symbolic Regression in the Physical Sciences)

  20. arXiv:2506.15844  [pdf, ps, other

    cs.DS

    HybHuff: Lossless Compression for Hypergraphs via Entropy-Guided Huffman-Bitwise Coordination

    Authors: Tianyu Zhao, Dongfang Zhao, Luanzheng Guo, Nathan Tallent

    Abstract: Hypergraphs provide a natural representation for many-to-many relationships in data-intensive applications, yet their scalability is often hindered by high memory consumption. While prior work has improved computational efficiency, reducing the space overhead of hypergraph representations remains a major challenge. This paper presents a hybrid compression framework for integer-based hypergraph adj… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  21. arXiv:2506.15400  [pdf, ps, other

    cond-mat.dis-nn cs.IT math.PR

    The maximum-average subtensor problem: equilibrium and out-of-equilibrium properties

    Authors: Vittorio Erba, Nathan Malo Kupferschmid, Rodrigo Pérez Ortiz, Lenka Zdeborová

    Abstract: In this paper we introduce and study the Maximum-Average Subtensor ($p$-MAS) problem, in which one wants to find a subtensor of size $k$ of a given random tensor of size $N$, both of order $p$, with maximum sum of entries. We are motivated by recent work on the matrix case of the problem in which several equilibrium and non-equilibrium properties have been characterized analytically in the asympto… ▽ More

    Submitted 20 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

  22. arXiv:2506.15020  [pdf, ps, other

    math.AT cs.LG math.CO math.ST

    Data analysis using discrete cubical homology

    Authors: Chris Kapulkin, Nathan Kershaw

    Abstract: We present a new tool for data analysis: persistence discrete homology, which is well-suited to analyze filtrations of graphs. In particular, we provide a novel way of representing high-dimensional data as a filtration of graphs using pairwise correlations. We discuss several applications of these tools, e.g., in weather and financial data, comparing them to the standard methods used in the respec… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 17 pages; comments welcome

    MSC Class: 62R40; 68T09; 05C90; 55U05

  23. arXiv:2506.13307  [pdf, ps, other

    cs.CV cs.AI

    Quantitative Comparison of Fine-Tuning Techniques for Pretrained Latent Diffusion Models in the Generation of Unseen SAR Image Concepts

    Authors: Solène Debuysère, Nicolas Trouvé, Nathan Letheule, Olivier Lévêque, Elise Colin

    Abstract: This work investigates the adaptation of large pre-trained latent diffusion models to a radically new imaging domain: Synthetic Aperture Radar (SAR). While these generative models, originally trained on natural images, demonstrate impressive capabilities in text-to-image synthesis, they are not natively adapted to represent SAR data, which involves different physics, statistical distributions, and… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  24. arXiv:2506.12202  [pdf, ps, other

    cs.PL cs.AI cs.CR cs.LG

    A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions

    Authors: Stephen Mell, Botong Zhang, David Mell, Shuo Li, Ramya Ramalingam, Nathan Yu, Steve Zdancewic, Osbert Bastani

    Abstract: Modern large language models (LLMs) are often deployed as agents, calling external tools adaptively to solve tasks. Rather than directly calling tools, it can be more effective for LLMs to write code to perform the tool calls, enabling them to automatically generate complex control flow such as conditionals and loops. Such code actions are typically provided as Python code, since LLMs are quite pr… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  25. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  26. arXiv:2506.10947  [pdf, ps, other

    cs.AI cs.LG

    Spurious Rewards: Rethinking Training Signals in RLVR

    Authors: Rulin Shao, Shuyue Stella Li, Rui Xin, Scott Geng, Yiping Wang, Sewoong Oh, Simon Shaolei Du, Nathan Lambert, Sewon Min, Ranjay Krishna, Yulia Tsvetkov, Hannaneh Hajishirzi, Pang Wei Koh, Luke Zettlemoyer

    Abstract: We show that reinforcement learning with verifiable rewards (RLVR) can elicit strong mathematical reasoning in certain models even with spurious rewards that have little, no, or even negative correlation with the correct answer. For example, RLVR improves MATH-500 performance for Qwen2.5-Math-7B in absolute points by 21.4% (random reward), 13.8% (format reward), 24.1% (incorrect label), 26.0% (1-s… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  27. arXiv:2506.09887  [pdf, ps, other

    cs.LG math.ST stat.ML

    Learning single-index models via harmonic decomposition

    Authors: Nirmit Joshi, Hugo Koubbi, Theodor Misiakiewicz, Nathan Srebro

    Abstract: We study the problem of learning single-index models, where the label $y \in \mathbb{R}$ depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown one-dimensional projection $\langle \boldsymbol{w}_*,\boldsymbol{x}\rangle$. Prior work has shown that under Gaussian inputs, the statistical and computational complexity of recovering $\boldsymbol{w}_*$ is governed by the Hermite e… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 80 pages

  28. arXiv:2506.09275  [pdf, ps, other

    cs.DC

    A Survey of End-to-End Modeling for Distributed DNN Training: Workloads, Simulators, and TCO

    Authors: Jonas Svedas, Hannah Watson, Nathan Laubeuf, Diksha Moolchandani, Abubakr Nada, Arjun Singh, Dwaipayan Biswas, James Myers, Debjyoti Bhattacharjee

    Abstract: Distributed deep neural networks (DNNs) have become a cornerstone for scaling machine learning to meet the demands of increasingly complex applications. However, the rapid growth in model complexity far outpaces CMOS technology scaling, making sustainable and efficient system design a critical challenge. Addressing this requires coordinated co-design across software, hardware, and technology layer… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  29. arXiv:2506.08138  [pdf, ps, other

    cs.NE q-bio.NC

    A Practical Guide to Tuning Spiking Neuronal Dynamics

    Authors: William Gebhardt, Alexander G. Ororbia, Nathan McDonald, Clare Thiem, Jack Lombardi

    Abstract: In this work, we examine fundamental elements of spiking neural networks (SNNs) as well as how to tune them. Concretely, we focus on two different foundational neuronal units utilized in SNNs -- the leaky integrate-and-fire (LIF) and the resonate-and-fire (RAF) neuron. We explore key equations and how hyperparameter values affect behavior. Beyond hyperparameters, we discuss other important design… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  30. arXiv:2506.07454  [pdf, ps, other

    cs.RO cs.AI

    Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs

    Authors: Jared Strader, Aaron Ray, Jacob Arkin, Mason B. Peterson, Yun Chang, Nathan Hughes, Christopher Bradley, Yi Xuan Jia, Carlos Nieto-Granda, Rajat Talak, Chuchu Fan, Luca Carlone, Jonathan P. How, Nicholas Roy

    Abstract: In this paper, we introduce a multi-robot system that integrates mapping, localization, and task and motion planning (TAMP) enabled by 3D scene graphs to execute complex instructions expressed in natural language. Our system builds a shared 3D scene graph incorporating an open-set object-based map, which is leveraged for multi-robot 3D scene graph fusion. This representation supports real-time, vi… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 12 pages, 4 figures

  31. arXiv:2506.06518  [pdf, ps, other

    cs.CR cs.LG

    A Systematic Review of Poisoning Attacks Against Large Language Models

    Authors: Neil Fendley, Edward W. Staley, Joshua Carney, William Redman, Marie Chau, Nathan Drenkow

    Abstract: With the widespread availability of pretrained Large Language Models (LLMs) and their training datasets, concerns about the security risks associated with their usage has increased significantly. One of these security risks is the threat of LLM poisoning attacks where an attacker modifies some part of the LLM training process to cause the LLM to behave in a malicious way. As an emerging area of re… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 28 Pages including number

  32. arXiv:2506.05391  [pdf, ps, other

    eess.IV cs.CV cs.LG stat.AP

    Enhancing Neural Autoregressive Distribution Estimators for Image Reconstruction

    Authors: Ambrose Emmett-Iwaniw, Nathan Kirk

    Abstract: Autoregressive models are often employed to learn distributions of image data by decomposing the $D$-dimensional density function into a product of one-dimensional conditional distributions. Each conditional depends on preceding variables (pixels, in the case of image data), making the order in which variables are processed fundamental to the model performance. In this paper, we study the problem… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted for publication in conference proceedings, MCQMC 2024

  33. arXiv:2506.05256  [pdf, ps, other

    cs.AI cs.LG

    Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

    Authors: Violet Xiang, Chase Blagden, Rafael Rafailov, Nathan Lile, Sang Truong, Chelsea Finn, Nick Haber

    Abstract: Large reasoning models (LRMs) achieve higher performance on challenging reasoning tasks by generating more tokens at inference time, but this verbosity often wastes computation on easy problems. Existing solutions, including supervised finetuning on shorter traces, user-controlled budgets, or RL with uniform penalties, either require data curation, manual configuration, or treat all problems alike… ▽ More

    Submitted 5 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  34. arXiv:2506.05213  [pdf, other

    cs.AI cs.CL

    LLM-First Search: Self-Guided Exploration of the Solution Space

    Authors: Nathan Herr, Tim Rocktäschel, Roberta Raileanu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable improvements in reasoning and planning through increased test-time compute, often by framing problem-solving as a search process. While methods like Monte Carlo Tree Search (MCTS) have proven effective in some domains, their reliance on fixed exploration hyperparameters limits their adaptability across tasks of varying difficulty, rendering… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 9 main pages, 2 figures, 2 tables, 36 appendix pages

  35. arXiv:2506.04467  [pdf

    physics.med-ph cs.AI

    Diffusion Transformer-based Universal Dose Denoising for Pencil Beam Scanning Proton Therapy

    Authors: Yuzhen Ding, Jason Holmes, Hongying Feng, Martin Bues, Lisa A. McGee, Jean-Claude M. Rwigema, Nathan Y. Yu, Terence S. Sio, Sameer R. Keole, William W. Wong, Steven E. Schild, Jonathan B. Ashman, Sujay A. Vora, Daniel J. Ma, Samir H. Patel, Wei Liu

    Abstract: Purpose: Intensity-modulated proton therapy (IMPT) offers precise tumor coverage while sparing organs at risk (OARs) in head and neck (H&N) cancer. However, its sensitivity to anatomical changes requires frequent adaptation through online adaptive radiation therapy (oART), which depends on fast, accurate dose calculation via Monte Carlo (MC) simulations. Reducing particle count accelerates MC but… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  36. arXiv:2506.04408  [pdf, ps, other

    cs.CL cs.AI

    Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning

    Authors: Wesley Scivetti, Tatsuya Aoyama, Ethan Wilcox, Nathan Schneider

    Abstract: Humans have a remarkable ability to acquire and understand grammatical phenomena that are seen rarely, if ever, during childhood. Recent evidence suggests that language models with human-scale pretraining data may possess a similar ability by generalizing from frequent to rare constructions. However, it remains an open question how widespread this generalization ability is, and to what extent this… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  37. arXiv:2506.03324  [pdf, ps, other

    cs.LG

    Optimization of Epsilon-Greedy Exploration

    Authors: Ethan Che, Hakan Ceylan, James McInerney, Nathan Kallus

    Abstract: Modern recommendation systems rely on exploration to learn user preferences for new items, typically implementing uniform exploration policies (e.g., epsilon-greedy) due to their simplicity and compatibility with machine learning (ML) personalization models. Within these systems, a crucial consideration is the rate of exploration - what fraction of user traffic should receive random item recommend… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  38. arXiv:2506.02986  [pdf, ps, other

    cs.LG

    Implicit Regularization of the Deep Inverse Prior Trained with Inertia

    Authors: Nathan Buskulic, Jalal Fadil, Yvain Quéau

    Abstract: Solving inverse problems with neural networks benefits from very few theoretical guarantees when it comes to the recovery guarantees. We provide in this work convergence and recovery guarantees for self-supervised neural networks applied to inverse problems, such as Deep Image/Inverse Prior, and trained with inertia featuring both viscous and geometric Hessian-driven dampings. We study both the co… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  39. arXiv:2506.02881  [pdf, ps, other

    stat.ME cs.LG stat.ML

    Simulation-Based Inference for Adaptive Experiments

    Authors: Brian M Cho, Aurélien Bibaut, Nathan Kallus

    Abstract: Multi-arm bandit experimental designs are increasingly being adopted over standard randomized trials due to their potential to improve outcomes for study participants, enable faster identification of the best-performing options, and/or enhance the precision of estimating key parameters. Current approaches for inference after adaptive sampling either rely on asymptotic normality under restricted ex… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  40. arXiv:2506.02865  [pdf, ps, other

    cs.AI

    Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights

    Authors: Mathieu Andreux, Breno Baldas Skuk, Hamza Benchekroun, Emilien Biré, Antoine Bonnet, Riaz Bordie, Nathan Bout, Matthias Brunel, Pierre-Louis Cedoz, Antoine Chassang, Mickaël Chen, Alexandra D. Constantinou, Antoine d'Andigné, Hubert de La Jonquière, Aurélien Delfosse, Ludovic Denoyer, Alexis Deprez, Augustin Derupti, Michael Eickenberg, Mathïs Federico, Charles Kantor, Xavier Koegler, Yann Labbé, Matthew C. H. Lee, Erwan Le Jumeau de Kergaradec , et al. (19 additional authors not shown)

    Abstract: We present Surfer-H, a cost-efficient web agent that integrates Vision-Language Models (VLM) to perform user-defined tasks on the web. We pair it with Holo1, a new open-weight collection of VLMs specialized in web navigation and information extraction. Holo1 was trained on carefully curated data sources, including open-access web content, synthetic examples, and self-produced agentic data. Holo1 t… ▽ More

    Submitted 11 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: Alphabetical order

  41. arXiv:2506.02482  [pdf, ps, other

    cs.SI

    Building a Recommendation System Using Amazon Product Co-Purchasing Network

    Authors: Minghao Liu, Catherine Zhao, Nathan Zhou

    Abstract: This project develops an online, inductive recommendation system for newly listed products on e-commerce platforms, focusing on suggesting relevant new items to customers as they purchase other products. Using the Amazon Product Co-Purchasing Network Metadata dataset, we construct a co-purchasing graph where nodes represent products and edges capture co-purchasing relationships. To address the cha… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  42. arXiv:2506.01937  [pdf, ps, other

    cs.CL

    RewardBench 2: Advancing Reward Model Evaluation

    Authors: Saumya Malik, Valentina Pyatkin, Sander Land, Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Nathan Lambert

    Abstract: Reward models are used throughout the post-training of language models to capture nuanced signals from preference data and provide a training target for optimization across instruction following, reasoning, safety, and more domains. The community has begun establishing best practices for evaluating reward models, from the development of benchmarks that test capabilities in specific skill areas to… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Data, models, and leaderboard available at https://huggingface.co/collections/allenai/reward-bench-2-683d2612a4b3e38a3e53bb51

  43. arXiv:2506.01084  [pdf, other

    cs.CL cs.LG

    zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression

    Authors: Saibo Geng, Nathan Ranchin, Yunzhen yao, Maxime Peyrard, Chris Wendler, Michael Gastpar, Robert West

    Abstract: Tokenization efficiency plays a critical role in the performance and cost of large language models (LLMs), yet most models rely on static tokenizers optimized for general-purpose corpora. These tokenizers' fixed vocabularies often fail to adapt to domain- or language-specific inputs, leading to longer token sequences and higher computational costs. We introduce zip2zip, a framework that enables LL… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Code will be released at https://github.com/epfl-dlab/zip2zip

  44. arXiv:2505.23702  [pdf, ps, other

    cs.LG math.NA

    (U)NFV: Supervised and Unsupervised Neural Finite Volume Methods for Solving Hyperbolic PDEs

    Authors: Nathan Lichtlé, Alexi Canesse, Zhe Fu, Hossein Nick Zinat Matin, Maria Laura Delle Monache, Alexandre M. Bayen

    Abstract: We introduce (U)NFV, a modular neural network architecture that generalizes classical finite volume (FV) methods for solving hyperbolic conservation laws. Hyperbolic partial differential equations (PDEs) are challenging to solve, particularly conservation laws whose physically relevant solutions contain shocks and discontinuities. FV methods are widely used for their mathematical properties: conve… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    ACM Class: I.2.6; G.1.8

  45. arXiv:2505.23575  [pdf, ps, other

    cs.AI cs.LG

    CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring

    Authors: Benjamin Arnav, Pablo Bernabeu-Pérez, Nathan Helm-Burger, Tim Kostolansky, Hannes Whittingham, Mary Phuong

    Abstract: As AI models are deployed with increasing autonomy, it is important to ensure they do not take harmful actions unnoticed. As a potential mitigation, we investigate Chain-of-Thought (CoT) monitoring, wherein a weaker trusted monitor model continuously oversees the intermediate reasoning steps of a more powerful but untrusted model. We compare CoT monitoring to action-only monitoring, where only fin… ▽ More

    Submitted 29 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  46. arXiv:2505.21746  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Learning to See More: UAS-Guided Super-Resolution of Satellite Imagery for Precision Agriculture

    Authors: Arif Masrur, Peder A. Olsen, Paul R. Adler, Carlan Jackson, Matthew W. Myers, Nathan Sedghi, Ray R. Weil

    Abstract: Unmanned Aircraft Systems (UAS) and satellites are key data sources for precision agriculture, yet each presents trade-offs. Satellite data offer broad spatial, temporal, and spectral coverage but lack the resolution needed for many precision farming applications, while UAS provide high spatial detail but are limited by coverage and cost, especially for hyperspectral data. This study presents a no… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  47. arXiv:2505.21647  [pdf, ps, other

    cs.CV cs.LG

    QuARI: Query Adaptive Retrieval Improvement

    Authors: Eric Xing, Abby Stylianou, Robert Pless, Nathan Jacobs

    Abstract: Massive-scale pretraining has made vision-language models increasingly popular for image-to-image and text-to-image retrieval across a broad collection of domains. However, these models do not perform well when used for challenging retrieval tasks, such as instance retrieval in very large-scale image collections. Recent work has shown that linear transformations of VLM features trained for instanc… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 13 pages, 4 figures, 4 tables

  48. arXiv:2505.21161  [pdf, ps, other

    cs.RO math.OC

    Collision Probability Estimation for Optimization-based Vehicular Motion Planning

    Authors: Leon Tolksdorf, Arturo Tejada, Christian Birkner, Nathan van de Wouw

    Abstract: Many motion planning algorithms for automated driving require estimating the probability of collision (POC) to account for uncertainties in the measurement and estimation of the motion of road users. Common POC estimation techniques often utilize sampling-based methods that suffer from computational inefficiency and a non-deterministic estimation, i.e., each estimation result for the same inputs i… ▽ More

    Submitted 30 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: 14 pages, 6 figures

  49. arXiv:2505.20781  [pdf, ps, other

    cs.RO cs.LG

    STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation

    Authors: Hossein Goli, Michael Gimelfarb, Nathan Samuel de Lara, Haruki Nishimura, Masha Itkina, Florian Shkurti

    Abstract: Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy, and is crucial in domains such as robotics or healthcare where direct interaction with the environment is costly or unsafe. Existing OPE methods are ineffective for high-dimensional, long-horizon problems, due to exponential blow-ups in variance from importance weighting or… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  50. arXiv:2505.20764  [pdf, ps, other

    cs.CV cs.LG

    ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval

    Authors: Eric Xing, Pranavi Kolouju, Robert Pless, Abby Stylianou, Nathan Jacobs

    Abstract: Composed image retrieval (CIR) is the task of retrieving a target image specified by a query image and a relative text that describes a semantic modification to the query image. Existing methods in CIR struggle to accurately represent the image and the text modification, resulting in subpar performance. To address this limitation, we introduce a CIR framework, ConText-CIR, trained with a Text Conc… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 15 pages, 8 figures, 6 tables. CVPR 2025