Skip to main content

Showing 1–5 of 5 results for author: Kon, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.24785  [pdf, ps, other

    cs.AI

    EXP-Bench: Can AI Conduct AI Research Experiments?

    Authors: Patrick Tser Jern Kon, Jiachen Liu, Xinyi Zhu, Qiuyi Ding, Jingjia Peng, Jiarong Xing, Yibo Huang, Yiming Qiu, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Matei Zaharia, Ang Chen

    Abstract: Automating AI research holds immense potential for accelerating scientific progress, yet current AI agents struggle with the complexities of rigorous, end-to-end experimentation. We introduce EXP-Bench, a novel benchmark designed to systematically evaluate AI agents on complete research experiments sourced from influential AI publications. Given a research question and incomplete starter code, EXP… ▽ More

    Submitted 1 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: 45 pages, 13 figures

  2. arXiv:2502.16069  [pdf, other

    cs.AI cs.LG

    Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

    Authors: Patrick Tser Jern Kon, Jiachen Liu, Qiuyi Ding, Yiming Qiu, Zhenning Yang, Yibo Huang, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Ang Chen

    Abstract: Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI a… ▽ More

    Submitted 25 February, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: 21 pages

  3. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  4. arXiv:2501.05842  [pdf, other

    cs.LG eess.SY

    Orthogonal projection-based regularization for efficient model augmentation

    Authors: Bendegúz M. Györök, Jan H. Hoekstra, Johan Kon, Tamás Péni, Maarten Schoukens, Roland Tóth

    Abstract: Deep-learning-based nonlinear system identification has shown the ability to produce reliable and highly accurate models in practice. However, these black-box models lack physical interpretability, and a considerable part of the learning effort is often spent on capturing already expected/known behavior of the system, that can be accurately described by first-principles laws of physics. A potentia… ▽ More

    Submitted 22 April, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

    Comments: Accepted for L4DC 2025

  5. Unifying Model-Based and Neural Network Feedforward: Physics-Guided Neural Networks with Linear Autoregressive Dynamics

    Authors: Johan Kon, Dennis Bruijnen, Jeroen van de Wijdeven, Marcel Heertjes, Tom Oomen

    Abstract: Unknown nonlinear dynamics often limit the tracking performance of feedforward control. The aim of this paper is to develop a feedforward control framework that can compensate these unknown nonlinear dynamics using universal function approximators. The feedforward controller is parametrized as a parallel combination of a physics-based model and a neural network, where both share the same linear au… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: Accepted for presentation at the 2022 Conference on Decision and Control (CDC)