Skip to main content

Showing 1–23 of 23 results for author: Balunović, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21621  [pdf, ps, other

    cs.CL cs.AI

    The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs

    Authors: Jasper Dekoninck, Ivo Petrov, Kristian Minchev, Mislav Balunovic, Martin Vechev, Miroslav Marinov, Maria Drencheva, Lyuba Konova, Milen Shumanov, Kaloyan Tsvetkov, Nikolay Drenchev, Lazar Todorov, Kalina Nikolova, Nikolay Georgiev, Vanesa Kalinkova, Margulan Ismoldayev

    Abstract: In recent months, large language models (LLMs) have made significant progress in mathematical proof generation, but further advancement is hindered by the lack of a large-scale, high-quality dataset of human-evaluated proofs. While expensive to create, such a dataset is essential for driving improvements in training and enabling a rigorous analysis of proof generation capabilities. In this work, w… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  2. arXiv:2505.23281  [pdf, ps, other

    cs.AI cs.CL

    MathArena: Evaluating LLMs on Uncontaminated Math Competitions

    Authors: Mislav Balunović, Jasper Dekoninck, Ivo Petrov, Nikola Jovanović, Martin Vechev

    Abstract: The rapid advancement of reasoning capabilities in large language models (LLMs) has led to notable improvements on mathematical benchmarks. However, many of the most commonly used evaluation datasets (e.g., AIME 2024) are widely available online, making it difficult to disentangle genuine reasoning from potential memorization. Furthermore, these benchmarks do not evaluate proof-writing capabilitie… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  3. arXiv:2503.21934  [pdf, other

    cs.CL

    Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

    Authors: Ivo Petrov, Jasper Dekoninck, Lyuben Baltadzhiev, Maria Drencheva, Kristian Minchev, Mislav Balunović, Nikola Jovanović, Martin Vechev

    Abstract: Recent math benchmarks for large language models (LLMs) such as MathArena indicate that state-of-the-art reasoning models achieve impressive performance on mathematical competitions like AIME, with the leading model, Gemini-2.5-Pro, achieving scores comparable to top human competitors. However, these benchmarks evaluate models solely based on final numerical answers, neglecting rigorous reasoning… ▽ More

    Submitted 29 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  4. arXiv:2503.04479  [pdf, other

    cs.AI cs.SE

    ToolFuzz -- Automated Agent Tool Testing

    Authors: Ivan Milev, Mislav Balunović, Maximilian Baader, Martin Vechev

    Abstract: Large Language Model (LLM) Agents leverage the advanced reasoning capabilities of LLMs in real-world applications. To interface with an environment, these agents often rely on tools, such as web search or database APIs. As the agent provides the LLM with tool documentation along the user query, the completeness and correctness of this documentation is critical. However, tool documentation is often… ▽ More

    Submitted 11 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

  5. arXiv:2502.10197  [pdf, other

    cs.AI

    MathConstruct: Challenging LLM Reasoning with Constructive Proofs

    Authors: Mislav Balunović, Jasper Dekoninck, Nikola Jovanović, Ivo Petrov, Martin Vechev

    Abstract: While Large Language Models (LLMs) demonstrate impressive performance in mathematics, existing math benchmarks come with significant limitations. Many focus on problems with fixed ground-truth answers, and are often saturated due to problem simplicity or the viability of guessing or memorization. Crucially, they capture only a narrow subset of relevant math problems. To address this research gap,… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  6. arXiv:2410.07959  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act

    Authors: Philipp Guldimann, Alexander Spiridonov, Robin Staab, Nikola Jovanović, Mark Vero, Velko Vechev, Anna-Maria Gueorguieva, Mislav Balunović, Nikola Konstantinov, Pavol Bielik, Petar Tsankov, Martin Vechev

    Abstract: The EU's Artificial Intelligence Act (AI Act) is a significant step towards responsible AI development, but lacks clear technical interpretation, making it difficult to assess models' compliance. This work presents COMPL-AI, a comprehensive framework consisting of (i) the first technical interpretation of the EU AI Act, translating its broad regulatory requirements into measurable technical requir… ▽ More

    Submitted 3 February, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

  7. arXiv:2406.13352  [pdf, other

    cs.CR cs.LG

    AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

    Authors: Edoardo Debenedetti, Jie Zhang, Mislav Balunović, Luca Beurer-Kellner, Marc Fischer, Florian Tramèr

    Abstract: AI agents aim to solve complex tasks by combining text-based reasoning with external tool calls. Unfortunately, AI agents are vulnerable to prompt injection attacks where data returned by external tools hijacks the agent to execute malicious tasks. To measure the adversarial robustness of AI agents, we introduce AgentDojo, an evaluation framework for agents that execute tools over untrusted data.… ▽ More

    Submitted 24 November, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Updated version after fixing a bug in the Llama implementation and updating the travel suite

  8. arXiv:2402.13846  [pdf, other

    cs.AI cs.CL cs.CR

    Large Language Models are Advanced Anonymizers

    Authors: Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev

    Abstract: Recent privacy research on large language models (LLMs) has shown that they achieve near-human-level performance at inferring personal data from online texts. With ever-increasing model capabilities, existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats. In this work, we take two steps to bridge this gap: First, we present a new setting fo… ▽ More

    Submitted 3 February, 2025; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: International Conference on Learning Representations (ICLR 2024)

    ACM Class: I.2.7

  9. arXiv:2311.10500  [pdf, other

    cs.LG cs.AI cs.CR

    From Principle to Practice: Vertical Data Minimization for Machine Learning

    Authors: Robin Staab, Nikola Jovanović, Mislav Balunović, Martin Vechev

    Abstract: Aiming to train and deploy predictive models, organizations collect large amounts of detailed client data, risking the exposure of private information in the event of a breach. To mitigate this, policymakers increasingly demand compliance with the data minimization (DM) principle, restricting data collection to only that data which is relevant and necessary for the task. Despite regulatory pressur… ▽ More

    Submitted 22 November, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: Accepted at IEEE S&P 2024

  10. arXiv:2310.07298  [pdf, other

    cs.AI cs.LG

    Beyond Memorization: Violating Privacy Via Inference with Large Language Models

    Authors: Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev

    Abstract: Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased drastically. This raises the key question of whether current LLMs could violate individuals' privacy by inferring personal attributes from text given at inference time. In this work, we present the first compr… ▽ More

    Submitted 6 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    ACM Class: I.2.7

  11. arXiv:2307.03577  [pdf, other

    cs.LG cs.DB cs.PL

    CuTS: Customizable Tabular Synthetic Data Generation

    Authors: Mark Vero, Mislav Balunović, Martin Vechev

    Abstract: Privacy, data quality, and data sharing concerns pose a key limitation for tabular data applications. While generating synthetic data resembling the original distribution addresses some of these issues, most applications would benefit from additional customization on the generated data. However, existing synthetic data approaches are limited to particular constraints, e.g., differential privacy (D… ▽ More

    Submitted 2 June, 2024; v1 submitted 7 July, 2023; originally announced July 2023.

  12. arXiv:2210.07213  [pdf, other

    cs.LG cs.AI cs.CY

    FARE: Provably Fair Representation Learning with Practical Certificates

    Authors: Nikola Jovanović, Mislav Balunović, Dimitar I. Dimitrov, Martin Vechev

    Abstract: Fair representation learning (FRL) is a popular class of methods aiming to produce fair classifiers via data preprocessing. Recent regulatory directives stress the need for FRL methods that provide practical certificates, i.e., provable upper bounds on the unfairness of any downstream classifier trained on preprocessed data, which directly provides assurance in a practical scenario. Creating such… ▽ More

    Submitted 8 June, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: ICML 2023

  13. arXiv:2210.01785  [pdf, other

    cs.LG cs.CR cs.DC

    TabLeak: Tabular Data Leakage in Federated Learning

    Authors: Mark Vero, Mislav Balunović, Dimitar I. Dimitrov, Martin Vechev

    Abstract: While federated learning (FL) promises to preserve privacy, recent works in the image and text domains have shown that training updates leak private client data. However, most high-stakes applications of FL (e.g., in healthcare and finance) use tabular data, where the risk of data leakage has not yet been explored. A successful attack for tabular data must address two key challenges unique to the… ▽ More

    Submitted 7 July, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    ACM Class: I.2.11

  14. arXiv:2206.12395  [pdf, other

    cs.LG cs.CR cs.DC

    Data Leakage in Federated Averaging

    Authors: Dimitar I. Dimitrov, Mislav Balunović, Nikola Konstantinov, Martin Vechev

    Abstract: Recent attacks have shown that user data can be recovered from FedSGD updates, thus breaking privacy. However, these attacks are of limited practical relevance as federated learning typically uses the FedAvg algorithm. Compared to FedSGD, recovering data from FedAvg updates is much harder as: (i) the updates are computed at unobserved intermediate network weights, (ii) a large number of batches ar… ▽ More

    Submitted 1 November, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

    ACM Class: I.2.11

  15. arXiv:2202.08827  [pdf, other

    cs.LG cs.DC

    LAMP: Extracting Text from Gradients with Language Model Priors

    Authors: Mislav Balunović, Dimitar I. Dimitrov, Nikola Jovanović, Martin Vechev

    Abstract: Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients… ▽ More

    Submitted 19 October, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    ACM Class: I.2.7; I.2.11

  16. arXiv:2111.13650  [pdf, ps, other

    cs.LG cs.AI cs.CV

    Latent Space Smoothing for Individually Fair Representations

    Authors: Momchil Peychev, Anian Ruoss, Mislav Balunović, Maximilian Baader, Martin Vechev

    Abstract: Fair representation learning transforms user data into a representation that ensures fairness and utility regardless of the downstream application. However, learning individually fair representations, i.e., guaranteeing that similar individuals are treated similarly, remains challenging in high-dimensional settings such as computer vision. In this work, we introduce LASSI, the first representation… ▽ More

    Submitted 26 July, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: ECCV 2022

  17. arXiv:2111.04706  [pdf, other

    cs.LG cs.CR

    Bayesian Framework for Gradient Leakage

    Authors: Mislav Balunović, Dimitar I. Dimitrov, Robin Staab, Martin Vechev

    Abstract: Federated learning is an established method for training machine learning models without sharing training data. However, recent work has shown that it cannot guarantee data privacy as shared gradients can still leak sensitive information. To formalize the problem of gradient leakage, we propose a theoretical framework that enables, for the first time, analysis of the Bayes optimal adversary phrase… ▽ More

    Submitted 17 March, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

  18. arXiv:2106.05937  [pdf, other

    cs.LG cs.AI

    Fair Normalizing Flows

    Authors: Mislav Balunović, Anian Ruoss, Martin Vechev

    Abstract: Fair representation learning is an attractive approach that promises fairness of downstream predictors by encoding sensitive data. Unfortunately, recent work has shown that strong adversarial predictors can still exhibit unfairness by recovering sensitive attributes from these representations. In this work, we present Fair Normalizing Flows (FNF), a new approach offering more rigorous fairness gua… ▽ More

    Submitted 17 March, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

  19. arXiv:2103.16652  [pdf, other

    cs.LG cs.AI cs.CV

    Robustness Certification for Point Cloud Models

    Authors: Tobias Lorenz, Anian Ruoss, Mislav Balunović, Gagandeep Singh, Martin Vechev

    Abstract: The use of deep 3D point cloud models in safety-critical applications, such as autonomous driving, dictates the need to certify the robustness of these models to real-world transformations. This is technically challenging, as it requires a scalable verifier tailored to point cloud models that handles a wide range of semantic 3D transformations. In this work, we address this challenge and introduce… ▽ More

    Submitted 23 August, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: International Conference on Computer Vision (ICCV) 2021

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2021, pp. 7608-7618

  20. arXiv:2102.06700  [pdf, other

    cs.LG cs.AI

    On the Paradox of Certified Training

    Authors: Nikola Jovanović, Mislav Balunović, Maximilian Baader, Martin Vechev

    Abstract: Certified defenses based on convex relaxations are an established technique for training provably robust models. The key component is the choice of relaxation, varying from simple intervals to tight polyhedra. Counterintuitively, loose interval-based training often leads to higher certified robustness than what can be achieved with tighter relaxations, which is a well-known but poorly understood p… ▽ More

    Submitted 12 October, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Published in Transactions on Machine Learning Research (TMLR) 10/2022

  21. arXiv:2009.09318  [pdf, ps, other

    cs.LG cs.AI cs.CV stat.ML

    Efficient Certification of Spatial Robustness

    Authors: Anian Ruoss, Maximilian Baader, Mislav Balunović, Martin Vechev

    Abstract: Recent work has exposed the vulnerability of computer vision models to vector field attacks. Due to the widespread usage of such models in safety-critical applications, it is crucial to quantify their robustness against such spatial transformations. However, existing work only provides empirical robustness quantification against vector field deformations via adversarial attacks, which lack provabl… ▽ More

    Submitted 30 January, 2021; v1 submitted 19 September, 2020; originally announced September 2020.

    Comments: Conference Paper at AAAI 2021

  22. arXiv:2005.13300  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Scalable Polyhedral Verification of Recurrent Neural Networks

    Authors: Wonryong Ryou, Jiayu Chen, Mislav Balunovic, Gagandeep Singh, Andrei Dan, Martin Vechev

    Abstract: We present a scalable and precise verifier for recurrent neural networks, called Prover based on two novel ideas: (i) a method to compute a set of polyhedral abstractions for the non-convex and nonlinear recurrent update functions by combining sampling, optimization, and Fermat's theorem, and (ii) a gradient descent based algorithm for abstraction refinement guided by the certification problem tha… ▽ More

    Submitted 10 June, 2021; v1 submitted 27 May, 2020; originally announced May 2020.

    Comments: Published in CAV 2021

  23. arXiv:2002.10312  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Learning Certified Individually Fair Representations

    Authors: Anian Ruoss, Mislav Balunović, Marc Fischer, Martin Vechev

    Abstract: Fair representation learning provides an effective way of enforcing fairness constraints without compromising utility for downstream users. A desirable family of such fairness constraints, each requiring similar treatment for similar individuals, is known as individual fairness. In this work, we introduce the first method that enables data consumers to obtain certificates of individual fairness fo… ▽ More

    Submitted 28 November, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: Conference Paper at NeurIPS 2020