Skip to main content

Showing 1–3 of 3 results for author: Sabbaghi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.06414  [pdf, other

    cs.CR cs.AI

    Benchmarking Misuse Mitigation Against Covert Adversaries

    Authors: Davis Brown, Mahdi Sabbaghi, Luze Sun, Alexander Robey, George J. Pappas, Eric Wong, Hamed Hassani

    Abstract: Existing language model safety evaluations focus on overt attacks and low-stakes tasks. Realistic attackers can subvert current safeguards by requesting help on small, benign-seeming tasks across many independent queries. Because individual queries do not appear harmful, the attack is hard to {detect}. However, when combined, these fragments uplift misuse by helping the attacker complete hard and… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  2. arXiv:2502.01633  [pdf, ps, other

    cs.LG cs.AI

    Adversarial Reasoning at Jailbreaking Time

    Authors: Mahdi Sabbaghi, Paul Kassianik, George Pappas, Yaron Singer, Amin Karbasi, Hamed Hassani

    Abstract: As large language models (LLMs) are becoming more capable and widespread, the study of their failure cases is becoming increasingly important. Recent advances in standardizing, measuring, and scaling test-time compute suggest new methodologies for optimizing models to achieve high performance on hard tasks. In this paper, we apply these advances to the task of model jailbreaking: eliciting harmful… ▽ More

    Submitted 25 June, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: Accepted to the 42nd International Conference on Machine Learning (ICML 2025)

  3. arXiv:2406.01895  [pdf, other

    cs.LG cs.CL stat.ML

    Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks

    Authors: Mahdi Sabbaghi, George Pappas, Hamed Hassani, Surbhi Goel

    Abstract: Despite the success of Transformers on language understanding, code generation, and logical reasoning, they still fail to generalize over length on basic arithmetic tasks such as addition and multiplication. A major reason behind this failure is the vast difference in structure between numbers and text; For example, the numbers are typically parsed from right to left, and there is a correspondence… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 32 pages, 16 figures