Skip to main content

Showing 1–6 of 6 results for author: Nemcovsky, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.09755  [pdf, ps, other

    cs.CR cs.LG

    Jailbreak Attack Initializations as Extractors of Compliance Directions

    Authors: Amit Levi, Rom Himelstein, Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin

    Abstract: Safety-aligned LLMs respond to prompts with either compliance or refusal, each corresponding to distinct directions in the model's activation space. Recent works show that initializing attacks via self-transfer from other prompts significantly enhances their performance. However, the underlying mechanisms of these initializations remain unclear, and attacks utilize arbitrary or hand-picked initial… ▽ More

    Submitted 5 June, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

  2. arXiv:2411.16162  [pdf, other

    cs.CV cs.LG

    Sparse patches adversarial attacks via extrapolating point-wise information

    Authors: Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin

    Abstract: Sparse and patch adversarial attacks were previously shown to be applicable in realistic settings and are considered a security risk to autonomous systems. Sparse adversarial perturbations constitute a setting in which the adversarial perturbations are limited to affecting a relatively small number of points in the input. Patch adversarial attacks denote the setting where the sparse attacks are li… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: AdvML-Frontiers 24: The 3nd Workshop on New Frontiers in Adversarial Machine Learning, NeurIPS 24

  3. arXiv:2207.05729  [pdf, other

    cs.CV cs.LG

    Physical Passive Patch Adversarial Attacks on Visual Odometry Systems

    Authors: Yaniv Nemcovsky, Matan Jacoby, Alex M. Bronstein, Chaim Baskin

    Abstract: Deep neural networks are known to be susceptible to adversarial perturbations -- small perturbations that alter the output of the network and exist under strict norm limitations. While such perturbations are usually discussed as tailored to a specific input, a universal perturbation can be constructed to alter the model's output on a set of inputs. Universal perturbations present a more realistic… ▽ More

    Submitted 4 October, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: Accepted to ACCV 2022

  4. arXiv:2003.02188  [pdf, other

    cs.LG cs.CV stat.ML

    Colored Noise Injection for Training Adversarially Robust Neural Networks

    Authors: Evgenii Zheltonozhskii, Chaim Baskin, Yaniv Nemcovsky, Brian Chmiel, Avi Mendelson, Alex M. Bronstein

    Abstract: Even though deep learning has shown unmatched performance on various tasks, neural networks have been shown to be vulnerable to small adversarial perturbations of the input that lead to significant performance degradation. In this work we extend the idea of adding white Gaussian noise to the network weights and activations during adversarial training (PNI) to the injection of colored noise for def… ▽ More

    Submitted 20 March, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

  5. arXiv:2002.09866  [pdf, other

    cs.LG stat.ML

    On the generalization of bayesian deep nets for multi-class classification

    Authors: Yossi Adi, Yaniv Nemcovsky, Alex Schwing, Tamir Hazan

    Abstract: Generalization bounds which assess the difference between the true risk and the empirical risk have been studied extensively. However, to obtain bounds, current techniques use strict assumptions such as a uniformly bounded or a Lipschitz loss function. To avoid these assumptions, in this paper, we propose a new generalization bound for Bayesian deep nets by exploiting the contractivity of the Log-… ▽ More

    Submitted 23 February, 2020; originally announced February 2020.

  6. arXiv:1911.07198  [pdf, other

    cs.LG cs.CV stat.ML

    Smoothed Inference for Adversarially-Trained Models

    Authors: Yaniv Nemcovsky, Evgenii Zheltonozhskii, Chaim Baskin, Brian Chmiel, Maxim Fishman, Alex M. Bronstein, Avi Mendelson

    Abstract: Deep neural networks are known to be vulnerable to adversarial attacks. Current methods of defense from such attacks are based on either implicit or explicit regularization, e.g., adversarial training. Randomized smoothing, the averaging of the classifier outputs over a random distribution centered in the sample, has been shown to guarantee the performance of a classifier subject to bounded pertur… ▽ More

    Submitted 16 March, 2020; v1 submitted 17 November, 2019; originally announced November 2019.