Skip to main content

Showing 1–2 of 2 results for author: Shlomi, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.18456  [pdf, other

    cs.LG cs.AI cs.CR cs.CY

    Baselines for Identifying Watermarked Large Language Models

    Authors: Leonard Tang, Gavin Uberti, Tom Shlomi

    Abstract: We consider the emerging problem of identifying the presence and use of watermarking schemes in widely used, publicly hosted, closed source large language models (LLMs). We introduce a suite of baseline algorithms for identifying watermarks in LLMs that rely on analyzing distributions of output tokens and logits generated by watermarked and unmarked LLMs. Notably, watermarked LLMs tend to produce… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  2. arXiv:2303.05593  [pdf, other

    cs.LG cs.AI cs.CR

    Learning the Wrong Lessons: Inserting Trojans During Knowledge Distillation

    Authors: Leonard Tang, Tom Shlomi, Alexander Cai

    Abstract: In recent years, knowledge distillation has become a cornerstone of efficiently deployed machine learning, with labs and industries using knowledge distillation to train models that are inexpensive and resource-optimized. Trojan attacks have contemporaneously gained significant prominence, revealing fundamental vulnerabilities in deep learning models. Given the widespread use of knowledge distilla… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

    Comments: ICLR 2023 Workshop on Backdoor Attacks and Defenses in Machine Learning