Skip to main content

Showing 1–7 of 7 results for author: Shaib, C

.
  1. arXiv:2505.17390  [pdf, ps, other

    cs.CL

    Measuring diversity of synthetic prompts and data generated with fine-grained persona prompting

    Authors: Gauri Kambhatla, Chantal Shaib, Venkata Govindarajan

    Abstract: Fine-grained personas have recently been used for generating 'diverse' synthetic data for pre-training and supervised fine-tuning of Large Language Models (LLMs). In this work, we measure the diversity of persona-driven synthetically generated prompts and responses with a suite of lexical diversity and redundancy metrics. Firstly, we find that synthetic prompts/instructions are significantly less… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  2. arXiv:2502.06659  [pdf, other

    cs.CL

    Who Taught You That? Tracing Teachers in Model Distillation

    Authors: Somin Wadhwa, Chantal Shaib, Silvio Amir, Byron C. Wallace

    Abstract: Model distillation -- using outputs from a large teacher model to teach a small student model -- is a practical means of creating efficient models for a particular task. We ask: Can we identify a students' teacher based on its outputs? Such "footprints" left by teacher LLMs would be interesting artifacts. Beyond this, reliable teacher inference may have practical implications as actors seek to dis… ▽ More

    Submitted 20 May, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: Findings of ACL 2025

  3. arXiv:2407.00211  [pdf, other

    cs.CL

    Detection and Measurement of Syntactic Templates in Generated Text

    Authors: Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C. Wallace

    Abstract: Recent work on evaluating the diversity of text generated by LLMs has focused on word-level features. Here we offer an analysis of syntactic features to characterize general repetition in models, beyond frequent n-grams. Specifically, we define syntactic templates and show that models tend to produce templated text in downstream tasks at a higher rate than what is found in human-reference texts. W… ▽ More

    Submitted 6 October, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

    Comments: EMNLP 2024

  4. arXiv:2403.00553  [pdf, other

    cs.CL

    Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores

    Authors: Chantal Shaib, Joe Barrow, Jiuding Sun, Alexa F. Siu, Byron C. Wallace, Ani Nenkova

    Abstract: The diversity across outputs generated by LLMs shapes perception of their quality and utility. High lexical diversity is often desirable, but there is no standard method to measure this property. Templated answer structures and ``canned'' responses across different documents are readily noticeable, but difficult to visualize across large corpora. This work aims to standardize measurement of text d… ▽ More

    Submitted 20 March, 2025; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Preprint

  5. arXiv:2402.18756  [pdf, other

    cs.CL

    How Much Annotation is Needed to Compare Summarization Models?

    Authors: Chantal Shaib, Joe Barrow, Alexa F. Siu, Byron C. Wallace, Ani Nenkova

    Abstract: Modern instruction-tuned models have become highly capable in text generation tasks such as summarization, and are expected to be released at a steady pace. In practice one may now wish to choose confidently, but with minimal effort, the best performing summarization model when applied to a new domain or purpose. In this work, we empirically investigate the test sample size necessary to select a p… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Preprint

  6. arXiv:2306.11270  [pdf, other

    cs.CL cs.LG

    Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

    Authors: Jiuding Sun, Chantal Shaib, Byron C. Wallace

    Abstract: Instruction fine-tuning has recently emerged as a promising approach for improving the zero-shot capabilities of Large Language Models (LLMs) on new tasks. This technique has shown particular strength in improving the performance of modestly sized LLMs, sometimes inducing performance competitive with much larger model variants. In this paper we ask two questions: (1) How sensitive are instruction-… ▽ More

    Submitted 8 July, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

  7. arXiv:2305.06299  [pdf, other

    cs.CL

    Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)

    Authors: Chantal Shaib, Millicent L. Li, Sebastian Joseph, Iain J. Marshall, Junyi Jessy Li, Byron C. Wallace

    Abstract: Large language models, particularly GPT-3, are able to produce high quality summaries of general domain news articles in few- and zero-shot settings. However, it is unclear if such models are similarly capable in more specialized, high-stakes domains such as biomedicine. In this paper, we enlist domain experts (individuals with medical training) to evaluate summaries of biomedical articles generat… ▽ More

    Submitted 11 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted short paper to ACL 2023