Skip to main content

Showing 1–13 of 13 results for author: Menghini, C

.
  1. arXiv:2503.03750  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

    Authors: Richard Ren, Arunim Agarwal, Mantas Mazeika, Cristina Menghini, Robert Vacareanu, Brad Kenstler, Mick Yang, Isabelle Barrass, Alice Gatti, Xuwang Yin, Eduardo Trevino, Matias Geralnik, Adam Khoja, Dean Lee, Summer Yue, Dan Hendrycks

    Abstract: As large language models (LLMs) become more capable and agentic, the requirement for trust in their outputs grows significantly, yet at the same time concerns have been mounting that models may learn to lie in pursuit of their goals. To address these concerns, a body of work has emerged around the notion of "honesty" in LLMs, along with interventions aimed at mitigating deceptive behaviors. Howeve… ▽ More

    Submitted 20 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: Website: https://www.mask-benchmark.ai

  2. arXiv:2502.08859  [pdf, other

    cs.AI cs.CL

    EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges

    Authors: Clinton J. Wang, Dean Lee, Cristina Menghini, Johannes Mols, Jack Doughty, Adam Khoja, Jayson Lynch, Sean Hendryx, Summer Yue, Dan Hendrycks

    Abstract: As language models master existing reasoning benchmarks, we need new challenges to evaluate their cognitive frontiers. Puzzle-solving events are rich repositories of challenging multimodal problems that test a wide range of advanced reasoning and knowledge capabilities, making them a unique testbed for evaluating frontier language models. We introduce EnigmaEval, a dataset of problems and solution… ▽ More

    Submitted 14 February, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

  3. arXiv:2408.15221  [pdf, other

    cs.LG cs.CL cs.CR cs.CY

    LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet

    Authors: Nathaniel Li, Ziwen Han, Ian Steneker, Willow Primack, Riley Goodside, Hugh Zhang, Zifan Wang, Cristina Menghini, Summer Yue

    Abstract: Recent large language model (LLM) defenses have greatly improved models' ability to refuse harmful queries, even when adversarially attacked. However, LLM defenses are primarily evaluated against automated adversarial attacks in a single turn of conversation, an insufficient threat model for real-world malicious use. We demonstrate that multi-turn human jailbreaks uncover significant vulnerabiliti… ▽ More

    Submitted 3 September, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

  4. arXiv:2403.16442  [pdf, other

    cs.CL cs.CV cs.LG

    If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions

    Authors: Reza Esfandiarpoor, Cristina Menghini, Stephen H. Bach

    Abstract: Recent works often assume that Vision-Language Model (VLM) representations are based on visual attributes like shape. However, it is unclear to what extent VLMs prioritize this information to represent concepts. We propose Extract and Explore (EX2), a novel approach to characterize textual features that are important for VLMs. EX2 uses reinforcement learning to align a large language model with VL… ▽ More

    Submitted 4 December, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: EMNLP 2024

  5. arXiv:2402.14086  [pdf, other

    cs.CL cs.AI cs.LG

    LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons

    Authors: Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach

    Abstract: Data scarcity in low-resource languages can be addressed with word-to-word translations from labeled task data in high-resource languages using bilingual lexicons. However, bilingual lexicons often have limited lexical overlap with task data, which results in poor translation coverage and lexicon utilization. We propose lexicon-conditioned data generation LexC-Gen, a method that generates low-reso… ▽ More

    Submitted 27 October, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: EMNLP Findings 2024

  6. arXiv:2310.02446  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Low-Resource Languages Jailbreak GPT-4

    Authors: Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach

    Abstract: AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4's safeguard through translating unsafe English inputs into low-resource languages. On… ▽ More

    Submitted 27 January, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: NeurIPS Workshop on Socially Responsible Language Modelling Research (SoLaR) 2023. Best Paper Award

  7. arXiv:2306.01669  [pdf, other

    cs.CV cs.LG

    Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

    Authors: Cristina Menghini, Andrew Delworth, Stephen H. Bach

    Abstract: Fine-tuning vision-language models (VLMs) like CLIP to downstream tasks is often necessary to optimize their performance. However, a major obstacle is the limited availability of labeled data. We study the use of pseudolabels, i.e., heuristic labels for unlabeled data, to enhance CLIP via prompt tuning. Conventional pseudolabeling trains a model on labeled data and then generates labels for unlabe… ▽ More

    Submitted 7 March, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

  8. arXiv:2205.13068  [pdf, other

    cs.LG

    Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes

    Authors: Alessio Mazzetto, Cristina Menghini, Andrew Yuan, Eli Upfal, Stephen H. Bach

    Abstract: We develop a rigorous mathematical analysis of zero-shot learning with attributes. In this setting, the goal is to label novel classes with no training data, only detectors for attributes and a description of how those attributes are correlated with the target classes, called the class-attribute matrix. We develop the first non-trivial lower bound on the worst-case error of the best map from attri… ▽ More

    Submitted 28 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  9. The Drift of #MyBodyMyChoice Discourse on Twitter

    Authors: Cristina Menghini, Justin Uhr, Shahrzad Haddadan, Ashley Champagne, Bjorn Sandstede, Sohini Ramachandran

    Abstract: #MyBodyMyChoice is a well-known hashtag originally created to advocate for women's rights, often used in discourse about abortion and bodily autonomy. The Covid-19 outbreak prompted governments to take containment measures such as vaccination campaigns and mask mandates. Population groups opposed to such measures started to use the slogan "My Body My Choice" to claim their bodily autonomy. In this… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Comments: Accepted at WebSci'22

  10. arXiv:2111.04798  [pdf, other

    cs.LG cs.CV

    TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data

    Authors: Wasu Piriyakulkij, Cristina Menghini, Ross Briden, Nihal V. Nayak, Jeffrey Zhu, Elaheh Raisi, Stephen H. Bach

    Abstract: Machine learning practitioners often have access to a spectrum of data: labeled data for the target task (which is often limited), unlabeled data, and auxiliary data, the many available labeled datasets for other tasks. We describe TAGLETS, a system built to study techniques for automatically exploiting all three types of data and creating high-quality, servable classifiers. The key components of… ▽ More

    Submitted 5 May, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: Paper published at MLSys 2022. It passed the artifact evaluation earning two ACM badges: (1) Artifacts Evaluated Functional v1.1 and (2) Artifacts Available v1.1

  11. RePBubLik: Reducing the Polarized Bubble Radius with Link Insertions

    Authors: Shahrzad Haddadan, Cristina Menghini, Matteo Riondato, Eli Upfal

    Abstract: The topology of the hyperlink graph among pages expressing different opinions may influence the exposure of readers to diverse content. Structural bias may trap a reader in a polarized bubble with no access to other opinions. We model readers' behavior as random walks. A node is in a polarized bubble if the expected length of a random walk from it to a page of different opinion is large. The struc… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

  12. How Inclusive Are Wikipedia's Hyperlinks in Articles Covering Polarizing Topics?

    Authors: Cristina Menghini, Aris Anagnostopoulos, Eli Upfal

    Abstract: Wikipedia relies on an extensive review process to verify that the content of each individual page is unbiased and presents a neutral point of view. Less attention has been paid to possible biases in the hyperlink structure of Wikipedia, which has a significant influence on the user's exploration process when visiting more than one page. The evaluation of hyperlink bias is challenging because it d… ▽ More

    Submitted 31 March, 2022; v1 submitted 16 July, 2020; originally announced July 2020.

  13. arXiv:1905.13651  [pdf, other

    cs.DS cs.LG

    Principal Fairness: Removing Bias via Projections

    Authors: Aris Anagnostopoulos, Luca Becchetti, Adriano Fazzone, Cristina Menghini, Chris Schwiegelshohn

    Abstract: Reducing hidden bias in the data and ensuring fairness in algorithmic data analysis has recently received significant attention. We complement several recent papers in this line of research by introducing a general method to reduce bias in the data through random projections in a "fair" subspace. We apply this method to densest subgraph problem. For densest subgraph, our approach based on fair p… ▽ More

    Submitted 5 March, 2021; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: Partially supported by the ERC Advanced Grant 788893 AMDROMA "Algorithmic and Mechanism Design Research in Online Markets" and MIUR PRIN project ALGADIMAR "Algorithms, Games, and Digital Markets"