Skip to main content

Showing 1–50 of 156 results for author: Lipton, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.12618  [pdf, ps, other

    cs.CL

    OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics

    Authors: Vineeth Dorna, Anmol Mekala, Wenlong Zhao, Andrew McCallum, Zachary C. Lipton, J. Zico Kolter, Pratyush Maini

    Abstract: Robust unlearning is crucial for safely deploying large language models (LLMs) in environments where data privacy, model safety, and regulatory compliance must be ensured. Yet the task is inherently challenging, partly due to difficulties in reliably measuring whether unlearning has truly occurred. Moreover, fragmentation in current methodologies and inconsistent evaluation metrics hinder comparat… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  2. arXiv:2505.20178  [pdf, ps, other

    stat.ML cs.LG

    No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference

    Authors: Pranav Mani, Peng Xu, Zachary C. Lipton, Michael Oberst

    Abstract: Prediction-Powered Inference (PPI) is a popular strategy for combining gold-standard and possibly noisy pseudo-labels to perform statistical estimation. Prior work has shown an asymptotic "free lunch" for PPI++, an adaptive form of PPI, showing that the *asymptotic* variance of PPI++ is always less than or equal to the variance obtained from using gold-standard labels alone. Notably, this result h… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  3. arXiv:2504.16980  [pdf, other

    cs.LG

    Safety Pretraining: Toward the Next Generation of Safe AI

    Authors: Pratyush Maini, Sachin Goyal, Dylan Sam, Alex Robey, Yash Savani, Yiding Jiang, Andy Zou, Zacharcy C. Lipton, J. Zico Kolter

    Abstract: As large language models (LLMs) are increasingly deployed in high-stakes settings, the risk of generating harmful or toxic content remains a central challenge. Post-hoc alignment methods are brittle: once unsafe patterns are learned during pretraining, they are hard to remove. We present a data-centric pretraining framework that builds safety into the model from the start. Our contributions includ… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  4. arXiv:2503.05782  [pdf, other

    cs.CY cs.AI

    AI Mentors for Student Projects: Spotting Early Issues in Computer Science Proposals

    Authors: Gati Aher, Robin Schmucker, Tom Mitchell, Zachary C. Lipton

    Abstract: When executed well, project-based learning (PBL) engages students' intrinsic motivation, encourages students to learn far beyond a course's limited curriculum, and prepares students to think critically and maturely about the skills and tools at their disposal. However, educators experience mixed results when using PBL in their classrooms: some students thrive with minimal guidance and others floun… ▽ More

    Submitted 26 February, 2025; originally announced March 2025.

    Comments: Accepted for oral presentation at Workshop on Innovation and Responsibility in AI-Supported Education (iRAISE), AAAI 2025

  5. arXiv:2412.17009  [pdf, other

    cs.LG

    Generate to Discriminate: Expert Routing for Continual Learning

    Authors: Yewon Byun, Sanket Vaibhav Mehta, Saurabh Garg, Emma Strubell, Michael Oberst, Bryan Wilder, Zachary C. Lipton

    Abstract: In many real-world settings, regulations and economic incentives permit the sharing of models but not data across institutional boundaries. In such scenarios, practitioners might hope to adapt models to new domains, without losing performance on previous domains (so-called catastrophic forgetting). While any single model may struggle to achieve this goal, learning an ensemble of domain-specific ex… ▽ More

    Submitted 27 December, 2024; v1 submitted 22 December, 2024; originally announced December 2024.

  6. arXiv:2411.08870  [pdf, other

    cs.CL cs.AI cs.LG

    The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models

    Authors: Daniel P. Jeong, Pranav Mani, Saurabh Garg, Zachary C. Lipton, Michael Oberst

    Abstract: Several recent works seek to adapt general-purpose large language models (LLMs) and vision-language models (VLMs) for medical applications through continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining improves performance on various downstream medical tasks, such as answering medical exam questions. In this paper, we compa… ▽ More

    Submitted 28 February, 2025; v1 submitted 13 November, 2024; originally announced November 2024.

    Comments: Extended version of EMNLP 2024 paper arXiv:2411.04118. Includes additional results on clinical note QA tasks and supervised fine-tuning evaluations

  7. arXiv:2411.04118  [pdf, other

    cs.CL cs.AI cs.LG

    Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

    Authors: Daniel P. Jeong, Saurabh Garg, Zachary C. Lipton, Michael Oberst

    Abstract: Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining (DAPT) improves performance on downstream medical tasks, such as answering medical l… ▽ More

    Submitted 19 November, 2024; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: This version was published at EMNLP 2024 Main Conference as a Long Paper (Oral). See the extended version (arXiv:2411.08870) for additional results on QA tasks based on clinical notes and evaluations in the supervised fine-tuning regime

  8. arXiv:2411.03195  [pdf, other

    stat.ML cs.LG

    Online Data Collection for Efficient Semiparametric Inference

    Authors: Shantanu Gupta, Zachary C. Lipton, David Childers

    Abstract: While many works have studied statistical data fusion, they typically assume that the various datasets are given in advance. However, in practice, estimation requires difficult data collection decisions like determining the available data sources, their costs, and how many samples to collect from each source. Moreover, this process is often sequential because the data collected at a given time can… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  9. arXiv:2410.23884  [pdf, ps, other

    cs.LG cs.CL

    Failure Modes of LLMs for Causal Reasoning on Narratives

    Authors: Khurram Yamin, Shantanu Gupta, Gaurav R. Ghosal, Zachary C. Lipton, Bryan Wilder

    Abstract: The ability to robustly identify causal relationships is essential for autonomous decision-making and adaptation to novel scenarios. However, accurately inferring causal structure requires integrating both world knowledge and abstract logical reasoning. In this work, we investigate the interaction between these two capabilities through the representative task of causal reasoning over narratives. T… ▽ More

    Submitted 14 June, 2025; v1 submitted 31 October, 2024; originally announced October 2024.

    Comments: ICML 2025 Workshop on Scaling up Intervention Models

  10. arXiv:2410.09867  [pdf, other

    cs.LG

    Towards characterizing the value of edge embeddings in Graph Neural Networks

    Authors: Dhruv Rohatgi, Tanya Marwah, Zachary Chase Lipton, Jianfeng Lu, Ankur Moitra, Andrej Risteski

    Abstract: Graph neural networks (GNNs) are the dominant approach to solving machine learning problems defined over graphs. Despite much theoretical and empirical work in recent years, our understanding of finer-grained aspects of architectural design for GNNs remains impoverished. In this paper, we consider the benefits of architectures that maintain and update edge embeddings. On the theoretical front, und… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: 25 pages, 2 figures

  11. arXiv:2410.09600  [pdf, other

    cs.LG cs.CY

    The Fragility of Fairness: Causal Sensitivity Analysis for Fair Machine Learning

    Authors: Jake Fawkes, Nic Fishman, Mel Andrews, Zachary C. Lipton

    Abstract: Fairness metrics are a core tool in the fair machine learning literature (FairML), used to determine that ML models are, in some sense, "fair". Real-world data, however, are typically plagued by various measurement biases and other violated assumptions, which can render fairness assessments meaningless. We adapt tools from causal sensitivity analysis to the FairML context, providing a general fram… ▽ More

    Submitted 15 October, 2024; v1 submitted 12 October, 2024; originally announced October 2024.

    Comments: Published at Neurips 2024 in the Dataset and Benchmarks Track

  12. arXiv:2409.13210  [pdf, other

    cs.LG cs.IR

    A Unified Causal Framework for Auditing Recommender Systems for Ethical Concerns

    Authors: Vibhhu Sharma, Shantanu Gupta, Nil-Jana Akpinar, Zachary C. Lipton, Liu Leqi

    Abstract: As recommender systems become widely deployed in different domains, they increasingly influence their users' beliefs and preferences. Auditing recommender systems is crucial as it not only ensures the continuous improvement of recommendation algorithms but also safeguards against potential issues like biases and ethical concerns. In this paper, we view recommender system auditing from a causal len… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: 28 pages

  13. arXiv:2407.02694  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    LLM-Select: Feature Selection with Large Language Models

    Authors: Daniel P. Jeong, Zachary C. Lipton, Pradeep Ravikumar

    Abstract: In this paper, we demonstrate a surprising capability of large language models (LLMs): given only input feature names and a description of a prediction task, they are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Remarkably, these models exhibit this capacity across various query mechanisms. For example, we zero-shot prompt an LLM… ▽ More

    Submitted 17 April, 2025; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Published in Transactions on Machine Learning Research (TMLR), April 2025

  14. arXiv:2406.09358  [pdf, other

    cs.LG

    Understanding Hallucinations in Diffusion Models through Mode Interpolation

    Authors: Sumukh K Aithal, Pratyush Maini, Zachary C. Lipton, J. Zico Kolter

    Abstract: Colloquially speaking, image generation models based upon diffusion processes are frequently said to exhibit "hallucinations," samples that could never occur in the training data. But where do such hallucinations come from? In this paper, we study a particular failure mode in diffusion models, which we term mode interpolation. Specifically, we find that diffusion models smoothly "interpolate" betw… ▽ More

    Submitted 25 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Additional results on real datasets

  15. arXiv:2406.09264  [pdf, other

    cs.HC cs.AI cs.CL

    Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More

    Submitted 10 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: proposing "bidirectional human-AI alignment" framework after a systematic review of over 400 alignment papers

  16. arXiv:2406.03487  [pdf, other

    cs.CL cs.AI

    Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends

    Authors: Sanjana Ramprasad, Elisa Ferracane, Zachary C. Lipton

    Abstract: Recent advancements in large language models (LLMs) have considerably advanced the capabilities of summarization systems. However, they continue to face concerns about hallucinations. While prior work has evaluated LLMs extensively in news domains, most evaluation of dialogue summarization has focused on BART-based models, leaving a gap in our understanding of their faithfulness. Our work benchmar… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024

  17. arXiv:2404.15146  [pdf, other

    cs.LG cs.CL

    Rethinking LLM Memorization through the Lens of Adversarial Compression

    Authors: Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary C. Lipton, J. Zico Kolter

    Abstract: Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on how we define memorization. In this work, we pro… ▽ More

    Submitted 11 November, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: https://locuslab.github.io/acr-memorization

  18. arXiv:2404.07815  [pdf, other

    cs.LG cs.AI stat.ML

    Post-Hoc Reversal: Are We Selecting Models Prematurely?

    Authors: Rishabh Ranjan, Saurabh Garg, Mrigank Raman, Carlos Guestrin, Zachary Lipton

    Abstract: Trained models are often composed with post-hoc transforms such as temperature scaling (TS), ensembling and stochastic weight averaging (SWA) to improve performance, robustness, uncertainty estimation, etc. However, such transforms are typically applied only after the base models have already been finalized by standard means. In this paper, we challenge this practice with an extensive empirical st… ▽ More

    Submitted 3 October, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: accepted at NeurIPS 2024; v2 adds an intuitions section

  19. arXiv:2404.07177  [pdf, other

    cs.LG

    Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic

    Authors: Sachin Goyal, Pratyush Maini, Zachary C. Lipton, Aditi Raghunathan, J. Zico Kolter

    Abstract: Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets. In recent times, data curation has gained prominence with several works developing strategies to retain 'high-quality' subsets of 'raw' scraped data. For instance, the LAION public dataset retained only 10% of the total crawled data. However, these strategies are typically developed agnostic of… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Published at CVPR 2024

  20. arXiv:2403.14713  [pdf, other

    cs.LG cs.CY stat.ME stat.ML

    Auditing Fairness under Unobserved Confounding

    Authors: Yewon Byun, Dylan Sam, Michael Oberst, Zachary C. Lipton, Bryan Wilder

    Abstract: Many definitions of fairness or inequity involve unobservable causal quantities that cannot be directly estimated without strong assumptions. For instance, it is particularly difficult to estimate notions of fairness that rely on hard-to-measure concepts such as risk (e.g., quantifying whether patients at the same risk level have equal probability of treatment, regardless of group membership). Suc… ▽ More

    Submitted 9 December, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: AISTATS 2024

  21. arXiv:2402.12566  [pdf, other

    cs.CL cs.LG

    GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

    Authors: Kundan Krishna, Sanjana Ramprasad, Prakhar Gupta, Byron C. Wallace, Zachary C. Lipton, Jeffrey P. Bigham

    Abstract: LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that ar… ▽ More

    Submitted 19 January, 2025; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Code and models available at https://genaudit.org

  22. arXiv:2402.08025  [pdf, other

    cs.CV

    Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing

    Authors: Jacob Tyo, Motolani Olarinre, Youngseog Chung, Zachary C. Lipton

    Abstract: Despite significant progress in optical character recognition (OCR) and computer vision systems, robustly recognizing text and identifying people in images taken in unconstrained \emph{in-the-wild} environments remain an ongoing challenge. However, such obstacles must be overcome in practical applications of vision systems, such as identifying racers in photos taken during off-road racing events.… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2311.09256

  23. arXiv:2402.07685  [pdf, other

    cs.CV cs.LG

    Contrastive Multiple Instance Learning for Weakly Supervised Person ReID

    Authors: Jacob Tyo, Zachary C. Lipton

    Abstract: The acquisition of large-scale, precisely labeled datasets for person re-identification (ReID) poses a significant challenge. Weakly supervised ReID has begun to address this issue, although its performance lags behind fully supervised methods. In response, we introduce Contrastive Multiple Instance Learning (CMIL), a novel framework tailored for more effective weakly supervised ReID. CMIL disting… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  24. arXiv:2402.05133  [pdf, other

    cs.CL cs.AI cs.LG

    Personalized Language Modeling from Personalized Human Feedback

    Authors: Xinyu Li, Ruiyang Zhou, Zachary C. Lipton, Liu Leqi

    Abstract: Personalized large language models (LLMs) are designed to tailor responses to individual user preferences. While Reinforcement Learning from Human Feedback (RLHF) is a commonly used framework for aligning LLMs with human preferences, vanilla RLHF assumes that all human preferences share the same distribution, preventing fine-tuned LLMs from generating personalized content when user preferences are… ▽ More

    Submitted 8 December, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  25. arXiv:2402.03509  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains

    Authors: Sanjana Ramprasad, Kundan Krishna, Zachary C Lipton, Byron C Wallace

    Abstract: Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (pote… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  26. arXiv:2401.15897  [pdf, other

    cs.CY cs.HC cs.LG

    Red-Teaming for Generative AI: Silver Bullet or Security Theater?

    Authors: Michael Feffer, Anusha Sinha, Wesley Hanwen Deng, Zachary C. Lipton, Hoda Heidari

    Abstract: In response to rising concerns surrounding the safety, security, and trustworthiness of Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red-teaming as a key component of their strategies for identifying and mitigating these risks. However, despite AI red-teaming's central role in policy discussions and corporate messaging, significant questions remain about what… ▽ More

    Submitted 27 August, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: AIES 2024

  27. arXiv:2401.08788  [pdf, other

    cs.LG cs.CY stat.ML

    The Impact of Differential Feature Under-reporting on Algorithmic Fairness

    Authors: Nil-Jana Akpinar, Zachary C. Lipton, Alexandra Chouldechova

    Abstract: Predictive risk models in the public sector are commonly developed using administrative data that is more complete for subpopulations that more greatly rely on public services. In the United States, for instance, information on health care utilization is routinely available to government agencies for individuals supported by Medicaid and Medicare, but not for the privately insured. Critiques of pu… ▽ More

    Submitted 3 May, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: ACM Conference on Fairness, Accountability, and Transparency (FAccT 2024)

  28. arXiv:2401.06121  [pdf, other

    cs.LG cs.CL

    TOFU: A Task of Fictitious Unlearning for LLMs

    Authors: Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J. Zico Kolter

    Abstract: Large language models trained on massive corpora of data from the web can memorize and reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. Although several methods exist for such unlearning, it is unclear to what extent they resu… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: https://locuslab.github.io/tofu/

  29. arXiv:2312.09323  [pdf, other

    cs.AI cs.LG

    Perspectives on the State and Future of Deep Learning - 2023

    Authors: Micah Goldblum, Anima Anandkumar, Richard Baraniuk, Tom Goldstein, Kyunghyun Cho, Zachary C Lipton, Melanie Mitchell, Preetum Nakkiran, Max Welling, Andrew Gordon Wilson

    Abstract: The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time. The plan is to host this survey periodically until the AI singularity paperclip-frenzy-driven doomsday, keeping an updated list of topical questions and interviewing new community members for each edition. In this issue, we probed people's opinions on inter… ▽ More

    Submitted 18 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

  30. arXiv:2312.03318  [pdf, other

    cs.LG cs.CV stat.ML

    Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift

    Authors: Saurabh Garg, Amrith Setlur, Zachary Chase Lipton, Sivaraman Balakrishnan, Virginia Smith, Aditi Raghunathan

    Abstract: Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning). However, despite the popularity and compatibility of these techniques, their efficacy in combination remains unexplored. In this paper, we undertake a systematic empirical investi… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023

  31. arXiv:2312.00234  [pdf, other

    cs.LG math.NA stat.ML

    Deep Equilibrium Based Neural Operators for Steady-State PDEs

    Authors: Tanya Marwah, Ashwini Pokle, J. Zico Kolter, Zachary C. Lipton, Jianfeng Lu, Andrej Risteski

    Abstract: Data-driven machine learning approaches are being increasingly used to solve partial differential equations (PDEs). They have shown particularly striking successes when training an operator, which takes as input a PDE in some family, and outputs its solution. However, the architectural design space, especially given structural knowledge of the PDE family of interest, is still poorly understood. We… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: NeurIPS 2023

  32. arXiv:2311.09401  [pdf, other

    cs.CV cs.LG

    MoCo-Transfer: Investigating out-of-distribution contrastive learning for limited-data domains

    Authors: Yuwen Chen, Helen Zhou, Zachary C. Lipton

    Abstract: Medical imaging data is often siloed within hospitals, limiting the amount of data available for specialized model development. With limited in-domain data, one might hope to leverage larger datasets from related domains. In this paper, we analyze the benefit of transferring self-supervised contrastive representations from moment contrast (MoCo) pretraining on out-of-distribution data to settings… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 4 pages

  33. arXiv:2311.09256  [pdf, other

    cs.CV

    Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset

    Authors: Jacob Tyo, Youngseog Chung, Motolani Olarinre, Zachary C. Lipton

    Abstract: This paper introduces the off-road motorcycle Racer number Dataset (RnD), a new challenging dataset for optical character recognition (OCR) research. RnD contains 2,411 images from professional motorsports photographers that depict motorcycle racers in off-road competitions. The images exhibit a wide variety of factors that make OCR difficult, including mud occlusions, motion blur, non-standard fo… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  34. arXiv:2311.08488  [pdf, other

    cs.CV

    MUDD: A New Re-Identification Dataset with Efficient Annotation for Off-Road Racers in Extreme Conditions

    Authors: Jacob Tyo, Motolani Olarinre, Youngseog Chung, Zachary C. Lipton

    Abstract: Re-identifying individuals in unconstrained environments remains an open challenge in computer vision. We introduce the Muddy Racer re-IDentification Dataset (MUDD), the first large-scale benchmark for matching identities of motorcycle racers during off-road competitions. MUDD exhibits heavy mud occlusion, motion blurring, complex poses, and extreme lighting conditions previously unseen in existin… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  35. arXiv:2308.14272  [pdf, other

    cs.CL cs.LG

    Goodhart's Law Applies to NLP's Explanation Benchmarks

    Authors: Jennifer Hsia, Danish Pruthi, Aarti Singh, Zachary C. Lipton

    Abstract: Despite the rising popularity of saliency-based explanations, the research community remains at an impasse, facing doubts concerning their purpose, efficacy, and tendency to contradict each other. Seeking to unite the community's efforts around common goals, several recent works have proposed evaluation metrics. In this paper, we critically examine two sets of metrics: the ERASER metrics (comprehe… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

  36. arXiv:2307.09542  [pdf, other

    cs.LG cs.CV

    Can Neural Network Memorization Be Localized?

    Authors: Pratyush Maini, Michael C. Mozer, Hanie Sedghi, Zachary C. Lipton, J. Zico Kolter, Chiyuan Zhang

    Abstract: Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $\textit{memorize}$ "hard" examples in the final few layers of the model. Memorization refers to the ability to correctly predict on $\textit{atypical}$ examples of the training set. In this work, we show that rather than being confined to individual lay… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted at ICML 2023

  37. arXiv:2307.03132  [pdf, other

    cs.CV cs.CL cs.LG

    T-MARS: Improving Visual Representations by Circumventing Text Feature Learning

    Authors: Pratyush Maini, Sachin Goyal, Zachary C. Lipton, J. Zico Kolter, Aditi Raghunathan

    Abstract: Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only im… ▽ More

    Submitted 18 March, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Accepted to ICLR 2024. Oral at ICCV Datacomp 2023

  38. arXiv:2305.19570  [pdf, other

    stat.ML cs.LG

    Online Label Shift: Optimal Dynamic Regret meets Practical Algorithms

    Authors: Dheeraj Baby, Saurabh Garg, Tzu-Ching Yen, Sivaraman Balakrishnan, Zachary Chase Lipton, Yu-Xiang Wang

    Abstract: This paper focuses on supervised and unsupervised online label shift, where the class marginals $Q(y)$ varies but the class-conditionals $Q(x|y)$ remain invariant. In the unsupervised setting, our goal is to adapt a learner, trained on some offline labeled data, to changing label distributions given unlabeled online data. In the supervised setting, we must both learn a classifier and adapt to the… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: First three authors contributed equally

  39. arXiv:2305.17319  [pdf, other

    cs.CY cs.AI cs.GT

    Moral Machine or Tyranny of the Majority?

    Authors: Michael Feffer, Hoda Heidari, Zachary C. Lipton

    Abstract: With Artificial Intelligence systems increasingly applied in consequential domains, researchers have begun to ask how these systems ought to act in ethically charged situations where even humans lack consensus. In the Moral Machine project, researchers crowdsourced answers to "Trolley Problems" concerning autonomous vehicles. Subsequently, Noothigattu et al. (2018) proposed inferring linear functi… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: To appear in the proceedings of AAAI 2023

  40. arXiv:2305.15444  [pdf, other

    cs.CL cs.AI cs.LG

    PromptNER: Prompting For Named Entity Recognition

    Authors: Dhananjay Ashok, Zachary C. Lipton

    Abstract: In a surprising turn, Large Language Models (LLMs) together with a growing arsenal of prompt-based heuristics now offer powerful off-the-shelf approaches providing few-shot solutions to myriad classic NLP problems. However, despite promising early results, these LLM-based few-shot methods remain far from the state of the art in Named Entity Recognition (NER), where prevailing methods include learn… ▽ More

    Submitted 20 June, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  41. arXiv:2305.14296  [pdf, other

    cs.CL cs.LG

    USB: A Unified Summarization Benchmark Across Tasks and Domains

    Authors: Kundan Krishna, Prakhar Gupta, Sanjana Ramprasad, Byron C. Wallace, Jeffrey P. Bigham, Zachary C. Lipton

    Abstract: While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive summarization… ▽ More

    Submitted 4 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP Findings 2023 Camera Ready

  42. arXiv:2305.13426  [pdf, other

    cs.LG cs.AI

    Evaluating Model Performance in Medical Datasets Over Time

    Authors: Helen Zhou, Yuwen Chen, Zachary C. Lipton

    Abstract: Machine learning (ML) models deployed in healthcare systems must face data drawn from continually evolving environments. However, researchers proposing such models typically evaluate them in a time-agnostic manner, splitting datasets according to patients sampled randomly throughout the entire study time period. This work proposes the Evaluation on Medical Datasets Over Time (EMDOT) framework, whi… ▽ More

    Submitted 16 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: To appear at Conference on Health, Inference, and Learning (CHIL) 2023. arXiv admin note: substantial text overlap with arXiv:2211.07165

  43. arXiv:2305.06884  [pdf, ps, other

    stat.ME cs.AI cs.LG math.ST stat.AP stat.ML

    Risk-limiting Financial Audits via Weighted Sampling without Replacement

    Authors: Shubhanshu Shekhar, Ziyu Xu, Zachary C. Lipton, Pierre J. Liang, Aaditya Ramdas

    Abstract: We introduce the notion of a risk-limiting financial auditing (RLFA): given $N$ transactions, the goal is to estimate the total misstated monetary fraction~($m^*$) to a given accuracy $ε$, with confidence $1-δ$. We do this by constructing new confidence sequences (CSs) for the weighted average of $N$ unknown values, based on samples drawn without replacement according to a (randomized) weighted sa… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: 23 pages, 8 figures, to appear in the Proceedings of Uncertainty in Artificial Intelligence (UAI) 2023

  44. arXiv:2304.09088  [pdf, other

    cs.IR cs.HC cs.LG

    A Field Test of Bandit Algorithms for Recommendations: Understanding the Validity of Assumptions on Human Preferences in Multi-armed Bandits

    Authors: Liu Leqi, Giulio Zhou, Fatma Kılınç-Karzan, Zachary C. Lipton, Alan L. Montgomery

    Abstract: Personalized recommender systems suffuse modern life, shaping what media we read and what products we consume. Algorithms powering such systems tend to consist of supervised learning-based heuristics, such as latent factor models with a variety of heuristically chosen prediction targets. Meanwhile, theoretical treatments of recommendation frequently address the decision-theoretic nature of the pro… ▽ More

    Submitted 16 April, 2023; originally announced April 2023.

    Comments: Accepted to CHI. 16 pages, 6 figures

  45. arXiv:2303.07320  [pdf, other

    cs.CL cs.LG

    Model-tuning Via Prompts Makes NLP Models Adversarially Robust

    Authors: Mrigank Raman, Pratyush Maini, J. Zico Kolter, Zachary C. Lipton, Danish Pruthi

    Abstract: In recent years, NLP practitioners have converged on the following practice: (i) import an off-the-shelf pretrained (masked) language model; (ii) append a multilayer perceptron atop the CLS token's hidden representation (with randomly initialized weights); and (iii) fine-tune the entire model on a downstream task (MLP-FT). This procedure has produced massive gains on standard NLP benchmarks, but t… ▽ More

    Submitted 5 December, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted to the EMNLP 2023 Conference

  46. arXiv:2303.05500  [pdf, ps, other

    cs.CY cs.AI cs.HC

    Users are the North Star for AI Transparency

    Authors: Alex Mei, Michael Saxon, Shiyu Chang, Zachary C. Lipton, William Yang Wang

    Abstract: Despite widespread calls for transparent artificial intelligence systems, the term is too overburdened with disparate meanings to express precise policy aims or to orient concrete lines of research. Consequently, stakeholders often talk past each other, with policymakers expressing vague demands and practitioners devising solutions that may not address the underlying concerns. Part of why this hap… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

    Comments: 9 pages, 3 tables

  47. arXiv:2302.08070  [pdf, other

    cs.LG stat.ME

    Local Causal Discovery for Estimating Causal Effects

    Authors: Shantanu Gupta, David Childers, Zachary C. Lipton

    Abstract: Even when the causal graph underlying our data is unknown, we can use observational data to narrow down the possible values that an average treatment effect (ATE) can take by (1) identifying the graph up to a Markov equivalence class; and (2) estimating that ATE for each graph in the class. While the PC algorithm can identify this class under strong faithfulness assumptions, it can be computationa… ▽ More

    Submitted 10 April, 2024; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: Accepted at CLeaR 2023

  48. arXiv:2302.06804  [pdf, other

    cs.LG stat.ME

    Discovering Optimal Scoring Mechanisms in Causal Strategic Prediction

    Authors: Tom Yan, Shantanu Gupta, Zachary Lipton

    Abstract: Faced with data-driven policies, individuals will manipulate their features to obtain favorable decisions. While earlier works cast these manipulations as undesirable gaming, recent works have adopted a more nuanced causal framing in which manipulations can improve outcomes of interest, and setting coherent mechanisms requires accounting for both predictive accuracy and improvement of the outcome.… ▽ More

    Submitted 20 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  49. arXiv:2302.03020  [pdf, other

    cs.LG cs.CV stat.ML

    RLSbench: Domain Adaptation Under Relaxed Label Shift

    Authors: Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton

    Abstract: Despite the emergence of principled methods for domain adaptation under label shift, their sensitivity to shifts in class conditional distributions is precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with label proportions shifts. While several papers modify these heuristics in attempts to handle label proportions shifts, inconsistencies i… ▽ More

    Submitted 5 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023. Paper website: https://sites.google.com/view/rlsbench/

  50. arXiv:2302.02551  [pdf, other

    cs.CV cs.LG

    CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets

    Authors: Zachary Novack, Julian McAuley, Zachary C. Lipton, Saurabh Garg

    Abstract: Open vocabulary models (e.g. CLIP) have shown strong performance on zero-shot classification through their ability generate embeddings for each class based on their (natural language) names. Prior work has focused on improving the accuracy of these models through prompt engineering or by incorporating a small amount of labeled downstream data (via finetuning). However, there has been little focus… ▽ More

    Submitted 31 May, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023