Skip to main content

Showing 1–13 of 13 results for author: Sheshadri, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.09565  [pdf, other

    cs.LG

    Obfuscated Activations Bypass LLM Latent-Space Defenses

    Authors: Luke Bailey, Alex Serrano, Abhay Sheshadri, Mikhail Seleznyov, Jordan Taylor, Erik Jenner, Jacob Hilton, Stephen Casper, Carlos Guestrin, Scott Emmons

    Abstract: Recent latent-space monitoring techniques have shown promise as defenses against LLM attacks. These defenses act as scanners that seek to detect harmful activations before they lead to undesirable actions. This prompts the question: Can models execute harmful behavior via inconspicuous latent states? Here, we study such obfuscated activations. We show that state-of-the-art latent-space defenses --… ▽ More

    Submitted 8 February, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: Project page: https://obfuscated-activations.github.io/ Code: https://github.com/LukeBailey181/obfuscated-activations

  2. arXiv:2412.02780  [pdf, other

    cs.LG cs.AI

    WxC-Bench: A Novel Dataset for Weather and Climate Downstream Tasks

    Authors: Rajat Shinde, Christopher E. Phillips, Kumar Ankur, Aman Gupta, Simon Pfreundschuh, Sujit Roy, Sheyenne Kirkland, Vishal Gaur, Amy Lin, Aditi Sheshadri, Udaysankar Nair, Manil Maskey, Rahul Ramachandran

    Abstract: High-quality machine learning (ML)-ready datasets play a foundational role in developing new artificial intelligence (AI) models or fine-tuning existing models for scientific applications such as weather and climate analysis. Unfortunately, despite the growing development of new deep learning models for weather and climate, there is a scarcity of curated, pre-processed machine learning (ML)-ready… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  3. arXiv:2410.12949  [pdf, other

    cs.LG cs.CL

    Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization

    Authors: Phillip Guo, Aaquib Syed, Abhay Sheshadri, Aidan Ewart, Gintare Karolina Dziugaite

    Abstract: Methods for knowledge editing and unlearning in large language models seek to edit or remove undesirable knowledge or capabilities without compromising general language modeling performance. This work investigates how mechanistic interpretability -- which, in part, aims to identify model components (circuits) associated to specific interpretable mechanisms that make up a model capability -- can im… ▽ More

    Submitted 4 December, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: 31 pages, 45 figures, 7 tables

  4. arXiv:2409.13598  [pdf, other

    cs.LG physics.ao-ph

    Prithvi WxC: Foundation Model for Weather and Climate

    Authors: Johannes Schmude, Sujit Roy, Will Trojak, Johannes Jakubik, Daniel Salles Civitarese, Shraddha Singh, Julian Kuehnert, Kumar Ankur, Aman Gupta, Christopher E Phillips, Romeo Kienzler, Daniela Szwarcman, Vishal Gaur, Rajat Shinde, Rohit Lal, Arlindo Da Silva, Jorge Luis Guevara Diaz, Anne Jones, Simon Pfreundschuh, Amy Lin, Aditi Sheshadri, Udaysankar Nair, Valentine Anantharaj, Hendrik Hamann, Campbell Watson , et al. (4 additional authors not shown)

    Abstract: Triggered by the realization that AI emulators can rival the performance of traditional numerical weather prediction models running on HPC systems, there is now an increasing number of large AI models that address use cases such as forecasting, downscaling, or nowcasting. While the parallel developments in the AI literature focus on foundation models -- models that can be effectively tuned to addr… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  5. arXiv:2407.15549  [pdf, other

    cs.LG cs.AI cs.CL

    Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

    Authors: Abhay Sheshadri, Aidan Ewart, Phillip Guo, Aengus Lynch, Cindy Wu, Vivek Hebbar, Henry Sleight, Asa Cooper Stickland, Ethan Perez, Dylan Hadfield-Menell, Stephen Casper

    Abstract: Large language models (LLMs) can often be made to behave in undesirable ways that they are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a wide variety of 'jailbreaking' techniques to elicit harmful text from models that were fine-tuned to be harmless. Recent work on red-teaming, model editing, and interpretability suggests that this challenge stems from ho… ▽ More

    Submitted 21 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  6. arXiv:2406.14775  [pdf, other

    physics.ao-ph cs.LG physics.flu-dyn physics.geo-ph

    Machine Learning Global Simulation of Nonlocal Gravity Wave Propagation

    Authors: Aman Gupta, Aditi Sheshadri, Sujit Roy, Vishal Gaur, Manil Maskey, Rahul Ramachandran

    Abstract: Global climate models typically operate at a grid resolution of hundreds of kilometers and fail to resolve atmospheric mesoscale processes, e.g., clouds, precipitation, and gravity waves (GWs). Model representation of these processes and their sources is essential to the global circulation and planetary energy budget, but subgrid scale contributions from these processes are often only approximatel… ▽ More

    Submitted 13 November, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: International Conference on Machine Learning 2024

  7. arXiv:2402.11917  [pdf, other

    cs.LG

    A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

    Authors: Jannik Brinkmann, Abhay Sheshadri, Victor Levoso, Paul Swoboda, Christian Bartelt

    Abstract: Transformers demonstrate impressive performance on a range of reasoning benchmarks. To evaluate the degree to which these abilities are a result of actual reasoning, existing work has focused on developing sophisticated benchmarks for behavioral studies. However, these studies do not provide insights into the internal mechanisms driving the observed capabilities. To improve our understanding of th… ▽ More

    Submitted 29 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  8. arXiv:2310.18417  [pdf, other

    cs.CL

    Teacher Perception of Automatically Extracted Grammar Concepts for L2 Language Learning

    Authors: Aditi Chaudhary, Arun Sampath, Ashwin Sheshadri, Antonios Anastasopoulos, Graham Neubig

    Abstract: One of the challenges in language teaching is how best to organize rules regarding syntax, semantics, or phonology in a meaningful manner. This not only requires content creators to have pedagogical skills, but also have that language's deep understanding. While comprehensive materials to develop such curricula are available in English and some broadly spoken languages, for many other languages, t… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP Findings 2023. arXiv admin note: substantial text overlap with arXiv:2206.05154

  9. arXiv:2305.14956  [pdf, other

    cs.CL

    Editing Common Sense in Transformers

    Authors: Anshita Gupta, Debanjan Mondal, Akshay Krishna Sheshadri, Wenlong Zhao, Xiang Lorraine Li, Sarah Wiegreffe, Niket Tandon

    Abstract: Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training (Meng et al., 2023). However, these editing methods have only been evaluated on statements about encyclopedic knowledge with a single correct answer. Commonsense knowledge with multiple correct answers, e.g., an apple can be green or red but not transparent, has not be… ▽ More

    Submitted 26 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023 Main Conference. Anshita, Debanjan, Akshay are co-first authors. Code and datasets for all experiments are available at https://github.com/anshitag/memit_csk

  10. arXiv:2206.05154  [pdf, other

    cs.CL

    Teacher Perception of Automatically Extracted Grammar Concepts for L2 Language Learning

    Authors: Aditi Chaudhary, Arun Sampath, Ashwin Sheshadri, Antonios Anastasopoulos, Graham Neubig

    Abstract: One of the challenges of language teaching is how to organize the rules regarding syntax, semantics, or phonology of the language in a meaningful manner. This not only requires pedagogical skills, but also requires a deep understanding of that language. While comprehensive materials to develop such curricula are available in English and some broadly spoken languages, for many other languages, teac… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: 18 pages

  11. arXiv:2101.05478  [pdf, other

    cs.CL cs.SD eess.AS

    WER-BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm

    Authors: Akshay Krishna Sheshadri, Anvesh Rao Vijjini, Sukhdeep Kharbanda

    Abstract: Automatic Speech Recognition (ASR) systems are evaluated using Word Error Rate (WER), which is calculated by comparing the number of errors between the ground truth and the transcription of the ASR system. This calculation, however, requires manual transcription of the speech signal to obtain the ground truth. Since transcribing audio signals is a costly process, Automatic WER Evaluation (e-WER) m… ▽ More

    Submitted 13 February, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

    Comments: Accepted Long Paper at EACL 2021

  12. arXiv:1904.07331  [pdf, other

    cs.CY

    Predicting Student Performance Based on Online Study Habits: A Study of Blended Courses

    Authors: Adithya Sheshadri, Niki Gitinabard, Collin F. Lynch, Tiffany Barnes, Sarah Heckman

    Abstract: Online tools provide unique access to research students' study habits and problem-solving behavior. In MOOCs, this online data can be used to inform instructors and to provide automatic guidance to students. However, these techniques may not apply in blended courses with face to face and online components. We report on a study of integrated user-system interaction logs from 3 computer science cour… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: Published in the International Conference on Educational Data Mining (EDM 2018)

  13. arXiv:1806.00755  [pdf, other

    cs.IR

    Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately and Affordably

    Authors: Mucahid Kutlu, Tyler McDonnell, Aashish Sheshadri, Tamer Elsayed, Matthew Lease

    Abstract: Crowdsourcing offers an affordable and scalable means to collect relevance judgments for IR test collections. However, crowd assessors may show higher variance in judgment quality than trusted assessors. In this paper, we investigate how to effectively utilize both groups of assessors in partnership. We specifically investigate how agreement in judging is correlated with three factors: relevance c… ▽ More

    Submitted 9 June, 2018; v1 submitted 3 June, 2018; originally announced June 2018.