Skip to main content

Showing 1–3 of 3 results for author: Lad, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19384  [pdf, ps, other

    cs.LG cs.AI cs.CL

    The Remarkable Robustness of LLMs: Stages of Inference?

    Authors: Vedang Lad, Jin Hwa Lee, Wes Gurnee, Max Tegmark

    Abstract: We investigate the robustness of Large Language Models (LLMs) to structural interventions by deleting and swapping adjacent layers during inference. Surprisingly, models retain 72-95% of their original top-1 prediction accuracy without any fine-tuning. We find that performance degradation is not uniform across layers: interventions to the early and final layers cause the most degradation, while th… ▽ More

    Submitted 16 June, 2025; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: For Github code see https://github.com/vdlad/Remarkable-Robustness-of-LLMs. Send all correspondence to the first author

  2. arXiv:2402.05110  [pdf, other

    cs.LG

    Opening the AI black box: program synthesis via mechanistic interpretability

    Authors: Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, Anish Mudide, Chloe Loughridge, Zifan Carl Guo, Tara Rezaei Kheirkhah, Mateja Vukelić, Max Tegmark

    Abstract: We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by G… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 24 pages

  3. arXiv:2307.05080  [pdf, other

    cs.LG cs.CV

    Estimating label quality and errors in semantic segmentation data via any model

    Authors: Vedang Lad, Jonas Mueller

    Abstract: The labor-intensive annotation process of semantic segmentation datasets is often prone to errors, since humans struggle to label every pixel correctly. We study algorithms to automatically detect such annotation errors, in particular methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled. This helps prioritize what data to review in or… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: ICML Workshop on Data-centric Machine Learning Research 2023