Skip to main content

Showing 1–9 of 9 results for author: Jha, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.14751  [pdf, ps, other

    cs.LG cs.AI cs.ET

    Self Distillation via Iterative Constructive Perturbations

    Authors: Maheak Dave, Aniket Kumar Singh, Aryan Pareek, Harshita Jha, Debasis Chaudhuri, Manish Pratap Singh

    Abstract: Deep Neural Networks have achieved remarkable achievements across various domains, however balancing performance and generalization still remains a challenge while training these networks. In this paper, we propose a novel framework that uses a cyclic optimization strategy to concurrently optimize the model and its input data for better training, rethinking the traditional training paradigm. Centr… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  2. arXiv:2412.04403  [pdf, other

    cs.CL cs.AI

    Establishing Task Scaling Laws via Compute-Efficient Model Ladders

    Authors: Akshita Bhagia, Jiacheng Liu, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi

    Abstract: We develop task scaling laws and model ladders to predict the individual task performance of pretrained language models (LMs) in the overtrained setting. Standard power laws for language modeling loss cannot accurately model task performance. Therefore, we leverage a two-step prediction approach: first use model and data size to predict a task-specific loss, and then use this task loss to predict… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  3. Scalable Data Ablation Approximations for Language Models through Modular Training and Merging

    Authors: Clara Na, Ian Magnusson, Ananya Harsh Jha, Tom Sherborne, Emma Strubell, Jesse Dodge, Pradeep Dasigi

    Abstract: Training data compositions for Large Language Models (LLMs) can significantly affect their downstream performance. However, a thorough data ablation study exploring large sets of candidate data mixtures is typically prohibitively expensive since the full effect is seen only after training the models; this can lead practitioners to settle for sub-optimal data mixtures. We propose an efficient metho… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024. 17 pages

  4. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  5. arXiv:2402.00159  [pdf, other

    cs.CL

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

    Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

  6. arXiv:2312.10523  [pdf, other

    cs.CL cs.AI cs.LG

    Paloma: A Benchmark for Evaluating Language Model Fit

    Authors: Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge

    Abstract: Evaluations of language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains--varying distributions of language. We introduce Perplexity Analysis for Language Model Assessment (Paloma), a benchmark to measure LM fit to 546 English and code domains, instead of assuming perplexity on one distribution extrapolate… ▽ More

    Submitted 7 December, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: Conference: NeurIPS 2024, Project Page: https://paloma.allen.ai/

  7. arXiv:2305.14864  [pdf, other

    cs.CL

    Just CHOP: Embarrassingly Simple LLM Compression

    Authors: Ananya Harsh Jha, Tom Sherborne, Evan Pete Walsh, Dirk Groeneveld, Emma Strubell, Iz Beltagy

    Abstract: Large language models (LLMs) enable unparalleled few- and zero-shot reasoning capabilities but at a high computational footprint. A growing assortment of methods for compression promises to reduce the computational burden of LLMs in deployment, but so far, only quantization approaches have been demonstrated to be effective for LLM compression while maintaining zero-shot performance. A critical ste… ▽ More

    Submitted 9 July, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 13 pages, 6 figures, 6 tables

  8. arXiv:2107.12329  [pdf, other

    cs.LG

    AASAE: Augmentation-Augmented Stochastic Autoencoders

    Authors: William Falcon, Ananya Harsh Jha, Teddy Koker, Kyunghyun Cho

    Abstract: Recent methods for self-supervised learning can be grouped into two paradigms: contrastive and non-contrastive approaches. Their success can largely be attributed to data augmentation pipelines which generate multiple views of a single input that preserve the underlying semantics. In this work, we introduce augmentation-augmented stochastic autoencoders (AASAE), yet another alternative to self-sup… ▽ More

    Submitted 6 February, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: 17 pages, 5 figures, 3 table

  9. arXiv:1804.10469  [pdf, other

    cs.CV

    Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders

    Authors: Ananya Harsh Jha, Saket Anand, Maneesh Singh, V. S. R. Veeravasarapu

    Abstract: Generative models that learn disentangled representations for different factors of variation in an image can be very useful for targeted data augmentation. By sampling from the disentangled latent subspace of interest, we can efficiently generate new data necessary for a particular task. Learning disentangled representations is a challenging problem, especially when certain factors of variation ar… ▽ More

    Submitted 27 April, 2018; originally announced April 2018.