Skip to main content

Showing 1–2 of 2 results for author: Bhate, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.10335  [pdf

    cs.CL

    MorphTok: Morphologically Grounded Tokenization for Indian Languages

    Authors: Maharaj Brahma, N J Karthika, Atul Singh, Devaraj Adiga, Smruti Bhate, Ganesh Ramakrishnan, Rohit Saluja, Maunendra Sankar Desarkar

    Abstract: Tokenization is a crucial step in NLP, especially with the rise of large language models (LLMs), impacting downstream performance, computational cost, and efficiency. Existing LLMs rely on the classical Byte-pair Encoding (BPE) algorithm for subword tokenization that greedily merges frequent character bigrams. This often leads to segmentation that does not align with linguistically meaningful unit… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  2. arXiv:2211.16467  [pdf, other

    stat.ML cs.LG

    Linear Causal Disentanglement via Interventions

    Authors: Chandler Squires, Anna Seigal, Salil Bhate, Caroline Uhler

    Abstract: Causal disentanglement seeks a representation of data involving latent variables that relate to one another via a causal model. A representation is identifiable if both the latent model and the transformation from latent to observed variables are unique. In this paper, we study observed variables that are a linear transformation of a linear latent causal model. Data from interventions are necessar… ▽ More

    Submitted 11 June, 2023; v1 submitted 29 November, 2022; originally announced November 2022.