Skip to main content

Showing 1–5 of 5 results for author: Thawani, A

Searching in archive cs. Search in all archives.
.
  1. HIV Client Perspectives on Digital Health in Malawi

    Authors: Lisa Orii, Caryl Feldacker, Jacqueline Madalitso Huwa, Agness Thawani, Evelyn Viola, Christine Kiruthu-Kamamia, Odala Sande, Hannock Tweya, Richard Anderson

    Abstract: eHealth has strong potential to advance HIV care in low- and middle-income countries. Given the sensitivity of HIV-related information and the risks associated with unintended HIV status disclosure, clients' privacy perceptions towards eHealth applications should be examined to develop client-centered technologies. Through focus group discussions with antiretroviral therapy (ART) clients from Ligh… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  2. arXiv:2310.11628  [pdf, other

    cs.CL cs.AI

    Learn Your Tokens: Word-Pooled Tokenization for Language Modeling

    Authors: Avijit Thawani, Saurabh Ghanekar, Xiaoyuan Zhu, Jay Pujara

    Abstract: Language models typically tokenize text into subwords, using a deterministic, hand-engineered heuristic of combining characters into longer surface-level strings such as 'ing' or whole words. Recent literature has repeatedly shown the limitations of such a tokenization strategy, particularly for documents not written in English and for representing numbers. On the other extreme, byte/character-lev… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Findings

  3. arXiv:2310.06204  [pdf, other

    cs.CL cs.AI

    Estimating Numbers without Regression

    Authors: Avijit Thawani, Jay Pujara, Ashwin Kalyan

    Abstract: Despite recent successes in language models, their ability to represent numbers is insufficient. Humans conceptualize numbers based on their magnitudes, effectively projecting them on a number line; whereas subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks. To alleviate this shortcoming, alternative approaches have been proposed that modify numbe… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Workshop on Insights from Negative Results in NLP at EACL 2023

  4. arXiv:2103.13136  [pdf, other

    cs.CL cs.AI cs.LG

    Representing Numbers in NLP: a Survey and a Vision

    Authors: Avijit Thawani, Jay Pujara, Pedro A. Szekely, Filip Ilievski

    Abstract: NLP systems rarely give special consideration to numbers found in text. This starkly contrasts with the consensus in neuroscience that, in the brain, numbers are represented differently from words. We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods. We break down the subjective notion of numeracy into 7 subtasks, arranged along two dimensions: granularity (ex… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted at NAACL 2021

    ACM Class: I.2.7

  5. arXiv:2008.03226   

    physics.chem-ph cs.LG stat.ML

    Data-Driven Discovery of Molecular Photoswitches with Multioutput Gaussian Processes

    Authors: Ryan-Rhys Griffiths, Jake L. Greenfield, Aditya R. Thawani, Arian R. Jamasb, Henry B. Moss, Anthony Bourached, Penelope Jones, William McCorkindale, Alexander A. Aldrick, Matthew J. Fuchter Alpha A. Lee

    Abstract: Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacologi… ▽ More

    Submitted 7 August, 2022; v1 submitted 28 June, 2020; originally announced August 2020.

    Comments: Authors still in discussion about authorship ordering