Skip to main content

Showing 1–7 of 7 results for author: Ritchie, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.04761  [pdf, other

    cs.CY cs.AI cs.CL cs.HC cs.LG

    Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations

    Authors: Kunal Handa, Alex Tamkin, Miles McCain, Saffron Huang, Esin Durmus, Sarah Heck, Jared Mueller, Jerry Hong, Stuart Ritchie, Tim Belonax, Kevin K. Troy, Dario Amodei, Jared Kaplan, Jack Clark, Deep Ganguli

    Abstract: Despite widespread speculation about artificial intelligence's impact on the future of work, we lack systematic empirical evidence about how these systems are actually being used for different tasks. Here, we present a novel framework for measuring AI usage patterns across the economy. We leverage a recent privacy-preserving system to analyze over four million Claude.ai conversations through the l… ▽ More

    Submitted 10 February, 2025; originally announced March 2025.

  2. arXiv:2502.09715  [pdf, other

    cs.LG cs.AI cs.CL

    Evaluating GPT's Capability in Identifying Stages of Cognitive Impairment from Electronic Health Data

    Authors: Yu Leng, Yingnan He, Colin Magdamo, Ana-Maria Vranceanu, Christine S. Ritchie, Shibani S. Mukerji, Lidia M. V. R. Moura, John R. Dickson, Deborah Blacker, Sudeshna Das

    Abstract: Identifying cognitive impairment within electronic health records (EHRs) is crucial not only for timely diagnoses but also for facilitating research. Information about cognitive impairment often exists within unstructured clinician notes in EHRs, but manual chart reviews are both time-consuming and error-prone. To address this issue, our study evaluates an automated approach using zero-shot GPT-4o… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: Findings paper presented at Machine Learning for Health (ML4H) symposium 2024, December 15-16, 2024, Vancouver, Canada, 7 pages

  3. arXiv:2412.13678  [pdf, other

    cs.CY cs.AI cs.CL cs.CR cs.LG

    Clio: Privacy-Preserving Insights into Real-World AI Use

    Authors: Alex Tamkin, Miles McCain, Kunal Handa, Esin Durmus, Liane Lovitt, Ankur Rathi, Saffron Huang, Alfred Mountfield, Jerry Hong, Stuart Ritchie, Michael Stern, Brian Clarke, Landon Goldberg, Theodore R. Sumers, Jared Mueller, William McEachen, Wes Mitchell, Shan Carter, Jack Clark, Jared Kaplan, Deep Ganguli

    Abstract: How are AI assistants being used in the real world? While model providers in theory have a window into this impact via their users' data, both privacy concerns and practical challenges have made analyzing this data difficult. To address these issues, we present Clio (Claude insights and observations), a privacy-preserving platform that uses AI assistants themselves to analyze and surface aggregate… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  4. arXiv:2309.10567  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Multimodal Modeling For Spoken Language Identification

    Authors: Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

    Abstract: Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance. Conventionally, it is modeled as a speech-based language identification task. Prior techniques have been constrained to a single modality; however in the case of video data there is a wealth of other metadata that may be beneficial for this task. In this work, we propose MuSeLI,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  5. arXiv:2208.03067  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

    Authors: Sandy Ritchie, You-Chi Cheng, Mingqing Chen, Rajiv Mathews, Daan van Esch, Bo Li, Khe Chai Sim

    Abstract: Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages: multilingual modeling and self-supervised learning. We gathered available open source data… ▽ More

    Submitted 4 October, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  6. arXiv:2103.15845  [pdf, other

    cs.CL

    Text Normalization for Low-Resource Languages of Africa

    Authors: Andrew Zupon, Evan Crew, Sandy Ritchie

    Abstract: Training data for machine learning models can come from many different sources, which can be of dubious quality. For resource-rich languages like English, there is a lot of data available, so we can afford to throw out the dubious data. For low-resource languages where there is much less data available, we can't necessarily afford to throw out the dubious data, in case we end up with a training se… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: to be presented at AfricaNLP 2021

  7. arXiv:2101.11575  [pdf, other

    cs.CL

    Mining Large-Scale Low-Resource Pronunciation Data From Wikipedia

    Authors: Tania Chakraborty, Manasa Prasad, Theresa Breiner, Sandy Ritchie, Daan van Esch

    Abstract: Pronunciation modeling is a key task for building speech technology in new languages, and while solid grapheme-to-phoneme (G2P) mapping systems exist, language coverage can stand to be improved. The information needed to build G2P models for many more languages can easily be found on Wikipedia, but unfortunately, it is stored in disparate formats. We report on a system we built to mine a pronuncia… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

    Comments: 7 pages, 9 figures