Skip to main content

Showing 1–8 of 8 results for author: Baby, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05201  [pdf, ps, other

    cs.AI cs.CL cs.CV

    MedGemma Technical Report

    Authors: Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick , et al. (54 additional authors not shown)

    Abstract: Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce Me… ▽ More

    Submitted 8 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2502.14827  [pdf

    cs.CV cs.AI cs.ET cs.LG

    Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison

    Authors: Aiswarya Baby, Tintu Thankom Koshy

    Abstract: Visual Question Answering (VQA) has emerged as a pivotal task in the intersection of computer vision and natural language processing, requiring models to understand and reason about visual content in response to natural language questions. Analyzing VQA datasets is essential for developing robust models that can handle the complexities of multimodal reasoning. Several approaches have been develope… ▽ More

    Submitted 4 March, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: 8 pages, No figures

  3. arXiv:2411.04135  [pdf

    cs.NI

    A New Variant of Benes Network: Its Topological Characterisation and Comparative Analysis

    Authors: Parvez Ali, Annmaria Baby, D. Antony Xavier, Eddith Sarah Varghese, Theertha Nair A., Haidar Ali

    Abstract: The modern era always looks into advancements in technology. Design and topology of interconnection networks play a mutual role in development of technology. Analysing the topological properties and characteristics of an interconnection network is not an easy task. Graph theory helps in solving this task analytically and efficiently through the use of numerical parameters known as distance based t… ▽ More

    Submitted 25 October, 2024; originally announced November 2024.

  4. arXiv:2410.09122  [pdf, other

    cs.DM

    Study on (r,s)- Generalised Transformation Graphs, A Novel Perspective Based on Transformation Graphs

    Authors: Parvez Ali, Annmaria Baby, D. Antony Xavier, Theertha Nair A, Haidar Ali, Syed Ajaz K. Kirmani

    Abstract: For a graph $\mathbb{Q}=(\mathbb{V},\mathbb{E})$, the transformation graphs are defined as graphs with vertex set being $\mathbb{V(Q)} \cup \mathbb{E(Q)}$ and edge set is described following certain conditions. In comparison to the structure descriptor of the original graph $\mathbb{Q}$, the topological descriptor of its transformation graphs displays distinct characteristics related to structure.… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  5. arXiv:2206.04305  [pdf, other

    eess.AS cs.CL cs.SD

    Context-based out-of-vocabulary word recovery for ASR systems in Indian languages

    Authors: Arun Baby, Saranya Vinnaitherthan, Akhil Kerhalkar, Pranav Jawale, Sharath Adavanne, Nagaraj Adiga

    Abstract: Detecting and recovering out-of-vocabulary (OOV) words is always challenging for Automatic Speech Recognition (ASR) systems. Many existing methods focus on modeling OOV words by modifying acoustic and language models and integrating context words cleverly into models. To train such complex models, we need a large amount of data with context words, additional training time, and increased model size… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: 12 pages

  6. arXiv:2106.10870  [pdf, other

    eess.AS cs.CL cs.SD

    Non-native English lexicon creation for bilingual speech synthesis

    Authors: Arun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam, Nagaraj Adiga, Sharath Adavanne

    Abstract: Bilingual English speakers speak English as one of their languages. Their English is of a non-native kind, and their conversations are of a code-mixed fashion. The intelligibility of a bilingual text-to-speech (TTS) system for such non-native English speakers depends on a lexicon that captures the phoneme sequence used by non-native speakers. However, due to the lack of non-native English lexicon,… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted for Presentation at Speech Synthesis Workshop (SSW), 2021 (August 2021)

  7. arXiv:2006.01463  [pdf, other

    cs.SD eess.AS

    An ASR Guided Speech Intelligibility Measure for TTS Model Selection

    Authors: Arun Baby, Saranya Vinnaitherthan, Nagaraj Adiga, Pranav Jawale, Sumukh Badam, Sharath Adavanne, Srikanth Konjeti

    Abstract: The perceptual quality of neural text-to-speech (TTS) is highly dependent on the choice of the model during training. Selecting the model using a training-objective metric such as the least mean squared error does not always correlate with human perception. In this paper, we propose an objective metric based on the phone error rate (PER) to select the TTS model with the best speech intelligibility… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: Submitted to INTERSPEECH 2020

  8. Dynamic Vision Sensors for Human Activity Recognition

    Authors: Stefanie Anna Baby, Bimal Vinod, Chaitanya Chinni, Kaushik Mitra

    Abstract: Unlike conventional cameras which capture video at a fixed frame rate, Dynamic Vision Sensors (DVS) record only changes in pixel intensity values. The output of DVS is simply a stream of discrete ON/OFF events based on the polarity of change in its pixel values. DVS has many attractive features such as low power consumption, high temporal resolution, high dynamic range and fewer storage requiremen… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

    Comments: 6 pages, 9 figures, accepted at the 4th Asian Conference on Pattern Recognition (ACPR) 2017