-
ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning
Authors:
Sahil Sethi,
David Chen,
Thomas Statchen,
Michael C. Burkhart,
Nipun Bhandari,
Bashar Ramadan,
Brett Beaulieu-Jones
Abstract:
Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model's true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representati…
▽ More
Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model's true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representations of real ECG segments, enabling faithful, case-based explanations. We introduce ProtoECGNet, a prototype-based deep learning model for interpretable, multi-label ECG classification. ProtoECGNet employs a structured, multi-branch architecture that reflects clinical interpretation workflows: it integrates a 1D CNN with global prototypes for rhythm classification, a 2D CNN with time-localized prototypes for morphology-based reasoning, and a 2D CNN with global prototypes for diffuse abnormalities. Each branch is trained with a prototype loss designed for multi-label learning, combining clustering, separation, diversity, and a novel contrastive loss that encourages appropriate separation between prototypes of unrelated classes while allowing clustering for frequently co-occurring diagnoses. We evaluate ProtoECGNet on all 71 diagnostic labels from the PTB-XL dataset, demonstrating competitive performance relative to state-of-the-art black-box models while providing structured, case-based explanations. To assess prototype quality, we conduct a structured clinician review of the final model's projected prototypes, finding that they are rated as representative and clear. ProtoECGNet shows that prototype learning can be effectively scaled to complex, multi-label time-series classification, offering a practical path toward transparent and trustworthy deep learning models for clinical decision support.
△ Less
Submitted 17 May, 2025; v1 submitted 11 April, 2025;
originally announced April 2025.
-
Out of Style: RAG's Fragility to Linguistic Variation
Authors:
Tianyu Cao,
Neel Bhandari,
Akhila Yerukola,
Akari Asai,
Maarten Sap
Abstract:
Despite the impressive performance of Retrieval-augmented Generation (RAG) systems across various NLP benchmarks, their robustness in handling real-world user-LLM interaction queries remains largely underexplored. This presents a critical gap for practical deployment, where user queries exhibit greater linguistic variations and can trigger cascading errors across interdependent RAG components. In…
▽ More
Despite the impressive performance of Retrieval-augmented Generation (RAG) systems across various NLP benchmarks, their robustness in handling real-world user-LLM interaction queries remains largely underexplored. This presents a critical gap for practical deployment, where user queries exhibit greater linguistic variations and can trigger cascading errors across interdependent RAG components. In this work, we systematically analyze how varying four linguistic dimensions (formality, readability, politeness, and grammatical correctness) impact RAG performance. We evaluate two retrieval models and nine LLMs, ranging from 3 to 72 billion parameters, across four information-seeking Question Answering (QA) datasets. Our results reveal that linguistic reformulations significantly impact both retrieval and generation stages, leading to a relative performance drop of up to 40.41% in Recall@5 scores for less formal queries and 38.86% in answer match scores for queries containing grammatical errors. Notably, RAG systems exhibit greater sensitivity to such variations compared to LLM-only generations, highlighting their vulnerability to error propagation due to linguistic shifts. These findings highlight the need for improved robustness techniques to enhance reliability in diverse user interactions.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Style-agnostic evaluation of ASR using multiple reference transcripts
Authors:
Quinten McNamara,
Miguel Ángel del Río Fernández,
Nishchal Bhandari,
Martin Ratajczak,
Danny Chen,
Corey Miller,
Migüel Jetté
Abstract:
Word error rate (WER) as a metric has a variety of limitations that have plagued the field of speech recognition. Evaluation datasets suffer from varying style, formality, and inherent ambiguity of the transcription task. In this work, we attempt to mitigate some of these differences by performing style-agnostic evaluation of ASR systems using multiple references transcribed under opposing style p…
▽ More
Word error rate (WER) as a metric has a variety of limitations that have plagued the field of speech recognition. Evaluation datasets suffer from varying style, formality, and inherent ambiguity of the transcription task. In this work, we attempt to mitigate some of these differences by performing style-agnostic evaluation of ASR systems using multiple references transcribed under opposing style parameters. As a result, we find that existing WER reports are likely significantly over-estimating the number of contentful errors made by state-of-the-art ASR systems. In addition, we have found our multireference method to be a useful mechanism for comparing the quality of ASR models that differ in the stylistic makeup of their training data and target task.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Reverb: Open-Source ASR and Diarization from Rev
Authors:
Nishchal Bhandari,
Danny Chen,
Miguel Ángel del Río Fernández,
Natalie Delworth,
Jennifer Drexler Fox,
Migüel Jetté,
Quinten McNamara,
Corey Miller,
Ondřej Novotný,
Ján Profant,
Nan Qin,
Martin Ratajczak,
Jean-Philippe Robichaud
Abstract:
Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all exi…
▽ More
Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all existing open source speech recognition models across a variety of long-form speech recognition domains.
△ Less
Submitted 24 February, 2025; v1 submitted 4 October, 2024;
originally announced October 2024.
-
Quantification of stylistic differences in human- and ASR-produced transcripts of African American English
Authors:
Annika Heuser,
Tyler Kendall,
Miguel del Rio,
Quinten McNamara,
Nishchal Bhandari,
Corey Miller,
Migüel Jetté
Abstract:
Common measures of accuracy used to assess the performance of automatic speech recognition (ASR) systems, as well as human transcribers, conflate multiple sources of error. Stylistic differences, such as verbatim vs non-verbatim, can play a significant role in ASR performance evaluation when differences exist between training and test datasets. The problem is compounded for speech from underrepres…
▽ More
Common measures of accuracy used to assess the performance of automatic speech recognition (ASR) systems, as well as human transcribers, conflate multiple sources of error. Stylistic differences, such as verbatim vs non-verbatim, can play a significant role in ASR performance evaluation when differences exist between training and test datasets. The problem is compounded for speech from underrepresented varieties, where the speech to orthography mapping is not as standardized. We categorize the kinds of stylistic differences between 6 transcription versions, 4 human- and 2 ASR-produced, of 10 hours of African American English (AAE) speech. Focusing on verbatim features and AAE morphosyntactic features, we investigate the interactions of these categories with how well transcripts can be compared via word error rate (WER). The results, and overall analysis, help clarify how ASR outputs are a function of the decisions made by the training data's human transcribers.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Authors:
Ahmet Üstün,
Viraat Aryabumi,
Zheng-Xin Yong,
Wei-Yin Ko,
Daniel D'souza,
Gbemileke Onilude,
Neel Bhandari,
Shivalika Singh,
Hui-Lee Ooi,
Amr Kayid,
Freddie Vargus,
Phil Blunsom,
Shayne Longpre,
Niklas Muennighoff,
Marzieh Fadaee,
Julia Kreutzer,
Sara Hooker
Abstract:
Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOM…
▽ More
Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOMZ on the majority of tasks while covering double the number of languages. We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages -- including discriminative and generative tasks, human evaluation, and simulated win rates that cover both held-out tasks and in-distribution performance. Furthermore, we conduct detailed investigations on the optimal finetuning mixture composition, data pruning, as well as the toxicity, bias, and safety of our models. We open-source our instruction datasets and our model at https://hf.co/CohereForAI/aya-101
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Robust Calibration For Improved Weather Prediction Under Distributional Shift
Authors:
Sankalp Gilda,
Neel Bhandari,
Wendy Mak,
Andrea Panizza
Abstract:
In this paper, we present results on improving out-of-domain weather prediction and uncertainty estimation as part of the \texttt{Shifts Challenge on Robustness and Uncertainty under Real-World Distributional Shift} challenge. We find that by leveraging a mixture of experts in conjunction with an advanced data augmentation technique borrowed from the computer vision domain, in conjunction with rob…
▽ More
In this paper, we present results on improving out-of-domain weather prediction and uncertainty estimation as part of the \texttt{Shifts Challenge on Robustness and Uncertainty under Real-World Distributional Shift} challenge. We find that by leveraging a mixture of experts in conjunction with an advanced data augmentation technique borrowed from the computer vision domain, in conjunction with robust \textit{post-hoc} calibration of predictive uncertainties, we can potentially achieve more accurate and better-calibrated results with deep neural networks than with boosted tree models for tabular data. We quantify our predictions using several metrics and propose several future lines of inquiry and experimentation to boost performance.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation
Authors:
Neel Bhandari,
Pin-Yu Chen
Abstract:
Language Models today provide a high accuracy across a large number of downstream tasks. However, they remain susceptible to adversarial attacks, particularly against those where the adversarial examples maintain considerable similarity to the original text. Given the multilingual nature of text, the effectiveness of adversarial examples across translations and how machine translations can improve…
▽ More
Language Models today provide a high accuracy across a large number of downstream tasks. However, they remain susceptible to adversarial attacks, particularly against those where the adversarial examples maintain considerable similarity to the original text. Given the multilingual nature of text, the effectiveness of adversarial examples across translations and how machine translations can improve the robustness of adversarial examples remain largely unexplored. In this paper, we present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation. We demonstrate that 6 state-of-the-art text-based adversarial attacks do not maintain their efficacy after round-trip translation. Furthermore, we introduce an intervention-based solution to this problem, by integrating Machine Translation into the process of adversarial example generation and demonstrating increased robustness to round-trip translation. Our results indicate that finding adversarial examples robust to translation can help identify the insufficiency of language models that is common across languages, and motivate further research into multilingual adversarial attacks.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
A comprehensive survey on computational learning methods for analysis of gene expression data
Authors:
Nikita Bhandari,
Rahee Walambe,
Ketan Kotecha,
Satyajeet Khare
Abstract:
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification…
▽ More
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.
△ Less
Submitted 27 September, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Comparison of machine learning and deep learning techniques in promoter prediction across diverse species
Authors:
Nikita Bhandari,
Satyajeet Khare,
Rahee Walambe,
Ketan Kotecha
Abstract:
Gene promoters are the key DNA regulatory elements positioned around the transcription start sites and are responsible for regulating gene transcription process. Various alignment-based, signal-based and content-based approaches are reported for the prediction of promoters. However, since all promoter sequences do not show explicit features, the prediction performance of these techniques is poor.…
▽ More
Gene promoters are the key DNA regulatory elements positioned around the transcription start sites and are responsible for regulating gene transcription process. Various alignment-based, signal-based and content-based approaches are reported for the prediction of promoters. However, since all promoter sequences do not show explicit features, the prediction performance of these techniques is poor. Therefore, many machine learning and deep learning models have been proposed for promoter prediction. In this work, we studied methods for vector encoding and promoter classification using genome sequences of three distinct higher eukaryotes viz. yeast (Saccharomyces cerevisiae), A. thaliana (plant) and human (Homo sapiens). We compared one-hot vector encoding method with frequency-based tokenization (FBT) for data pre-processing on 1-D Convolutional Neural Network (CNN) model. We found that FBT gives a shorter input dimension reducing the training time without affecting the sensitivity and specificity of classification. We employed the deep learning techniques, mainly CNN and recurrent neural network with Long Short Term Memory (LSTM) and random forest (RF) classifier for promoter classification at k-mer sizes of 2, 4 and 8. We found CNN to be superior in classification of promoters from non-promoter sequences (binary classification) as well as species-specific classification of promoter sequences (multiclass classification). In summary, the contribution of this work lies in the use of synthetic shuffled negative dataset and frequency-based tokenization for pre-processing. This study provides a comprehensive and generic framework for classification tasks in genomic applications and can be extended to various classification problems.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Earnings-21: A Practical Benchmark for ASR in the Wild
Authors:
Miguel Del Rio,
Natalie Delworth,
Ryan Westerman,
Michelle Huang,
Nishchal Bhandari,
Joseph Palakapilly,
Quinten McNamara,
Joshua Dong,
Piotr Zelasko,
Miguel Jette
Abstract:
Commonly used speech corpora inadequately challenge academic and commercial ASR systems. In particular, speech corpora lack metadata needed for detailed analysis and WER measurement. In response, we present Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. This corpus is intended to benchmark ASR systems in the wild with special a…
▽ More
Commonly used speech corpora inadequately challenge academic and commercial ASR systems. In particular, speech corpora lack metadata needed for detailed analysis and WER measurement. In response, we present Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. This corpus is intended to benchmark ASR systems in the wild with special attention towards named entity recognition. We benchmark four commercial ASR models, two internal models built with open-source tools, and an open-source LibriSpeech model and discuss their differences in performance on Earnings-21. Using our recently released fstalign tool, we provide a candid analysis of each model's recognition capabilities under different partitions. Our analysis finds that ASR accuracy for certain NER categories is poor, presenting a significant impediment to transcript comprehension and usage. Earnings-21 bridges academic and commercial ASR system evaluation and enables further research on entity modeling and WER on real world audio.
△ Less
Submitted 15 June, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Accented Speech Recognition: A Survey
Authors:
Arthur Hinsvark,
Natalie Delworth,
Miguel Del Rio,
Quinten McNamara,
Joshua Dong,
Ryan Westerman,
Michelle Huang,
Joseph Palakapilly,
Jennifer Drexler,
Ilya Pirkin,
Nishchal Bhandari,
Miguel Jette
Abstract:
Automatic Speech Recognition (ASR) systems generalize poorly on accented speech. The phonetic and linguistic variability of accents present hard challenges for ASR systems today in both data collection and modeling strategies. The resulting bias in ASR performance across accents comes at a cost to both users and providers of ASR.
We present a survey of current promising approaches to accented sp…
▽ More
Automatic Speech Recognition (ASR) systems generalize poorly on accented speech. The phonetic and linguistic variability of accents present hard challenges for ASR systems today in both data collection and modeling strategies. The resulting bias in ASR performance across accents comes at a cost to both users and providers of ASR.
We present a survey of current promising approaches to accented speech recognition and highlight the key challenges in the space. Approaches mostly focus on single model generalization and accent feature engineering. Among the challenges, lack of a standard benchmark makes research and comparison especially difficult.
△ Less
Submitted 2 June, 2021; v1 submitted 21 April, 2021;
originally announced April 2021.
-
When and how CNNs generalize to out-of-distribution category-viewpoint combinations
Authors:
Spandan Madan,
Timothy Henry,
Jamell Dozier,
Helen Ho,
Nishchal Bhandari,
Tomotake Sasaki,
Frédo Durand,
Hanspeter Pfister,
Xavier Boix
Abstract:
Object recognition and viewpoint estimation lie at the heart of visual understanding. Recent works suggest that convolutional neural networks (CNNs) fail to generalize to out-of-distribution (OOD) category-viewpoint combinations, ie. combinations not seen during training. In this paper, we investigate when and how such OOD generalization may be possible by evaluating CNNs trained to classify both…
▽ More
Object recognition and viewpoint estimation lie at the heart of visual understanding. Recent works suggest that convolutional neural networks (CNNs) fail to generalize to out-of-distribution (OOD) category-viewpoint combinations, ie. combinations not seen during training. In this paper, we investigate when and how such OOD generalization may be possible by evaluating CNNs trained to classify both object category and 3D viewpoint on OOD combinations, and identifying the neural mechanisms that facilitate such OOD generalization. We show that increasing the number of in-distribution combinations (ie. data diversity) substantially improves generalization to OOD combinations, even with the same amount of training data. We compare learning category and viewpoint in separate and shared network architectures, and observe starkly different trends on in-distribution and OOD combinations, ie. while shared networks are helpful in-distribution, separate networks significantly outperform shared ones at OOD combinations. Finally, we demonstrate that such OOD generalization is facilitated by the neural mechanism of specialization, ie. the emergence of two types of neurons -- neurons selective to category and invariant to viewpoint, and vice versa.
△ Less
Submitted 17 November, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Overview of MC CDMA PAPR Reduction Techniques
Authors:
B. Sarala,
D. S. Venkateswarulu,
B. N. Bhandari
Abstract:
High Peak to Average Power Ratio (PAPR) of the transmitted signal is a critical problem in multicarrier modulation systems (MCM) such as Orthogonal Frequency Division Multiplexing (OFDM), and Multi-Carrier Code Division Multiple Access (MC CDMA) systems, due to large number of subcarriers. High PAPR leads to reduced resolution, and battery life. It also deteriorates system performance. This paper…
▽ More
High Peak to Average Power Ratio (PAPR) of the transmitted signal is a critical problem in multicarrier modulation systems (MCM) such as Orthogonal Frequency Division Multiplexing (OFDM), and Multi-Carrier Code Division Multiple Access (MC CDMA) systems, due to large number of subcarriers. High PAPR leads to reduced resolution, and battery life. It also deteriorates system performance. This paper focuses on review of different PAPR reduction techniques with attendant technical issues as well as criteria for selection of PAPR reduction technique. To reduce PAPR the constraints are low power consumption, and low Bit Error Rate (BER). Spectral bandwidth is improved by better spectral characteristics, and low complexity/cost.
△ Less
Submitted 10 April, 2012;
originally announced April 2012.