Skip to main content

Showing 1–12 of 12 results for author: Azemi, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.20745  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation

    Authors: Jingping Nie, Dung T. Tran, Karan Thakkar, Vasudha Kowtha, Jon Huang, Carlos Avendano, Erdrin Azemi, Vikramjit Mitra

    Abstract: Auscultation, particularly heart sound, is a non-invasive technique that provides essential vital sign information. Recently, self-supervised acoustic representation foundation models (FMs) have been proposed to offer insights into acoustics-based vital signs. However, there has been little exploration of the extent to which auscultation is encoded in these pre-trained FM representations. In this… ▽ More

    Submitted 29 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: 5 pages, Interspeech 2025 conference

  2. arXiv:2503.22711  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Modeling speech emotion with label variance and analyzing performance across speakers and unseen acoustic conditions

    Authors: Vikramjit Mitra, Amrit Romana, Dung T. Tran, Erdrin Azemi

    Abstract: Spontaneous speech emotion data usually contain perceptual grades where graders assign emotion score after listening to the speech files. Such perceptual grades introduce uncertainty in labels due to grader opinion variation. Grader variation is addressed by using consensus grades as groundtruth, where the emotion with the highest vote is selected. Consensus grades fail to consider ambiguous insta… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: 11 pages, 5 figures

  3. arXiv:2410.16424  [pdf, other

    cs.LG

    Promoting cross-modal representations to improve multimodal foundation models for physiological signals

    Authors: Ching Fang, Christopher Sandino, Behrooz Mahasseni, Juri Minxha, Hadi Pouransari, Erdrin Azemi, Ali Moin, Ellen Zippi

    Abstract: Many healthcare applications are inherently multimodal, involving several physiological signals. As sensors for these signals become more common, improving machine learning methods for multimodal healthcare data is crucial. Pretraining foundation models is a promising avenue for success. However, methods for developing foundation models in healthcare are still in early exploration and it is unclea… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024 AIM-FM Workshop

  4. arXiv:2410.08421  [pdf, other

    cs.LG

    Generalizable autoregressive modeling of time series through functional narratives

    Authors: Ran Liu, Wenrui Ma, Ellen Zippi, Hadi Pouransari, Jingyun Xiao, Chris Sandino, Behrooz Mahasseni, Juri Minxha, Erdrin Azemi, Eva L. Dyer, Ali Moin

    Abstract: Time series data are inherently functions of time, yet current transformers often learn time series by modeling them as mere concatenations of time periods, overlooking their functional properties. In this work, we propose a novel objective for transformers that learn time series by re-interpreting them as temporal functions. We build an alternative sequence of time series by constructing degradat… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  5. arXiv:2410.02147  [pdf, other

    cs.LG cs.AI eess.SP

    Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement

    Authors: Gaurav Patel, Christopher Sandino, Behrooz Mahasseni, Ellen L Zippi, Erdrin Azemi, Ali Moin, Juri Minxha

    Abstract: In this paper, we propose a framework for efficient Source-Free Domain Adaptation (SFDA) in the context of time-series, focusing on enhancing both parameter efficiency and data-sample utilization. Our approach introduces an improved paradigm for source-model preparation and target-side adaptation, aiming to enhance training efficiency during target adaptation. Specifically, we reparameterize the s… ▽ More

    Submitted 1 February, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: Accepted at ICLR 2025

  6. arXiv:2407.18424  [pdf, other

    cs.SD cs.LG eess.AS

    Model-driven Heart Rate Estimation and Heart Murmur Detection based on Phonocardiogram

    Authors: Jingping Nie, Ran Liu, Behrooz Mahasseni, Erdrin Azemi, Vikramjit Mitra

    Abstract: Acoustic signals are crucial for health monitoring, particularly heart sounds which provide essential data like heart rate and detect cardiac anomalies such as murmurs. This study utilizes a publicly available phonocardiogram (PCG) dataset to estimate heart rate using model-driven methods and extends the best-performing model to a multi-task learning (MTL) framework for simultaneous heart rate est… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 6 pages, 10 figures

  7. arXiv:2407.13035  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech

    Authors: Vikramjit Mitra, Anirban Chatterjee, Ke Zhai, Helen Weng, Ayuko Hill, Nicole Hay, Christopher Webb, Jamie Cheng, Erdrin Azemi

    Abstract: The process of human speech production involves coordinated respiratory action to elicit acoustic speech signals. Typically, speech is produced when air is forced from the lungs and is modulated by the vocal tract, where such actions are interspersed by moments of breathing in air (inhalation) to refill the lungs again. Respiratory rate (RR) is a vital metric that is used to assess the overall hea… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures, BioKDD workshop paper

  8. arXiv:2312.16180  [pdf, other

    cs.SD cs.AI cs.CL cs.LG

    Investigating salient representations and label Variance in Dimensional Speech Emotion Analysis

    Authors: Vikramjit Mitra, Jingping Nie, Erdrin Azemi

    Abstract: Representations derived from models such as BERT (Bidirectional Encoder Representations from Transformers) and HuBERT (Hidden units BERT), have helped to achieve state-of-the-art performance in dimensional speech emotion recognition. Despite their large dimensionality, and even though these representations are not tailored for emotion recognition tasks, they are frequently used to train large spee… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 5 pages

    Journal ref: ICASSP 2024

  9. arXiv:2309.05927  [pdf, other

    cs.LG cs.AI eess.SP

    Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

    Authors: Ran Liu, Ellen L. Zippi, Hadi Pouransari, Chris Sandino, Jingping Nie, Hanlin Goh, Erdrin Azemi, Ali Moin

    Abstract: Leveraging multimodal information from biosignals is vital for building a comprehensive representation of people's physical and mental states. However, multimodal biosignals often exhibit substantial distributional shifts between pretraining and inference datasets, stemming from changes in task specification or variations in modality compositions. To achieve effective pretraining in the presence o… ▽ More

    Submitted 18 April, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Extended version of ICLR 2024 Learning from Time Series for Health workshop

  10. arXiv:2303.03177  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis

    Authors: Vikramjit Mitra, Vasudha Kowtha, Hsiang-Yun Sherry Chien, Erdrin Azemi, Carlos Avendano

    Abstract: Pre-trained model representations have demonstrated state-of-the-art performance in speech recognition, natural language processing, and other applications. Speech models, such as Bidirectional Encoder Representations from Transformers (BERT) and Hidden units BERT (HuBERT), have enabled generating lexical and acoustic representations to benefit speech recognition applications. We investigated the… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: 5 pages, conference

  11. arXiv:2207.03334  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation

    Authors: Vikramjit Mitra, Hsiang-Yun Sherry Chien, Vasudha Kowtha, Joseph Yitan Cheng, Erdrin Azemi

    Abstract: Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance. Lexical… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Comments: 5 pages, 3 figures, Interspeech 2022

  12. arXiv:2007.04871  [pdf, other

    cs.LG eess.SP stat.ML

    Subject-Aware Contrastive Learning for Biosignals

    Authors: Joseph Y. Cheng, Hanlin Goh, Kaan Dogrusoz, Oncel Tuzel, Erdrin Azemi

    Abstract: Datasets for biosignals, such as electroencephalogram (EEG) and electrocardiogram (ECG), often have noisy labels and have limited number of subjects (<100). To handle these challenges, we propose a self-supervised approach based on contrastive learning to model biosignals with a reduced reliance on labeled data and with fewer subjects. In this regime of limited labels and subjects, intersubject va… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.