Skip to main content

Showing 1–13 of 13 results for author: McDuff, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.13577  [pdf, other

    cs.SD cs.AI eess.AS

    VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation

    Authors: Yubin Kim, Taehan Kim, Wonjune Kang, Eugene Park, Joonsik Yoon, Dongjae Lee, Xin Liu, Daniel McDuff, Hyeonhoon Lee, Cynthia Breazeal, Hae Won Park

    Abstract: Vocal health plays a crucial role in peoples' lives, significantly impacting their communicative abilities and interactions. However, despite the global prevalence of voice disorders, many lack access to convenient diagnosis and treatment. This paper introduces VocalAgent, an audio large language model (LLM) to address these challenges through vocal health diagnosis. We leverage Qwen-Audio-Chat fi… ▽ More

    Submitted 26 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  2. arXiv:2403.10582  [pdf, other

    eess.IV cs.LG

    How Suboptimal is Training rPPG Models with Videos and Targets from Different Body Sites?

    Authors: Björn Braun, Daniel McDuff, Christian Holz

    Abstract: Remote camera measurement of the blood volume pulse via photoplethysmography (rPPG) is a compelling technology for scalable, low-cost, and accessible assessment of cardiovascular information. Neural networks currently provide the state-of-the-art for this task and supervised training or fine-tuning is an important step in creating these models. However, most current models are trained on facial vi… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  3. arXiv:2304.14916  [pdf, other

    eess.SP cs.AI cs.HC cs.LG

    "Can't Take the Pressure?": Examining the Challenges of Blood Pressure Estimation via Pulse Wave Analysis

    Authors: Suril Mehta, Nipun Kwatra, Mohit Jain, Daniel McDuff

    Abstract: The use of observed wearable sensor data (e.g., photoplethysmograms [PPG]) to infer health measures (e.g., glucose level or blood pressure) is a very active area of research. Such technology can have a significant impact on health screening, chronic disease management and remote monitoring. A common approach is to collect sensor data and corresponding labels from a clinical grade device (e.g., blo… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

  4. arXiv:2203.05759  [pdf, other

    cs.CV cs.LG eess.IV

    Federated Remote Physiological Measurement with Imperfect Data

    Authors: Xin Liu, Mingchuan Zhang, Ziheng Jiang, Shwetak Patel, Daniel McDuff

    Abstract: The growing need for technology that supports remote healthcare is being acutely highlighted by an aging population and the COVID-19 pandemic. In health-related machine learning applications the ability to learn predictive models without data leaving a private device is attractive, especially when these data might contain features (e.g., photographs or videos of the body) that make identifying a s… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

  5. arXiv:2111.11547  [pdf, other

    cs.CV cs.LG eess.IV eess.SP

    Camera Measurement of Physiological Vital Signs

    Authors: Daniel McDuff

    Abstract: The need for remote tools for healthcare monitoring has never been more apparent. Camera measurement of vital signs leverages imaging devices to compute physiological changes by analyzing images of the human body. Building on advances in optics, machine learning, computer vision and medicine these techniques have progressed significantly since the invention of digital cameras. This paper presents… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  6. arXiv:2110.03690  [pdf, other

    eess.IV cs.CV cs.LG

    Learning Higher-Order Dynamics in Video-Based Cardiac Measurement

    Authors: Brian L. Hill, Xin Liu, Daniel McDuff

    Abstract: Computer vision methods typically optimize for first-order dynamics (e.g., optical flow). However, in many cases the properties of interest are subtle variations in higher-order changes, such as acceleration. This is true in the cardiac pulse, where the second derivative can be used as an indicator of blood pressure and arterial disease. Recent developments in camera-based vital sign measurement h… ▽ More

    Submitted 27 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

  7. arXiv:2104.05418  [pdf, other

    cs.LG cs.CV cs.SD eess.AS eess.IV

    Contrastive Learning of Global-Local Video Representations

    Authors: Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

    Abstract: Contrastive learning has delivered impressive results for various tasks in the self-supervised regime. However, existing approaches optimize for learning representations specific to downstream scenarios, i.e., \textit{global} representations suitable for tasks such as classification or \textit{local} representations for tasks such as detection and localization. While they produce satisfactory resu… ▽ More

    Submitted 27 October, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

  8. arXiv:2103.02484  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    DeepFN: Towards Generalizable Facial Action Unit Recognition with Deep Face Normalization

    Authors: Javier Hernandez, Daniel McDuff, Ognjen, Rudovic, Alberto Fung, Mary Czerwinski

    Abstract: Facial action unit recognition has many applications from market research to psychotherapy and from image captioning to entertainment. Despite its recent progress, deployment of these models has been impeded due to their limited generalization to unseen people and demographics. This work conducts an in-depth analysis of performance across several dimensions: individuals(40 subjects), genders (male… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Journal ref: 2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII)

  9. arXiv:2010.07770  [pdf, other

    eess.IV cs.LG

    The Benefit of Distraction: Denoising Remote Vitals Measurements using Inverse Attention

    Authors: Ewa Nowara, Daniel McDuff, Ashok Veeraraghavan

    Abstract: Attention is a powerful concept in computer vision. End-to-end networks that learn to focus selectively on regions of an image or video often perform strongly. However, other image regions, while not necessarily containing the signal of interest, may contain useful context. We present an approach that exploits the idea that statistics of noise may be shared between the regions that contain the sig… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

  10. arXiv:2010.06045  [pdf, other

    cs.CV cs.LG eess.IV

    Spectral Synthesis for Satellite-to-Satellite Translation

    Authors: Thomas Vandal, Daniel McDuff, Weile Wang, Andrew Michaelis, Ramakrishna Nemani

    Abstract: Earth observing satellites carrying multi-spectral sensors are widely used to monitor the physical and biological states of the atmosphere, land, and oceans. These satellites have different vantage points above the earth and different spectral imaging bands resulting in inconsistent imagery from one to another. This presents challenges in building downstream applications. What if we could generate… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  11. arXiv:2006.03790  [pdf, other

    eess.SP cs.CV eess.IV

    Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement

    Authors: Xin Liu, Josh Fromm, Shwetak Patel, Daniel McDuff

    Abstract: Telehealth and remote health monitoring have become increasingly important during the SARS-CoV-2 pandemic and it is widely expected that this will have a lasting impact on healthcare practices. These tools can help reduce the risk of exposing patients and medical staff to infection, make healthcare services more accessible, and allow providers to see more patients. However, objective measurement o… ▽ More

    Submitted 28 February, 2021; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: preprint

  12. arXiv:1910.11958  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency

    Authors: Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song

    Abstract: Current multi-reference style transfer models for Text-to-Speech (TTS) perform sub-optimally on disjoints datasets, where one dataset contains only a single style class for one of the style dimensions. These models generally fail to produce style transfer for the dimension that is underrepresented in the dataset. In this paper, we propose an adversarial cycle consistency training scheme with paire… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

  13. arXiv:1907.04378  [pdf, other

    cs.CV cs.CL cs.LG eess.AS eess.IV

    M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

    Authors: Shuang Ma, Daniel McDuff, Yale Song

    Abstract: Generative adversarial networks have led to significant advances in cross-modal/domain translation. However, typically these networks are designed for a specific task (e.g., dialogue generation or image synthesis, but not both). We present a unified model, M3D-GAN, that can translate across a wide range of modalities (e.g., text, image, and speech) and domains (e.g., attributes in images or emotio… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.