Search | arXiv e-print repository

doi 10.1016/j.procs.2024.11.082

Multimodal Sentiment Analysis based on Video and Audio Inputs

Authors: Antonio Fernandez, Suzan Awinat

Abstract: Despite the abundance of current researches working on the sentiment analysis from videos and audios, finding the best model that gives the highest accuracy rate is still considered a challenge for researchers in this field. The main objective of this paper is to prove the usability of emotion recognition models that take video and audio inputs. The datasets used to train the models are the CREMA-… ▽ More Despite the abundance of current researches working on the sentiment analysis from videos and audios, finding the best model that gives the highest accuracy rate is still considered a challenge for researchers in this field. The main objective of this paper is to prove the usability of emotion recognition models that take video and audio inputs. The datasets used to train the models are the CREMA-D dataset for audio and the RAVDESS dataset for video. The fine-tuned models that been used are: Facebook/wav2vec2-large for audio and the Google/vivit-b-16x2-kinetics400 for video. The avarage of the probabilities for each emotion generated by the two previous models is utilized in the decision making framework. After disparity in the results, if one of the models gets much higher accuracy, another test framework is created. The methods used are the Weighted Average method, the Confidence Level Threshold method, the Dynamic Weighting Based on Confidence method, and the Rule-Based Logic method. This limited approach gives encouraging results that make future research into these methods viable. △ Less

Submitted 12 December, 2024; originally announced December 2024.

Comments: Presented as a full paper in the 15th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2024) October 28-30, 2024, Leuven, Belgium

Journal ref: Procedia Computer Science, Volume 251, 2024, Pages 41-48, ISSN 1877-0509

arXiv:2410.03930 [pdf, ps, other]

Reverb: Open-Source ASR and Diarization from Rev

Authors: Nishchal Bhandari, Danny Chen, Miguel Ángel del Río Fernández, Natalie Delworth, Jennifer Drexler Fox, Migüel Jetté, Quinten McNamara, Corey Miller, Ondřej Novotný, Ján Profant, Nan Qin, Martin Ratajczak, Jean-Philippe Robichaud

Abstract: Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all exi… ▽ More Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all existing open source speech recognition models across a variety of long-form speech recognition domains. △ Less

Submitted 24 February, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

arXiv:2402.06050 [pdf, other]

doi 10.1109/TCE.2020.3034619

Energy- and Quality-Aware Video Request Policy for Wireless Adaptive Streaming Clients

Authors: César Díaz, Antonio Fernández, Fernando Sacristán, Narciso García

Abstract: We present a straightforward, non-intrusive adaptive bit rate streaming segment quality selection policy which aims at extending battery lifetime during playback while limiting the impact on the user's quality of experience, thus benefiting consumers of video streaming services. This policy relies on the relationship between the available channel bandwidth and the bit rate of the representations i… ▽ More We present a straightforward, non-intrusive adaptive bit rate streaming segment quality selection policy which aims at extending battery lifetime during playback while limiting the impact on the user's quality of experience, thus benefiting consumers of video streaming services. This policy relies on the relationship between the available channel bandwidth and the bit rate of the representations in the quality ladder. It results from the characterization of the energy consumed by smartphones when running adaptive streaming client applications for different network connections (Wifi, 4G, and 5G) and the modeling of the energy consumed as a function of said relationship. Results show that a significant amount of energy can be saved (10 to 30%) by slightly modifying the default policy at the expense of a controlled reduction of video quality. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Journal ref: IEEE Transactions on Consumer Electronics, vol. 66, no. 4, pp. 366-375, Nov. 2020

arXiv:2306.09365 [pdf, other]

Fault Detection in Induction Motors using Functional Dimensionality Reduction Methods

Authors: María Barroso, José M. Bossio, Carlos M. Alaíz, Ángela Fernández

Abstract: The implementation of strategies for fault detection and diagnosis on rotating electrical machines is crucial for the reliability and safety of modern industrial systems. The contribution of this work is a methodology that combines conventional strategy of Motor Current Signature Analysis with functional dimensionality reduction methods, namely Functional Principal Components Analysis and Function… ▽ More The implementation of strategies for fault detection and diagnosis on rotating electrical machines is crucial for the reliability and safety of modern industrial systems. The contribution of this work is a methodology that combines conventional strategy of Motor Current Signature Analysis with functional dimensionality reduction methods, namely Functional Principal Components Analysis and Functional Diffusion Maps, for detecting and classifying fault conditions in induction motors. The results obtained from the proposed scheme are very encouraging, revealing a potential use in the future not only for real-time detection of the presence of a fault in an induction motor, but also in the identification of a greater number of types of faults present through an offline analysis. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2303.04485 [pdf, other]

Onsets and Velocities: Affordable Real-Time Piano Transcription Using Convolutional Neural Networks

Authors: Andres Fernandez

Abstract: Polyphonic Piano Transcription has recently experienced substantial progress, driven by the use of sophisticated Deep Learning approaches and the introduction of new subtasks such as note onset, offset, velocity and pedal detection. This progress was coupled with an increased complexity and size of the proposed models, typically relying on non-realtime components and high-resolution data. In this… ▽ More Polyphonic Piano Transcription has recently experienced substantial progress, driven by the use of sophisticated Deep Learning approaches and the introduction of new subtasks such as note onset, offset, velocity and pedal detection. This progress was coupled with an increased complexity and size of the proposed models, typically relying on non-realtime components and high-resolution data. In this work we focus on onset and velocity detection, showing that a substantially smaller and simpler convolutional approach, using lower temporal resolution (24ms), is still competitive: our proposed ONSETS&VELOCITIES model achieves state-of-the-art performance on the MAESTRO dataset for onset detection (F1=96.78%) and sets a good novel baseline for onset+velocity (F1=94.50%), while having ~3.1M parameters and maintaining real-time capabilities on modest commodity hardware. We provide open-source code to reproduce our results and a real-time demo with a pretrained model. △ Less

Submitted 31 May, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: Accepted at EUSIPCO 2023

arXiv:2303.02475 [pdf, other]

Synthetic ECG Signal Generation using Probabilistic Diffusion Models

Authors: Edmond Adib, Amanda Fernandez, Fatemeh Afghah, John Jeff Prevost

Abstract: Deep learning image processing models have had remarkable success in recent years in generating high quality images. Particularly, the Improved Denoising Diffusion Probabilistic Models (DDPM) have shown superiority in image quality to the state-of-the-art generative models, which motivated us to investigate their capability in the generation of the synthetic electrocardiogram (ECG) signals. In thi… ▽ More Deep learning image processing models have had remarkable success in recent years in generating high quality images. Particularly, the Improved Denoising Diffusion Probabilistic Models (DDPM) have shown superiority in image quality to the state-of-the-art generative models, which motivated us to investigate their capability in the generation of the synthetic electrocardiogram (ECG) signals. In this work, synthetic ECG signals are generated by the Improved DDPM and by the Wasserstein GAN with Gradient Penalty (WGAN-GP) models and then compared. To this end, we devise a pipeline to utilize DDPM in its original $2D$ form. First, the $1D$ ECG time series data are embedded into the $2D$ space, for which we employed the Gramian Angular Summation/Difference Fields (GASF/GADF) as well as Markov Transition Fields (MTF) to generate three $2D$ matrices from each ECG time series, which when put together, form a $3$-channel $2D$ datum. Then $2D$ DDPM is used to generate $2D$ $3$-channel synthetic ECG images. The $1D$ ECG signals are created by de-embedding the $2D$ generated image files back into the $1D$ space. This work focuses on unconditional models and the generation of \emph{Normal Sinus Beat} ECG signals exclusively, where the Normal Sinus Beat class from the MIT-BIH Arrhythmia dataset is used in the training phase. The \emph{quality}, \emph{distribution}, and the \emph{authenticity} of the generated ECG signals by each model are quantitatively evaluated and compared. Our results show that in the proposed pipeline and in the particular setting of this paper, the WGAN-GP model is consistently superior to DDPM in all the considered metrics. △ Less

Submitted 22 May, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

arXiv:2201.04069 [pdf, other]

doi 10.1016/j.measurement.2021.110646

A novel method for error analysis in radiation thermometry with application to industrial furnaces

Authors: Iñigo Martinez, Urtzi Otamendi, Igor G. Olaizola, Roger Solsona, Mikel Maiza, Elisabeth Viles, Arturo Fernandez, Ignacio Arzua

Abstract: Accurate temperature measurements are essential for the proper monitoring and control of industrial furnaces. However, measurement uncertainty is a risk for such a critical parameter. Certain instrumental and environmental errors must be considered when using spectral-band radiation thermometry techniques, such as the uncertainty in the emissivity of the target surface, reflected radiation from su… ▽ More Accurate temperature measurements are essential for the proper monitoring and control of industrial furnaces. However, measurement uncertainty is a risk for such a critical parameter. Certain instrumental and environmental errors must be considered when using spectral-band radiation thermometry techniques, such as the uncertainty in the emissivity of the target surface, reflected radiation from surrounding objects, or atmospheric absorption and emission, to name a few. Undesired contributions to measured radiation can be isolated using measurement models, also known as error-correction models. This paper presents a methodology for budgeting significant sources of error and uncertainty during temperature measurements in a petrochemical furnace scenario. A continuous monitoring system is also presented, aided by a deep-learning-based measurement correction model, to allow domain experts to analyze the furnace's operation in real-time. To validate the proposed system's functionality, a real-world application case in a petrochemical plant is presented. The proposed solution demonstrates the viability of precise industrial furnace monitoring, thereby increasing operational security and improving the efficiency of such energy-intensive systems. △ Less

Submitted 10 January, 2022; originally announced January 2022.

Comments: 14 pages, 14 figures, 4 tables. Accepted for publication on Measurement journal

arXiv:2107.10880 [pdf, other]

Using UMAP to Inspect Audio Data for Unsupervised Anomaly Detection under Domain-Shift Conditions

Authors: Andres Fernandez, Mark D. Plumbley

Abstract: The goal of Unsupervised Anomaly Detection (UAD) is to detect anomalous signals under the condition that only non-anomalous (normal) data is available beforehand. In UAD under Domain-Shift Conditions (UAD-S), data is further exposed to contextual changes that are usually unknown beforehand. Motivated by the difficulties encountered in the UAD-S task presented at the 2021 edition of the Detection a… ▽ More The goal of Unsupervised Anomaly Detection (UAD) is to detect anomalous signals under the condition that only non-anomalous (normal) data is available beforehand. In UAD under Domain-Shift Conditions (UAD-S), data is further exposed to contextual changes that are usually unknown beforehand. Motivated by the difficulties encountered in the UAD-S task presented at the 2021 edition of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge, we visually inspect Uniform Manifold Approximations and Projections (UMAPs) for log-STFT, log-mel and pretrained Look, Listen and Learn (L3) representations of the DCASE UAD-S dataset. In our exploratory investigation, we look for two qualities, Separability (SEP) and Discriminative Support (DSUP), and formulate several hypotheses that could facilitate diagnosis and developement of further representation and detection approaches. Particularly, we hypothesize that input length and pretraining may regulate a relevant tradeoff between SEP and DSUP. Our code as well as the resulting UMAPs and plots are publicly available. △ Less

Submitted 15 October, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

Comments: Accepted at the DCASE2021 Workshop

arXiv:2004.13905 [pdf, other]

End-to-end NILM System Using High Frequency Data and Neural Networks

Authors: Franco Marchesoni-Acland, Camilo Mariño, Elías Masquil, Pablo Masaferro, Alicia Fernández

Abstract: Improving energy efficiency is a necessity in the fight against climate change. Non Intrusive Load Monitoring (NILM) systems give important information about the household consumption that can be used by the electric utility or the end users. In this work the implementation of an end-to-end NILM system is presented, which comprises a custom high frequency meter and neural-network based algorithms.… ▽ More Improving energy efficiency is a necessity in the fight against climate change. Non Intrusive Load Monitoring (NILM) systems give important information about the household consumption that can be used by the electric utility or the end users. In this work the implementation of an end-to-end NILM system is presented, which comprises a custom high frequency meter and neural-network based algorithms. The present article presents a novel way to include high frequency information as input of neural network models by means of multivariate time series that include carefully selected features. Furthermore, it provides a detailed assessment of the generalization error and shows that this class of models generalize well to new instances of seen-in-training appliances. An evaluation database formed of measurements in two Uruguayan homes is collected and discussion on general unsupervised approaches is provided. △ Less

Submitted 28 April, 2020; originally announced April 2020.

arXiv:1411.0024 [pdf, other]

Robust sketching for multiple square-root LASSO problems

Authors: Vu Pham, Laurent El Ghaoui, Arturo Fernandez

Abstract: Many learning tasks, such as cross-validation, parameter search, or leave-one-out analysis, involve multiple instances of similar problems, each instance sharing a large part of learning data with the others. We introduce a robust framework for solving multiple square-root LASSO problems, based on a sketch of the learning data that uses low-rank approximations. Our approach allows a dramatic reduc… ▽ More Many learning tasks, such as cross-validation, parameter search, or leave-one-out analysis, involve multiple instances of similar problems, each instance sharing a large part of learning data with the others. We introduce a robust framework for solving multiple square-root LASSO problems, based on a sketch of the learning data that uses low-rank approximations. Our approach allows a dramatic reduction in computational effort, in effect reducing the number of observations from $m$ (the number of observations to start with) to $k$ (the number of singular values retained in the low-rank model), while not sacrificing---sometimes even improving---the statistical performance. Theoretical analysis, as well as numerical experiments on both synthetic and real data, illustrate the efficiency of the method in large scale applications. △ Less

Submitted 30 October, 2014; originally announced November 2014.

arXiv:1102.3867 [pdf, ps, other]

Controllability properties for the one-dimensional Heat equation under multiplicative or nonnegative additive controls with local mobile support

Authors: Luis A. Fernandez, Alexander Y. Khapalov

Abstract: We discuss several new results on nonnegative approximate controllability for the one-dimensional Heat equation governed by either multiplicative or nonnegative additive control, acting within a proper subset of the space domain at every moment of time. Our methods allow us to link these two types of controls to some extend. The main results include approximate controllability properties both for… ▽ More We discuss several new results on nonnegative approximate controllability for the one-dimensional Heat equation governed by either multiplicative or nonnegative additive control, acting within a proper subset of the space domain at every moment of time. Our methods allow us to link these two types of controls to some extend. The main results include approximate controllability properties both for the static and mobile control supports. △ Less

Submitted 18 February, 2011; originally announced February 2011.

MSC Class: 35K05; 35K20; 93B05

Showing 1–11 of 11 results for author: Fernandez, A