Search | arXiv e-print repository

The Cambrian Explosion of Mixed-Precision Matrix Multiplication for Quantized Deep Learning Inference

Authors: Héctor Martínez, Adrián Castelló, Francisco D. Igual, Enrique S. Quintana-Ortí

Abstract: Recent advances in deep learning (DL) have led to a shift from traditional 64-bit floating point (FP64) computations toward reduced-precision formats, such as FP16, BF16, and 8- or 16-bit integers, combined with mixed-precision arithmetic. This transition enhances computational throughput, reduces memory and bandwidth usage, and improves energy efficiency, offering significant advantages for resou… ▽ More Recent advances in deep learning (DL) have led to a shift from traditional 64-bit floating point (FP64) computations toward reduced-precision formats, such as FP16, BF16, and 8- or 16-bit integers, combined with mixed-precision arithmetic. This transition enhances computational throughput, reduces memory and bandwidth usage, and improves energy efficiency, offering significant advantages for resource-constrained edge devices. To support this shift, hardware architectures have evolved accordingly, now including adapted ISAs (Instruction Set Architectures) that expose mixed-precision vector units and matrix engines tailored for DL workloads. At the heart of many DL and scientific computing tasks is the general matrix-matrix multiplication gemm, a fundamental kernel historically optimized using axpy vector instructions on SIMD (single instruction, multiple data) units. However, as hardware moves toward mixed-precision dot-product-centric operations optimized for quantized inference, these legacy approaches are being phased out. In response to this, our paper revisits traditional high-performance gemm and describes strategies for adapting it to mixed-precision integer (MIP) arithmetic across modern ISAs, including x86_64, ARM, and RISC-V. Concretely, we illustrate novel micro-kernel designs and data layouts that better exploit today's specialized hardware and demonstrate significant performance gains from MIP arithmetic over floating-point implementations across three representative CPU architectures. These contributions highlight a new era of gemm optimization-driven by the demands of DL inference on heterogeneous architectures, marking what we term as the "Cambrian period" for matrix multiplication. △ Less

Submitted 13 June, 2025; originally announced June 2025.

Comments: 16 pages, 7 tables, 7 figures

arXiv:2411.05794 [pdf, other]

Beyond Correlation: Evaluating Multimedia Quality Models with the Constrained Concordance Index

Authors: Alessandro Ragano, Helard Becerra Martinez, Andrew Hines

Abstract: This study investigates the evaluation of multimedia quality models, focusing on the inherent uncertainties in subjective Mean Opinion Score (MOS) ratings due to factors like rater inconsistency and bias. Traditional statistical measures such as Pearson's Correlation Coefficient (PCC), Spearman's Rank Correlation Coefficient (SRCC), and Kendall's Tau (KTAU) often fail to account for these uncertai… ▽ More This study investigates the evaluation of multimedia quality models, focusing on the inherent uncertainties in subjective Mean Opinion Score (MOS) ratings due to factors like rater inconsistency and bias. Traditional statistical measures such as Pearson's Correlation Coefficient (PCC), Spearman's Rank Correlation Coefficient (SRCC), and Kendall's Tau (KTAU) often fail to account for these uncertainties, leading to inaccuracies in model performance assessment. We introduce the Constrained Concordance Index (CCI), a novel metric designed to overcome the limitations of existing metrics by considering the statistical significance of MOS differences and excluding comparisons where MOS confidence intervals overlap. Through comprehensive experiments across various domains including speech and image quality assessment, we demonstrate that CCI provides a more robust and accurate evaluation of instrumental quality models, especially in scenarios of low sample sizes, rater group variability, and restriction of range. Our findings suggest that incorporating rater subjectivity and focusing on statistically significant pairs can significantly enhance the evaluation framework for multimedia quality prediction models. This work not only sheds light on the overlooked aspects of subjective rating uncertainties but also proposes a methodological advancement for more reliable and accurate quality model evaluation. △ Less

Submitted 24 October, 2024; originally announced November 2024.

arXiv:2407.08380 [pdf, other]

Digital twins to alleviate the need for real field data in vision-based vehicle speed detection systems

Authors: Antonio Hernández Martínez, Iván García Daza, Carlos Fernández López, David Fernández Llorca

Abstract: Accurate vision-based speed estimation is much more cost-effective than traditional methods based on radar or LiDAR. However, it is also challenging due to the limitations of perspective projection on a discrete sensor, as well as the high sensitivity to calibration, lighting and weather conditions. Interestingly, deep learning approaches (which dominate the field of computer vision) are very limi… ▽ More Accurate vision-based speed estimation is much more cost-effective than traditional methods based on radar or LiDAR. However, it is also challenging due to the limitations of perspective projection on a discrete sensor, as well as the high sensitivity to calibration, lighting and weather conditions. Interestingly, deep learning approaches (which dominate the field of computer vision) are very limited in this context due to the lack of available data. Indeed, obtaining video sequences of real road traffic with accurate speed values associated with each vehicle is very complex and costly, and the number of available datasets is very limited. Recently, some approaches are focusing on the use of synthetic data. However, it is still unclear how models trained on synthetic data can be effectively applied to real world conditions. In this work, we propose the use of digital-twins using CARLA simulator to generate a large dataset representative of a specific real-world camera. The synthetic dataset contains a large variability of vehicle types, colours, speeds, lighting and weather conditions. A 3D CNN model is trained on the digital twin and tested on the real sequences. Unlike previous approaches that generate multi-camera sequences, we found that the gap between the the real and the virtual conditions is a key factor in obtaining low speed estimation errors. Even with a preliminary approach, the mean absolute error obtained remains below 3km/h. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Paper accepted at the 27th IEEE International Conference on Intelligent Transportation Systems (ITSC 2024)

arXiv:2403.15336 [pdf, other]

Dialogue Understandability: Why are we streaming movies with subtitles?

Authors: Helard Becerra Martinez, Alessandro Ragano, Diptasree Debnath, Asad Ullah, Crisron Rudolf Lucas, Martin Walsh, Andrew Hines

Abstract: Watching movies and TV shows with subtitles enabled is not simply down to audibility or speech intelligibility. A variety of evolving factors related to technological advances, cinema production and social behaviour challenge our perception and understanding. This study seeks to formalise and give context to these influential factors under a wider and novel term referred to as Dialogue Understanda… ▽ More Watching movies and TV shows with subtitles enabled is not simply down to audibility or speech intelligibility. A variety of evolving factors related to technological advances, cinema production and social behaviour challenge our perception and understanding. This study seeks to formalise and give context to these influential factors under a wider and novel term referred to as Dialogue Understandability. We propose a working definition for Dialogue Understandability being a listener's capacity to follow the story without undue cognitive effort or concentration being required that impacts their Quality of Experience (QoE). The paper identifies, describes and categorises the factors that influence Dialogue Understandability mapping them over the QoE framework, a media streaming lifecycle, and the stakeholders involved. We then explore available measurement tools in the literature and link them to the factors they could potentially be used for. The maturity and suitability of these tools is evaluated over a set of pilot experiments. Finally, we reflect on the gaps that still need to be filled, what we can measure and what not, future subjective experiments, and new research trends that could help us to fully characterise Dialogue Understandability. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.07731 [pdf, other]

doi 10.1007/978-3-031-23220-6_5

Performance Analysis of Matrix Multiplication for Deep Learning on the Edge

Authors: Cristian Ramírez, Adrián Castelló, Héctor Martínez, Enrique S. Quintana-Ortí

Abstract: The devices designed for the Internet-of-Things encompass a large variety of distinct processor architectures, forming a highly heterogeneous zoo. In order to tackle this, we employ a simulator to estimate the performance of the matrix-matrix multiplication (GEMM) kernel on processors designed to operate at the edge. Our simulator adheres to the modern implementations of GEMM, advocated by GotoBLA… ▽ More The devices designed for the Internet-of-Things encompass a large variety of distinct processor architectures, forming a highly heterogeneous zoo. In order to tackle this, we employ a simulator to estimate the performance of the matrix-matrix multiplication (GEMM) kernel on processors designed to operate at the edge. Our simulator adheres to the modern implementations of GEMM, advocated by GotoBLAS2, BLIS, OpenBLAS, etc., to carefully account for the amount of data transfers across the memory hierarchy of different algorithmic variants of the kernel. %Armed with this tool, A small collection of experiments provide the necessary data to calibrate the simulator and deliver highly accurate estimations of the execution time for a given processor architecture. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 12 pages, 2 Tables, 6 Figures

Journal ref: High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham

arXiv:2310.20347 [pdf, other]

Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

Authors: Guillermo Alaejos, Adrián Castelló, Pedro Alonso-Jordá, Francisco D. Igual, Héctor Martínez, Enrique S. Quintana-Ortí

Abstract: We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in order to obtain high-performance blocked formulations of the general matrix multiplication (GEMM). % In addition, we fully automatize the generation process, by also leveragin… ▽ More We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in order to obtain high-performance blocked formulations of the general matrix multiplication (GEMM). % In addition, we fully automatize the generation process, by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for GEMM. This is in contrast with the convention in high performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. % In global, the combination of our TVM-generated blocked algorithms and micro-kernels for GEMM 1)~improves portability, maintainability and, globally, streamlines the software life cycle; 2)~provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and 3)~features a small memory footprint. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 35 pages, 22 figures. Submitted to ACM TOMS

arXiv:2310.20093 [pdf, other]

Evaluating Neural Language Models as Cognitive Models of Language Acquisition

Authors: Héctor Javier Vázquez Martínez, Annika Lea Heuser, Charles Yang, Jordan Kodner

Abstract: The success of neural language models (LMs) on many technological tasks has brought about their potential relevance as scientific theories of language despite some clear differences between LM training and child language acquisition. In this paper we argue that some of the most prominent benchmarks for evaluating the syntactic capacities of LMs may not be sufficiently rigorous. In particular, we s… ▽ More The success of neural language models (LMs) on many technological tasks has brought about their potential relevance as scientific theories of language despite some clear differences between LM training and child language acquisition. In this paper we argue that some of the most prominent benchmarks for evaluating the syntactic capacities of LMs may not be sufficiently rigorous. In particular, we show that the template-based benchmarks lack the structural diversity commonly found in the theoretical and psychological studies of language. When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models. We advocate for the use of the readily available, carefully curated datasets that have been evaluated for gradient acceptability by large pools of native speakers and are designed to probe the structural basis of grammar specifically. On one such dataset, the LI-Adger dataset, LMs evaluate sentences in a way inconsistent with human language users. We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: To appear in the GenBench 2023 workshop proceedings, the first workshop on (benchmarking) generalisation in NLP. GenBench 2023 will be held at EMNLP 2023 on December 6, 2023

arXiv:2310.17408 [pdf, other]

Tackling the Matrix Multiplication Micro-kernel Generation with Exo

Authors: Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Héctor Martínez

Abstract: The optimization of the matrix multiplication (or GEMM) has been a need during the last decades. This operation is considered the flagship of current linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its widespread use in a large variety of scientific applications. The GEMM is usually implemented following the GotoBLAS philosophy, which tiles the GEMM operands and uses a… ▽ More The optimization of the matrix multiplication (or GEMM) has been a need during the last decades. This operation is considered the flagship of current linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its widespread use in a large variety of scientific applications. The GEMM is usually implemented following the GotoBLAS philosophy, which tiles the GEMM operands and uses a series of nested loops for performance improvement. These approaches extract the maximum computational power of the architectures through small pieces of hardware-oriented, high-performance code called micro-kernel. However, this approach forces developers to generate, with a non-negligible effort, a dedicated micro-kernel for each new hardware. In this work, we present a step-by-step procedure for generating micro-kernels with the Exo compiler that performs close to (or even better than) manually developed microkernels written with intrinsic functions or assembly language. Our solution also improves the portability of the generated code, since a hardware target is fully specified by a concise library-based description of its instructions. △ Less

Submitted 27 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: 11 pages, 18 figures. Presented at CGO 2024. It includes a software artifact step-by-step execution

arXiv:2304.14480 [pdf, other]

Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

Authors: Héctor Martínez, Sandra Catalán, Francisco D. Igual, José R. Herrero, Rafael Rodríguez-Sánchez, Enrique S. Quintana-Ortí

Abstract: This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level, architecture-dependent kernels in BLAS (Basic Linear Algebra Subprograms). Specifically, we propose customizing the GEMM (general matrix multiplication) kernel, which is invoke… ▽ More This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level, architecture-dependent kernels in BLAS (Basic Linear Algebra Subprograms). Specifically, we propose customizing the GEMM (general matrix multiplication) kernel, which is invoked from the blocked algorithms for relevant matrix factorizations in LAPACK, to improve performance on modern multicore processors with hierarchical cache memories. To achieve this, we leverage an analytical model to dynamically adapt the cache configuration parameters of the GEMM to the shape of the matrix operands. Additionally, we accommodate a flexible development of architecture-specific micro-kernels that allow us to further improve the utilization of the cache hierarchy. Our experiments on two platforms, equipped with ARM (NVIDIA Carmel, Neon) and x86 (AMD EPYC, AVX2) multi-core processors, demonstrate the benefits of this approach in terms of better cache utilization and, in general, higher performance. However, they also reveal the delicate balance between optimizing for multi-threaded parallelism versus cache usage. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2206.00343 [pdf, other]

Towards view-invariant vehicle speed detection from driving simulator images

Authors: Antonio Hernández Martínez, David Fernandez Llorca, Iván García Daza

Abstract: The use of cameras for vehicle speed measurement is much more cost effective compared to other technologies such as inductive loops, radar or laser. However, accurate speed measurement remains a challenge due to the inherent limitations of cameras to provide accurate range estimates. In addition, classical vision-based methods are very sensitive to extrinsic calibration between the camera and the… ▽ More The use of cameras for vehicle speed measurement is much more cost effective compared to other technologies such as inductive loops, radar or laser. However, accurate speed measurement remains a challenge due to the inherent limitations of cameras to provide accurate range estimates. In addition, classical vision-based methods are very sensitive to extrinsic calibration between the camera and the road. In this context, the use of data-driven approaches appears as an interesting alternative. However, data collection requires a complex and costly setup to record videos under real traffic conditions from the camera synchronized with a high-precision speed sensor to generate the ground truth speed values. It has recently been demonstrated that the use of driving simulators (e.g., CARLA) can serve as a robust alternative for generating large synthetic datasets to enable the application of deep learning techniques for vehicle speed estimation for a single camera. In this paper, we study the same problem using multiple cameras in different virtual locations and with different extrinsic parameters. We address the question of whether complex 3D-CNN architectures are capable of implicitly learning view-invariant speeds using a single model, or whether view-specific models are more appropriate. The results are very promising as they show that a single model with data from multiple views reports even better accuracy than camera-specific models, paving the way towards a view-invariant vehicle speed measurement system. △ Less

Submitted 28 September, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022)

arXiv:2104.09903 [pdf, other]

Data-driven vehicle speed detection from synthetic driving simulator images

Authors: Antonio Hernández Martínez, Javier Lorenzo Díaz, Iván García Daza, David Fernández Llorca

Abstract: Despite all the challenges and limitations, vision-based vehicle speed detection is gaining research interest due to its great potential benefits such as cost reduction, and enhanced additional functions. As stated in a recent survey [1], the use of learning-based approaches to address this problem is still in its infancy. One of the main difficulties is the need for a large amount of data, which… ▽ More Despite all the challenges and limitations, vision-based vehicle speed detection is gaining research interest due to its great potential benefits such as cost reduction, and enhanced additional functions. As stated in a recent survey [1], the use of learning-based approaches to address this problem is still in its infancy. One of the main difficulties is the need for a large amount of data, which must contain the input sequences and, more importantly, the output values corresponding to the actual speed of the vehicles. Data collection in this context requires a complex and costly setup to capture the images from the camera synchronized with a high precision speed sensor to generate the ground truth speed values. In this paper we explore, for the first time, the use of synthetic images generated from a driving simulator (e.g., CARLA) to address vehicle speed detection using a learning-based approach. We simulate a virtual camera placed over a stretch of road, and generate thousands of images with variability corresponding to multiple speeds, different vehicle types and colors, and lighting and weather conditions. Two different approaches to map the sequence of images to an output speed (regression) are studied, including CNN-GRU and 3D-CNN. We present preliminary results that support the high potential of this approach to address vehicle speed detection. △ Less

Submitted 20 April, 2021; originally announced April 2021.

Comments: Submitted to the IEEE Intelligent Transportation Systems Conference 2021 (ITSC2021)

arXiv:2104.05711 [pdf, other]

doi 10.1038/s41467-022-28810-x

The world-wide waste web

Authors: Johann H. Martínez, Sergi Romero, José J. Ramasco, Ernesto Estrada

Abstract: Countries globally trade with tons of waste materials every year, some of which are highly hazardous. This trade admits a network representation of the world-wide waste web, with countries as vertices and flows as directed weighted edges. Here we investigate the main properties of this network by tracking 108 categories of wastes interchanged in the period 2001-2019. Although, most of the hazardou… ▽ More Countries globally trade with tons of waste materials every year, some of which are highly hazardous. This trade admits a network representation of the world-wide waste web, with countries as vertices and flows as directed weighted edges. Here we investigate the main properties of this network by tracking 108 categories of wastes interchanged in the period 2001-2019. Although, most of the hazardous waste was traded between developed nations, a disproportionate asymmetry existed in the flow from developed to developing countries. Using a dynamical model, we simulate how waste stress propagates through the network and affects the countries. We identify 28 countries with low Environmental Performance Index that are at high risk of waste congestion. Therefore, they are at threat of improper handling and disposal of hazardous waste. We find evidence of pollution by heavy metals, by volatile organic compounds and/or by persistent organic pollutants, which are used as chemical fingerprints, due to the improper handling of waste in several of these countries. △ Less

Submitted 14 March, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Nat Commun (2022). Main manuscript, and supplementary information. Total of 15 figures and 58 pages

arXiv:2103.11715 [pdf, other]

Transforming Exploratory Creativity with DeLeNoX

Authors: Antonios Liapis, Hector P. Martinez, Julian Togelius, Georgios N. Yannakakis

Abstract: We introduce DeLeNoX (Deep Learning Novelty Explorer), a system that autonomously creates artifacts in constrained spaces according to its own evolving interestingness criterion. DeLeNoX proceeds in alternating phases of exploration and transformation. In the exploration phases, a version of novelty search augmented with constraint handling searches for maximally diverse artifacts using a given di… ▽ More We introduce DeLeNoX (Deep Learning Novelty Explorer), a system that autonomously creates artifacts in constrained spaces according to its own evolving interestingness criterion. DeLeNoX proceeds in alternating phases of exploration and transformation. In the exploration phases, a version of novelty search augmented with constraint handling searches for maximally diverse artifacts using a given distance function. In the transformation phases, a deep learning autoencoder learns to compress the variation between the found artifacts into a lower-dimensional space. The newly trained encoder is then used as the basis for a new distance function, transforming the criteria for the next exploration phase. In the current paper, we apply DeLeNoX to the creation of spaceships suitable for use in two-dimensional arcade-style computer games, a representative problem in procedural content generation in games. We also situate DeLeNoX in relation to the distinction between exploratory and transformational creativity, and in relation to Schmidhuber's theory of creativity through the drive for compression progress. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Comments: 8 pages

Journal ref: Proceedings of the Fourth International Conference on Computational Creativity, 2013, pages 56-63

arXiv:2101.06159 [pdf, other]

doi 10.1049/itr2.12079

Vision-based Vehicle Speed Estimation: A Survey

Authors: David Fernández Llorca, Antonio Hernández Martínez, Iván García Daza

Abstract: The need to accurately estimate the speed of road vehicles is becoming increasingly important for at least two main reasons. First, the number of speed cameras installed worldwide has been growing in recent years, as the introduction and enforcement of appropriate speed limits is considered one of the most effective means to increase the road safety. Second, traffic monitoring and forecasting in r… ▽ More The need to accurately estimate the speed of road vehicles is becoming increasingly important for at least two main reasons. First, the number of speed cameras installed worldwide has been growing in recent years, as the introduction and enforcement of appropriate speed limits is considered one of the most effective means to increase the road safety. Second, traffic monitoring and forecasting in road networks plays a fundamental role to enhance traffic, emissions and energy consumption in smart cities, being the speed of the vehicles one of the most relevant parameters of the traffic state. Among the technologies available for the accurate detection of vehicle speed, the use of vision-based systems brings great challenges to be solved, but also great potential advantages, such as the drastic reduction of costs due to the absence of expensive range sensors, and the possibility of identifying vehicles accurately. This paper provides a review of vision-based vehicle speed estimation. We describe the terminology, the application domains, and propose a complete taxonomy of a large selection of works that categorizes all stages involved. An overview of performance evaluation metrics and available datasets is provided. Finally, we discuss current limitations and future directions. △ Less

Submitted 26 May, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

Comments: Manuscript published in the IET Intelligent Transport Systems journal

Journal ref: IET Intelligent Transport Systems 2021

arXiv:2003.11100 [pdf, other]

How deep is your encoder: an analysis of features descriptors for an autoencoder-based audio-visual quality metric

Authors: Helard Martinez, Andrew Hines, Mylene C. Q. Farias

Abstract: The development of audio-visual quality assessment models poses a number of challenges in order to obtain accurate predictions. One of these challenges is the modelling of the complex interaction that audio and visual stimuli have and how this interaction is interpreted by human users. The No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd) deals with this problem from a… ▽ More The development of audio-visual quality assessment models poses a number of challenges in order to obtain accurate predictions. One of these challenges is the modelling of the complex interaction that audio and visual stimuli have and how this interaction is interpreted by human users. The No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd) deals with this problem from a machine learning perspective. The metric receives two sets of audio and video features descriptors and produces a low-dimensional set of features used to predict the audio-visual quality. A basic implementation of NAViDAd was able to produce accurate predictions tested with a range of different audio-visual databases. The current work performs an ablation study on the base architecture of the metric. Several modules are removed or re-trained using different configurations to have a better understanding of the metric functionality. The results presented in this study provided important feedback that allows us to understand the real capacity of the metric's architecture and eventually develop a much better audio-visual quality metric. △ Less

Submitted 24 March, 2020; originally announced March 2020.

arXiv:2001.11406 [pdf, other]

doi 10.23919/EUSIPCO.2019.8902975

NAViDAd: A No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder

Authors: Helard Martinez, M. C. Farias, A. Hines

Abstract: The development of models for quality prediction of both audio and video signals is a fairly mature field. But, although several multimodal models have been proposed, the area of audio-visual quality prediction is still an emerging area. In fact, despite the reasonable performance obtained by combination and parametric metrics, currently there is no reliable pixel-based audio-visual quality metric… ▽ More The development of models for quality prediction of both audio and video signals is a fairly mature field. But, although several multimodal models have been proposed, the area of audio-visual quality prediction is still an emerging area. In fact, despite the reasonable performance obtained by combination and parametric metrics, currently there is no reliable pixel-based audio-visual quality metric. The approach presented in this work is based on the assumption that autoencoders, fed with descriptive audio and video features, might produce a set of features that is able to describe the complex audio and video interactions. Based on this hypothesis, we propose a No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd). The model visual features are natural scene statistics (NSS) and spatial-temporal measures of the video component. Meanwhile, the audio features are obtained by computing the spectrogram representation of the audio component. The model is formed by a 2-layer framework that includes a deep autoencoder layer and a classification layer. These two layers are stacked and trained to build the deep neural network model. The model is trained and tested using a large set of stimuli, containing representative audio and video artifacts. The model performed well when tested against the UnB-AV and the LiveNetflix-II databases. %Results shows that this type of approach produces quality scores that are highly correlated to subjective quality scores. △ Less

Submitted 4 February, 2020; v1 submitted 30 January, 2020; originally announced January 2020.

Comments: 5 pages

Journal ref: 2019 27th European Signal Processing Conference (EUSIPCO), IEEE, 2019, pp 1-5

arXiv:1804.07533 [pdf, other]

In defence of the simple: Euclidean distance for comparing complex networks

Authors: Johann H. Martínez, Mario Chavez

Abstract: To improve our understanding of connected systems, different tools derived from statistics, signal processing, information theory and statistical physics have been developed in the last decade. Here, we will focus on the graph comparison problem. Although different estimates exist to quantify how different two networks are, an appropriate metric has not been proposed. Within this framework we comp… ▽ More To improve our understanding of connected systems, different tools derived from statistics, signal processing, information theory and statistical physics have been developed in the last decade. Here, we will focus on the graph comparison problem. Although different estimates exist to quantify how different two networks are, an appropriate metric has not been proposed. Within this framework we compare the performances of different networks distances (a topological descriptor and a kernel-based approach) with the simple Euclidean metric. We define the performance of metrics as the efficiency of distinguish two network's groups and the computing time. We evaluate these frameworks on synthetic and real-world networks (functional connectomes from Alzheimer patients and healthy subjects), and we show that the Euclidean distance is the one that efficiently captures networks differences in comparison to other proposals. We conclude that the operational use of complicated methods can be justified only by showing that they out-perform well-understood traditional statistics, such as Euclidean metrics. △ Less

Submitted 20 April, 2018; originally announced April 2018.

Comments: 4 pages, 3 figures

MSC Class: 68R10; 90C35; 94C15

arXiv:1506.01709 [pdf, ps, other]

The Preference Learning Toolbox

Authors: Vincent E. Farrugia, Héctor P. Martínez, Georgios N. Yannakakis

Abstract: Preference learning (PL) is a core area of machine learning that handles datasets with ordinal relations. As the number of generated data of ordinal nature is increasing, the importance and role of the PL field becomes central within machine learning research and practice. This paper introduces an open source, scalable, efficient and accessible preference learning toolbox that supports the key pha… ▽ More Preference learning (PL) is a core area of machine learning that handles datasets with ordinal relations. As the number of generated data of ordinal nature is increasing, the importance and role of the PL field becomes central within machine learning research and practice. This paper introduces an open source, scalable, efficient and accessible preference learning toolbox that supports the key phases of the data training process incorporating various popular data preprocessing, feature selection and preference learning methods. △ Less

Submitted 4 June, 2015; originally announced June 2015.

arXiv:1412.6395 [pdf, other]

SClib, a hack for straightforward embedded C functions in Python

Authors: Esteban Fuentes, Hector E. Martinez

Abstract: We present SClib, a simple hack that allows easy and straightforward evaluation of C functions within Python code, boosting flexibility for better trade-off between computation power and feature availability, such as visualization and existing computation routines in SciPy. We also present two cases were SClib has been used. In the first set of applications we use SClib to write a port to Python o… ▽ More We present SClib, a simple hack that allows easy and straightforward evaluation of C functions within Python code, boosting flexibility for better trade-off between computation power and feature availability, such as visualization and existing computation routines in SciPy. We also present two cases were SClib has been used. In the first set of applications we use SClib to write a port to Python of a Schrödinger equation solver that has been extensively used the literature, the resulting script presents a speed-up of about 150x with respect to the original one. A review of the situations where the speeded-up script has been used is presented. We also describe the solution to the related problem of solving a set of coupled Schrödinger-like equations where SClib is used to implement the speed-critical parts of the code. We argue that when using SClib within IPython we can use NumPy and Matplotlib for the manipulation and visualization of the solutions in an interactive environment with no performance compromise. The second case is an engineering application. We use SClib to evaluate the control and system derivatives in a feedback control loop for electrical motors. With this and the integration routines available in SciPy, we can run simulations of the control loop a la Simulink. The use of C code not only boosts the speed of the simulations, but also enables to test the exact same code that we use in the test rig to get experimental results. Again, integration with IPython gives us the flexibility to analyze and visualize the data. △ Less

Submitted 19 December, 2014; originally announced December 2014.

Comments: Part of the Proceedings of the 7th European Conference on Python in Science (EuroSciPy 2014), Pierre de Buyl and Nelle Varoquaux editors, (2014)

Report number: euroscipy-proceedings2014-08

arXiv:1304.0681 [pdf, other]

Concurrent and Accurate RNA Sequencing on Multicore Platforms

Authors: Héctor Martínez, Joaquín Tárraga, Ignacio Medina, Sergio Barrachina, Maribel Castillo, Joaquín Dopazo, Enrique S. Quintana-Ortí

Abstract: In this paper we introduce a novel parallel pipeline for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, named HPG-Aligner, leverages the speed of the Burrows-Wheeler Transform to map a large number of RNA fragments (reads) rapidly, as well as the accuracy of the Smith-Waterman algorithm, that is employed to deal with conflictive reads. The a… ▽ More In this paper we introduce a novel parallel pipeline for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, named HPG-Aligner, leverages the speed of the Burrows-Wheeler Transform to map a large number of RNA fragments (reads) rapidly, as well as the accuracy of the Smith-Waterman algorithm, that is employed to deal with conflictive reads. The aligner is complemented with a careful strategy to detect splice junctions based on the division of RNA reads into short segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing useful information for the successful alignment of the complete reads. Experimental results on platforms with AMD and Intel multicore processors report the remarkable parallel performance of HPG-Aligner, on short and long RNA reads, which excels in both execution time and sensitivity to an state-of-the-art aligner such as TopHat 2 built on top of Bowtie and Bowtie 2. △ Less

Submitted 2 April, 2013; originally announced April 2013.

Report number: UJI ICC 2013-03-01 ACM Class: D.1.3; J.3

Showing 1–20 of 20 results for author: Martínez, H