-
Electrifying Heavy-Duty Trucks: Battery-Swapping vs Fast Charging
Authors:
Ruiting Wang,
Antoine Martinez,
Zaid Allybokus,
Wente Zeng,
Nicolas Obrecht,
Scott Moura
Abstract:
The advantages and disadvantages of Battery Swapping Stations (BSS) for heavy-duty trucks are poorly understood, relative to Fast Charging Stations (FCS) systems. This study evaluates these two charging mechanisms for electric heavy-duty trucks, aiming to compare the systems' efficiency and identify the optimal design for each option. A model was developed to address the planning and operation of…
▽ More
The advantages and disadvantages of Battery Swapping Stations (BSS) for heavy-duty trucks are poorly understood, relative to Fast Charging Stations (FCS) systems. This study evaluates these two charging mechanisms for electric heavy-duty trucks, aiming to compare the systems' efficiency and identify the optimal design for each option. A model was developed to address the planning and operation of BSS in a charging network, considering in-station batteries as assets for various services. We assess performance metrics including transportation efficiency and battery utilization efficiency. Our evaluation reveals that BSS significantly increased transportation efficiency by reducing vehicle downtime compared to fast charging, but may require more batteries. BSS with medium-sized batteries offers improved transportation efficiency in terms of time and labor. FCS-reliant trucks require larger batteries to compensate for extended charging times. To understand the trade-off between these two metrics, a cost-benefit analysis was performed under different scenarios involving potential shifts in battery prices and labor costs. Additionally, BSS shows potential for significant $\text{CO}_2$ emission reductions and increased profitability through energy arbitrage and grid ancillary services. These findings emphasize the importance of integrating BSS into future electric truck charging networks and adopting carbon-aware operational frameworks.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Computer-aided shape features extraction and regression models for predicting the ascending aortic aneurysm growth rate
Authors:
Leonardo Geronzi,
Antonio Martinez,
Michel Rochette,
Kexin Yan,
Aline Bel-Brunon,
Pascal Haigron,
Pierre Escrig,
Jacques Tomasi,
Morgan Daniel,
Alain Lalande,
Siyu Lin,
Diana Marcela Marin-Castrillon,
Olivier Bouchot,
Jean Porterie,
Pier Paolo Valentini,
Marco Evangelos Biancolini
Abstract:
Objective: ascending aortic aneurysm growth prediction is still challenging in clinics. In this study, we evaluate and compare the ability of local and global shape features to predict ascending aortic aneurysm growth.
Material and methods: 70 patients with aneurysm, for which two 3D acquisitions were available, are included. Following segmentation, three local shape features are computed: (1) t…
▽ More
Objective: ascending aortic aneurysm growth prediction is still challenging in clinics. In this study, we evaluate and compare the ability of local and global shape features to predict ascending aortic aneurysm growth.
Material and methods: 70 patients with aneurysm, for which two 3D acquisitions were available, are included. Following segmentation, three local shape features are computed: (1) the ratio between maximum diameter and length of the ascending aorta centerline, (2) the ratio between the length of external and internal lines on the ascending aorta and (3) the tortuosity of the ascending tract. By exploiting longitudinal data, the aneurysm growth rate is derived. Using radial basis function mesh morphing, iso-topological surface meshes are created. Statistical shape analysis is performed through unsupervised principal component analysis (PCA) and supervised partial least squares (PLS). Two types of global shape features are identified: three PCA-derived and three PLS-based shape modes. Three regression models are set for growth prediction: two based on gaussian support vector machine using local and PCA-derived global shape features; the third is a PLS linear regression model based on the related global shape features. The prediction results are assessed and the aortic shapes most prone to growth are identified.
Results: the prediction root mean square error from leave-one-out cross-validation is: 0.112 mm/month, 0.083 mm/month and 0.066 mm/month for local, PCA-based and PLS-derived shape features, respectively. Aneurysms close to the root with a large initial diameter report faster growth.
Conclusion: global shape features might provide an important contribution for predicting the aneurysm growth.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
A Digital Beamforming Receiver Architecture Implemented on a FPGA for Space Applications
Authors:
Eduardo Ortega,
Agustín Martínez,
Antonio Oliva,
Fernando Sanz,
Oscar Rodríguez,
Manuel Prieto,
Pablo Parra,
Antonio Da Silva,
Sebastián Sánchez
Abstract:
The burgeoning interest within the space community in digital beamforming is largely attributable to the superior flexibility that satellites with active antenna systems offer for a wide range of applications, notably in communication services. This paper delves into the analysis and practical implementation of a Digital Beamforming and Digital Down Conversion (DDC) chain, leveraging a high-speed…
▽ More
The burgeoning interest within the space community in digital beamforming is largely attributable to the superior flexibility that satellites with active antenna systems offer for a wide range of applications, notably in communication services. This paper delves into the analysis and practical implementation of a Digital Beamforming and Digital Down Conversion (DDC) chain, leveraging a high-speed Analog-to-Digital Converter (ADC) certified for space applications alongside a high-performance Field-Programmable Gate Array (FPGA). The proposed design strategy focuses on optimizing resource efficiency and minimizing power consumption by strategically sequencing the beamformer processor ahead of the complex down-conversion operation. This innovative approach entails the application of demodulation and low-pass filtering exclusively to the aggregated beam channel, culminating in a marked reduction in the requisite digital signal processing resources relative to traditional, more resource-intensive digital beamforming and DDC architectures. In the experimental validation, an evaluation board integrating a high-speed ADC and a FPGA was utilized. This setup facilitated the empirical validation of the design's efficacy by applying various RF input signals to the digital beamforming receiver system. The ADC employed is capable of high-resolution signal processing, while the FPGA provides the necessary computational flexibility and speed for real-time digital signal processing tasks. The findings underscore the potential of this design to significantly enhance the efficiency and performance of digital beamforming systems in space applications.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation
Authors:
Gwanghyun Kim,
Alonso Martinez,
Yu-Chuan Su,
Brendan Jou,
José Lezama,
Agrim Gupta,
Lijun Yu,
Lu Jiang,
Aren Jansen,
Jacob Walker,
Krishna Somandepalli
Abstract:
Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the a…
▽ More
Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space.Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process. Instead of the standard fixed diffusion timestep, we propose applying variable diffusion timesteps across the temporal dimension and across modalities of the inputs. This formulation offers flexibility to introduce variable noise levels for various portions of the input, hence the term mixture of noise levels. We propose a transformer-based audiovisual latent diffusion model and show that it can be trained in a task-agnostic fashion using our approach to enable a variety of audiovisual generation tasks at inference time. Experiments demonstrate the versatility of our method in tackling cross-modal and multimodal interpolation tasks in the audiovisual space. Notably, our proposed approach surpasses baselines in generating temporally and perceptually consistent samples conditioned on the input. Project page: avdit2024.github.io
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Full Reference Video Quality Assessment for Machine Learning-Based Video Codecs
Authors:
Abrar Majeedi,
Babak Naderi,
Yasaman Hosseinkashi,
Juhee Cho,
Ruben Alvarez Martinez,
Ross Cutler
Abstract:
Machine learning-based video codecs have made significant progress in the past few years. A critical area in the development of ML-based video codecs is an accurate evaluation metric that does not require an expensive and slow subjective test. We show that existing evaluation metrics that were designed and trained on DSP-based video codecs are not highly correlated to subjective opinion when used…
▽ More
Machine learning-based video codecs have made significant progress in the past few years. A critical area in the development of ML-based video codecs is an accurate evaluation metric that does not require an expensive and slow subjective test. We show that existing evaluation metrics that were designed and trained on DSP-based video codecs are not highly correlated to subjective opinion when used with ML video codecs due to the video artifacts being quite different between ML and video codecs. We provide a new dataset of ML video codec videos that have been accurately labeled for quality. We also propose a new full reference video quality assessment (FRVQA) model that achieves a Pearson Correlation Coefficient (PCC) of 0.99 and a Spearman's Rank Correlation Coefficient (SRCC) of 0.99 at the model level. We make the dataset and FRVQA model open source to help accelerate research in ML video codecs, and so that others can further improve the FRVQA model.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Structural Similarity: When to Use Deep Generative Models on Imbalanced Image Dataset Augmentation
Authors:
Chenqi Guo,
Fabian Benitez-Quiroz,
Qianli Feng,
Aleix Martinez
Abstract:
Improving the performance on an imbalanced training set is one of the main challenges in nowadays Machine Learning. One way to augment and thus re-balance the image dataset is through existing deep generative models, like class-conditional Generative Adversarial Networks (cGAN) or Diffusion Models by synthesizing images on each of the tail-class. Our experiments on imbalanced image dataset classif…
▽ More
Improving the performance on an imbalanced training set is one of the main challenges in nowadays Machine Learning. One way to augment and thus re-balance the image dataset is through existing deep generative models, like class-conditional Generative Adversarial Networks (cGAN) or Diffusion Models by synthesizing images on each of the tail-class. Our experiments on imbalanced image dataset classification show that, the validation accuracy improvement with such re-balancing method is related to the image similarity between different classes. Thus, to quantify this image dataset class similarity, we propose a measurement called Super-Sub Class Structural Similarity (SSIM-supSubCls) based on Structural Similarity (SSIM). A deep generative model data augmentation classification (GM-augCls) pipeline is also provided to verify this metric correlates with the accuracy enhancement. We further quantify the relationship between them, discovering that the accuracy improvement decays exponentially with respect to SSIM-supSubCls values.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Low-Complexity Loeffler DCT Approximations for Image and Video Coding
Authors:
D. F. G. Coelho,
R. J. Cintra,
F. M. Bayer,
S. Kulasekera,
A. Madanayake,
P. A. C. Martinez,
T. L. T. Silveira,
R. S. Oliveira,
V. S. Dimitrov
Abstract:
This paper introduced a matrix parametrization method based on the Loeffler discrete cosine transform (DCT) algorithm. As a result, a new class of eight-point DCT approximations was proposed, capable of unifying the mathematical formalism of several eight-point DCT approximations archived in the literature. Pareto-efficient DCT approximations are obtained through multicriteria optimization, where…
▽ More
This paper introduced a matrix parametrization method based on the Loeffler discrete cosine transform (DCT) algorithm. As a result, a new class of eight-point DCT approximations was proposed, capable of unifying the mathematical formalism of several eight-point DCT approximations archived in the literature. Pareto-efficient DCT approximations are obtained through multicriteria optimization, where computational complexity, proximity, and coding performance are considered. Efficient approximations and their scaled 16- and 32-point versions are embedded into image and video encoders, including a JPEG-like codec and H.264/AVC and H.265/HEVC standards. Results are compared to the unmodified standard codecs. Efficient approximations are mapped and implemented on a Xilinx VLX240T FPGA and evaluated for area, speed, and power consumption.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
A Novel Approach for Cancellation of Non-Aligned Inter Spreading Factor Interference in LoRa Systems
Authors:
Qiaohan Zhang,
Ivo Bizon,
Atul Kumar,
Ana Belen Martinez,
Marwa Chafii,
Gerhard Fettweis
Abstract:
Long Range (LoRa) has become a key enabler technology for low power wide area networks. However, due to its ALOHA-based medium access scheme, LoRa has to cope with collisions that limit the capacity and network scalability. Collisions between randomly overlapped signals modulated with different spreading factors (SFs) result in inter-SF interference, which increases the packet loss likelihood when…
▽ More
Long Range (LoRa) has become a key enabler technology for low power wide area networks. However, due to its ALOHA-based medium access scheme, LoRa has to cope with collisions that limit the capacity and network scalability. Collisions between randomly overlapped signals modulated with different spreading factors (SFs) result in inter-SF interference, which increases the packet loss likelihood when signal-to-interference ratio (SIR) is low. This issue cannot be resolved by channel coding since the probability of error distance is not concentrated around the adjacent symbol. In this paper, we analytically model this interference, and propose an interference cancellation method based on the idea of segmentation of the received signal. This scheme has three steps. First, the SF of the interference signal is identified, then the equivalent data symbol and complex amplitude of the interference are estimated. Finally, the estimated interference signal is subtracted from the received signal before demodulation. Unlike conventional serial interference cancellation (SIC), this scheme can directly estimate and reconstruct the non-aligned inter-SF interference without synchronization. Simulation results show that the proposed method can significantly reduce the symbol error rate (SER) under low SIR compared with the conventional demodulation. Moreover, it also shows high robustness to fractional sample timing offset (STO) and carrier frequency offset (CFO) of interference. The presented results clearly show the effectiveness of the proposed method in terms of the SER performance.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
Prediction of speech intelligibility with DNN-based performance measures
Authors:
Angel Mario Castro Martinez,
Constantin Spille,
Jana Roßbach,
Birger Kollmeier,
Bernd T. Meyer
Abstract:
This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step, which finds the most likely sequence…
▽ More
This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step, which finds the most likely sequence of words given phoneme posterior probabilities, is omitted. The model is evaluated via the root-mean-squared error between the predicted and observed speech reception thresholds from eight normal-hearing listeners. The recognition task consists of identifying noisy words from a German matrix sentence test. The speech material was mixed with eight noise maskers covering different modulation types, from speech-shaped stationary noise to a single-talker masker. The prediction performance is compared to five established models and an ASR-model using word labels. Two combinations of features and networks were tested. Both include temporal information either at the feature level (amplitude modulation filterbanks and a feed-forward network) or captured by the architecture (mel-spectrograms and a time-delay deep neural network, TDNN). The TDNN model is on par with the DNN while reducing the number of parameters by a factor of 37; this optimization allows parallel streams on dedicated hearing aid hardware as a forward-pass can be computed within the 10ms of each frame. The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Differentiable Signal Processing With Black-Box Audio Effects
Authors:
Marco A. Martínez Ramírez,
Oliver Wang,
Paris Smaragdis,
Nicholas J. Bryan
Abstract:
We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network. We then train a deep encoder to analyze input audio and control effect parameters to perform the desired signal manipulation, requiring only input-target paired audio data as supervision. To train our network with non-differentiable blac…
▽ More
We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network. We then train a deep encoder to analyze input audio and control effect parameters to perform the desired signal manipulation, requiring only input-target paired audio data as supervision. To train our network with non-differentiable black-box effects layers, we use a fast, parallel stochastic gradient approximation scheme within a standard auto differentiation graph, yielding efficient end-to-end backpropagation. We demonstrate the power of our approach with three separate automatic audio production applications: tube amplifier emulation, automatic removal of breaths and pops from voice recordings, and automatic music mastering. We validate our results with a subjective listening test, showing our approach not only can enable new automatic audio effects tasks, but can yield results comparable to a specialized, state-of-the-art commercial solution for music mastering.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
Classification of COVID-19 in CT Scans using Multi-Source Transfer Learning
Authors:
Alejandro R. Martinez
Abstract:
Since December of 2019, novel coronavirus disease COVID-19 has spread around the world infecting millions of people and upending the global economy. One of the driving reasons behind its high rate of infection is due to the unreliability and lack of RT-PCR testing. At times the turnaround results span as long as a couple of days, only to yield a roughly 70% sensitivity rate. As an alternative, rec…
▽ More
Since December of 2019, novel coronavirus disease COVID-19 has spread around the world infecting millions of people and upending the global economy. One of the driving reasons behind its high rate of infection is due to the unreliability and lack of RT-PCR testing. At times the turnaround results span as long as a couple of days, only to yield a roughly 70% sensitivity rate. As an alternative, recent research has investigated the use of Computer Vision with Convolutional Neural Networks (CNNs) for the classification of COVID-19 from CT scans. Due to an inherent lack of available COVID-19 CT data, these research efforts have been forced to leverage the use of Transfer Learning. This commonly employed Deep Learning technique has shown to improve model performance on tasks with relatively small amounts of data, as long as the Source feature space somewhat resembles the Target feature space. Unfortunately, a lack of similarity is often encountered in the classification of medical images as publicly available Source datasets usually lack the visual features found in medical images. In this study, we propose the use of Multi-Source Transfer Learning (MSTL) to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans. With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet. We additionally, propose an unsupervised label creation process, which enhances the performance of our Deep Residual Networks. Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
△ Less
Submitted 22 September, 2020;
originally announced September 2020.
-
Modeling plate and spring reverberation using a DSP-informed deep neural network
Authors:
Marco A. Martínez Ramírez,
Emmanouil Benetos,
Joshua D. Reiss
Abstract:
Plate and spring reverberators are electromechanical systems first used and researched as means to substitute real room reverberation. Nowadays they are often used in music production for aesthetic reasons due to their particular sonic characteristics. The modeling of these audio processors and their perceptual qualities is difficult since they use mechanical elements together with analog electron…
▽ More
Plate and spring reverberators are electromechanical systems first used and researched as means to substitute real room reverberation. Nowadays they are often used in music production for aesthetic reasons due to their particular sonic characteristics. The modeling of these audio processors and their perceptual qualities is difficult since they use mechanical elements together with analog electronics resulting in an extremely complex response. Based on digital reverberators that use sparse FIR filters, we propose a signal processing-informed deep learning architecture for the modeling of artificial reverberators. We explore the capabilities of deep neural networks to learn such highly nonlinear electromechanical responses and we perform modeling of plate and spring reverberators. In order to measure the performance of the model, we conduct a perceptual evaluation experiment and we also analyze how the given task is accomplished and what the model is actually learning.
△ Less
Submitted 17 April, 2020; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Hopfield Learning-based and Nonlinear Programming methods for Resource Allocation in OCDMA Networks
Authors:
Cristiane A. Pendeza Martinez,
Taufik Abrão,
Fábio Renan Durand,
Alessandro Goedtel
Abstract:
This paper proposes the deployment of the Hopfield's artificial neural network (H-NN) approach to optimally assign power in optical code division multiple access (OCDMA) systems. Figures of merit such as feasibility of solutions and complexity are compared with the classical power allocation methods found in the literature, such as Sequential Quadratic Programming (SQP) and Augmented Lagrangian Me…
▽ More
This paper proposes the deployment of the Hopfield's artificial neural network (H-NN) approach to optimally assign power in optical code division multiple access (OCDMA) systems. Figures of merit such as feasibility of solutions and complexity are compared with the classical power allocation methods found in the literature, such as Sequential Quadratic Programming (SQP) and Augmented Lagrangian Method (ALM). The analyzed methods are used to solve constrained nonlinear optimization problems in the context of resource allocation for optical networks, specially to deal with the energy efficiency (EE) in OCDMA networks. The promising performance-complexity tradeoff of the modified H-NN is demonstrated through numerical results performed in comparison with classic methods for general problems in nonlinear programming. The evaluation is carried out considering challenging OCDMA networks in which different levels of QoS were considered for large numbers of optical users.
△ Less
Submitted 4 September, 2019; v1 submitted 27 August, 2019;
originally announced August 2019.
-
Distance Map Loss Penalty Term for Semantic Segmentation
Authors:
Francesco Caliva,
Claudia Iriondo,
Alejandro Morales Martinez,
Sharmila Majumdar,
Valentina Pedoia
Abstract:
Convolutional neural networks for semantic segmentation suffer from low performance at object boundaries. In medical imaging, accurate representation of tissue surfaces and volumes is important for tracking of disease biomarkers such as tissue morphology and shape features. In this work, we propose a novel distance map derived loss penalty term for semantic segmentation. We propose to use distance…
▽ More
Convolutional neural networks for semantic segmentation suffer from low performance at object boundaries. In medical imaging, accurate representation of tissue surfaces and volumes is important for tracking of disease biomarkers such as tissue morphology and shape features. In this work, we propose a novel distance map derived loss penalty term for semantic segmentation. We propose to use distance maps, derived from ground truth masks, to create a penalty term, guiding the network's focus towards hard-to-segment boundary regions. We investigate the effects of this penalizing factor against cross-entropy, Dice, and focal loss, among others, evaluating performance on a 3D MRI bone segmentation task from the publicly available Osteoarthritis Initiative dataset. We observe a significant improvement in the quality of segmentation, with better shape preservation at bone boundaries and areas affected by partial volume. We ultimately aim to use our loss penalty term to improve the extraction of shape biomarkers and derive metrics to quantitatively evaluate the preservation of shape.
△ Less
Submitted 9 August, 2019;
originally announced August 2019.
-
A general-purpose deep learning approach to model time-varying audio effects
Authors:
Marco A. Martínez Ramírez,
Emmanouil Benetos,
Joshua D. Reiss
Abstract:
Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects. Most existing methods for modeling these type of effect units are often optimized to a very specific circuit and cannot be efficiently generalized to other time-varying effects. Based on convolutional and recurrent neural networks, we propose a deep learning a…
▽ More
Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects. Most existing methods for modeling these type of effect units are often optimized to a very specific circuit and cannot be efficiently generalized to other time-varying effects. Based on convolutional and recurrent neural networks, we propose a deep learning architecture for generic black-box modeling of audio processors with long-term memory. We explore the capabilities of deep neural networks to learn such long temporal dependencies and we show the network modeling various linear and nonlinear, time-varying and time-invariant audio effects. In order to measure the performance of the model, we propose an objective metric based on the psychoacoustics of modulation frequency perception. We also analyze what the model is actually learning and how the given task is accomplished.
△ Less
Submitted 21 June, 2019; v1 submitted 15 May, 2019;
originally announced May 2019.
-
Modeling of nonlinear audio effects with end-to-end deep neural networks
Authors:
Marco A. Martínez Ramirez,
Joshua D. Reiss
Abstract:
In the context of music production, distortion effects are mainly used for aesthetic reasons and are usually applied to electric musical instruments. Most existing methods for nonlinear modeling are often either simplified or optimized to a very specific circuit. In this work, we investigate deep learning architectures for audio processing and we aim to find a general purpose end-to-end deep neura…
▽ More
In the context of music production, distortion effects are mainly used for aesthetic reasons and are usually applied to electric musical instruments. Most existing methods for nonlinear modeling are often either simplified or optimized to a very specific circuit. In this work, we investigate deep learning architectures for audio processing and we aim to find a general purpose end-to-end deep neural network to perform modeling of nonlinear audio effects. We show the network modeling various nonlinearities and we discuss the generalization capabilities among different instruments.
△ Less
Submitted 6 March, 2019; v1 submitted 15 October, 2018;
originally announced October 2018.