-
Control of Renewable Energy Communities using AI and Real-World Data
Authors:
Tiago Fonseca,
Clarisse Sousa,
Ricardo Venâncio,
Pedro Pires,
Ricardo Severino,
Paulo Rodrigues,
Pedro Paiva,
Luis Lino Ferreira
Abstract:
The electrification of transportation and the increased adoption of decentralized renewable energy generation have added complexity to managing Renewable Energy Communities (RECs). Integrating Electric Vehicle (EV) charging with building energy systems like heating, ventilation, air conditioning (HVAC), photovoltaic (PV) generation, and battery storage presents significant opportunities but also p…
▽ More
The electrification of transportation and the increased adoption of decentralized renewable energy generation have added complexity to managing Renewable Energy Communities (RECs). Integrating Electric Vehicle (EV) charging with building energy systems like heating, ventilation, air conditioning (HVAC), photovoltaic (PV) generation, and battery storage presents significant opportunities but also practical challenges. Reinforcement learning (RL), particularly MultiAgent Deep Deterministic Policy Gradient (MADDPG) algorithms, have shown promising results in simulation, outperforming heuristic control strategies. However, translating these successes into real-world deployments faces substantial challenges, including incomplete and noisy data, integration of heterogeneous subsystems, synchronization issues, unpredictable occupant behavior, and missing critical EV state-of-charge (SoC) information. This paper introduces a framework designed explicitly to handle these complexities and bridge the simulation to-reality gap. The framework incorporates EnergAIze, a MADDPG-based multi-agent control strategy, and specifically addresses challenges related to real-world data collection, system integration, and user behavior modeling. Preliminary results collected from a real-world operational REC with four residential buildings demonstrate the practical feasibility of our approach, achieving an average 9% reduction in daily peak demand and a 5% decrease in energy costs through optimized load scheduling and EV charging behaviors. These outcomes underscore the framework's effectiveness, advancing the practical deployment of intelligent energy management solutions in RECs.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
SAS: Segment Anything Small for Ultrasound -- A Non-Generative Data Augmentation Technique for Robust Deep Learning in Ultrasound Imaging
Authors:
Danielle L. Ferreira,
Ahana Gangopadhyay,
Hsi-Ming Chang,
Ravi Soni,
Gopal Avinash
Abstract:
Accurate segmentation of anatomical structures in ultrasound (US) images, particularly small ones, is challenging due to noise and variability in imaging conditions (e.g., probe position, patient anatomy, tissue characteristics and pathology). To address this, we introduce Segment Anything Small (SAS), a simple yet effective scale- and texture-aware data augmentation technique designed to enhance…
▽ More
Accurate segmentation of anatomical structures in ultrasound (US) images, particularly small ones, is challenging due to noise and variability in imaging conditions (e.g., probe position, patient anatomy, tissue characteristics and pathology). To address this, we introduce Segment Anything Small (SAS), a simple yet effective scale- and texture-aware data augmentation technique designed to enhance the performance of deep learning models for segmenting small anatomical structures in ultrasound images. SAS employs a dual transformation strategy: (1) simulating diverse organ scales by resizing and embedding organ thumbnails into a black background, and (2) injecting noise into regions of interest to simulate varying tissue textures. These transformations generate realistic and diverse training data without introducing hallucinations or artifacts, improving the model's robustness to noise and variability. We fine-tuned a promptable foundation model on a controlled organ-specific medical imaging dataset and evaluated its performance on one internal and five external datasets. Experimental results demonstrate significant improvements in segmentation performance, with Dice score gains of up to 0.35 and an average improvement of 0.16 [95% CI 0.132,0.188]. Additionally, our iterative point prompts provide precise control and adaptive refinement, achieving performance comparable to bounding box prompts with just two points. SAS enhances model robustness and generalizability across diverse anatomical structures and imaging conditions, particularly for small structures, without compromising the accuracy of larger ones. By offering a computationally efficient solution that eliminates the need for extensive human labeling efforts, SAS emerges as a powerful tool for advancing medical image analysis, particularly in resource-constrained settings.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
SAMRI-2: A Memory-based Model for Cartilage and Meniscus Segmentation in 3D MRIs of the Knee Joint
Authors:
Danielle L. Ferreira,
Bruno A. A. Nunes,
Xuzhe Zhang,
Laura Carretero Gomez,
Maggie Fung,
Ravi Soni
Abstract:
Accurate morphometric assessment of cartilage-such as thickness/volume-via MRI is essential for monitoring knee osteoarthritis. Segmenting cartilage remains challenging and dependent on extensive expert-annotated datasets, which are heavily subjected to inter-reader variability. Recent advancements in Visual Foundational Models (VFM), especially memory-based approaches, offer opportunities for imp…
▽ More
Accurate morphometric assessment of cartilage-such as thickness/volume-via MRI is essential for monitoring knee osteoarthritis. Segmenting cartilage remains challenging and dependent on extensive expert-annotated datasets, which are heavily subjected to inter-reader variability. Recent advancements in Visual Foundational Models (VFM), especially memory-based approaches, offer opportunities for improving generalizability and robustness. This study introduces a deep learning (DL) method for cartilage and meniscus segmentation from 3D MRIs using interactive, memory-based VFMs. To improve spatial awareness and convergence, we incorporated a Hybrid Shuffling Strategy (HSS) during training and applied a segmentation mask propagation technique to enhance annotation efficiency. We trained four AI models-a CNN-based 3D-VNet, two automatic transformer-based models (SaMRI2D and SaMRI3D), and a transformer-based promptable memory-based VFM (SAMRI-2)-on 3D knee MRIs from 270 patients using public and internal datasets and evaluated on 57 external cases, including multi-radiologist annotations and different data acquisitions. Model performance was assessed against reference standards using Dice Score (DSC) and Intersection over Union (IoU), with additional morphometric evaluations to further quantify segmentation accuracy. SAMRI-2 model, trained with HSS, outperformed all other models, achieving an average DSC improvement of 5 points, with a peak improvement of 12 points for tibial cartilage. It also demonstrated the lowest cartilage thickness errors, reducing discrepancies by up to threefold. Notably, SAMRI-2 maintained high performance with as few as three user clicks per volume, reducing annotation effort while ensuring anatomical precision. This memory-based VFM with spatial awareness offers a novel approach for reliable AI-assisted knee MRI segmentation, advancing DL in musculoskeletal imaging.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
FlexiGen: Stochastic Dataset Generator for Electric Vehicle Charging Energy Flexibility
Authors:
Bernardo Cabral,
Tiago Fonseca,
Clarisse Sousa,
Luis Lino Ferreira
Abstract:
Electric vehicles (EVs) and renewable energy sources (RES) are vital components of sustainable energy systems, yet their uncoordinated integration can pose substantial challenges to grid stability, such as unmanaged peak loads and energy balance issues. Vehicle-to-Grid (V2G), offer a promising solution to address these challenges by enabling bidirectional energy flow between EVs and the grid. As s…
▽ More
Electric vehicles (EVs) and renewable energy sources (RES) are vital components of sustainable energy systems, yet their uncoordinated integration can pose substantial challenges to grid stability, such as unmanaged peak loads and energy balance issues. Vehicle-to-Grid (V2G), offer a promising solution to address these challenges by enabling bidirectional energy flow between EVs and the grid. As such, EVs can be used in advances Demand Response (DR) strategies to optimize energy use and mitigate the intermittency of renewable generation. To reach such advantages, optimization algorithms need data on EV energy flexibility, such as charging patterns and usage preferences. However, data collection remains constrained by challenges such as high costs, user engagement, data privacy concerns, and limited access to open-source datasets on EV energy flexibility. This paper presents FlexiGen an open-source stochastic dataset generator tool designed to overcome the data limitations in EV flexibility for V2G and V1G DR applications. FlexiGen generates synthetic datasets encompassing realist EV usage patterns, behaviours and flexibility scenarios for household and office routines. To generate these datasets, FlexiGen uses a series of configurable probabilistic variables, such as stochastic user routines, traffic conditions, charger types, car average electricity consumption and State of Charge (SoC). The generated datasets include an hourly routine with the EV State of connection, Destination Charger, Estimated Departure Time, Required SOC at Departure, Estimated Arrival Time, and Estimated SOC at Arrival. Accompanying this publication an example dataset is generated for 3 households with 1 EV each, and 1 office building with 3 EVs. The generated dataset is analyzed and discussed on the paper and published alongside the open-source code for FlexiGen tool.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case Study in Tabletop Role-Playing Games Soundtracks
Authors:
Felipe Marra,
Lucas N. Ferreira
Abstract:
This paper investigates the capabilities of text-to-audio music generation models in producing long-form music with prompts that change over time, focusing on soundtrack generation for Tabletop Role-Playing Games (TRPGs). We introduce Babel Bardo, a system that uses Large Language Models (LLMs) to transform speech transcriptions into music descriptions for controlling a text-to-music model. Four v…
▽ More
This paper investigates the capabilities of text-to-audio music generation models in producing long-form music with prompts that change over time, focusing on soundtrack generation for Tabletop Role-Playing Games (TRPGs). We introduce Babel Bardo, a system that uses Large Language Models (LLMs) to transform speech transcriptions into music descriptions for controlling a text-to-music model. Four versions of Babel Bardo were compared in two TRPG campaigns: a baseline using direct speech transcriptions, and three LLM-based versions with varying approaches to music description generation. Evaluations considered audio quality, story alignment, and transition smoothness. Results indicate that detailed music descriptions improve audio quality while maintaining consistency across consecutive descriptions enhances story alignment and transition smoothness.
△ Less
Submitted 21 May, 2025; v1 submitted 6 November, 2024;
originally announced November 2024.
-
CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities
Authors:
Kingsley Nweye,
Kathryn Kaspar,
Giacomo Buscemi,
Tiago Fonseca,
Giuseppe Pinto,
Dipanjan Ghose,
Satvik Duddukuru,
Pavani Pratapa,
Han Li,
Javad Mohammadi,
Luis Lino Ferreira,
Tianzhen Hong,
Mohamed Ouf,
Alfonso Capozzoli,
Zoltan Nagy
Abstract:
As more distributed energy resources become part of the demand-side infrastructure, it is important to quantify the energy flexibility they provide on a community scale, particularly to understand the impact of geographic, climatic, and occupant behavioral differences on their effectiveness, as well as identify the best control strategies to accelerate their real-world adoption. CityLearn provides…
▽ More
As more distributed energy resources become part of the demand-side infrastructure, it is important to quantify the energy flexibility they provide on a community scale, particularly to understand the impact of geographic, climatic, and occupant behavioral differences on their effectiveness, as well as identify the best control strategies to accelerate their real-world adoption. CityLearn provides an environment for benchmarking simple and advanced distributed energy resource control algorithms including rule-based, model-predictive, and reinforcement learning control. CityLearn v2 presented here extends CityLearn v1 by providing a simulation environment that leverages the End-Use Load Profiles for the U.S. Building Stock dataset to create virtual grid-interactive communities for resilient, multi-agent distributed energy resources and objective control with dynamic occupant feedback. This work details the v2 environment design and provides application examples that utilize reinforcement learning to manage battery energy storage system charging/discharging cycles, vehicle-to-grid control, and thermal comfort during heat pump power modulation.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
EVLearn: Extending the CityLearn Framework with Electric Vehicle Simulation
Authors:
Tiago Fonseca,
Luis Ferreira,
Bernardo Cabral,
Ricardo Severino,
Kingsley Nweye,
Dipanjan Ghose,
Zoltan Nagy
Abstract:
Intelligent energy management strategies, such as Vehicle-to-Grid (V2G) and Grid-to-Vehicle (G2V) emerge as a potential solution to the Electric Vehicles' (EVs) integration into the energy grid. These strategies promise enhanced grid resilience and economic benefits for both vehicle owners and grid operators. Despite the announced prospective, the adoption of these strategies is still hindered by…
▽ More
Intelligent energy management strategies, such as Vehicle-to-Grid (V2G) and Grid-to-Vehicle (G2V) emerge as a potential solution to the Electric Vehicles' (EVs) integration into the energy grid. These strategies promise enhanced grid resilience and economic benefits for both vehicle owners and grid operators. Despite the announced prospective, the adoption of these strategies is still hindered by an array of operational problems. Key among these is the lack of a simulation platform that allows to validate and refine V2G and G2V strategies. Including the development, training, and testing in the context of Energy Communities (ECs) incorporating multiple flexible energy assets. Addressing this gap, first we introduce the EVLearn, a simulation module for researching in both V2G and G2V energy management strategies, that models EVs, their charging infrastructure and associated energy flexibility dynamics; second, this paper integrates EVLearn with the existing CityLearn framework, providing V2G and G2V simulation capabilities into the study of broader energy management strategies. Results validated EVLearn and its integration into CityLearn, where the impact of these strategies is highlighted through a comparative simulation scenario.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
The NES Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos
Authors:
Igor Cardoso,
Rubens O. Moraes,
Lucas N. Ferreira
Abstract:
Neural models are one of the most popular approaches for music generation, yet there aren't standard large datasets tailored for learning music directly from game data. To address this research gap, we introduce a novel dataset named NES-VMDB, containing 98,940 gameplay videos from 389 NES games, each paired with its original soundtrack in symbolic format (MIDI). NES-VMDB is built upon the Nintend…
▽ More
Neural models are one of the most popular approaches for music generation, yet there aren't standard large datasets tailored for learning music directly from game data. To address this research gap, we introduce a novel dataset named NES-VMDB, containing 98,940 gameplay videos from 389 NES games, each paired with its original soundtrack in symbolic format (MIDI). NES-VMDB is built upon the Nintendo Entertainment System Music Database (NES-MDB), encompassing 5,278 music pieces from 397 NES games. Our approach involves collecting long-play videos for 389 games of the original dataset, slicing them into 15-second-long clips, and extracting the audio from each clip. Subsequently, we apply an audio fingerprinting algorithm (similar to Shazam) to automatically identify the corresponding piece in the NES-MDB dataset. Additionally, we introduce a baseline method based on the Controllable Music Transformer to generate NES music conditioned on gameplay clips. We evaluated this approach with objective metrics, and the results showed that the conditional CMT improves musical structural quality when compared to its unconditional counterpart. Moreover, we used a neural classifier to predict the game genre of the generated pieces. Results showed that the CMT generator can learn correlations between gameplay videos and game genres, but further research has to be conducted to achieve human-level performance.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Are foundation models efficient for medical image segmentation?
Authors:
Danielle Ferreira,
Rima Arnaout
Abstract:
Foundation models are experiencing a surge in popularity. The Segment Anything model (SAM) asserts an ability to segment a wide spectrum of objects but required supervised training at unprecedented scale. We compared SAM's performance (against clinical ground truth) and resources (labeling time, compute) to a modality-specific, label-free self-supervised learning (SSL) method on 25 measurements fo…
▽ More
Foundation models are experiencing a surge in popularity. The Segment Anything model (SAM) asserts an ability to segment a wide spectrum of objects but required supervised training at unprecedented scale. We compared SAM's performance (against clinical ground truth) and resources (labeling time, compute) to a modality-specific, label-free self-supervised learning (SSL) method on 25 measurements for 100 cardiac ultrasounds. SAM performed poorly and required significantly more labeling and computing resources, demonstrating worse efficiency than SSL.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
An IoT Cloud and Big Data Architecture for the Maintenance of Home Appliances
Authors:
Pedro Chaves,
Tiago Fonseca,
Luis Lino Ferreira,
Bernardo Cabral,
Orlando Sousa,
Andre Oliveira,
Jorge Landeck
Abstract:
Billions of interconnected Internet of Things (IoT) sensors and devices collect tremendous amounts of data from real-world scenarios. Big data is generating increasing interest in a wide range of industries. Once data is analyzed through compute-intensive Machine Learning (ML) methods, it can derive critical business value for organizations. Powerfulplatforms are essential to handle and process su…
▽ More
Billions of interconnected Internet of Things (IoT) sensors and devices collect tremendous amounts of data from real-world scenarios. Big data is generating increasing interest in a wide range of industries. Once data is analyzed through compute-intensive Machine Learning (ML) methods, it can derive critical business value for organizations. Powerfulplatforms are essential to handle and process such massive collections of information cost-effectively and conveniently. This work introduces a distributed and scalable platform architecture that can be deployed for efficient real-world big data collection and analytics. The proposed system was tested with a case study for Predictive Maintenance of Home Appliances, where current and vibration sensors with high acquisition frequency were connected to washing machines and refrigerators. The introduced platform was used to collect, store, and analyze the data. The experimental results demonstrated that the presented system could be advantageous for tackling real-world IoT scenarios in a cost-effective and local approach.
△ Less
Submitted 25 October, 2022;
originally announced November 2022.
-
Label-free segmentation from cardiac ultrasound using self-supervised learning
Authors:
Danielle L. Ferreira,
Connor Lau,
Zaynaf Salaymang,
Rima Arnaout
Abstract:
Segmentation and measurement of cardiac chambers is critical in cardiac ultrasound but is laborious and poorly reproducible. Neural networks can assist, but supervised approaches require the same laborious manual annotations. We built a pipeline for self-supervised (no manual labels) segmentation combining computer vision, clinical domain knowledge, and deep learning. We trained on 450 echocardiog…
▽ More
Segmentation and measurement of cardiac chambers is critical in cardiac ultrasound but is laborious and poorly reproducible. Neural networks can assist, but supervised approaches require the same laborious manual annotations. We built a pipeline for self-supervised (no manual labels) segmentation combining computer vision, clinical domain knowledge, and deep learning. We trained on 450 echocardiograms (93,000 images) and tested on 8,393 echocardiograms (4,476,266 images; mean 61 years, 51% female), using the resulting segmentations to calculate biometrics. We also tested against external images from an additional 10,030 patients with available manual tracings of the left ventricle. r2 between clinically measured and pipeline-predicted measurements were similar to reported inter-clinician variation and comparable to supervised learning across several different measurements (r2 0.56-0.84). Average accuracy for detecting abnormal chamber size and function was 0.85 (range 0.71-0.97) compared to clinical measurements. A subset of test echocardiograms (n=553) had corresponding cardiac MRIs, where MRI is the gold standard. Correlation between pipeline and MRI measurements was similar to that between clinical echocardiogram and MRI. Finally, the pipeline accurately segments the left ventricle with an average Dice score of 0.89 (95% CI [0.89]) in the external, manually labeled dataset. Our results demonstrate a manual-label free, clinically valid, and highly scalable method for segmentation from ultrasound, a noisy but globally important imaging modality.
△ Less
Submitted 11 April, 2025; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Value of Bidirectional V2G Smart Charging Responsive Services: Insights from a Simple CA Model
Authors:
Pedro M. S. Carvalho,
Luis A. F. M. Ferreira
Abstract:
In this paper, particle-hopping cellular automaton (CA) models of elastic demand are used to investigate the value added to plug-in electric vehicles (PEV) aggregators by adopting vehicle-to-grid (V2G) responsive services. CA models used earlier to study load-sifting responses are modified to capture discharge/ recharge capabilities of V2G. Results on ramping responses from CA are then analysed to…
▽ More
In this paper, particle-hopping cellular automaton (CA) models of elastic demand are used to investigate the value added to plug-in electric vehicles (PEV) aggregators by adopting vehicle-to-grid (V2G) responsive services. CA models used earlier to study load-sifting responses are modified to capture discharge/ recharge capabilities of V2G. Results on ramping responses from CA are then analysed to discuss the small contribution to system controllability added by V2G responsive services.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
A Low-Cost Multi-Agent System for Physical Security in Smart Buildings
Authors:
Tiago Fonseca,
Tiago Dias,
João Vitorino,
Luís Lino Ferreira,
Isabel Praça
Abstract:
Modern organizations face numerous physical security threats, from fire hazards to more intricate concerns regarding surveillance and unauthorized personnel. Conventional standalone fire and intrusion detection solutions must be installed and maintained independently, which leads to high capital and operational costs. Nonetheless, due to recent developments in smart sensors, computer vision techni…
▽ More
Modern organizations face numerous physical security threats, from fire hazards to more intricate concerns regarding surveillance and unauthorized personnel. Conventional standalone fire and intrusion detection solutions must be installed and maintained independently, which leads to high capital and operational costs. Nonetheless, due to recent developments in smart sensors, computer vision techniques, and wireless communication technologies, these solutions can be integrated in a modular and low-cost manner. This work introduces Integrated Physical Security System (IP2S), a multi-agent system capable of coordinating diverse Internet of Things (IoT) sensors and actuators for an efficient mitigation of multiple physical security events. The proposed system was tested in a live case study that combined fire and intrusion detection in an industrial shop floor environment with four different sectors, two surveillance cameras, and a firefighting robot. The experimental results demonstrate that the integration of several events in a single automated system can be advantageous for the security of smart buildings, reducing false alarms and delays.
△ Less
Submitted 1 September, 2022;
originally announced September 2022.
-
Controlling Perceived Emotion in Symbolic Music Generation with Monte Carlo Tree Search
Authors:
Lucas N. Ferreira,
Lili Mou,
Jim Whitehead,
Levi H. S. Lelis
Abstract:
This paper presents a new approach for controlling emotion in symbolic music generation with Monte Carlo Tree Search. We use Monte Carlo Tree Search as a decoding mechanism to steer the probability distribution learned by a language model towards a given emotion. At every step of the decoding process, we use Predictor Upper Confidence for Trees (PUCT) to search for sequences that maximize the aver…
▽ More
This paper presents a new approach for controlling emotion in symbolic music generation with Monte Carlo Tree Search. We use Monte Carlo Tree Search as a decoding mechanism to steer the probability distribution learned by a language model towards a given emotion. At every step of the decoding process, we use Predictor Upper Confidence for Trees (PUCT) to search for sequences that maximize the average values of emotion and quality as given by an emotion classifier and a discriminator, respectively. We use a language model as PUCT's policy and a combination of the emotion classifier and the discriminator as its value function. To decode the next token in a piece of music, we sample from the distribution of node visits created during the search. We evaluate the quality of the generated samples with respect to human-composed pieces using a set of objective metrics computed directly from the generated samples. We also perform a user study to evaluate how human subjects perceive the generated samples' quality and emotion. We compare PUCT against Stochastic Bi-Objective Beam Search (SBBS) and Conditional Sampling (CS). Results suggest that PUCT outperforms SBBS and CS in almost all metrics of music quality and emotion.
△ Less
Submitted 1 September, 2022; v1 submitted 10 August, 2022;
originally announced August 2022.
-
Re-ordering of Hadamard matrix using Fourier transform and gray-level co-occurrence matrix for compressive single-pixel imaging
Authors:
Pedro G. Vaz,
Andreia Gaudêncio,
L. F. Requicha Ferreira,
Anne Humeau-Heurtier,
Miguel Morgado,
João Cardoso
Abstract:
One of the most active research fields in single-pixel imaging is the influence of the sampling basis and its order in the quality of the reconstructed images. This paper presents two new orders, ascending scale (AS) and ascending inertia (AI), of the Hadamard basis and test their performance, using simulation and experimental methods, for low sampling ratios (0.5 to 0.01). These orders were compa…
▽ More
One of the most active research fields in single-pixel imaging is the influence of the sampling basis and its order in the quality of the reconstructed images. This paper presents two new orders, ascending scale (AS) and ascending inertia (AI), of the Hadamard basis and test their performance, using simulation and experimental methods, for low sampling ratios (0.5 to 0.01). These orders were compared with two state-of-the-art orders, cake-cutting (CC) and total gradient (TG), using TVAL3 as the reconstruction algorithm and three noise levels. These newly proposed orders have better reconstructed image quality on the simulation data set (110 images) and achieved structure similarity index values higher than CC order. The experimental data set (2 images) showed that the AS and AI orders performed better with a sampling ratio of 0.5, while for lower sampling ratio the performance of AS, AI and CC was similar. The TG order performed worst in the majority of the cases. Finally, the simulation results present clear evidence that peak signal-to-noise ratio (PSNR) is not a reliable image quality assessment (IQA) metric to assess image reconstruction quality in the context of single pixel imaging.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Learning to Generate Music With Sentiment
Authors:
Lucas N. Ferreira,
Jim Whitehead
Abstract:
Deep Learning models have shown very promising results in automatically composing polyphonic music pieces. However, it is very hard to control such models in order to guide the compositions towards a desired goal. We are interested in controlling a model to automatically generate music with a given sentiment. This paper presents a generative Deep Learning model that can be directed to compose musi…
▽ More
Deep Learning models have shown very promising results in automatically composing polyphonic music pieces. However, it is very hard to control such models in order to guide the compositions towards a desired goal. We are interested in controlling a model to automatically generate music with a given sentiment. This paper presents a generative Deep Learning model that can be directed to compose music with a given sentiment. Besides music generation, the same model can be used for sentiment analysis of symbolic music. We evaluate the accuracy of the model in classifying sentiment of symbolic music using a new dataset of video game soundtracks. Results show that our model is able to obtain good prediction accuracy. A user study shows that human subjects agreed that the generated music has the intended sentiment, however negative pieces can be ambiguous.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
Computer-Generated Music for Tabletop Role-Playing Games
Authors:
Lucas N. Ferreira,
Levi H. S. Lelis,
Jim Whitehead
Abstract:
In this paper we present Bardo Composer, a system to generate background music for tabletop role-playing games. Bardo Composer uses a speech recognition system to translate player speech into text, which is classified according to a model of emotion. Bardo Composer then uses Stochastic Bi-Objective Beam Search, a variant of Stochastic Beam Search that we introduce in this paper, with a neural mode…
▽ More
In this paper we present Bardo Composer, a system to generate background music for tabletop role-playing games. Bardo Composer uses a speech recognition system to translate player speech into text, which is classified according to a model of emotion. Bardo Composer then uses Stochastic Bi-Objective Beam Search, a variant of Stochastic Beam Search that we introduce in this paper, with a neural model to generate musical pieces conveying the desired emotion. We performed a user study with 116 participants to evaluate whether people are able to correctly identify the emotion conveyed in the pieces generated by the system. In our study we used pieces generated for Call of the Wild, a Dungeons and Dragons campaign available on YouTube. Our results show that human subjects could correctly identify the emotion of the generated music pieces as accurately as they were able to identify the emotion of pieces written by humans.
△ Less
Submitted 16 August, 2020;
originally announced August 2020.
-
Deep Dense and Convolutional Autoencoders for Unsupervised Anomaly Detection in Machine Condition Sounds
Authors:
Alexandrine Ribeiro,
Luis Miguel Matos,
Pedro Jose Pereira,
Eduardo C. Nunes,
Andre L. Ferreira,
Paulo Cortez,
Andre Pilastri
Abstract:
This technical report describes two methods that were developed for Task 2 of the DCASE 2020 challenge. The challenge involves an unsupervised learning to detect anomalous sounds, thus only normal machine working condition samples are available during the training process. The two methods involve deep autoencoders, based on dense and convolutional architectures that use melspectogram processed sou…
▽ More
This technical report describes two methods that were developed for Task 2 of the DCASE 2020 challenge. The challenge involves an unsupervised learning to detect anomalous sounds, thus only normal machine working condition samples are available during the training process. The two methods involve deep autoencoders, based on dense and convolutional architectures that use melspectogram processed sound features. Experiments were held, using the six machine type datasets of the challenge. Overall, competitive results were achieved by the proposed dense and convolutional AE, outperforming the baseline challenge method.
△ Less
Submitted 19 June, 2020; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Optimising maintenance: What are the expectations for Cyber Physical Systems
Authors:
Erkki Jantunen,
Urko Zurutuza,
Luis Lino Ferreira,
Pal Varga
Abstract:
The need for maintenance is based on the wear of components of machinery. If this need can be defined reliably beforehand so that no unpredicted failures take place then the maintenance actions can be carried out economically with mini-mum disturbances to production. There are two basic challenges in solving the above. First understanding the development of wear and failures, and second managing t…
▽ More
The need for maintenance is based on the wear of components of machinery. If this need can be defined reliably beforehand so that no unpredicted failures take place then the maintenance actions can be carried out economically with mini-mum disturbances to production. There are two basic challenges in solving the above. First understanding the development of wear and failures, and second managing the measurement and diagnosis of such parameters that can reveal the development of wear. In principle the development of wear and failures can be predicted through monitoring time, load or wear as such. Moni-toring time is not very efficient, as there are only limited numbers of components that suffer from aging which as such is the result of chemical wear i.e. changes in the material. In most cases the loading of components influences their wear. In principle the loading can be stable or varying in nature. Of these two cases the varying load case is much more challenging than the stable one. The monitoring of wear can be done either directly e.g. optical methods or indirectly e.g. vibration. Monitoring actual wear is naturally the most reliable approach, but it often means that additional investments are needed. The paper discusses how the monitoring of wear and need for maintenance can be done based on the use of Cyber Physical Systems.
△ Less
Submitted 20 March, 2019;
originally announced March 2019.