-
Aligning Trustworthy AI with Democracy: A Dual Taxonomy of Opportunities and Risks
Authors:
Oier Mentxaka,
Natalia Díaz-Rodríguez,
Mark Coeckelbergh,
Marcos López de Prado,
Emilia Gómez,
David Fernández Llorca,
Enrique Herrera-Viedma,
Francisco Herrera
Abstract:
Artificial Intelligence (AI) poses both significant risks and valuable opportunities for democratic governance. This paper introduces a dual taxonomy to evaluate AI's complex relationship with democracy: the AI Risks to Democracy (AIRD) taxonomy, which identifies how AI can undermine core democratic principles such as autonomy, fairness, and trust; and the AI's Positive Contributions to Democracy…
▽ More
Artificial Intelligence (AI) poses both significant risks and valuable opportunities for democratic governance. This paper introduces a dual taxonomy to evaluate AI's complex relationship with democracy: the AI Risks to Democracy (AIRD) taxonomy, which identifies how AI can undermine core democratic principles such as autonomy, fairness, and trust; and the AI's Positive Contributions to Democracy (AIPD) taxonomy, which highlights AI's potential to enhance transparency, participation, efficiency, and evidence-based policymaking.
Grounded in the European Union's approach to ethical AI governance, and particularly the seven Trustworthy AI requirements proposed by the European Commission's High-Level Expert Group on AI, each identified risk is aligned with mitigation strategies based on EU regulatory and normative frameworks. Our analysis underscores the transversal importance of transparency and societal well-being across all risk categories and offers a structured lens for aligning AI systems with democratic values.
By integrating democratic theory with practical governance tools, this paper offers a normative and actionable framework to guide research, regulation, and institutional design to support trustworthy, democratic AI. It provides scholars with a conceptual foundation to evaluate the democratic implications of AI, equips policymakers with structured criteria for ethical oversight, and helps technologists align system design with democratic principles. In doing so, it bridges the gap between ethical aspirations and operational realities, laying the groundwork for more inclusive, accountable, and resilient democratic systems in the algorithmic age.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Authors:
Maria Eriksson,
Erasmo Purificato,
Arman Noroozian,
Joao Vinagre,
Guillaume Chaslot,
Emilia Gomez,
David Fernandez-Llorca
Abstract:
Quantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems. Currently, they shape the direction of AI development and are playing an increasingly prominent role in regulatory frameworks. As their influence grows, however, so too does concerns about how and with what effects they evaluate hig…
▽ More
Quantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems. Currently, they shape the direction of AI development and are playing an increasingly prominent role in regulatory frameworks. As their influence grows, however, so too does concerns about how and with what effects they evaluate highly sensitive topics such as capabilities, including high-impact capabilities, safety and systemic risks. This paper presents an interdisciplinary meta-review of about 100 studies that discuss shortcomings in quantitative benchmarking practices, published in the last 10 years. It brings together many fine-grained issues in the design and application of benchmarks (such as biases in dataset creation, inadequate documentation, data contamination, and failures to distinguish signal from noise) with broader sociotechnical issues (such as an over-focus on evaluating text-based AI models according to one-time testing logic that fails to account for how AI models are increasingly multimodal and interact with humans and other technical systems). Our review also highlights a series of systemic flaws in current benchmarking practices, such as misaligned incentives, construct validity issues, unknown unknowns, and problems with the gaming of benchmark results. Furthermore, it underscores how benchmark practices are fundamentally shaped by cultural, commercial and competitive dynamics that often prioritise state-of-the-art performance at the expense of broader societal concerns. By providing an overview of risks associated with existing benchmarking procedures, we problematise disproportionate trust placed in benchmarks and contribute to ongoing efforts to improve the accountability and relevance of quantitative AI benchmarks within the complexities of real-world scenarios.
△ Less
Submitted 25 May, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
Supervision policies can shape long-term risk management in general-purpose AI models
Authors:
Manuel Cebrian,
Emilia Gomez,
David Fernandez Llorca
Abstract:
The rapid proliferation and deployment of General-Purpose AI (GPAI) models, including large language models (LLMs), present unprecedented challenges for AI supervisory entities. We hypothesize that these entities will need to navigate an emergent ecosystem of risk and incident reporting, likely to exceed their supervision capacity. To investigate this, we develop a simulation framework parameteriz…
▽ More
The rapid proliferation and deployment of General-Purpose AI (GPAI) models, including large language models (LLMs), present unprecedented challenges for AI supervisory entities. We hypothesize that these entities will need to navigate an emergent ecosystem of risk and incident reporting, likely to exceed their supervision capacity. To investigate this, we develop a simulation framework parameterized by features extracted from the diverse landscape of risk, incident, or hazard reporting ecosystems, including community-driven platforms, crowdsourcing initiatives, and expert assessments. We evaluate four supervision policies: non-prioritized (first-come, first-served), random selection, priority-based (addressing the highest-priority risks first), and diversity-prioritized (balancing high-priority risks with comprehensive coverage across risk types). Our results indicate that while priority-based and diversity-prioritized policies are more effective at mitigating high-impact risks, particularly those identified by experts, they may inadvertently neglect systemic issues reported by the broader community. This oversight can create feedback loops that amplify certain types of reporting while discouraging others, leading to a skewed perception of the overall risk landscape. We validate our simulation results with several real-world datasets, including one with over a million ChatGPT interactions, of which more than 150,000 conversations were identified as risky. This validation underscores the complex trade-offs inherent in AI risk supervision and highlights how the choice of risk management policies can shape the future landscape of AI risks across diverse GPAI models used in society.
△ Less
Submitted 10 June, 2025; v1 submitted 10 January, 2025;
originally announced January 2025.
-
GRATEV2.0: Computational Tools for Real-time Analysis of High-throughput High-resolution TEM (HRTEM) Images of Conjugated Polymers
Authors:
Dhruv Gamdha,
Ryan Fair,
Adarsh Krishnamurthy,
Enrique Gomez,
Baskar Ganapathysubramanian
Abstract:
Automated analysis of high-resolution transmission electron microscopy (HRTEM) images is increasingly essential for advancing research in organic electronics, where precise characterization of nanoscale crystal structures is crucial for optimizing material properties. This paper introduces an open-source computational framework called GRATEV2.0 (GRaph-based Analysis of TEM), designed for real-time…
▽ More
Automated analysis of high-resolution transmission electron microscopy (HRTEM) images is increasingly essential for advancing research in organic electronics, where precise characterization of nanoscale crystal structures is crucial for optimizing material properties. This paper introduces an open-source computational framework called GRATEV2.0 (GRaph-based Analysis of TEM), designed for real-time analysis of HRTEM data, with a focus on characterizing complex microstructures in conjugated polymers, illustrated using Poly[N-9'-heptadecanyl-2,7-carbazole-alt-5,5-(4',7'-di-2-thienyl-2',1',3'-benzothiadiazole)] (PCDTBT), a key material in organic photovoltaics. GRATEV2.0 employs fast, automated image processing algorithms, enabling rapid extraction of structural features like d-spacing, orientation, and crystal shape metrics. Gaussian process optimization rapidly identifies the user-defined parameters in the approach, reducing the need for manual parameter tuning and thus enhancing reproducibility and usability. Additionally, GRATEV2.0 is compatible with high-performance computing (HPC) environments, allowing for efficient, large-scale data processing at near real-time speeds. A unique feature of GRATEV2.0 is a Wasserstein distance-based stopping criterion, which optimizes data collection by determining when further sampling no longer adds statistically significant information. This capability optimizes the amount of time the TEM facility is used while ensuring data adequacy for in-depth analysis. Open-source and tested on a substantial PCDTBT dataset, this tool offers a powerful, robust, and accessible solution for high-throughput material characterization in organic electronics.
△ Less
Submitted 24 December, 2024; v1 submitted 5 November, 2024;
originally announced November 2024.
-
Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio
Authors:
Roser Batlle-Roca,
Wei-Hisang Liao,
Xavier Serra,
Yuki Mitsufuji,
Emilia Gómez
Abstract:
Recent advancements in music generation are raising multiple concerns about the implications of AI in creative music processes, current business models and impacts related to intellectual property management. A relevant discussion and related technical challenge is the potential replication and plagiarism of the training set in AI-generated music, which could lead to misuse of data and intellectua…
▽ More
Recent advancements in music generation are raising multiple concerns about the implications of AI in creative music processes, current business models and impacts related to intellectual property management. A relevant discussion and related technical challenge is the potential replication and plagiarism of the training set in AI-generated music, which could lead to misuse of data and intellectual property rights violations. To tackle this issue, we present the Music Replication Assessment (MiRA) tool: a model-independent open evaluation method based on diverse audio music similarity metrics to assess data replication. We evaluate the ability of five metrics to identify exact replication by conducting a controlled replication experiment in different music genres using synthetic samples. Our results show that the proposed methodology can estimate exact data replication with a proportion higher than 10%. By introducing the MiRA tool, we intend to encourage the open evaluation of music-generative models by researchers, developers, and users concerning data replication, highlighting the importance of the ethical, social, legal, and economic consequences. Code and examples are available for reproducibility purposes.
△ Less
Submitted 1 August, 2024; v1 submitted 19 July, 2024;
originally announced July 2024.
-
Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness
Authors:
David Fernández Llorca,
Ronan Hamon,
Henrik Junklewitz,
Kathrin Grosse,
Lars Kunze,
Patrick Seiniger,
Robert Swaim,
Nick Reed,
Alexandre Alahi,
Emilia Gómez,
Ignacio Sánchez,
Akos Kriston
Abstract:
This study explores the complexities of integrating Artificial Intelligence (AI) into Autonomous Vehicles (AVs), examining the challenges introduced by AI components and the impact on testing procedures, focusing on some of the essential requirements for trustworthy AI. Topics addressed include the role of AI at various operational layers of AVs, the implications of the EU's AI Act on AVs, and the…
▽ More
This study explores the complexities of integrating Artificial Intelligence (AI) into Autonomous Vehicles (AVs), examining the challenges introduced by AI components and the impact on testing procedures, focusing on some of the essential requirements for trustworthy AI. Topics addressed include the role of AI at various operational layers of AVs, the implications of the EU's AI Act on AVs, and the need for new testing methodologies for Advanced Driver Assistance Systems (ADAS) and Automated Driving Systems (ADS). The study also provides a detailed analysis on the importance of cybersecurity audits, the need for explainability in AI decision-making processes and protocols for assessing the robustness and ethical behaviour of predictive systems in AVs. The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology, highlighting the need for multidisciplinary expertise.
△ Less
Submitted 21 February, 2024;
originally announced March 2024.
-
Face Recognition: to Deploy or not to Deploy? A Framework for Assessing the Proportional Use of Face Recognition Systems in Real-World Scenarios
Authors:
Pablo Negri,
Isabelle Hupont,
Emilia Gomez
Abstract:
Face recognition (FR) has reached a high technical maturity. However, its use needs to be carefully assessed from an ethical perspective, especially in sensitive scenarios. This is precisely the focus of this paper: the use of FR for the identification of specific subjects in moderately to densely crowded spaces (e.g. public spaces, sports stadiums, train stations) and law enforcement scenarios. I…
▽ More
Face recognition (FR) has reached a high technical maturity. However, its use needs to be carefully assessed from an ethical perspective, especially in sensitive scenarios. This is precisely the focus of this paper: the use of FR for the identification of specific subjects in moderately to densely crowded spaces (e.g. public spaces, sports stadiums, train stations) and law enforcement scenarios. In particular, there is a need to consider the trade-off between the need to protect privacy and fundamental rights of citizens as well as their safety. Recent Artificial Intelligence (AI) policies, notably the European AI Act, propose that such FR interventions should be proportionate and deployed only when strictly necessary. Nevertheless, concrete guidelines on how to address the concept of proportional FR intervention are lacking to date. This paper proposes a framework to contribute to assessing whether an FR intervention is proportionate or not for a given context of use in the above mentioned scenarios. It also identifies the main quantitative and qualitative variables relevant to the FR intervention decision (e.g. number of people in the scene, level of harm that the person(s) in search could perpetrate, consequences to individual rights and freedoms) and propose a 2D graphical model making it possible to balance these variables in terms of ethical cost vs security gain. Finally, different FR scenarios inspired by real-world deployments validate the proposed model. The framework is conceived as a simple support tool for decision makers when confronted with the deployment of an FR system.
△ Less
Submitted 3 September, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
MOReGIn: Multi-Objective Recommendation at the Global and Individual Levels
Authors:
Elizabeth Gómez,
David Contreras,
Ludovico Boratto,
Maria Salamó
Abstract:
Multi-Objective Recommender Systems (MORSs) emerged as a paradigm to guarantee multiple (often conflicting) goals. Besides accuracy, a MORS can operate at the global level, where additional beyond-accuracy goals are met for the system as a whole, or at the individual level, meaning that the recommendations are tailored to the needs of each user. The state-of-the-art MORSs either operate at the glo…
▽ More
Multi-Objective Recommender Systems (MORSs) emerged as a paradigm to guarantee multiple (often conflicting) goals. Besides accuracy, a MORS can operate at the global level, where additional beyond-accuracy goals are met for the system as a whole, or at the individual level, meaning that the recommendations are tailored to the needs of each user. The state-of-the-art MORSs either operate at the global or individual level, without assuming the co-existence of the two perspectives. In this study, we show that when global and individual objectives co-exist, MORSs are not able to meet both types of goals. To overcome this issue, we present an approach that regulates the recommendation lists so as to guarantee both global and individual perspectives, while preserving its effectiveness. Specifically, as individual perspective, we tackle genre calibration and, as global perspective, provider fairness. We validate our approach on two real-world datasets, publicly released with this paper.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Attribute Annotation and Bias Evaluation in Visual Datasets for Autonomous Driving
Authors:
David Fernández Llorca,
Pedro Frau,
Ignacio Parra,
Rubén Izquierdo,
Emilia Gómez
Abstract:
This paper addresses the often overlooked issue of fairness in the autonomous driving domain, particularly in vision-based perception and prediction systems, which play a pivotal role in the overall functioning of Autonomous Vehicles (AVs). We focus our analysis on biases present in some of the most commonly used visual datasets for training person and vehicle detection systems. We introduce an an…
▽ More
This paper addresses the often overlooked issue of fairness in the autonomous driving domain, particularly in vision-based perception and prediction systems, which play a pivotal role in the overall functioning of Autonomous Vehicles (AVs). We focus our analysis on biases present in some of the most commonly used visual datasets for training person and vehicle detection systems. We introduce an annotation methodology and a specialised annotation tool, both designed to annotate protected attributes of agents in visual datasets. We validate our methodology through an inter-rater agreement analysis and provide the distribution of attributes across all datasets. These include annotations for the attributes age, sex, skin tone, group, and means of transport for more than 90K people, as well as vehicle type, colour, and car type for over 50K vehicles. Generally, diversity is very low for most attributes, with some groups, such as children, wheelchair users, or personal mobility vehicle users, being extremely underrepresented in the analysed datasets. The study contributes significantly to efforts to consider fairness in the evaluation of perception and prediction systems for AVs. This paper follows reproducibility principles. The annotation tool, scripts and the annotated attributes can be accessed publicly at https://github.com/ec-jrc/humaint_annotator.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors
Authors:
Marta Navarro,
Josué Feliu,
Salvador Petit,
María E. Gómez,
Julio Sahuquillo
Abstract:
Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from distinct threads. However, resource sharing introduces inter-thread interference within the core, which has a negative impact on individual application performance and can significantly increase the turnaround time of multi-program workloads. T…
▽ More
Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from distinct threads. However, resource sharing introduces inter-thread interference within the core, which has a negative impact on individual application performance and can significantly increase the turnaround time of multi-program workloads. The severity of the interference effects depends on the competing co-runners sharing the core. Thus, it can be mitigated by applying a thread-to-core allocation policy that smartly selects applications to be run in the same core to minimize their interference.
This paper presents SYNPA, a simple approach that dynamically allocates threads to cores in an SMT processor based on their run-time dynamic behavior. The approach uses a regression model to select synergistic pairs to mitigate intra-core interference. The main novelty of SYNPA is that it uses just three variables collected from the performance counters available in current ARM processors at the dispatch stage. Experimental results show that SYNPA outperforms the default Linux scheduler by around 36%, on average, in terms of turnaround time in 8-application workloads combining frontend bound and backend bound benchmarks.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Towards Automated Accessibility Report Generation for Mobile Apps
Authors:
Amanda Swearngin,
Jason Wu,
Xiaoyi Zhang,
Esteban Gomez,
Jen Coughenour,
Rachel Stukenborg,
Bhavya Garg,
Greg Hughes,
Adriana Hilliard,
Jeffrey P. Bigham,
Jeffrey Nichols
Abstract:
Many apps have basic accessibility issues, like missing labels or low contrast. Automated tools can help app developers catch basic issues, but can be laborious or require writing dedicated tests. We propose a system, motivated by a collaborative process with accessibility stakeholders at a large technology company, to generate whole app accessibility reports by combining varied data collection me…
▽ More
Many apps have basic accessibility issues, like missing labels or low contrast. Automated tools can help app developers catch basic issues, but can be laborious or require writing dedicated tests. We propose a system, motivated by a collaborative process with accessibility stakeholders at a large technology company, to generate whole app accessibility reports by combining varied data collection methods (e.g., app crawling, manual recording) with an existing accessibility scanner. Many such scanners are based on single-screen scanning, and a key problem in whole app accessibility reporting is to effectively de-duplicate and summarize issues collected across an app. To this end, we developed a screen grouping model with 96.9% accuracy (88.8% F1-score) and UI element matching heuristics with 97% accuracy (98.2% F1-score). We combine these technologies in a system to report and summarize unique issues across an app, and enable a unique pixel-based ignore feature to help engineers and testers better manage reported issues across their app's lifetime. We conducted a qualitative evaluation with 18 accessibility-focused engineers and testers which showed this system can enhance their existing accessibility testing toolkit and address key limitations in current accessibility scanning tools.
△ Less
Submitted 16 October, 2023; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Behind Recommender Systems: the Geography of the ACM RecSys Community
Authors:
Lorenzo Porcaro,
João Vinagre,
Pedro Frau,
Isabelle Hupont,
Emilia Gómez
Abstract:
The amount and dissemination rate of media content accessible online is nowadays overwhelming. Recommender Systems filter this information into manageable streams or feeds, adapted to our personal needs or preferences. It is of utter importance that algorithms employed to filter information do not distort or cut out important elements from our perspectives of the world. Under this principle, it is…
▽ More
The amount and dissemination rate of media content accessible online is nowadays overwhelming. Recommender Systems filter this information into manageable streams or feeds, adapted to our personal needs or preferences. It is of utter importance that algorithms employed to filter information do not distort or cut out important elements from our perspectives of the world. Under this principle, it is essential to involve diverse views and teams from the earliest stages of their design and development. This has been highlighted, for instance, in recent European Union regulations such as the Digital Services Act, via the requirement of risk monitoring, including the risk of discrimination, and the AI Act, through the requirement to involve people with diverse backgrounds in the development of AI systems. We look into the geographic diversity of the recommender systems research community, specifically by analyzing the affiliation countries of the authors who contributed to the ACM Conference on Recommender Systems (RecSys) during the last 15 years. This study has been carried out in the framework of the Diversity in AI - DivinAI project, whose main objective is the long-term monitoring of diversity in AI forums through a set of indexes.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Use case cards: a use case reporting framework inspired by the European AI Act
Authors:
Isabelle Hupont,
David Fernández-Llorca,
Sandra Baldassarri,
Emilia Gómez
Abstract:
Despite recent efforts by the Artificial Intelligence (AI) community to move towards standardised procedures for documenting models, methods, systems or datasets, there is currently no methodology focused on use cases aligned with the risk-based approach of the European AI Act (AI Act). In this paper, we propose a new framework for the documentation of use cases, that we call "use case cards", bas…
▽ More
Despite recent efforts by the Artificial Intelligence (AI) community to move towards standardised procedures for documenting models, methods, systems or datasets, there is currently no methodology focused on use cases aligned with the risk-based approach of the European AI Act (AI Act). In this paper, we propose a new framework for the documentation of use cases, that we call "use case cards", based on the use case modelling included in the Unified Markup Language (UML) standard. Unlike other documentation methodologies, we focus on the intended purpose and operational use of an AI system. It consists of two main parts. Firstly, a UML-based template, tailored to allow implicitly assessing the risk level of the AI system and defining relevant requirements. Secondly, a supporting UML diagram designed to provide information about the system-user interactions and relationships. The proposed framework is the result of a co-design process involving a relevant team of EU policy experts and scientists. We have validated our proposal with 11 experts with different backgrounds and a reasonable knowledge of the AI Act as a prerequisite. We provide the 5 "use case cards" used in the co-design and validation process. "Use case cards" allows framing and contextualising use cases in an effective way, and we hope this methodology can be a useful tool for policy makers and providers for documenting use cases, assessing the risk level, adapting the different requirements and building a catalogue of existing usages of AI.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
GenQ: Automated Question Generation to Support Caregivers While Reading Stories with Children
Authors:
Arun Balajiee Lekshmi Narayanan,
Ligia E. Gomez,
Martha Michelle Soto Fernandez,
Tri Nguyen,
Chris Blais,
M. Adelaida Restrepo,
Art Glenberg
Abstract:
When caregivers ask open--ended questions to motivate dialogue with children, it facilitates the child's reading comprehension skills.Although there is scope for use of technological tools, referred here as "intelligent tutoring systems", to scaffold this process, it is currently unclear whether existing intelligent systems that generate human--language like questions is beneficial. Additionally,…
▽ More
When caregivers ask open--ended questions to motivate dialogue with children, it facilitates the child's reading comprehension skills.Although there is scope for use of technological tools, referred here as "intelligent tutoring systems", to scaffold this process, it is currently unclear whether existing intelligent systems that generate human--language like questions is beneficial. Additionally, training data used in the development of these automated question generation systems is typically sourced without attention to demographics, but people with different cultural backgrounds may ask different questions. As a part of a broader project to design an intelligent reading support app for Latinx children, we crowdsourced questions from Latinx caregivers and noncaregivers as well as caregivers and noncaregivers from other demographics. We examine variations in question--asking within this dataset mediated by individual, cultural, and contextual factors. We then design a system that automatically extracts templates from this data to generate open--ended questions that are representative of those asked by Latinx caregivers.
△ Less
Submitted 25 September, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
CACTUS: A Computational Framework for Generating Realistic White Matter Microstructure Substrates
Authors:
Juan Luis Villarreal-Haro,
Remy Gardier,
Erick J Canales-Rodriguez,
Elda Fischi Gomez,
Gabriel Girard,
Jean-Philippe Thiran,
Jonathan Rafael-Patino
Abstract:
Monte-Carlo diffusion simulations are a powerful tool for validating tissue microstructure models by generating synthetic diffusion-weighted magnetic resonance images (DW-MRI) in controlled environments. This is fundamental for understanding the link between micrometre-scale tissue properties and DW-MRI signals measured at the millimetre-scale, optimising acquisition protocols to target microstruc…
▽ More
Monte-Carlo diffusion simulations are a powerful tool for validating tissue microstructure models by generating synthetic diffusion-weighted magnetic resonance images (DW-MRI) in controlled environments. This is fundamental for understanding the link between micrometre-scale tissue properties and DW-MRI signals measured at the millimetre-scale, optimising acquisition protocols to target microstructure properties of interest, and exploring the robustness and accuracy of estimation methods. However, accurate simulations require substrates that reflect the main microstructural features of the studied tissue. To address this challenge, we introduce a novel computational workflow, CACTUS (Computational Axonal Configurator for Tailored and Ultradense Substrates), for generating synthetic white matter substrates. Our approach allows constructing substrates with higher packing density than existing methods, up to 95 % intra-axonal volume fraction, and larger voxel sizes of up to (500um) 3 with rich fibre complexity. CACTUS generates bundles with angular dispersion, bundle crossings, and variations along the fibres of their inner and outer radii and g-ratio. We achieve this by introducing a novel global cost function and a fibre radial growth approach that allows substrates to match predefined targeted characteristics and mirror those reported in histological studies. CACTUS improves the development of complex synthetic substrates, paving the way for future applications in microstructure imaging.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Fairness and Diversity in Information Access Systems
Authors:
Lorenzo Porcaro,
Carlos Castillo,
Emilia Gómez,
João Vinagre
Abstract:
Among the seven key requirements to achieve trustworthy AI proposed by the High-Level Expert Group on Artificial Intelligence (AI-HLEG) established by the European Commission (EC), the fifth requirement ("Diversity, non-discrimination and fairness") declares: "In order to achieve Trustworthy AI, we must enable inclusion and diversity throughout the entire AI system's life cycle. [...] This require…
▽ More
Among the seven key requirements to achieve trustworthy AI proposed by the High-Level Expert Group on Artificial Intelligence (AI-HLEG) established by the European Commission (EC), the fifth requirement ("Diversity, non-discrimination and fairness") declares: "In order to achieve Trustworthy AI, we must enable inclusion and diversity throughout the entire AI system's life cycle. [...] This requirement is closely linked with the principle of fairness". In this paper, we try to shed light on how closely these two distinct concepts, diversity and fairness, may be treated by focusing on information access systems and ranking literature. These concepts should not be used interchangeably because they do represent two different values, but what we argue is that they also cannot be considered totally unrelated or divergent. Having diversity does not imply fairness, but fostering diversity can effectively lead to fair outcomes, an intuition behind several methods proposed to mitigate the disparate impact of information access systems, i.e. recommender systems and search engines.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Assessing the Impact of Music Recommendation Diversity on Listeners: A Longitudinal Study
Authors:
Lorenzo Porcaro,
Emilia Gómez,
Carlos Castillo
Abstract:
We present the results of a 12-week longitudinal user study wherein the participants, 110 subjects from Southern Europe, received on a daily basis Electronic Music (EM) diversified recommendations. By analyzing their explicit and implicit feedback, we show that exposure to specific levels of music recommendation diversity may be responsible for long-term impacts on listeners' attitudes. In particu…
▽ More
We present the results of a 12-week longitudinal user study wherein the participants, 110 subjects from Southern Europe, received on a daily basis Electronic Music (EM) diversified recommendations. By analyzing their explicit and implicit feedback, we show that exposure to specific levels of music recommendation diversity may be responsible for long-term impacts on listeners' attitudes. In particular, we highlight the function of diversity in increasing the openness in listening to EM, a music genre not particularly known or liked by the participants previous to their participation in the study. Moreover, we demonstrate that recommendations may help listeners in removing positive and negative attachments towards EM, deconstructing pre-existing implicit associations but also stereotypes associated with this music. In addition, our results show the significant clout that recommendation diversity has in generating curiosity in listeners.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Liability regimes in the age of AI: a use-case driven analysis of the burden of proof
Authors:
David Fernández Llorca,
Vicky Charisi,
Ronan Hamon,
Ignacio Sánchez,
Emilia Gómez
Abstract:
New emerging technologies powered by Artificial Intelligence (AI) have the potential to disruptively transform our societies for the better. In particular, data-driven learning approaches (i.e., Machine Learning (ML)) have been a true revolution in the advancement of multiple technologies in various application domains. But at the same time there is growing concern about certain intrinsic characte…
▽ More
New emerging technologies powered by Artificial Intelligence (AI) have the potential to disruptively transform our societies for the better. In particular, data-driven learning approaches (i.e., Machine Learning (ML)) have been a true revolution in the advancement of multiple technologies in various application domains. But at the same time there is growing concern about certain intrinsic characteristics of these methodologies that carry potential risks to both safety and fundamental rights. Although there are mechanisms in the adoption process to minimize these risks (e.g., safety regulations), these do not exclude the possibility of harm occurring, and if this happens, victims should be able to seek compensation. Liability regimes will therefore play a key role in ensuring basic protection for victims using or interacting with these systems. However, the same characteristics that make AI systems inherently risky, such as lack of causality, opacity, unpredictability or their self and continuous learning capabilities, may lead to considerable difficulties when it comes to proving causation. This paper presents three case studies, as well as the methodology to reach them, that illustrate these difficulties. Specifically, we address the cases of cleaning robots, delivery drones and robots in education. The outcome of the proposed analysis suggests the need to revise liability regimes to alleviate the burden of proof on victims in cases involving AI technologies.
△ Less
Submitted 17 March, 2023; v1 submitted 3 November, 2022;
originally announced November 2022.
-
Algebra of N-event synchronization
Authors:
Ernesto Gomez,
Keith E. Schubert,
Khalil Dajani
Abstract:
We have previously defined synchronization (Gomez, E. and K. Schubert 2011) as a relation between the times at which a pair of events can happen, and introduced an algebra that covers all possible relations for such pairs. In this work we introduce the synchronization matrix, to make it easier to calculate the properties and results of $N$ event synchronizations, such as are commonly encountered i…
▽ More
We have previously defined synchronization (Gomez, E. and K. Schubert 2011) as a relation between the times at which a pair of events can happen, and introduced an algebra that covers all possible relations for such pairs. In this work we introduce the synchronization matrix, to make it easier to calculate the properties and results of $N$ event synchronizations, such as are commonly encountered in parallel execution of multiple processes. The synchronization matrix leads to the definition of N-event synchronization algebras as specific extensions to the original algebra. We derive general properties of such synchronization, and we are able to analyze effects of synchronization on the phase space of parallel execution introduced in (Gomez E Kai R, Schubert KE 2017)
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Documenting use cases in the affective computing domain using Unified Modeling Language
Authors:
Isabelle Hupont,
Emilia Gomez
Abstract:
The study of the ethical impact of AI and the design of trustworthy systems needs the analysis of the scenarios where AI systems are used, which is related to the software engineering concept of "use case" and the "intended purpose" legal term. However, there is no standard methodology for use case documentation covering the context of use, scope, functional requirements and risks of an AI system.…
▽ More
The study of the ethical impact of AI and the design of trustworthy systems needs the analysis of the scenarios where AI systems are used, which is related to the software engineering concept of "use case" and the "intended purpose" legal term. However, there is no standard methodology for use case documentation covering the context of use, scope, functional requirements and risks of an AI system. In this work, we propose a novel documentation methodology for AI use cases, with a special focus on the affective computing domain. Our approach builds upon an assessment of use case information needs documented in the research literature and the recently proposed European regulatory framework for AI. From this assessment, we adopt and adapt the Unified Modeling Language (UML), which has been used in the last two decades mostly by software engineers. Each use case is then represented by an UML diagram and a structured table, and we provide a set of examples illustrating its application to several affective computing scenarios.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Guidelines to Develop Trustworthy Conversational Agents for Children
Authors:
Marina Escobar-Planas,
Emilia Gómez,
Carlos-D Martínez-Hinarejos
Abstract:
Conversational agents (CAs) embodied in speakers or chatbots are becoming very popular in some countries, and despite their adult-centred design, they have become part of children's lives, generating a need for children-centric trustworthy systems. This paper presents a literature review to identify the main opportunities, challenges and risks brought by CAs when used by children. We then consider…
▽ More
Conversational agents (CAs) embodied in speakers or chatbots are becoming very popular in some countries, and despite their adult-centred design, they have become part of children's lives, generating a need for children-centric trustworthy systems. This paper presents a literature review to identify the main opportunities, challenges and risks brought by CAs when used by children. We then consider relevant ethical guidelines for AI and adapt them to this particular system and population, using a Delphi methodology with a set of experts from different disciplines. From this analysis, we propose specific guidelines to help CAs developers improve their design towards trustworthiness and children.
△ Less
Submitted 1 September, 2022;
originally announced September 2022.
-
Federated Data Analytics: A Study on Linear Models
Authors:
Xubo Yue,
Raed Al Kontar,
Ana María Estrada Gómez
Abstract:
As edge devices become increasingly powerful, data analytics are gradually moving from a centralized to a decentralized regime where edge compute resources are exploited to process more of the data locally. This regime of analytics is coined as federated data analytics (FDA). In spite of the recent success stories of FDA, most literature focuses exclusively on deep neural networks. In this work, w…
▽ More
As edge devices become increasingly powerful, data analytics are gradually moving from a centralized to a decentralized regime where edge compute resources are exploited to process more of the data locally. This regime of analytics is coined as federated data analytics (FDA). In spite of the recent success stories of FDA, most literature focuses exclusively on deep neural networks. In this work, we take a step back to develop an FDA treatment for one of the most fundamental statistical models: linear regression. Our treatment is built upon hierarchical modeling that allows borrowing strength across multiple groups. To this end, we propose two federated hierarchical model structures that provide a shared representation across devices to facilitate information sharing. Notably, our proposed frameworks are capable of providing uncertainty quantification, variable selection, hypothesis testing and fast adaptation to new unseen data. We validate our methods on a range of real-life applications including condition monitoring for aircraft engines. The results show that our FDA treatment for linear models can serve as a competing benchmark model for future development of federated algorithms.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Monitoring Diversity of AI Conferences: Lessons Learnt and Future Challenges in the DivinAI Project
Authors:
Isabelle Hupont,
Emilia Gomez,
Songul Tolan,
Lorenzo Porcaro,
Ana Freire
Abstract:
DivinAI is an open and collaborative initiative promoted by the European Commission's Joint Research Centre to measure and monitor diversity indicators related to AI conferences, with special focus on gender balance, geographical representation, and presence of academia vs companies. This paper summarizes the main achievements and lessons learnt during the first year of life of the DivinAI project…
▽ More
DivinAI is an open and collaborative initiative promoted by the European Commission's Joint Research Centre to measure and monitor diversity indicators related to AI conferences, with special focus on gender balance, geographical representation, and presence of academia vs companies. This paper summarizes the main achievements and lessons learnt during the first year of life of the DivinAI project, and proposes a set of recommendations for its further development and maintenance by the AI community.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
Diversity in the Music Listening Experience: Insights from Focus Group Interviews
Authors:
Lorenzo Porcaro,
Emilia Gómez,
Carlos Castillo
Abstract:
Music listening in today's digital spaces is highly characterized by the availability of huge music catalogues, accessible by people all over the world. In this scenario, recommender systems are designed to guide listeners in finding tracks and artists that best fit their requests, having therefore the power to influence the diversity of the music they listen to. Albeit several works have proposed…
▽ More
Music listening in today's digital spaces is highly characterized by the availability of huge music catalogues, accessible by people all over the world. In this scenario, recommender systems are designed to guide listeners in finding tracks and artists that best fit their requests, having therefore the power to influence the diversity of the music they listen to. Albeit several works have proposed new techniques for developing diversity-aware recommendations, little is known about how people perceive diversity while interacting with music recommendations. In this study, we interview several listeners about the role that diversity plays in their listening experience, trying to get a better understanding of how they interact with music recommendations. We recruit the listeners among the participants of a previous quantitative study, where they were confronted with the notion of diversity when asked to identify, from a series of electronic music lists, the most diverse ones according to their beliefs. As a follow-up, in this qualitative study we carry out semi-structured interviews to understand how listeners may assess the diversity of a music list and to investigate their experiences with music recommendation diversity. We report here our main findings on 1) what can influence the diversity assessment of tracks and artists' music lists, and 2) which factors can characterize listeners' interaction with music recommendation diversity.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Personalized musically induced emotions of not-so-popular Colombian music
Authors:
Juan Sebastián Gómez-Cañón,
Perfecto Herrera,
Estefanía Cano,
Emilia Gómez
Abstract:
This work presents an initial proof of concept of how Music Emotion Recognition (MER) systems could be intentionally biased with respect to annotations of musically induced emotions in a political context. In specific, we analyze traditional Colombian music containing politically charged lyrics of two types: (1) vallenatos and social songs from the "left-wing" guerrilla Fuerzas Armadas Revoluciona…
▽ More
This work presents an initial proof of concept of how Music Emotion Recognition (MER) systems could be intentionally biased with respect to annotations of musically induced emotions in a political context. In specific, we analyze traditional Colombian music containing politically charged lyrics of two types: (1) vallenatos and social songs from the "left-wing" guerrilla Fuerzas Armadas Revolucionarias de Colombia (FARC) and (2) corridos from the "right-wing" paramilitaries Autodefensas Unidas de Colombia (AUC). We train personalized machine learning models to predict induced emotions for three users with diverse political views - we aim at identifying the songs that may induce negative emotions for a particular user, such as anger and fear. To this extent, a user's emotion judgements could be interpreted as problematizing data - subjective emotional judgments could in turn be used to influence the user in a human-centered machine learning environment. In short, highly desired "emotion regulation" applications could potentially deviate to "emotion manipulation" - the recent discredit of emotion recognition technologies might transcend ethical issues of diversity and inclusion.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
EIHW-MTG: Second DiCOVA Challenge System Report
Authors:
Adria Mallol-Ragolta,
Helena Cuesta,
Emilia Gómez,
Björn W. Schuller
Abstract:
This work presents an outer product-based approach to fuse the embedded representations generated from the spectrograms of cough, breath, and speech samples for the automatic detection of COVID-19. To extract deep learnt representations from the spectrograms, we compare the performance of a CNN trained from scratch and a ResNet18 architecture fine-tuned for the task at hand. Furthermore, we invest…
▽ More
This work presents an outer product-based approach to fuse the embedded representations generated from the spectrograms of cough, breath, and speech samples for the automatic detection of COVID-19. To extract deep learnt representations from the spectrograms, we compare the performance of a CNN trained from scratch and a ResNet18 architecture fine-tuned for the task at hand. Furthermore, we investigate whether the patients' sex and the use of contextual attention mechanisms is beneficial. Our experiments use the dataset released as part of the Second Diagnosing COVID-19 using Acoustics (DiCOVA) Challenge. The results suggest the suitability of fusing breath and speech information to detect COVID-19. An Area Under the Curve (AUC) of 84.06% is obtained on the test partition when using a CNN trained from scratch with contextual attention mechanisms. When using the ResNet18 architecture for feature extraction, the baseline model scores the highest performance with an AUC of 84.26%.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
EIHW-MTG DiCOVA 2021 Challenge System Report
Authors:
Adria Mallol-Ragolta,
Helena Cuesta,
Emilia Gómez,
Björn W. Schuller
Abstract:
This paper aims to automatically detect COVID-19 patients by analysing the acoustic information embedded in coughs. COVID-19 affects the respiratory system, and, consequently, respiratory-related signals have the potential to contain salient information for the task at hand. We focus on analysing the spectrogram representations of coughing samples with the aim to investigate whether COVID-19 alter…
▽ More
This paper aims to automatically detect COVID-19 patients by analysing the acoustic information embedded in coughs. COVID-19 affects the respiratory system, and, consequently, respiratory-related signals have the potential to contain salient information for the task at hand. We focus on analysing the spectrogram representations of coughing samples with the aim to investigate whether COVID-19 alters the frequency content of these signals. Furthermore, this work also assesses the impact of gender in the automatic detection of COVID-19. To extract deep learnt representations of the spectrograms, we compare the performance of a cough-specific, and a Resnet18 pre-trained Convolutional Neural Network (CNN). Additionally, our approach explores the use of contextual attention, so the model can learn to highlight the most relevant deep learnt features extracted by the CNN. We conduct our experiments on the dataset released for the Cough Sound Track of the DiCOVA 2021 Challenge. The best performance on the test set is obtained using the Resnet18 pre-trained CNN with contextual attention, which scored an Area Under the Curve (AUC) of 70.91 at 80% sensitivity.
△ Less
Submitted 13 October, 2021;
originally announced October 2021.
-
Assessing Algorithmic Biases for Musical Version Identification
Authors:
Furkan Yesiler,
Marius Miron,
Joan Serrà,
Emilia Gómez
Abstract:
Version identification (VI) systems now offer accurate and scalable solutions for detecting different renditions of a musical composition, allowing the use of these systems in industrial applications and throughout the wider music ecosystem. Such use can have an important impact on various stakeholders regarding recognition and financial benefits, including how royalties are circulated for digital…
▽ More
Version identification (VI) systems now offer accurate and scalable solutions for detecting different renditions of a musical composition, allowing the use of these systems in industrial applications and throughout the wider music ecosystem. Such use can have an important impact on various stakeholders regarding recognition and financial benefits, including how royalties are circulated for digital rights management. In this work, we take a step toward acknowledging this impact and consider VI systems as socio-technical systems rather than isolated technologies. We propose a framework for quantifying performance disparities across 5 systems and 6 relevant side attributes: gender, popularity, country, language, year, and prevalence. We also consider 3 main stakeholders for this particular information retrieval use case: the performing artists of query tracks, those of reference (original) tracks, and the composers. By categorizing the recordings in our dataset using such attributes and stakeholders, we analyze whether the considered VI systems show any implicit biases. We find signs of disparities in identification performance for most of the groups we include in our analyses. Moreover, we also find that learning- and rule-based systems behave differently for some attributes, which suggests an additional dimension to consider along with accuracy and scalability when evaluating VI systems. Lastly, we share our dataset with attribute annotations to encourage VI researchers to take these aspects into account while building new systems.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
How diverse is the ACII community? Analysing gender, geographical and business diversity of Affective Computing research
Authors:
Isabelle Hupont,
Songül Tolan,
Ana Freire,
Lorenzo Porcaro,
Sara Estevez,
Emilia Gómez
Abstract:
ACII is the premier international forum for presenting the latest research on affective computing. In this work, we monitor, quantify and reflect on the diversity in ACII conference across time by computing a set of indexes. We measure diversity in terms of gender, geographic location and academia vs research centres vs industry, and consider three different actors: authors, keynote speakers and o…
▽ More
ACII is the premier international forum for presenting the latest research on affective computing. In this work, we monitor, quantify and reflect on the diversity in ACII conference across time by computing a set of indexes. We measure diversity in terms of gender, geographic location and academia vs research centres vs industry, and consider three different actors: authors, keynote speakers and organizers. Results raise awareness on the limited diversity in the field, in all studied facets, and compared to other AI conferences. While gender diversity is relatively high, equality is far from being reached. The community is dominated by European, Asian and North American researchers, leading the rest of continents under-represented. There is also a strong absence of companies and research centres focusing on applied research and products. This study fosters discussion in the community on the need for diversity and related challenges in terms of minimizing potential biases of the developed systems to the represented groups. We intend our paper to contribute with a first analysis to consider as a monitoring tool when implementing diversity initiatives. The data collected for this study are publicly released through the European divinAI initiative.
△ Less
Submitted 12 September, 2021;
originally announced September 2021.
-
LoopNet: Musical Loop Synthesis Conditioned On Intuitive Musical Parameters
Authors:
Pritish Chandna,
António Ramires,
Xavier Serra,
Emilia Gómez
Abstract:
Loops, seamlessly repeatable musical segments, are a cornerstone of modern music production. Contemporary artists often mix and match various sampled or pre-recorded loops based on musical criteria such as rhythm, harmony and timbral texture to create compositions. Taking such criteria into account, we present LoopNet, a feed-forward generative model for creating loops conditioned on intuitive par…
▽ More
Loops, seamlessly repeatable musical segments, are a cornerstone of modern music production. Contemporary artists often mix and match various sampled or pre-recorded loops based on musical criteria such as rhythm, harmony and timbral texture to create compositions. Taking such criteria into account, we present LoopNet, a feed-forward generative model for creating loops conditioned on intuitive parameters. We leverage Music Information Retrieval (MIR) models as well as a large collection of public loop samples in our study and use the Wave-U-Net architecture to map control parameters to audio. We also evaluate the quality of the generated audio and propose intuitive controls for composers to map the ideas in their minds to an audio loop.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
Perceptions of Diversity in Electronic Music: the Impact of Listener, Artist, and Track Characteristics
Authors:
Lorenzo Porcaro,
Emilia Gómez,
Carlos Castillo
Abstract:
Shared practices to assess the diversity of retrieval system results are still debated in the Information Retrieval community, partly because of the challenges of determining what diversity means in specific scenarios, and of understanding how diversity is perceived by end-users. The field of Music Information Retrieval is not exempt from this issue. Even if fields such as Musicology or Sociology…
▽ More
Shared practices to assess the diversity of retrieval system results are still debated in the Information Retrieval community, partly because of the challenges of determining what diversity means in specific scenarios, and of understanding how diversity is perceived by end-users. The field of Music Information Retrieval is not exempt from this issue. Even if fields such as Musicology or Sociology of Music have a long tradition in questioning the representation and the impact of diversity in cultural environments, such knowledge has not been yet embedded into the design and development of music technologies. In this paper, focusing on electronic music, we investigate the characteristics of listeners, artists, and tracks that are influential in the perception of diversity. Specifically, we center our attention on 1) understanding the relationship between perceived diversity and computational methods to measure diversity, and 2) analyzing how listeners' domain knowledge and familiarity influence such perceived diversity. To accomplish this, we design a user-study in which listeners are asked to compare pairs of lists of tracks and artists, and to select the most diverse list from each pair. We compare participants' ratings with results obtained through computational models built using audio tracks' features and artist attributes. We find that such models are generally aligned with participants' choices when most of them agree that one list is more diverse than the other, while they present a mixed behaviour in cases where participants have little agreement. Moreover, we observe how differences in domain knowledge, familiarity, and demographics can influence the level of agreement among listeners, and between listeners and diversity metrics computed automatically.
△ Less
Submitted 26 November, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Investigating the efficacy of music version retrieval systems for setlist identification
Authors:
Furkan Yesiler,
Emilio Molina,
Joan Serrà,
Emilia Gómez
Abstract:
The setlist identification (SLI) task addresses a music recognition use case where the goal is to retrieve the metadata and timestamps for all the tracks played in live music events. Due to various musical and non-musical changes in live performances, developing automatic SLI systems is still a challenging task that, despite its industrial relevance, has been under-explored in the academic literat…
▽ More
The setlist identification (SLI) task addresses a music recognition use case where the goal is to retrieve the metadata and timestamps for all the tracks played in live music events. Due to various musical and non-musical changes in live performances, developing automatic SLI systems is still a challenging task that, despite its industrial relevance, has been under-explored in the academic literature. In this paper, we propose an end-to-end workflow that identifies relevant metadata and timestamps of live music performances using a version identification system. We compare 3 of such systems to investigate their suitability for this particular task. For developing and evaluating SLI systems, we also contribute a new dataset that contains 99.5h of concerts with annotated metadata and timestamps, along with the corresponding reference set. The dataset is categorized by audio qualities and genres to analyze the performance of SLI systems in different use cases. Our approach can identify 68% of the annotated segments, with values ranging from 35% to 77% based on the genre. Finally, we evaluate our approach against a database of 56.8k songs to illustrate the effect of expanding the reference set, where we can still identify 56% of the annotated segments.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
Understanding Cloud Workloads Performance in a Production like Environment
Authors:
Lucia Pons,
Josué Feliu,
José Puche,
Chaoyi Huang,
Salvador Petit,
Julio Pons,
María E. Gómez,
Julio Sahuquillo
Abstract:
Understanding inter-VM interference is of paramount importance to provide a sound knowledge and understand where performance degradation comes from in the current public cloud. With this aim, this paper devises a workload taxonomy that classifies applications according to how the major system resources affect their performance (e.g., tail latency) as a function of the level of load (e.g., QPS). Af…
▽ More
Understanding inter-VM interference is of paramount importance to provide a sound knowledge and understand where performance degradation comes from in the current public cloud. With this aim, this paper devises a workload taxonomy that classifies applications according to how the major system resources affect their performance (e.g., tail latency) as a function of the level of load (e.g., QPS). After that, we present three main studies addressing three major concerns to improve the cloud performance: impact of the level of load on performance, impact of hyper-threading on performance, and impact of limiting the major system resources (e.g., last level cache) on performance. In all these studies we identified important findings that we hope help cloud providers improve their system utilization.
△ Less
Submitted 10 October, 2020;
originally announced October 2020.
-
Less is more: Faster and better music version identification with embedding distillation
Authors:
Furkan Yesiler,
Joan Serrà,
Emilia Gómez
Abstract:
Version identification systems aim to detect different renditions of the same underlying musical composition (loosely called cover songs). By learning to encode entire recordings into plain vector embeddings, recent systems have made significant progress in bridging the gap between accuracy and scalability, which has been a key challenge for nearly two decades. In this work, we propose to further…
▽ More
Version identification systems aim to detect different renditions of the same underlying musical composition (loosely called cover songs). By learning to encode entire recordings into plain vector embeddings, recent systems have made significant progress in bridging the gap between accuracy and scalability, which has been a key challenge for nearly two decades. In this work, we propose to further narrow this gap by employing a set of data distillation techniques that reduce the embedding dimensionality of a pre-trained state-of-the-art model. We compare a wide range of techniques and propose new ones, from classical dimensionality reduction to more sophisticated distillation schemes. With those, we obtain 99% smaller embeddings that, moreover, yield up to a 3% accuracy increase. Such small embeddings can have an important impact in retrieval time, up to the point of making a real-world system practical on a standalone laptop.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
A Deep Learning Based Analysis-Synthesis Framework For Unison Singing
Authors:
Pritish Chandna,
Helena Cuesta,
Emilia Gómez
Abstract:
Unison singing is the name given to an ensemble of singers simultaneously singing the same melody and lyrics. While each individual singer in a unison sings the same principle melody, there are slight timing and pitch deviations between the singers, which, along with the ensemble of timbres, give the listener a perceived sense of "unison". In this paper, we present a study of unison singing in the…
▽ More
Unison singing is the name given to an ensemble of singers simultaneously singing the same melody and lyrics. While each individual singer in a unison sings the same principle melody, there are slight timing and pitch deviations between the singers, which, along with the ensemble of timbres, give the listener a perceived sense of "unison". In this paper, we present a study of unison singing in the context of choirs; utilising some recently proposed deep-learning based methodologies, we analyse the fundamental frequency (F0) distribution of the individual singers in recordings of unison mixtures. Based on the analysis, we propose a system for synthesising a unison signal from an a cappella input and a single voice prototype representative of a unison mixture. We use subjective listening tests to evaluate perceptual factors of our proposed system for synthesis, including quality, adherence to the melody as well the degree of perceived unison.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
Multiple F0 Estimation in Vocal Ensembles using Convolutional Neural Networks
Authors:
Helena Cuesta,
Brian McFee,
Emilia Gómez
Abstract:
This paper addresses the extraction of multiple F0 values from polyphonic and a cappella vocal performances using convolutional neural networks (CNNs). We address the major challenges of ensemble singing, i.e., all melodic sources are vocals and singers sing in harmony. We build upon an existing architecture to produce a pitch salience function of the input signal, where the harmonic constant-Q tr…
▽ More
This paper addresses the extraction of multiple F0 values from polyphonic and a cappella vocal performances using convolutional neural networks (CNNs). We address the major challenges of ensemble singing, i.e., all melodic sources are vocals and singers sing in harmony. We build upon an existing architecture to produce a pitch salience function of the input signal, where the harmonic constant-Q transform (HCQT) and its associated phase differentials are used as an input representation. The pitch salience function is subsequently thresholded to obtain a multiple F0 estimation output. For training, we build a dataset that comprises several multi-track datasets of vocal quartets with F0 annotations. This work proposes and evaluates a set of CNNs for this task in diverse scenarios and data configurations, including recordings with additional reverb. Our models outperform a state-of-the-art method intended for the same music genre when evaluated with an increased F0 resolution, as well as a general-purpose method for multi-F0 estimation. We conclude with a discussion on future research directions.
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
Exploring Artist Gender Bias in Music Recommendation
Authors:
Dougal Shakespeare,
Lorenzo Porcaro,
Emilia Gómez,
Carlos Castillo
Abstract:
Music Recommender Systems (mRS) are designed to give personalised and meaningful recommendations of items (i.e. songs, playlists or artists) to a user base, thereby reflecting and further complementing individual users' specific music preferences. Whilst accuracy metrics have been widely applied to evaluate recommendations in mRS literature, evaluating a user's item utility from other impact-orien…
▽ More
Music Recommender Systems (mRS) are designed to give personalised and meaningful recommendations of items (i.e. songs, playlists or artists) to a user base, thereby reflecting and further complementing individual users' specific music preferences. Whilst accuracy metrics have been widely applied to evaluate recommendations in mRS literature, evaluating a user's item utility from other impact-oriented perspectives, including their potential for discrimination, is still a novel evaluation practice in the music domain. In this work, we center our attention on a specific phenomenon for which we want to estimate if mRS may exacerbate its impact: gender bias. Our work presents an exploratory study, analyzing the extent to which commonly deployed state of the art Collaborative Filtering(CF) algorithms may act to further increase or decrease artist gender bias. To assess group biases introduced by CF, we deploy a recently proposed metric of bias disparity on two listening event datasets: the LFM-1b dataset, and the earlier constructed Celma's dataset. Our work traces the causes of disparity to variations in input gender distributions and user-item preferences, highlighting the effect such configurations can have on user's gender bias after recommendation generation.
△ Less
Submitted 6 October, 2020; v1 submitted 3 September, 2020;
originally announced September 2020.
-
Deep Learning Based Source Separation Applied To Choir Ensembles
Authors:
Darius Petermann,
Pritish Chandna,
Helena Cuesta,
Jordi Bonada,
Emilia Gomez
Abstract:
Choral singing is a widely practiced form of ensemble singing wherein a group of people sing simultaneously in polyphonic harmony. The most commonly practiced setting for choir ensembles consists of four parts; Soprano, Alto, Tenor and Bass (SATB), each with its own range of fundamental frequencies (F$0$s). The task of source separation for this choral setting entails separating the SATB mixture i…
▽ More
Choral singing is a widely practiced form of ensemble singing wherein a group of people sing simultaneously in polyphonic harmony. The most commonly practiced setting for choir ensembles consists of four parts; Soprano, Alto, Tenor and Bass (SATB), each with its own range of fundamental frequencies (F$0$s). The task of source separation for this choral setting entails separating the SATB mixture into the constituent parts. Source separation for musical mixtures is well studied and many deep learning based methodologies have been proposed for the same. However, most of the research has been focused on a typical case which consists in separating vocal, percussion and bass sources from a mixture, each of which has a distinct spectral structure. In contrast, the simultaneous and harmonic nature of ensemble singing leads to high structural similarity and overlap between the spectral components of the sources in a choral mixture, making source separation for choirs a harder task than the typical case. This, along with the lack of an appropriate consolidated dataset has led to a dearth of research in the field so far. In this paper we first assess how well some of the recently developed methodologies for musical source separation perform for the case of SATB choirs. We then propose a novel domain-specific adaptation for conditioning the recently proposed U-Net architecture for musical source separation using the fundamental frequency contour of each of the singing groups and demonstrate that our proposed approach surpasses results from domain-agnostic architectures.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Conditioned Source Separation for Music Instrument Performances
Authors:
Olga Slizovskaia,
Gloria Haro,
Emilia Gómez
Abstract:
In music source separation, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated. This leads to additional challenges in the source separation problem. This paper proposes a source separation method for multiple musical instruments sounding simultaneously and e…
▽ More
In music source separation, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated. This leads to additional challenges in the source separation problem. This paper proposes a source separation method for multiple musical instruments sounding simultaneously and explores how much additional information apart from the audio stream can lift the quality of source separation. We explore conditioning techniques at different levels of a primary source separation network and utilize two extra modalities of data, namely presence or absence of instruments in the mixture, and the corresponding video stream data.
△ Less
Submitted 7 July, 2021; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Vocoder-Based Speech Synthesis from Silent Videos
Authors:
Daniel Michelsanti,
Olga Slizovskaia,
Gloria Haro,
Emilia Gómez,
Zheng-Hua Tan,
Jesper Jensen
Abstract:
Both acoustic and visual information influence human perception of speech. For this reason, the lack of audio in a video sequence determines an extremely low speech intelligibility for untrained lip readers. In this paper, we present a way to synthesise speech from the silent video of a talker using deep learning. The system learns a mapping function from raw video frames to acoustic features and…
▽ More
Both acoustic and visual information influence human perception of speech. For this reason, the lack of audio in a video sequence determines an extremely low speech intelligibility for untrained lip readers. In this paper, we present a way to synthesise speech from the silent video of a talker using deep learning. The system learns a mapping function from raw video frames to acoustic features and reconstructs the speech with a vocoder synthesis algorithm. To improve speech reconstruction performance, our model is also trained to predict text information in a multi-task learning fashion and it is able to simultaneously reconstruct and recognise speech in real time. The results in terms of estimated speech quality and intelligibility show the effectiveness of our method, which exhibits an improvement over existing video-to-speech approaches.
△ Less
Submitted 15 August, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
Multi-channel U-Net for Music Source Separation
Authors:
Venkatesh S. Kadandale,
Juan F. Montesinos,
Gloria Haro,
Emilia Gómez
Abstract:
A fairly straightforward approach for music source separation is to train independent models, wherein each model is dedicated for estimating only a specific source. Training a single model to estimate multiple sources generally does not perform as well as the independent dedicated models. However, Conditioned U-Net (C-U-Net) uses a control mechanism to train a single model for multi-source separat…
▽ More
A fairly straightforward approach for music source separation is to train independent models, wherein each model is dedicated for estimating only a specific source. Training a single model to estimate multiple sources generally does not perform as well as the independent dedicated models. However, Conditioned U-Net (C-U-Net) uses a control mechanism to train a single model for multi-source separation and attempts to achieve a performance comparable to that of the dedicated models. We propose a multi-channel U-Net (M-U-Net) trained using a weighted multi-task loss as an alternative to the C-U-Net. We investigate two weighting strategies for our multi-task loss: 1) Dynamic Weighted Average (DWA), and 2) Energy Based Weighting (EBW). DWA determines the weights by tracking the rate of change of loss of each task during training. EBW aims to neutralize the effect of the training bias arising from the difference in energy levels of each of the sources in a mixture. Our methods provide three-fold advantages compared to C-UNet: 1) Fewer effective training iterations per epoch, 2) Fewer trainable network parameters (no control parameters), and 3) Faster processing at inference. Our methods achieve performance comparable to that of C-U-Net and the dedicated U-Nets at a much lower training cost.
△ Less
Submitted 4 September, 2020; v1 submitted 23 March, 2020;
originally announced March 2020.
-
Addressing multiple metrics of group fairness in data-driven decision making
Authors:
Marius Miron,
Songül Tolan,
Emilia Gómez,
Carlos Castillo
Abstract:
The Fairness, Accountability, and Transparency in Machine Learning (FAT-ML) literature proposes a varied set of group fairness metrics to measure discrimination against socio-demographic groups that are characterized by a protected feature, such as gender or race.Such a system can be deemed as either fair or unfair depending on the choice of the metric. Several metrics have been proposed, some of…
▽ More
The Fairness, Accountability, and Transparency in Machine Learning (FAT-ML) literature proposes a varied set of group fairness metrics to measure discrimination against socio-demographic groups that are characterized by a protected feature, such as gender or race.Such a system can be deemed as either fair or unfair depending on the choice of the metric. Several metrics have been proposed, some of them incompatible with each other.We do so empirically, by observing that several of these metrics cluster together in two or three main clusters for the same groups and machine learning methods. In addition, we propose a robust way to visualize multidimensional fairness in two dimensions through a Principal Component Analysis (PCA) of the group fairness metrics. Experimental results on multiple datasets show that the PCA decomposition explains the variance between the metrics with one to three components.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Content Based Singing Voice Extraction From a Musical Mixture
Authors:
Pritish Chandna,
Merlijn Blaauw,
Jordi Bonada,
Emilia Gomez
Abstract:
We present a deep learning based methodology for extracting the singing voice signal from a musical mixture based on the underlying linguistic content. Our model follows an encoder decoder architecture and takes as input the magnitude component of the spectrogram of a musical mixture with vocals. The encoder part of the model is trained via knowledge distillation using a teacher network to learn a…
▽ More
We present a deep learning based methodology for extracting the singing voice signal from a musical mixture based on the underlying linguistic content. Our model follows an encoder decoder architecture and takes as input the magnitude component of the spectrogram of a musical mixture with vocals. The encoder part of the model is trained via knowledge distillation using a teacher network to learn a content embedding, which is decoded to generate the corresponding vocoder features. Using this methodology, we are able to extract the unprocessed raw vocal signal from the mixture even for a processed mixture dataset with singers not seen during training. While the nature of our system makes it incongruous with traditional objective evaluation metrics, we use subjective evaluation via listening tests to compare the methodology to state-of-the-art deep learning based source separation algorithms. We also provide sound examples and source code for reproducibility.
△ Less
Submitted 17 February, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Artificial intelligence in medicine and healthcare: a review and classification of current and near-future applications and their ethical and social Impact
Authors:
Emilio Gómez-González,
Emilia Gomez,
Javier Márquez-Rivas,
Manuel Guerrero-Claro,
Isabel Fernández-Lizaranzu,
María Isabel Relimpio-López,
Manuel E. Dorado,
María José Mayorga-Buiza,
Guillermo Izquierdo-Ayuso,
Luis Capitán-Morales
Abstract:
This paper provides an overview of the current and near-future applications of Artificial Intelligence (AI) in Medicine and Health Care and presents a classification according to their ethical and societal aspects, potential benefits and pitfalls, and issues that can be considered controversial and are not deeply discussed in the literature.
This work is based on an analysis of the state of the…
▽ More
This paper provides an overview of the current and near-future applications of Artificial Intelligence (AI) in Medicine and Health Care and presents a classification according to their ethical and societal aspects, potential benefits and pitfalls, and issues that can be considered controversial and are not deeply discussed in the literature.
This work is based on an analysis of the state of the art of research and technology, including existing software, personal monitoring devices, genetic tests and editing tools, personalized digital models, online platforms, augmented reality devices, and surgical and companion robotics. Motivated by our review, we present and describe the notion of 'extended personalized medicine', we then review existing applications of AI in medicine and healthcare and explore the public perception of medical AI systems, and how they show, simultaneously, extraordinary opportunities and drawbacks that even question fundamental medical concepts. Many of these topics coincide with urgent priorities recently defined by the World Health Organization for the coming decade. In addition, we study the transformations of the roles of doctors and patients in an age of ubiquitous information, identify the risk of a division of Medicine into 'fake-based', 'patient-generated', and 'scientifically tailored', and draw the attention of some aspects that need further thorough analysis and public debate.
△ Less
Submitted 6 February, 2020; v1 submitted 22 January, 2020;
originally announced January 2020.
-
Measuring Diversity of Artificial Intelligence Conferences
Authors:
Ana Freire,
Lorenzo Porcaro,
Emilia Gómez
Abstract:
The lack of diversity of the Artificial Intelligence (AI) field is nowadays a concern, and several initiatives such as funding schemes and mentoring programs have been designed to overcome it. However, there is no indication on how these initiatives actually impact AI diversity in the short and long term. This work studies the concept of diversity in this particular context and proposes a small se…
▽ More
The lack of diversity of the Artificial Intelligence (AI) field is nowadays a concern, and several initiatives such as funding schemes and mentoring programs have been designed to overcome it. However, there is no indication on how these initiatives actually impact AI diversity in the short and long term. This work studies the concept of diversity in this particular context and proposes a small set of diversity indicators (i.e. indexes) of AI scientific events. These indicators are designed to quantify the diversity of the AI field and monitor its evolution. We consider diversity in terms of gender, geographical location and business (understood as the presence of academia versus industry). We compute these indicators for the different communities of a conference: authors, keynote speakers and organizing committee. From these components we compute a summarized diversity indicator for each AI event. We evaluate the proposed indexes for a set of recent major AI conferences and we discuss their values and limitations.
△ Less
Submitted 22 March, 2021; v1 submitted 20 January, 2020;
originally announced January 2020.
-
Neural Percussive Synthesis Parameterised by High-Level Timbral Features
Authors:
António Ramires,
Pritish Chandna,
Xavier Favory,
Emilia Gómez,
Xavier Serra
Abstract:
We present a deep neural network-based methodology for synthesising percussive sounds with control over high-level timbral characteristics of the sounds. This approach allows for intuitive control of a synthesizer, enabling the user to shape sounds without extensive knowledge of signal processing. We use a feedforward convolutional neural network-based architecture, which is able to map input para…
▽ More
We present a deep neural network-based methodology for synthesising percussive sounds with control over high-level timbral characteristics of the sounds. This approach allows for intuitive control of a synthesizer, enabling the user to shape sounds without extensive knowledge of signal processing. We use a feedforward convolutional neural network-based architecture, which is able to map input parameters to the corresponding waveform. We propose two datasets to evaluate our approach on both a restrictive context, and in one covering a broader spectrum of sounds. The timbral features used as parameters are taken from recent literature in signal processing. We also use these features for evaluation and validation of the presented model, to ensure that changing the input parameters produces a congruent waveform with the desired characteristics. Finally, we evaluate the quality of the output sound using a subjective listening test. We provide sound examples and the system's source code for reproducibility.
△ Less
Submitted 3 April, 2020; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Accurate and Scalable Version Identification Using Musically-Motivated Embeddings
Authors:
Furkan Yesiler,
Joan Serrà,
Emilia Gómez
Abstract:
The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE…
▽ More
The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy. It improves over previous work by employing an alternative input representation, and introducing a novel technique for temporal content summarization, a standardized latent space, and a data augmentation strategy specifically designed for VI. In addition to the main results, we perform an ablation study to highlight the importance of our design choices, and study the relation between embedding dimensionality and model performance.
△ Less
Submitted 13 April, 2020; v1 submitted 28 October, 2019;
originally announced October 2019.
-
The emotions that we perceive in music: the influence of language and lyrics comprehension on agreement
Authors:
Juan Sebastián Gómez Cañón,
Perfecto Herrera,
Emilia Gómez,
Estefanía Cano
Abstract:
In the present study, we address the relationship between the emotions perceived in pop and rock music (mainly in Euro-American styles with English lyrics) and the language spoken by the listener. Our goal is to understand the influence of lyrics comprehension on the perception of emotions and use this information to improve Music Emotion Recognition (MER) models. Two main research questions are a…
▽ More
In the present study, we address the relationship between the emotions perceived in pop and rock music (mainly in Euro-American styles with English lyrics) and the language spoken by the listener. Our goal is to understand the influence of lyrics comprehension on the perception of emotions and use this information to improve Music Emotion Recognition (MER) models. Two main research questions are addressed: 1. Are there differences and similarities between the emotions perceived in pop/rock music by listeners raised with different mother tongues? 2. Do personal characteristics have an influence on the perceived emotions for listeners of a given language? Personal characteristics include the listeners' general demographics, familiarity and preference for the fragments, and music sophistication. Our hypothesis is that inter-rater agreement (as defined by Krippendorff's alpha coefficient) from subjects is directly influenced by the comprehension of lyrics.
△ Less
Submitted 25 October, 2019; v1 submitted 12 September, 2019;
originally announced September 2019.
-
A Case Study of Deep-Learned Activations via Hand-Crafted Audio Features
Authors:
Olga Slizovskaia,
Emilia Gómez,
Gloria Haro
Abstract:
The explainability of Convolutional Neural Networks (CNNs) is a particularly challenging task in all areas of application, and it is notably under-researched in music and audio domain. In this paper, we approach explainability by exploiting the knowledge we have on hand-crafted audio features. Our study focuses on a well-defined MIR task, the recognition of musical instruments from user-generated…
▽ More
The explainability of Convolutional Neural Networks (CNNs) is a particularly challenging task in all areas of application, and it is notably under-researched in music and audio domain. In this paper, we approach explainability by exploiting the knowledge we have on hand-crafted audio features. Our study focuses on a well-defined MIR task, the recognition of musical instruments from user-generated music recordings. We compute the similarity between a set of traditional audio features and representations learned by CNNs. We also propose a technique for measuring the similarity between activation maps and audio features which typically presented in the form of a matrix, such as chromagrams or spectrograms. We observe that some neurons' activations correspond to well-known classical audio features. In particular, for shallow layers, we found similarities between activations and harmonic and percussive components of the spectrum. For deeper layers, we compare chromagrams with high-level activation maps as well as loudness and onset rate with deep-learned embeddings.
△ Less
Submitted 3 July, 2019;
originally announced July 2019.
-
A Framework for Multi-f0 Modeling in SATB Choir Recordings
Authors:
Helena Cuesta,
Emilia Gómez,
Pritish Chandna
Abstract:
Fundamental frequency (f0) modeling is an important but relatively unexplored aspect of choir singing. Performance evaluation as well as auditory analysis of singing, whether individually or in a choir, often depend on extracting f0 contours for the singing voice. However, due to the large number of singers, singing at a similar frequency range, extracting the exact individual pitch contours from…
▽ More
Fundamental frequency (f0) modeling is an important but relatively unexplored aspect of choir singing. Performance evaluation as well as auditory analysis of singing, whether individually or in a choir, often depend on extracting f0 contours for the singing voice. However, due to the large number of singers, singing at a similar frequency range, extracting the exact individual pitch contours from choir recordings is a challenging task. In this paper, we address this task and develop a methodology for modeling pitch contours of SATB choir recordings. A typical SATB choir consists of four parts, each covering a distinct range of pitches and often with multiple singers each. We first evaluate some state-of-the-art multi-f0 estimation systems for the particular case of choirs with a single singer per part, and observe that the pitch of individual singers can be estimated to a relatively high degree of accuracy. We observe, however, that the scenario of multiple singers for each choir part (i.e. unison singing) is far more challenging. In this work we propose a methodology based on combining a multi-f0 estimation methodology based on deep learning followed by a set of traditional DSP techniques to model f0 and its dispersion instead of a single f0 trajectory for each choir part. We present and discuss our observations and test our framework with different singer configurations.
△ Less
Submitted 10 April, 2019;
originally announced April 2019.