-
Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy
Authors:
Bogdan Kulynych,
Juan Felipe Gomez,
Georgios Kaissis,
Jamie Hayes,
Borja Balle,
Flavio du Pin Calmon,
Jean Louis Raisaro
Abstract:
Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks -- re-identification, attribute inference, and data reconstruction -- are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack su…
▽ More
Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks -- re-identification, attribute inference, and data reconstruction -- are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) consistent across a multitude of attack settings, and (2) tunable, enabling practitioners to evaluate risk with respect to arbitrary (including worst-case) levels of baseline risk. Empirically, our results are tighter than prior methods using $\varepsilon$-DP, Rényi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., more than 15pp accuracy increase in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.
△ Less
Submitted 9 July, 2025;
originally announced July 2025.
-
Performance of Confidential Computing GPUs
Authors:
Antonio Martínez Ibarra,
Julian James Stephen,
Aurora González Vidal,
K. R. Jayaram,
Antonio Fernando Skarmeta Gómez
Abstract:
This work examines latency, throughput, and other metrics when performing inference on confidential GPUs. We explore different traffic patterns and scheduling strategies using a single Virtual Machine with one NVIDIA H100 GPU, to perform relaxed batch inferences on multiple Large Language Models (LLMs), operating under the constraint of swapping models in and out of memory, which necessitates effi…
▽ More
This work examines latency, throughput, and other metrics when performing inference on confidential GPUs. We explore different traffic patterns and scheduling strategies using a single Virtual Machine with one NVIDIA H100 GPU, to perform relaxed batch inferences on multiple Large Language Models (LLMs), operating under the constraint of swapping models in and out of memory, which necessitates efficient control. The experiments simulate diverse real-world scenarios by varying parameters such as traffic load, traffic distribution patterns, scheduling strategies, and Service Level Agreement (SLA) requirements. The findings provide insights into the differences between confidential and non-confidential settings when performing inference in scenarios requiring active model swapping. Results indicate that in No-CC mode, relaxed batch inference with model swapping latency is 20-30% lower than in confidential mode. Additionally, SLA attainment is 15-20% higher in No-CC settings. Throughput in No-CC scenarios surpasses that of confidential mode by 45-70%, and GPU utilization is approximately 50% higher in No-CC environments. Overall, performance in the confidential setting is inferior to that in the No-CC scenario, primarily due to the additional encryption and decryption overhead required for loading models onto the GPU in confidential environments.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Experimental algorithms for the dualization problem
Authors:
Mauro Mezzini,
Fernando Cuartero Gomez,
Jose Javier Paulet Gonzalez,
Hernan Indibil de la Cruz Calvo,
Vicente Pascual,
Fernando L. Pelayo
Abstract:
In this paper, we present experimental algorithms for solving the dualization problem. We present the results of extensive experimentation comparing the execution time of various algorithms.
In this paper, we present experimental algorithms for solving the dualization problem. We present the results of extensive experimentation comparing the execution time of various algorithms.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
Comparison of Visual Trackers for Biomechanical Analysis of Running
Authors:
Luis F. Gomez,
Gonzalo Garrido-Lopez,
Julian Fierrez,
Aythami Morales,
Ruben Tolosana,
Javier Rueda,
Enrique Navarro
Abstract:
Human pose estimation has witnessed significant advancements in recent years, mainly due to the integration of deep learning models, the availability of a vast amount of data, and large computational resources. These developments have led to highly accurate body tracking systems, which have direct applications in sports analysis and performance evaluation.
This work analyzes the performance of s…
▽ More
Human pose estimation has witnessed significant advancements in recent years, mainly due to the integration of deep learning models, the availability of a vast amount of data, and large computational resources. These developments have led to highly accurate body tracking systems, which have direct applications in sports analysis and performance evaluation.
This work analyzes the performance of six trackers: two point trackers and four joint trackers for biomechanical analysis in sprints. The proposed framework compares the results obtained from these pose trackers with the manual annotations of biomechanical experts for more than 5870 frames. The experimental framework employs forty sprints from five professional runners, focusing on three key angles in sprint biomechanics: trunk inclination, hip flex extension, and knee flex extension. We propose a post-processing module for outlier detection and fusion prediction in the joint angles.
The experimental results demonstrate that using joint-based models yields root mean squared errors ranging from 11.41° to 4.37°. When integrated with the post-processing modules, these errors can be reduced to 6.99° and 3.88°, respectively. The experimental findings suggest that human pose tracking approaches can be valuable resources for the biomechanical analysis of running. However, there is still room for improvement in applications where high accuracy is required.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Optimizing Noise Distributions for Differential Privacy
Authors:
Atefeh Gilani,
Juan Felipe Gomez,
Shahab Asoodeh,
Flavio P. Calmon,
Oliver Kosut,
Lalitha Sankar
Abstract:
We propose a unified optimization framework for designing continuous and discrete noise distributions that ensure differential privacy (DP) by minimizing Rényi DP, a variant of DP, under a cost constraint. Rényi DP has the advantage that by considering different values of the Rényi parameter $α$, we can tailor our optimization for any number of compositions. To solve the optimization problem, we r…
▽ More
We propose a unified optimization framework for designing continuous and discrete noise distributions that ensure differential privacy (DP) by minimizing Rényi DP, a variant of DP, under a cost constraint. Rényi DP has the advantage that by considering different values of the Rényi parameter $α$, we can tailor our optimization for any number of compositions. To solve the optimization problem, we reduce it to a finite-dimensional convex formulation and perform preconditioned gradient descent. The resulting noise distributions are then compared to their Gaussian and Laplace counterparts. Numerical results demonstrate that our optimized distributions are consistently better, with significant improvements in $(\varepsilon, δ)$-DP guarantees in the moderate composition regimes, compared to Gaussian and Laplace distributions with the same variance.
△ Less
Submitted 9 June, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
-
$(\varepsilon, δ)$ Considered Harmful: Best Practices for Reporting Differential Privacy Guarantees
Authors:
Juan Felipe Gomez,
Bogdan Kulynych,
Georgios Kaissis,
Jamie Hayes,
Borja Balle,
Antti Honkela
Abstract:
Current practices for reporting the level of differential privacy (DP) guarantees for machine learning (ML) algorithms provide an incomplete and potentially misleading picture of the guarantees and make it difficult to compare privacy levels across different settings. We argue for using Gaussian differential privacy (GDP) as the primary means of communicating DP guarantees in ML, with the full pri…
▽ More
Current practices for reporting the level of differential privacy (DP) guarantees for machine learning (ML) algorithms provide an incomplete and potentially misleading picture of the guarantees and make it difficult to compare privacy levels across different settings. We argue for using Gaussian differential privacy (GDP) as the primary means of communicating DP guarantees in ML, with the full privacy profile as a secondary option in case GDP is too inaccurate. Unlike other widely used alternatives, GDP has only one parameter, which ensures easy comparability of guarantees, and it can accurately capture the full privacy profile of many important ML applications. To support our claims, we investigate the privacy profiles of state-of-the-art DP large-scale image classification, and the TopDown algorithm for the U.S. Decennial Census, observing that GDP fits the profiles remarkably well in all three cases. Although GDP is ideal for reporting the final guarantees, other formalisms (e.g., privacy loss random variables) are needed for accurate privacy accounting. We show that such intermediate representations can be efficiently converted to GDP with minimal loss in tightness.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Gemini Embedding: Generalizable Embeddings from Gemini
Authors:
Jinhyuk Lee,
Feiyang Chen,
Sahil Dua,
Daniel Cer,
Madhuri Shanbhogue,
Iftekhar Naim,
Gustavo Hernández Ábrego,
Zhe Li,
Kaifeng Chen,
Henrique Schechter Vera,
Xiaoqi Ren,
Shanfeng Zhang,
Daniel Salz,
Michael Boratko,
Jay Han,
Blair Chen,
Shuo Huang,
Vikram Rao,
Paul Suganthan,
Feng Han,
Andreas Doumanoglou,
Nithi Gupta,
Fedor Moiseev,
Cathy Yip,
Aashi Jain
, et al. (22 additional authors not shown)
Abstract:
In this report, we introduce Gemini Embedding, a state-of-the-art embedding model leveraging the power of Gemini, Google's most capable large language model. Capitalizing on Gemini's inherent multilingual and code understanding capabilities, Gemini Embedding produces highly generalizable embeddings for text spanning numerous languages and textual modalities. The representations generated by Gemini…
▽ More
In this report, we introduce Gemini Embedding, a state-of-the-art embedding model leveraging the power of Gemini, Google's most capable large language model. Capitalizing on Gemini's inherent multilingual and code understanding capabilities, Gemini Embedding produces highly generalizable embeddings for text spanning numerous languages and textual modalities. The representations generated by Gemini Embedding can be precomputed and applied to a variety of downstream tasks including classification, similarity, clustering, ranking, and retrieval. Evaluated on the Massive Multilingual Text Embedding Benchmark (MMTEB), which includes over one hundred tasks across 250+ languages, Gemini Embedding substantially outperforms prior state-of-the-art models, demonstrating considerable improvements in embedding quality. Achieving state-of-the-art performance across MMTEB's multilingual, English, and code benchmarks, our unified model demonstrates strong capabilities across a broad selection of tasks and surpasses specialized domain-specific models.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models
Authors:
Simeng Han,
Frank Palma Gomez,
Tu Vu,
Zefei Li,
Daniel Cer,
Hansi Zeng,
Chris Tar,
Arman Cohan,
Gustavo Hernandez Abrego
Abstract:
Traditional text embedding benchmarks primarily evaluate embedding models' capabilities to capture semantic similarity. However, more advanced NLP tasks require a deeper understanding of text, such as safety and factuality. These tasks demand an ability to comprehend and process complex information, often involving the handling of sensitive content, or the verification of factual statements agains…
▽ More
Traditional text embedding benchmarks primarily evaluate embedding models' capabilities to capture semantic similarity. However, more advanced NLP tasks require a deeper understanding of text, such as safety and factuality. These tasks demand an ability to comprehend and process complex information, often involving the handling of sensitive content, or the verification of factual statements against reliable sources. We introduce a new benchmark designed to assess and highlight the limitations of embedding models trained on existing information retrieval data mixtures on advanced capabilities, which include factuality, safety, instruction following, reasoning and document-level understanding. This benchmark includes a diverse set of tasks that simulate real-world scenarios where these capabilities are critical and leads to identification of the gaps of the currently advanced embedding models. Furthermore, we propose a novel method that reformulates these various tasks as retrieval tasks. By framing tasks like safety or factuality classification as retrieval problems, we leverage the strengths of retrieval models in capturing semantic relationships while also pushing them to develop a deeper understanding of context and content. Using this approach with single-task fine-tuning, we achieved performance gains of 8\% on factuality classification and 13\% on safety classification. Our code and data will be publicly available.
△ Less
Submitted 3 March, 2025; v1 submitted 23 February, 2025;
originally announced February 2025.
-
Dynamic safety cases for frontier AI
Authors:
Carmen Cârlan,
Francesca Gomez,
Yohan Mathew,
Ketana Krishna,
René King,
Peter Gebauer,
Ben R. Smith
Abstract:
Frontier artificial intelligence (AI) systems present both benefits and risks to society. Safety cases - structured arguments supported by evidence - are one way to help ensure the safe development and deployment of these systems. Yet the evolving nature of AI capabilities, as well as changes in the operational environment and understanding of risk, necessitates mechanisms for continuously updatin…
▽ More
Frontier artificial intelligence (AI) systems present both benefits and risks to society. Safety cases - structured arguments supported by evidence - are one way to help ensure the safe development and deployment of these systems. Yet the evolving nature of AI capabilities, as well as changes in the operational environment and understanding of risk, necessitates mechanisms for continuously updating these safety cases. Typically, in other sectors, safety cases are produced pre-deployment and do not require frequent updates post-deployment, which can be a manual, costly process. This paper proposes a Dynamic Safety Case Management System (DSCMS) to support both the initial creation of a safety case and its systematic, semi-automated revision over time. Drawing on methods developed in the autonomous vehicles (AV) sector - state-of-the-art Checkable Safety Arguments (CSA) combined with Safety Performance Indicators (SPIs) recommended by UL 4600, a DSCMS helps developers maintain alignment between system safety claims and the latest system state. We demonstrate this approach on a safety case template for offensive cyber capabilities and suggest ways it can be integrated into governance structures for safety-critical decision-making. While the correctness of the initial safety argument remains paramount - particularly for high-severity risks - a DSCMS provides a framework for adapting to new insights and strengthening incident response. We outline challenges and further work towards development and implementation of this approach as part of continuous safety assurance of frontier AI systems.
△ Less
Submitted 23 December, 2024;
originally announced December 2024.
-
Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data
Authors:
Ivan DeAndres-Tame,
Ruben Tolosana,
Pietro Melzi,
Ruben Vera-Rodriguez,
Minchul Kim,
Christian Rathgeb,
Xiaoming Liu,
Luis F. Gomez,
Aythami Morales,
Julian Fierrez,
Javier Ortega-Garcia,
Zhizhou Zhong,
Yuge Huang,
Yuxi Mi,
Shouhong Ding,
Shuigeng Zhou,
Shuai He,
Lingzhi Fu,
Heng Cong,
Rongyu Zhang,
Zhihong Xiao,
Evgeny Smirnov,
Anton Pimenov,
Aleksei Grigorev,
Denis Timoshenko
, et al. (34 additional authors not shown)
Abstract:
Synthetic data is gaining increasing popularity for face recognition technologies, mainly due to the privacy concerns and challenges associated with obtaining real data, including diverse scenarios, quality, and demographic groups, among others. It also offers some advantages over real data, such as the large amount of data that can be generated or the ability to customize it to adapt to specific…
▽ More
Synthetic data is gaining increasing popularity for face recognition technologies, mainly due to the privacy concerns and challenges associated with obtaining real data, including diverse scenarios, quality, and demographic groups, among others. It also offers some advantages over real data, such as the large amount of data that can be generated or the ability to customize it to adapt to specific problem-solving needs. To effectively use such data, face recognition models should also be specifically designed to exploit synthetic data to its fullest potential. In order to promote the proposal of novel Generative AI methods and synthetic data, and investigate the application of synthetic data to better train face recognition systems, we introduce the 2nd FRCSyn-onGoing challenge, based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. This is an ongoing challenge that provides researchers with an accessible platform to benchmark i) the proposal of novel Generative AI methods and synthetic data, and ii) novel face recognition systems that are specifically proposed to take advantage of synthetic data. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition such as demographic bias, domain adaptation, and performance constraints in demanding situations, such as age disparities between training and testing, changes in the pose, or occlusions. Very interesting findings are obtained in this second edition, including a direct comparison with the first one, in which synthetic databases were restricted to DCFace and GANDiffFace.
△ Less
Submitted 10 March, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Comprehensive Methodology for Sample Augmentation in EEG Biomarker Studies for Alzheimers Risk Classification
Authors:
Veronica Henao Isaza,
David Aguillon,
Carlos Andres Tobon Quintero,
Francisco Lopera,
John Fredy Ochoa Gomez
Abstract:
Background: Dementia, marked by cognitive decline, is a global health challenge. Alzheimer's disease (AD), the leading type, accounts for ~70% of cases. Electroencephalography (EEG) measures show promise in identifying AD risk, but obtaining large samples for reliable comparisons is challenging. Objective: This study integrates signal processing, harmonization, and statistical techniques to enhanc…
▽ More
Background: Dementia, marked by cognitive decline, is a global health challenge. Alzheimer's disease (AD), the leading type, accounts for ~70% of cases. Electroencephalography (EEG) measures show promise in identifying AD risk, but obtaining large samples for reliable comparisons is challenging. Objective: This study integrates signal processing, harmonization, and statistical techniques to enhance sample size and improve AD risk classification reliability. Methods: We used advanced EEG preprocessing, feature extraction, harmonization, and propensity score matching (PSM) to balance healthy non-carriers (HC) and asymptomatic E280A mutation carriers (ACr). Data from four databases were harmonized to adjust site effects while preserving covariates like age and sex. PSM ratios (2:1, 5:1, 10:1) were applied to assess sample size impact on model performance. The final dataset underwent machine learning analysis with decision trees and cross-validation for robust results. Results: Balancing sample sizes via PSM significantly improved classification accuracy, ranging from 0.92 to 0.96 across ratios. This approach enabled precise risk identification even with limited samples. Conclusion: Integrating data processing, harmonization, and balancing techniques improves AD risk classification accuracy, offering potential for other neurodegenerative diseases.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
VideoRun2D: Cost-Effective Markerless Motion Capture for Sprint Biomechanics
Authors:
Gonzalo Garrido-Lopez,
Luis F. Gomez,
Julian Fierrez,
Aythami Morales,
Ruben Tolosana,
Javier Rueda,
Enrique Navarro
Abstract:
Sprinting is a determinant ability, especially in team sports. The kinematics of the sprint have been studied in the past using different methods specially developed considering human biomechanics and, among those methods, markerless systems stand out as very cost-effective. On the other hand, we have now multiple general methods for pixel and body tracking based on recent machine learning breakth…
▽ More
Sprinting is a determinant ability, especially in team sports. The kinematics of the sprint have been studied in the past using different methods specially developed considering human biomechanics and, among those methods, markerless systems stand out as very cost-effective. On the other hand, we have now multiple general methods for pixel and body tracking based on recent machine learning breakthroughs with excellent performance in body tracking, but these excellent trackers do not generally consider realistic human biomechanics. This investigation first adapts two of these general trackers (MoveNet and CoTracker) for realistic biomechanical analysis and then evaluate them in comparison to manual tracking (with key points manually marked using the software Kinovea).
Our best resulting markerless body tracker particularly adapted for sprint biomechanics is termed VideoRun2D. The experimental development and assessment of VideoRun2D is reported on forty sprints recorded with a video camera from 5 different subjects, focusing our analysis in 3 key angles in sprint biomechanics: inclination of the trunk, flex extension of the hip and the knee. The CoTracker method showed huge differences compared to the manual labeling approach. However, the angle curves were correctly estimated by the MoveNet method, finding errors between 3.2° and 5.5°.
In conclusion, our proposed VideoRun2D based on MoveNet core seems to be a helpful tool for evaluating sprint kinematics in some scenarios. On the other hand, the observed precision of this first version of VideoRun2D as a markerless sprint analysis system may not be yet enough for highly demanding applications. Future research lines towards that purpose are also discussed at the end: better tracking post-processing and user- and time-dependent adaptation.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation with Application to e-Learning
Authors:
Roberto Daza,
Luis F. Gomez,
Julian Fierrez,
Aythami Morales,
Ruben Tolosana,
Javier Ortega-Garcia
Abstract:
This work introduces an innovative method for estimating attention levels (cognitive load) using an ensemble of facial analysis techniques applied to webcam videos. Our method is particularly useful, among others, in e-learning applications, so we trained, evaluated, and compared our approach on the mEBAL2 database, a public multi-modal database acquired in an e-learning environment. mEBAL2 compri…
▽ More
This work introduces an innovative method for estimating attention levels (cognitive load) using an ensemble of facial analysis techniques applied to webcam videos. Our method is particularly useful, among others, in e-learning applications, so we trained, evaluated, and compared our approach on the mEBAL2 database, a public multi-modal database acquired in an e-learning environment. mEBAL2 comprises data from 60 users who performed 8 different tasks. These tasks varied in difficulty, leading to changes in their cognitive loads. Our approach adapts state-of-the-art facial analysis technologies to quantify the users' cognitive load in the form of high or low attention. Several behavioral signals and physiological processes related to the cognitive load are used, such as eyeblink, heart rate, facial action units, and head pose, among others. Furthermore, we conduct a study to understand which individual features obtain better results, the most efficient combinations, explore local and global features, and how temporary time intervals affect attention level estimation, among other aspects. We find that global facial features are more appropriate for multimodal systems using score-level fusion, particularly as the temporal window increases. On the other hand, local features are more suitable for fusion through neural network training with score-level fusion approaches. Our method outperforms existing state-of-the-art accuracies using the public mEBAL2 benchmark.
△ Less
Submitted 14 August, 2024; v1 submitted 10 August, 2024;
originally announced August 2024.
-
Attack-Aware Noise Calibration for Differential Privacy
Authors:
Bogdan Kulynych,
Juan Felipe Gomez,
Georgios Kaissis,
Flavio du Pin Calmon,
Carmela Troncoso
Abstract:
Differential privacy (DP) is a widely used approach for mitigating privacy risks when training machine learning models on sensitive data. DP mechanisms add noise during training to limit the risk of information leakage. The scale of the added noise is critical, as it determines the trade-off between privacy and utility. The standard practice is to select the noise scale to satisfy a given privacy…
▽ More
Differential privacy (DP) is a widely used approach for mitigating privacy risks when training machine learning models on sensitive data. DP mechanisms add noise during training to limit the risk of information leakage. The scale of the added noise is critical, as it determines the trade-off between privacy and utility. The standard practice is to select the noise scale to satisfy a given privacy budget $\varepsilon$. This privacy budget is in turn interpreted in terms of operational attack risks, such as accuracy, sensitivity, and specificity of inference attacks aimed to recover information about the training data records. We show that first calibrating the noise scale to a privacy budget $\varepsilon$, and then translating ε to attack risk leads to overly conservative risk assessments and unnecessarily low utility. Instead, we propose methods to directly calibrate the noise scale to a desired attack risk level, bypassing the step of choosing $\varepsilon$. For a given notion of attack risk, our approach significantly decreases noise scale, leading to increased utility at the same level of privacy. We empirically demonstrate that calibrating noise to attack sensitivity/specificity, rather than $\varepsilon$, when training privacy-preserving ML models substantially improves model accuracy for the same risk level. Our work provides a principled and practical way to improve the utility of privacy-preserving ML without compromising on privacy. The code is available at https://github.com/Felipe-Gomez/riskcal
△ Less
Submitted 7 November, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Authors:
Frank Palma Gomez,
Ramon Sanabria,
Yun-hsuan Sung,
Daniel Cer,
Siddharth Dalmia,
Gustavo Hernandez Abrego
Abstract:
Large language models (LLMs) are trained on text-only data that go far beyond the languages with paired speech and text data. At the same time, Dual Encoder (DE) based retrieval systems project queries and documents into the same embedding space and have demonstrated their success in retrieval and bi-text mining. To match speech and text in many languages, we propose using LLMs to initialize multi…
▽ More
Large language models (LLMs) are trained on text-only data that go far beyond the languages with paired speech and text data. At the same time, Dual Encoder (DE) based retrieval systems project queries and documents into the same embedding space and have demonstrated their success in retrieval and bi-text mining. To match speech and text in many languages, we propose using LLMs to initialize multi-modal DE retrieval systems. Unlike traditional methods, our system doesn't require speech data during LLM pre-training and can exploit LLM's multilingual text understanding capabilities to match speech and text in languages unseen during retrieval training. Our multi-modal LLM-based retrieval system is capable of matching speech and text in 102 languages despite only training on 21 languages. Our system outperforms previous systems trained explicitly on all 102 languages. We achieve a 10% absolute improvement in Recall@1 averaged across these languages. Additionally, our model demonstrates cross-lingual speech and text matching, which is further enhanced by readily available machine translation data.
△ Less
Submitted 10 July, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Algorithmic Arbitrariness in Content Moderation
Authors:
Juan Felipe Gomez,
Caio Vieira Machado,
Lucas Monteiro Paes,
Flavio P. Calmon
Abstract:
Machine learning (ML) is widely used to moderate online content. Despite its scalability relative to human moderation, the use of ML introduces unique challenges to content moderation. One such challenge is predictive multiplicity: multiple competing models for content classification may perform equally well on average, yet assign conflicting predictions to the same content. This multiplicity can…
▽ More
Machine learning (ML) is widely used to moderate online content. Despite its scalability relative to human moderation, the use of ML introduces unique challenges to content moderation. One such challenge is predictive multiplicity: multiple competing models for content classification may perform equally well on average, yet assign conflicting predictions to the same content. This multiplicity can result from seemingly innocuous choices during model development, such as random seed selection for parameter initialization. We experimentally demonstrate how content moderation tools can arbitrarily classify samples as toxic, leading to arbitrary restrictions on speech. We discuss these findings in terms of human rights set out by the International Covenant on Civil and Political Rights (ICCPR), namely freedom of expression, non-discrimination, and procedural justice. We analyze (i) the extent of predictive multiplicity among state-of-the-art LLMs used for detecting toxic content; (ii) the disparate impact of this arbitrariness across social groups; and (iii) how model multiplicity compares to unambiguous human classifications. Our findings indicate that the up-scaled algorithmic moderation risks legitimizing an algorithmic leviathan, where an algorithm disproportionately manages human rights. To mitigate such risks, our study underscores the need to identify and increase the transparency of arbitrariness in content moderation applications. Since algorithmic content moderation is being fueled by pressing social concerns, such as disinformation and hate speech, our discussion on harms raises concerns relevant to policy debates. Our findings also contribute to content moderation and intermediary liability laws being discussed and passed in many countries, such as the Digital Services Act in the European Union, the Online Safety Act in the United Kingdom, and the Fake News Bill in Brazil.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Predicting Tweet Posting Behavior on Citizen Security: A Hawkes Point Process Analysis
Authors:
Cristian Pulido,
Francisco Gómez
Abstract:
The Perception of Security (PoS) refers to people's opinions about security or insecurity in a place or situation. While surveys have traditionally been the primary means to capture such perceptions, they need to be improved in their ability to offer real-time monitoring or predictive insights into future security perceptions. Recent evidence suggests that social network content can provide comple…
▽ More
The Perception of Security (PoS) refers to people's opinions about security or insecurity in a place or situation. While surveys have traditionally been the primary means to capture such perceptions, they need to be improved in their ability to offer real-time monitoring or predictive insights into future security perceptions. Recent evidence suggests that social network content can provide complementary insights into quantifying these perceptions. However, the challenge of accurately predicting these perceptions, with the capacity to anticipate them, still needs to be explored. This article introduces an innovative approach to PoS within short time frames using social network data. Our model incorporates external factors that influence the publication and reposting of content related to security perceptions. Our results demonstrate that this proposed model achieves competitive predictive performance and maintains a high degree of interpretability regarding the factors influencing security perceptions. This research contributes to understanding how temporal patterns and external factors impact the anticipation of security perceptions, providing valuable insights for proactive security planning.
△ Less
Submitted 4 July, 2025; v1 submitted 3 February, 2024;
originally announced February 2024.
-
PAD-Phys: Exploiting Physiology for Presentation Attack Detection in Face Biometrics
Authors:
Luis F. Gomez,
Julian Fierrez,
Aythami Morales,
Mahdi Ghafourian,
Ruben Tolosana,
Imanol Solano,
Alejandro Garcia,
Francisco Zamora-Martinez
Abstract:
Presentation Attack Detection (PAD) is a crucial stage in facial recognition systems to avoid leakage of personal information or spoofing of identity to entities. Recently, pulse detection based on remote photoplethysmography (rPPG) has been shown to be effective in face presentation attack detection.
This work presents three different approaches to the presentation attack detection based on rPP…
▽ More
Presentation Attack Detection (PAD) is a crucial stage in facial recognition systems to avoid leakage of personal information or spoofing of identity to entities. Recently, pulse detection based on remote photoplethysmography (rPPG) has been shown to be effective in face presentation attack detection.
This work presents three different approaches to the presentation attack detection based on rPPG: (i) The physiological domain, a domain using rPPG-based models, (ii) the Deepfakes domain, a domain where models were retrained from the physiological domain to specific Deepfakes detection tasks; and (iii) a new Presentation Attack domain was trained by applying transfer learning from the two previous domains to improve the capability to differentiate between bona-fides and attacks.
The results show the efficiency of the rPPG-based models for presentation attack detection, evidencing a 21.70% decrease in average classification error rate (ACER) (from 41.03% to 19.32%) when the presentation attack domain is compared to the physiological and Deepfakes domains. Our experiments highlight the efficiency of transfer learning in rPPG-based models and perform well in presentation attack detection in instruments that do not allow copying of this physiological feature.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
A polynomial quantum computing algorithm for solving the dualization problem
Authors:
Mauro Mezzini,
Fernando Cuartero Gomez,
Fernando Pelayo,
Jose Javier Paulet Gonzales,
Hernan Indibil de la Cruz Calvo,
Vicente Pascual
Abstract:
Given two prime monotone boolean functions $f:\{0,1\}^n \to \{0,1\}$ and $g:\{0,1\}^n \to \{0,1\}$ the dualization problem consists in determining if $g$ is the dual of $f$, that is if $f(x_1, \dots, x_n)= \overline{g}(\overline{x_1}, \dots \overline{x_n})$ for all $(x_1, \dots x_n) \in \{0,1\}^n$. Associated to the dualization problem there is the corresponding decision problem: given two monoton…
▽ More
Given two prime monotone boolean functions $f:\{0,1\}^n \to \{0,1\}$ and $g:\{0,1\}^n \to \{0,1\}$ the dualization problem consists in determining if $g$ is the dual of $f$, that is if $f(x_1, \dots, x_n)= \overline{g}(\overline{x_1}, \dots \overline{x_n})$ for all $(x_1, \dots x_n) \in \{0,1\}^n$. Associated to the dualization problem there is the corresponding decision problem: given two monotone prime boolean functions $f$ and $g$ is $g$ the dual of $f$? In this paper we present a quantum computing algorithm that solves the decision version of the dualization problem in polynomial time.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Bayesian Flow Networks
Authors:
Alex Graves,
Rupesh Kumar Srivastava,
Timothy Atkinson,
Faustino Gomez
Abstract:
This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input to a neural network that outputs a second, interdependent distribution. Starting from a simple prior and iteratively updating the two distributions yields a ge…
▽ More
This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input to a neural network that outputs a second, interdependent distribution. Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models; however it is conceptually simpler in that no forward process is required. Discrete and continuous-time loss functions are derived for continuous, discretised and discrete data, along with sample generation procedures. Notably, the network inputs for discrete data lie on the probability simplex, and are therefore natively differentiable, paving the way for gradient-based sample guidance and few-step generation in discrete domains such as language modelling. The loss function directly optimises data compression and places no restrictions on the network architecture. In our experiments BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level language modelling task.
△ Less
Submitted 11 March, 2025; v1 submitted 14 August, 2023;
originally announced August 2023.
-
FAIR EVA: Bringing institutional multidisciplinary repositories into the FAIR picture
Authors:
Fernando Aguilar Gómez,
Isabel Bernal
Abstract:
The FAIR Principles are a set of good practices to improve the reproducibility and quality of data in an Open Science context. Different sets of indicators have been proposed to evaluate the FAIRness of digital objects, including datasets that are usually stored in repositories or data portals. However, indicators like those proposed by the Research Data Alliance are provided from a high-level per…
▽ More
The FAIR Principles are a set of good practices to improve the reproducibility and quality of data in an Open Science context. Different sets of indicators have been proposed to evaluate the FAIRness of digital objects, including datasets that are usually stored in repositories or data portals. However, indicators like those proposed by the Research Data Alliance are provided from a high-level perspective that can be interpreted and they are not always realistic to particular environments like multidisciplinary repositories. This paper describes FAIR EVA, a new tool developed within the European Open Science Cloud context that is oriented to particular data management systems like open repositories, which can be customized to a specific case in a scalable and automatic environment. It aims to be adaptive enough to work for different environments, repository software and disciplines, taking into account the flexibility of the FAIR Principles. As an example, we present DIGITAL.CSIC repository as the first target of the tool, gathering the particular needs of a multidisciplinary institution as well as its institutional repository.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Counterfactual Explanations and Predictive Models to Enhance Clinical Decision-Making in Schizophrenia using Digital Phenotyping
Authors:
Juan Sebastian Canas,
Francisco Gomez,
Omar Costilla-Reyes
Abstract:
Clinical practice in psychiatry is burdened with the increased demand for healthcare services and the scarce resources available. New paradigms of health data powered with machine learning techniques could open the possibility to improve clinical workflow in critical stages of clinical assessment and treatment in psychiatry. In this work, we propose a machine learning system capable of predicting,…
▽ More
Clinical practice in psychiatry is burdened with the increased demand for healthcare services and the scarce resources available. New paradigms of health data powered with machine learning techniques could open the possibility to improve clinical workflow in critical stages of clinical assessment and treatment in psychiatry. In this work, we propose a machine learning system capable of predicting, detecting, and explaining individual changes in symptoms of patients with Schizophrenia by using behavioral digital phenotyping data. We forecast symptoms of patients with an error rate below 10%. The system detects decreases in symptoms using changepoint algorithms and uses counterfactual explanations as a recourse in a simulated continuous monitoring scenario in healthcare. Overall, this study offers valuable insights into the performance and potential of counterfactual explanations, predictive models, and change-point detection within a simulated clinical workflow. These findings lay the foundation for further research to explore additional facets of the workflow, aiming to enhance its effectiveness and applicability in real-world healthcare settings. By leveraging these components, the goal is to develop an actionable, interpretable, and trustworthy integrative decision support system that combines real-time clinical assessments with sensor-based inputs.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
CACTUS: A Computational Framework for Generating Realistic White Matter Microstructure Substrates
Authors:
Juan Luis Villarreal-Haro,
Remy Gardier,
Erick J Canales-Rodriguez,
Elda Fischi Gomez,
Gabriel Girard,
Jean-Philippe Thiran,
Jonathan Rafael-Patino
Abstract:
Monte-Carlo diffusion simulations are a powerful tool for validating tissue microstructure models by generating synthetic diffusion-weighted magnetic resonance images (DW-MRI) in controlled environments. This is fundamental for understanding the link between micrometre-scale tissue properties and DW-MRI signals measured at the millimetre-scale, optimising acquisition protocols to target microstruc…
▽ More
Monte-Carlo diffusion simulations are a powerful tool for validating tissue microstructure models by generating synthetic diffusion-weighted magnetic resonance images (DW-MRI) in controlled environments. This is fundamental for understanding the link between micrometre-scale tissue properties and DW-MRI signals measured at the millimetre-scale, optimising acquisition protocols to target microstructure properties of interest, and exploring the robustness and accuracy of estimation methods. However, accurate simulations require substrates that reflect the main microstructural features of the studied tissue. To address this challenge, we introduce a novel computational workflow, CACTUS (Computational Axonal Configurator for Tailored and Ultradense Substrates), for generating synthetic white matter substrates. Our approach allows constructing substrates with higher packing density than existing methods, up to 95 % intra-axonal volume fraction, and larger voxel sizes of up to (500um) 3 with rich fibre complexity. CACTUS generates bundles with angular dispersion, bundle crossings, and variations along the fibres of their inner and outer radii and g-ratio. We achieve this by introducing a novel global cost function and a fibre radial growth approach that allows substrates to match predefined targeted characteristics and mirror those reported in histological studies. CACTUS improves the development of complex synthetic substrates, paving the way for future applications in microstructure imaging.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Toward Face Biometric De-identification using Adversarial Examples
Authors:
Mahdi Ghafourian,
Julian Fierrez,
Luis Felipe Gomez,
Ruben Vera-Rodriguez,
Aythami Morales,
Zohra Rezgui,
Raymond Veldhuis
Abstract:
The remarkable success of face recognition (FR) has endangered the privacy of internet users particularly in social media. Recently, researchers turned to use adversarial examples as a countermeasure. In this paper, we assess the effectiveness of using two widely known adversarial methods (BIM and ILLC) for de-identifying personal images. We discovered, unlike previous claims in the literature, th…
▽ More
The remarkable success of face recognition (FR) has endangered the privacy of internet users particularly in social media. Recently, researchers turned to use adversarial examples as a countermeasure. In this paper, we assess the effectiveness of using two widely known adversarial methods (BIM and ILLC) for de-identifying personal images. We discovered, unlike previous claims in the literature, that it is not easy to get a high protection success rate (suppressing identification rate) with imperceptible adversarial perturbation to the human visual system. Finally, we found out that the transferability of adversarial examples is highly affected by the training parameters of the network with which they are generated.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
MATT: Multimodal Attention Level Estimation for e-learning Platforms
Authors:
Roberto Daza,
Luis F. Gomez,
Aythami Morales,
Julian Fierrez,
Ruben Tolosana,
Ruth Cobos,
Javier Ortega-Garcia
Abstract:
This work presents a new multimodal system for remote attention level estimation based on multimodal face analysis. Our multimodal approach uses different parameters and signals obtained from the behavior and physiological processes that have been related to modeling cognitive load such as faces gestures (e.g., blink rate, facial actions units) and user actions (e.g., head pose, distance to the ca…
▽ More
This work presents a new multimodal system for remote attention level estimation based on multimodal face analysis. Our multimodal approach uses different parameters and signals obtained from the behavior and physiological processes that have been related to modeling cognitive load such as faces gestures (e.g., blink rate, facial actions units) and user actions (e.g., head pose, distance to the camera). The multimodal system uses the following modules based on Convolutional Neural Networks (CNNs): Eye blink detection, head pose estimation, facial landmark detection, and facial expression features. First, we individually evaluate the proposed modules in the task of estimating the student's attention level captured during online e-learning sessions. For that we trained binary classifiers (high or low attention) based on Support Vector Machines (SVM) for each module. Secondly, we find out to what extent multimodal score level fusion improves the attention level estimation. The mEBAL database is used in the experimental framework, a public multi-modal database for attention level estimation obtained in an e-learning environment that contains data from 38 users while conducting several e-learning tasks of variable difficulty (creating changes in student cognitive loads).
△ Less
Submitted 22 January, 2023;
originally announced January 2023.
-
edBB-Demo: Biometrics and Behavior Analysis for Online Educational Platforms
Authors:
Roberto Daza,
Aythami Morales,
Ruben Tolosana,
Luis F. Gomez,
Julian Fierrez,
Javier Ortega-Garcia
Abstract:
We present edBB-Demo, a demonstrator of an AI-powered research platform for student monitoring in remote education. The edBB platform aims to study the challenges associated to user recognition and behavior understanding in digital platforms. This platform has been developed for data collection, acquiring signals from a variety of sensors including keyboard, mouse, webcam, microphone, smartwatch,…
▽ More
We present edBB-Demo, a demonstrator of an AI-powered research platform for student monitoring in remote education. The edBB platform aims to study the challenges associated to user recognition and behavior understanding in digital platforms. This platform has been developed for data collection, acquiring signals from a variety of sensors including keyboard, mouse, webcam, microphone, smartwatch, and an Electroencephalography band. The information captured from the sensors during the student sessions is modelled in a multimodal learning framework. The demonstrator includes: i) Biometric user authentication in an unsupervised environment; ii) Human action recognition based on remote video analysis; iii) Heart rate estimation from webcam video; and iv) Attention level estimation from facial expression analysis.
△ Less
Submitted 5 December, 2022; v1 submitted 16 November, 2022;
originally announced November 2022.
-
The Saddle-Point Accountant for Differential Privacy
Authors:
Wael Alghamdi,
Shahab Asoodeh,
Flavio P. Calmon,
Juan Felipe Gomez,
Oliver Kosut,
Lalitha Sankar,
Fei Wei
Abstract:
We introduce a new differential privacy (DP) accountant called the saddle-point accountant (SPA). SPA approximates privacy guarantees for the composition of DP mechanisms in an accurate and fast manner. Our approach is inspired by the saddle-point method -- a ubiquitous numerical technique in statistics. We prove rigorous performance guarantees by deriving upper and lower bounds for the approximat…
▽ More
We introduce a new differential privacy (DP) accountant called the saddle-point accountant (SPA). SPA approximates privacy guarantees for the composition of DP mechanisms in an accurate and fast manner. Our approach is inspired by the saddle-point method -- a ubiquitous numerical technique in statistics. We prove rigorous performance guarantees by deriving upper and lower bounds for the approximation error offered by SPA. The crux of SPA is a combination of large-deviation methods with central limit theorems, which we derive via exponentially tilting the privacy loss random variables corresponding to the DP mechanisms. One key advantage of SPA is that it runs in constant time for the $n$-fold composition of a privacy mechanism. Numerical experiments demonstrate that SPA achieves comparable accuracy to state-of-the-art accounting methods with a faster runtime.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
A note on averaging prediction accuracy, Green's functions and other kernels
Authors:
J. Galvis,
Freddy Hernández-Romero,
Francisco Gómez
Abstract:
We present the mathematical context of the predictive accuracy index and then introduce the definition of integral average transform. We establish the relation of our definition with two variables kernels $K({\bf y},{\bf x})$. As an example of an application we show that integrating against the fundamental solution of the Laplace operator, that is, solving the Poisson equation, can be re-interpret…
▽ More
We present the mathematical context of the predictive accuracy index and then introduce the definition of integral average transform. We establish the relation of our definition with two variables kernels $K({\bf y},{\bf x})$. As an example of an application we show that integrating against the fundamental solution of the Laplace operator, that is, solving the Poisson equation, can be re-interpreted as an integral of averages of the forcing term over balls. As a result, we obtained a novel integral representation of the solution of the Poisson equation. Our motivation comes from the need for a better mathematical understanding of the prediction accuracy index. This index is used to identify hot spots in predictive security and other applications.
△ Less
Submitted 6 December, 2021; v1 submitted 15 November, 2021;
originally announced November 2021.
-
FaceQvec: Vector Quality Assessment for Face Biometrics based on ISO Compliance
Authors:
Javier Hernandez-Ortega,
Julian Fierrez,
Luis F. Gomez,
Aythami Morales,
Jose Luis Gonzalez-de-Suso,
Francisco Zamora-Martinez
Abstract:
In this paper we develop FaceQvec, a software component for estimating the conformity of facial images with each of the points contemplated in the ISO/IEC 19794-5, a quality standard that defines general quality guidelines for face images that would make them acceptable or unacceptable for use in official documents such as passports or ID cards. This type of tool for quality assessment can help to…
▽ More
In this paper we develop FaceQvec, a software component for estimating the conformity of facial images with each of the points contemplated in the ISO/IEC 19794-5, a quality standard that defines general quality guidelines for face images that would make them acceptable or unacceptable for use in official documents such as passports or ID cards. This type of tool for quality assessment can help to improve the accuracy of face recognition, as well as to identify which factors are affecting the quality of a given face image and to take actions to eliminate or reduce those factors, e.g., with postprocessing techniques or re-acquisition of the image. FaceQvec consists of the automation of 25 individual tests related to different points contemplated in the aforementioned standard, as well as other characteristics of the images that have been considered to be related to facial quality. We first include the results of the quality tests evaluated on a development dataset captured under realistic conditions. We used those results to adjust the decision threshold of each test. Then we checked again their accuracy on a evaluation database that contains new face images not seen during development. The evaluation results demonstrate the accuracy of the individual tests for checking compliance with ISO/IEC 19794-5. FaceQvec is available online (https://github.com/uam-biometrics/FaceQvec).
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Automatic design of novel potential 3CL$^{\text{pro}}$ and PL$^{\text{pro}}$ inhibitors
Authors:
Timothy Atkinson,
Saeed Saremi,
Faustino Gomez,
Jonathan Masci
Abstract:
With the goal of designing novel inhibitors for SARS-CoV-1 and SARS-CoV-2, we propose the general molecule optimization framework, Molecular Neural Assay Search (MONAS), consisting of three components: a property predictor which identifies molecules with specific desirable properties, an energy model which approximates the statistical similarity of a given molecule to known training molecules, and…
▽ More
With the goal of designing novel inhibitors for SARS-CoV-1 and SARS-CoV-2, we propose the general molecule optimization framework, Molecular Neural Assay Search (MONAS), consisting of three components: a property predictor which identifies molecules with specific desirable properties, an energy model which approximates the statistical similarity of a given molecule to known training molecules, and a molecule search method. In this work, these components are instantiated with graph neural networks (GNNs), Deep Energy Estimator Networks (DEEN) and Monte Carlo tree search (MCTS), respectively. This implementation is used to identify 120K molecules (out of 40-million explored) which the GNN determined to be likely SARS-CoV-1 inhibitors, and, at the same time, are statistically close to the dataset used to train the GNN.
△ Less
Submitted 29 January, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Look here! A parametric learning based approach to redirect visual attention
Authors:
Youssef Alami Mejjati,
Celso F. Gomez,
Kwang In Kim,
Eli Shechtman,
Zoya Bylinskii
Abstract:
Across photography, marketing, and website design, being able to direct the viewer's attention is a powerful tool. Motivated by professional workflows, we introduce an automatic method to make an image region more attention-capturing via subtle image edits that maintain realism and fidelity to the original. From an input image and a user-provided mask, our GazeShiftNet model predicts a distinct se…
▽ More
Across photography, marketing, and website design, being able to direct the viewer's attention is a powerful tool. Motivated by professional workflows, we introduce an automatic method to make an image region more attention-capturing via subtle image edits that maintain realism and fidelity to the original. From an input image and a user-provided mask, our GazeShiftNet model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions separately. We present the results of quantitative and qualitative experiments that demonstrate improvements over prior state-of-the-art. In contrast to existing attention shifting algorithms, our global parametric approach better preserves image semantics and avoids typical generative artifacts. Our edits enable inference at interactive rates on any image size, and easily generalize to videos. Extensions of our model allow for multi-style edits and the ability to both increase and attenuate attention in an image region. Furthermore, users can customize the edited images by dialing the edits up or down via interpolations in parameter space. This paper presents a practical tool that can simplify future image editing pipelines.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
Wiener Filter for Short-Reach Fiber-Optic Links
Authors:
Daniel Plabst,
Francisco Javier García Gómez,
Thomas Wiegart,
Norbert Hanik
Abstract:
Analytic expressions are derived for the Wiener filter (WF), also known as the linear minimum mean square error (LMMSE) estimator, for an intensity-modulation/direct-detection (IM/DD) short-haul fiber-optic communication system. The link is purely dispersive and the nonlinear square-law detector (SLD) operates at the thermal noise limit. The achievable rates of geometrically shaped PAM constellati…
▽ More
Analytic expressions are derived for the Wiener filter (WF), also known as the linear minimum mean square error (LMMSE) estimator, for an intensity-modulation/direct-detection (IM/DD) short-haul fiber-optic communication system. The link is purely dispersive and the nonlinear square-law detector (SLD) operates at the thermal noise limit. The achievable rates of geometrically shaped PAM constellations are substantially increased by taking the SLD into account as compared to a WF that ignores the SLD.
△ Less
Submitted 5 August, 2020; v1 submitted 25 April, 2020;
originally announced April 2020.
-
Safe Interactive Model-Based Learning
Authors:
Marco Gallieri,
Seyed Sina Mirrazavi Salehian,
Nihat Engin Toklu,
Alessio Quaglino,
Jonathan Masci,
Jan Koutník,
Faustino Gomez
Abstract:
Control applications present hard operational constraints. A violation of these can result in unsafe behavior. This paper introduces Safe Interactive Model Based Learning (SiMBL), a framework to refine an existing controller and a system model while operating on the real environment. SiMBL is composed of the following trainable components: a Lyapunov function, which determines a safe set; a safe c…
▽ More
Control applications present hard operational constraints. A violation of these can result in unsafe behavior. This paper introduces Safe Interactive Model Based Learning (SiMBL), a framework to refine an existing controller and a system model while operating on the real environment. SiMBL is composed of the following trainable components: a Lyapunov function, which determines a safe set; a safe control policy; and a Bayesian RNN forward model. A min-max control framework, based on alternate minimisation and backpropagation through the forward model, is used for the offline computation of the controller and the safe set. Safety is formally verified a-posteriori with a probabilistic method that utilizes the Noise Contrastive Priors (NPC) idea to build a Bayesian RNN forward model with an additive state uncertainty estimate which is large outside the training data distribution. Iterative refinement of the model and the safe set is achieved thanks to a novel loss that conditions the uncertainty estimates of the new model to be close to the current one. The learned safe set and model can also be used for safe exploration, i.e., to collect data within the safe invariant set, for which a simple one-step MPC is proposed. The single components are tested on the simulation of an inverted pendulum with limited torque and stability region, showing that iteratively adding more data can improve the model, the controller and the size of the safe region.
△ Less
Submitted 18 November, 2019; v1 submitted 15 November, 2019;
originally announced November 2019.
-
Fast and Provable ADMM for Learning with Generative Priors
Authors:
Fabian Latorre Gómez,
Armin Eftekhari,
Volkan Cevher
Abstract:
In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (G…
▽ More
In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (GANs) in tasks like compressive sensing, denoising and robustness against adversarial examples. The derived rates for our algorithm are characterized in terms of certain geometric properties of the generator network, which we show hold for feedforward architectures, under mild assumptions. Unlike gradient descent (GD), it can efficiently handle non-smooth objectives as well as exploit efficient partial minimization procedures, thus being faster in many practical scenarios.
△ Less
Submitted 7 July, 2019;
originally announced July 2019.
-
Model-Based Active Exploration
Authors:
Pranav Shyam,
Wojciech Jaśkowski,
Faustino Gomez
Abstract:
Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events. This is carried out by optimizing agent beh…
▽ More
Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events. This is carried out by optimizing agent behaviour with respect to a measure of novelty derived from the Bayesian perspective of exploration, which is estimated using the disagreement between the futures predicted by the ensemble members. We show empirically that in semi-random discrete environments where directed exploration is critical to make progress, MAX is at least an order of magnitude more efficient than strong baselines. MAX scales to high-dimensional continuous environments where it builds task-agnostic models that can be used for any downstream task.
△ Less
Submitted 13 June, 2019; v1 submitted 29 October, 2018;
originally announced October 2018.
-
Natural Language Processing for Music Knowledge Discovery
Authors:
Sergio Oramas,
Luis Espinosa-Anke,
Francisco Gómez,
Xavier Serra
Abstract:
Today, a massive amount of musical knowledge is stored in written form, with testimonies dated as far back as several centuries ago. In this work, we present different Natural Language Processing (NLP) approaches to harness the potential of these text collections for automatic music knowledge discovery, covering different phases in a prototypical NLP pipeline, namely corpus compilation, text-minin…
▽ More
Today, a massive amount of musical knowledge is stored in written form, with testimonies dated as far back as several centuries ago. In this work, we present different Natural Language Processing (NLP) approaches to harness the potential of these text collections for automatic music knowledge discovery, covering different phases in a prototypical NLP pipeline, namely corpus compilation, text-mining, information extraction, knowledge graph generation and sentiment analysis. Each of these approaches is presented alongside different use cases (i.e., flamenco, Renaissance and popular music) where large collections of documents are processed, and conclusions stemming from data-driven analyses are presented and discussed.
△ Less
Submitted 5 July, 2018;
originally announced July 2018.
-
NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations
Authors:
Marco Ciccone,
Marco Gallieri,
Jonathan Masci,
Christian Osendorfer,
Faustino Gomez
Abstract:
This paper introduces Non-Autonomous Input-Output Stable Network(NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system. Non-autonomy is implemented by skip connections from the block input to each of the unrolled processing stages and allows stability to be enforced so that blocks can be unrolled adaptively to a pat…
▽ More
This paper introduces Non-Autonomous Input-Output Stable Network(NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system. Non-autonomy is implemented by skip connections from the block input to each of the unrolled processing stages and allows stability to be enforced so that blocks can be unrolled adaptively to a pattern-dependent processing depth. NAIS-Net induces non-trivial, Lipschitz input-output maps, even for an infinite unroll length. We prove that the network is globally asymptotically stable so that for every initial condition there is exactly one input-dependent equilibrium assuming $tanh$ units, and incrementally stable for ReL units. An efficient implementation that enforces the stability under derived conditions for both fully-connected and convolutional layers is also presented. Experimental results show how NAIS-Net exhibits stability in practice, yielding a significant reduction in generalization gap compared to ResNets.
△ Less
Submitted 21 May, 2021; v1 submitted 19 April, 2018;
originally announced April 2018.
-
Melodic Contour and Mid-Level Global Features Applied to the Analysis of Flamenco Cantes
Authors:
Francisco Gómez,
Joaquín Mora,
Emilia Gómez,
José Miguel Díaz-Báñez
Abstract:
This work focuses on the topic of melodic characterization and similarity in a specific musical repertoire: a cappella flamenco singing, more specifically in debla and martinete styles. We propose the combination of manual and automatic description. First, we use a state-of-the-art automatic transcription method to account for general melodic similarity from music recordings. Second, we define a s…
▽ More
This work focuses on the topic of melodic characterization and similarity in a specific musical repertoire: a cappella flamenco singing, more specifically in debla and martinete styles. We propose the combination of manual and automatic description. First, we use a state-of-the-art automatic transcription method to account for general melodic similarity from music recordings. Second, we define a specific set of representative mid-level melodic features, which are manually labeled by flamenco experts. Both approaches are then contrasted and combined into a global similarity measure. This similarity measure is assessed by inspecting the clusters obtained through phylogenetic algorithms algorithms and by relating similarity to categorization in terms of style. Finally, we discuss the advantage of combining automatic and expert annotations as well as the need to include repertoire-specific descriptions for meaningful melodic characterization in traditional music collections.
△ Less
Submitted 16 September, 2015;
originally announced September 2015.
-
Understanding Locally Competitive Networks
Authors:
Rupesh Kumar Srivastava,
Jonathan Masci,
Faustino Gomez,
Jürgen Schmidhuber
Abstract:
Recently proposed neural network activation functions such as rectified linear, maxout, and local winner-take-all have allowed for faster and more effective training of deep neural architectures on large and complex datasets. The common trait among these functions is that they implement local competition between small groups of computational units within a layer, so that only part of the network i…
▽ More
Recently proposed neural network activation functions such as rectified linear, maxout, and local winner-take-all have allowed for faster and more effective training of deep neural architectures on large and complex datasets. The common trait among these functions is that they implement local competition between small groups of computational units within a layer, so that only part of the network is activated for any given input pattern. In this paper, we attempt to visualize and understand this self-modularization, and suggest a unified explanation for the beneficial properties of such networks. We also show how our insights can be directly useful for efficiently performing retrieval over large datasets using neural networks.
△ Less
Submitted 8 April, 2015; v1 submitted 5 October, 2014;
originally announced October 2014.
-
Deep Networks with Internal Selective Attention through Feedback Connections
Authors:
Marijn Stollenga,
Jonathan Masci,
Faustino Gomez,
Juergen Schmidhuber
Abstract:
Traditional convolutional neural networks (CNN) are stationary and feedforward. They neither change their parameters during evaluation nor use feedback from higher to lower layers. Real brains, however, do. So does our Deep Attention Selective Network (dasNet) architecture. DasNets feedback structure can dynamically alter its convolutional filter sensitivities during classification. It harnesses t…
▽ More
Traditional convolutional neural networks (CNN) are stationary and feedforward. They neither change their parameters during evaluation nor use feedback from higher to lower layers. Real brains, however, do. So does our Deep Attention Selective Network (dasNet) architecture. DasNets feedback structure can dynamically alter its convolutional filter sensitivities during classification. It harnesses the power of sequential processing to improve classification performance, by allowing the network to iteratively focus its internal attention on some of its convolutional filters. Feedback is trained through direct policy search in a huge million-dimensional parameter space, through scalable natural evolution strategies (SNES). On the CIFAR-10 and CIFAR-100 datasets, dasNet outperforms the previous state-of-the-art model.
△ Less
Submitted 28 July, 2014; v1 submitted 11 July, 2014;
originally announced July 2014.
-
A Clockwork RNN
Authors:
Jan Koutník,
Klaus Greff,
Faustino Gomez,
Jürgen Schmidhuber
Abstract:
Sequence prediction and classification are ubiquitous and challenging problems in machine learning that can require identifying complex dependencies between temporally distant inputs. Recurrent Neural Networks (RNNs) have the ability, in theory, to cope with these temporal dependencies by virtue of the short-term memory implemented by their recurrent (feedback) connections. However, in practice th…
▽ More
Sequence prediction and classification are ubiquitous and challenging problems in machine learning that can require identifying complex dependencies between temporally distant inputs. Recurrent Neural Networks (RNNs) have the ability, in theory, to cope with these temporal dependencies by virtue of the short-term memory implemented by their recurrent (feedback) connections. However, in practice they are difficult to train successfully when the long-term memory is required. This paper introduces a simple, yet powerful modification to the standard RNN architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate. Rather than making the standard RNN models more complex, CW-RNN reduces the number of RNN parameters, improves the performance significantly in the tasks tested, and speeds up the network evaluation. The network is demonstrated in preliminary experiments involving two tasks: audio signal generation and TIMIT spoken word classification, where it outperforms both RNN and LSTM networks.
△ Less
Submitted 14 February, 2014;
originally announced February 2014.
-
A Frequency-Domain Encoding for Neuroevolution
Authors:
Jan Koutník,
Juergen Schmidhuber,
Faustino Gomez
Abstract:
Neuroevolution has yet to scale up to complex reinforcement learning tasks that require large networks. Networks with many inputs (e.g. raw video) imply a very high dimensional search space if encoded directly. Indirect methods use a more compact genotype representation that is transformed into networks of potentially arbitrary size. In this paper, we present an indirect method where networks are…
▽ More
Neuroevolution has yet to scale up to complex reinforcement learning tasks that require large networks. Networks with many inputs (e.g. raw video) imply a very high dimensional search space if encoded directly. Indirect methods use a more compact genotype representation that is transformed into networks of potentially arbitrary size. In this paper, we present an indirect method where networks are encoded by a set of Fourier coefficients which are transformed into network weight matrices via an inverse Fourier-type transform. Because there often exist network solutions whose weight matrices contain regularity (i.e. adjacent weights are correlated), the number of coefficients required to represent these networks in the frequency domain is much smaller than the number of weights (in the same way that natural images can be compressed by ignore high-frequency components). This "compressed" encoding is compared to the direct approach where search is conducted in the weight space on the high-dimensional octopus arm task. The results show that representing networks in the frequency domain can reduce the search-space dimensionality by as much as two orders of magnitude, both accelerating convergence and yielding more general solutions.
△ Less
Submitted 28 December, 2012;
originally announced December 2012.
-
On the Size of the Online Kernel Sparsification Dictionary
Authors:
Yi Sun,
Faustino Gomez,
Juergen Schmidhuber
Abstract:
We analyze the size of the dictionary constructed from online kernel sparsification, using a novel formula that expresses the expected determinant of the kernel Gram matrix in terms of the eigenvalues of the covariance operator. Using this formula, we are able to connect the cardinality of the dictionary with the eigen-decay of the covariance operator. In particular, we show that under certain tec…
▽ More
We analyze the size of the dictionary constructed from online kernel sparsification, using a novel formula that expresses the expected determinant of the kernel Gram matrix in terms of the eigenvalues of the covariance operator. Using this formula, we are able to connect the cardinality of the dictionary with the eigen-decay of the covariance operator. In particular, we show that under certain technical conditions, the size of the dictionary will always grow sub-linearly in the number of data points, and, as a consequence, the kernel linear regressor constructed from the resulting dictionary is consistent.
△ Less
Submitted 18 June, 2012;
originally announced June 2012.
-
T-Learning
Authors:
Vincent Graziano,
Faustino Gomez,
Mark Ring,
Juergen Schmidhuber
Abstract:
Traditional Reinforcement Learning (RL) has focused on problems involving many states and few actions, such as simple grid worlds. Most real world problems, however, are of the opposite type, Involving Few relevant states and many actions. For example, to return home from a conference, humans identify only few subgoal states such as lobby, taxi, airport etc. Each valid behavior connecting two such…
▽ More
Traditional Reinforcement Learning (RL) has focused on problems involving many states and few actions, such as simple grid worlds. Most real world problems, however, are of the opposite type, Involving Few relevant states and many actions. For example, to return home from a conference, humans identify only few subgoal states such as lobby, taxi, airport etc. Each valid behavior connecting two such states can be viewed as an action, and there are trillions of them. Assuming the subgoal identification problem is already solved, the quality of any RL method---in real-world settings---depends less on how well it scales with the number of states than on how well it scales with the number of actions. This is where our new method T-Learning excels, by evaluating the relatively few possible transits from one state to another in a policy-independent way, rather than a huge number of state-action pairs, or states in traditional policy-dependent ways. Illustrative experiments demonstrate that performance improvements of T-Learning over Q-learning can be arbitrarily large.
△ Less
Submitted 31 December, 2011;
originally announced January 2012.
-
A Linear Time Natural Evolution Strategy for Non-Separable Functions
Authors:
Yi Sun,
Faustino Gomez,
Tom Schaul,
Juergen Schmidhuber
Abstract:
We present a novel Natural Evolution Strategy (NES) variant, the Rank-One NES (R1-NES), which uses a low rank approximation of the search distribution covariance matrix. The algorithm allows computation of the natural gradient with cost linear in the dimensionality of the parameter space, and excels in solving high-dimensional non-separable problems, including the best result to date on the Rosenb…
▽ More
We present a novel Natural Evolution Strategy (NES) variant, the Rank-One NES (R1-NES), which uses a low rank approximation of the search distribution covariance matrix. The algorithm allows computation of the natural gradient with cost linear in the dimensionality of the parameter space, and excels in solving high-dimensional non-separable problems, including the best result to date on the Rosenbrock function (512 dimensions).
△ Less
Submitted 13 June, 2011; v1 submitted 10 June, 2011;
originally announced June 2011.
-
Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments
Authors:
Yi Sun,
Faustino Gomez,
Juergen Schmidhuber
Abstract:
To maximize its success, an AGI typically needs to explore its initially unknown world. Is there an optimal way of doing so? Here we derive an affirmative answer for a broad class of environments.
To maximize its success, an AGI typically needs to explore its initially unknown world. Is there an optimal way of doing so? Here we derive an affirmative answer for a broad class of environments.
△ Less
Submitted 29 March, 2011;
originally announced March 2011.
-
Optimization of Planck/LFI on--board data handling
Authors:
M. Maris,
M. Tomasi,
S. Galeotta,
M. Miccolis,
S. Hildebrandt,
M. Frailis,
R. Rohlfs,
N. Morisset,
A. Zacchei,
M. Bersanelli,
P. Binko,
C. Burigana,
R. C. Butler,
F. Cuttaia,
H. Chulani,
O. D'Arcangelo,
S. Fogliani,
E. Franceschi,
F. Gasparo,
F. Gomez,
A. Gregorio,
J. M. Herreros,
R. Leonardi,
P. Leutenegger,
G. Maggio
, et al. (12 additional authors not shown)
Abstract:
To asses stability against 1/f noise, the Low Frequency Instrument (LFI) onboard the Planck mission will acquire data at a rate much higher than the data rate allowed by its telemetry bandwith of 35.5 kbps. The data are processed by an onboard pipeline, followed onground by a reversing step. This paper illustrates the LFI scientific onboard processing to fit the allowed datarate. This is a lossy…
▽ More
To asses stability against 1/f noise, the Low Frequency Instrument (LFI) onboard the Planck mission will acquire data at a rate much higher than the data rate allowed by its telemetry bandwith of 35.5 kbps. The data are processed by an onboard pipeline, followed onground by a reversing step. This paper illustrates the LFI scientific onboard processing to fit the allowed datarate. This is a lossy process tuned by using a set of 5 parameters Naver, r1, r2, q, O for each of the 44 LFI detectors. The paper quantifies the level of distortion introduced by the onboard processing, EpsilonQ, as a function of these parameters. It describes the method of optimizing the onboard processing chain. The tuning procedure is based on a optimization algorithm applied to unprocessed and uncompressed raw data provided either by simulations, prelaunch tests or data taken from LFI operating in diagnostic mode. All the needed optimization steps are performed by an automated tool, OCA2, which ends with optimized parameters and produces a set of statistical indicators, among them the compression rate Cr and EpsilonQ. For Planck/LFI the requirements are Cr = 2.4 and EpsilonQ <= 10% of the rms of the instrumental white noise. To speedup the process an analytical model is developed that is able to extract most of the relevant information on EpsilonQ and Cr as a function of the signal statistics and the processing parameters. This model will be of interest for the instrument data analysis. The method was applied during ground tests when the instrument was operating in conditions representative of flight. Optimized parameters were obtained and the performance has been verified, the required data rate of 35.5 Kbps has been achieved while keeping EpsilonQ at a level of 3.8% of white noise rms well within the requirements.
△ Less
Submitted 26 January, 2010;
originally announced January 2010.
-
Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot
Authors:
Viktor Zhumatiy,
Faustino Gomez,
Marcus Hutter,
Juergen Schmidhuber
Abstract:
We address the problem of autonomously learning controllers for vision-capable mobile robots. We extend McCallum's (1995) Nearest-Sequence Memory algorithm to allow for general metrics over state-action trajectories. We demonstrate the feasibility of our approach by successfully running our algorithm on a real mobile robot. The algorithm is novel and unique in that it (a) explores the environmen…
▽ More
We address the problem of autonomously learning controllers for vision-capable mobile robots. We extend McCallum's (1995) Nearest-Sequence Memory algorithm to allow for general metrics over state-action trajectories. We demonstrate the feasibility of our approach by successfully running our algorithm on a real mobile robot. The algorithm is novel and unique in that it (a) explores the environment and learns directly on a mobile robot without using a hand-made computer model as an intermediate step, (b) does not require manual discretization of the sensor input space, (c) works in piecewise continuous perceptual spaces, and (d) copes with partial observability. Together this allows learning from much less experience compared to previous methods.
△ Less
Submitted 7 March, 2006;
originally announced March 2006.
-
Evolino for recurrent support vector machines
Authors:
Juergen Schmidhuber,
Matteo Gagliolo,
Daan Wierstra,
Faustino Gomez
Abstract:
Traditional Support Vector Machines (SVMs) need pre-wired finite time windows to predict and classify time series. They do not have an internal state necessary to deal with sequences involving arbitrary long-term dependencies. Here we introduce a new class of recurrent, truly sequential SVM-like devices with internal adaptive states, trained by a novel method called EVOlution of systems with KEr…
▽ More
Traditional Support Vector Machines (SVMs) need pre-wired finite time windows to predict and classify time series. They do not have an internal state necessary to deal with sequences involving arbitrary long-term dependencies. Here we introduce a new class of recurrent, truly sequential SVM-like devices with internal adaptive states, trained by a novel method called EVOlution of systems with KErnel-based outputs (Evoke), an instance of the recent Evolino class of methods. Evoke evolves recurrent neural networks to detect and represent temporal dependencies while using quadratic programming/support vector regression to produce precise outputs. Evoke is the first SVM-based mechanism learning to classify a context-sensitive language. It also outperforms recent state-of-the-art gradient-based recurrent neural networks (RNNs) on various time series prediction tasks.
△ Less
Submitted 15 December, 2005;
originally announced December 2005.
-
Acquiring Knowledge from Encyclopedic Texts
Authors:
Fernando Gomez,
Richard Hull,
Carlos Segami
Abstract:
A computational model for the acquisition of knowledge from encyclopedic texts is described. The model has been implemented in a program, called SNOWY, that reads unedited texts from {\em The World Book Encyclopedia}, and acquires new concepts and conceptual relations about topics dealing with the dietary habits of animals, their classifications and habitats. The program is also able to answer a…
▽ More
A computational model for the acquisition of knowledge from encyclopedic texts is described. The model has been implemented in a program, called SNOWY, that reads unedited texts from {\em The World Book Encyclopedia}, and acquires new concepts and conceptual relations about topics dealing with the dietary habits of animals, their classifications and habitats. The program is also able to answer an ample set of questions about the knowledge that it has acquired. This paper describes the essential components of this model, namely semantic interpretation, inferences and representation, and ends with an evaluation of the performance of the program, a sample of the questions that it is able to answer, and its relation to other programs of similar nature.
△ Less
Submitted 4 November, 1994;
originally announced November 1994.