-
Graph Neural Network-Based Predictive Modeling for Robotic Plaster Printing
Authors:
Diego Machain Rivera,
Selen Ercan Jenny,
Ping Hsun Tsai,
Ena Lloret-Fritschi,
Luis Salamanca,
Fernando Perez-Cruz,
Konstantinos E. Tatsis
Abstract:
This work proposes a Graph Neural Network (GNN) modeling approach to predict the resulting surface from a particle based fabrication process. The latter consists of spray-based printing of cementitious plaster on a wall and is facilitated with the use of a robotic arm. The predictions are computed using the robotic arm trajectory features, such as position, velocity and direction, as well as the p…
▽ More
This work proposes a Graph Neural Network (GNN) modeling approach to predict the resulting surface from a particle based fabrication process. The latter consists of spray-based printing of cementitious plaster on a wall and is facilitated with the use of a robotic arm. The predictions are computed using the robotic arm trajectory features, such as position, velocity and direction, as well as the printing process parameters. The proposed approach, based on a particle representation of the wall domain and the end effector, allows for the adoption of a graph-based solution. The GNN model consists of an encoder-processor-decoder architecture and is trained using data from laboratory tests, while the hyperparameters are optimized by means of a Bayesian scheme. The aim of this model is to act as a simulator of the printing process, and ultimately used for the generation of the robotic arm trajectory and the optimization of the printing parameters, towards the materialization of an autonomous plastering process. The performance of the proposed model is assessed in terms of the prediction error against unseen ground truth data, which shows its generality in varied scenarios, as well as in comparison with the performance of an existing benchmark model. The results demonstrate a significant improvement over the benchmark model, with notably better performance and enhanced error scaling across prediction steps.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification
Authors:
Markus Marks,
Manuel Knott,
Neehar Kondapaneni,
Elijah Cole,
Thijs Defraeye,
Fernando Perez-Cruz,
Pietro Perona
Abstract:
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. The model is forced to learn about the data structure or context by solving a pretext task. With SSL, models can learn from abundant and cheap unlabeled data, significantly reducing the cost of training models where labels are expensive or inaccessible…
▽ More
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. The model is forced to learn about the data structure or context by solving a pretext task. With SSL, models can learn from abundant and cheap unlabeled data, significantly reducing the cost of training models where labels are expensive or inaccessible. In Computer Vision, SSL is widely used as pre-training followed by a downstream task, such as supervised transfer, few-shot learning on smaller labeled data sets, and/or unsupervised clustering. Unfortunately, it is infeasible to evaluate SSL methods on all possible downstream tasks and objectively measure the quality of the learned representation. Instead, SSL methods are evaluated using in-domain evaluation protocols, such as fine-tuning, linear probing, and k-nearest neighbors (kNN). However, it is not well understood how well these evaluation protocols estimate the representation quality of a pre-trained model for different downstream tasks under different conditions, such as dataset, metric, and model architecture. We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types. Our study includes eleven common image datasets and 26 models that were pre-trained with different SSL methods or have different model backbones. We find that in-domain linear/kNN probing protocols are, on average, the best general predictors for out-of-domain performance. We further investigate the importance of batch normalization and evaluate how robust correlations are for different kinds of dataset domain shifts. We challenge assumptions about the relationship between discriminative and generative self-supervised methods, finding that most of their performance differences can be explained by changes to model backbones.
△ Less
Submitted 17 July, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Counterfactual Explanations for Deep Learning-Based Traffic Forecasting
Authors:
Rushan Wang,
Yanan Xin,
Yatao Zhang,
Fernando Perez-Cruz,
Martin Raubal
Abstract:
Deep learning models are widely used in traffic forecasting and have achieved state-of-the-art prediction accuracy. However, the black-box nature of those models makes the results difficult to interpret by users. This study aims to leverage an Explainable AI approach, counterfactual explanations, to enhance the explainability and usability of deep learning-based traffic forecasting models. Specifi…
▽ More
Deep learning models are widely used in traffic forecasting and have achieved state-of-the-art prediction accuracy. However, the black-box nature of those models makes the results difficult to interpret by users. This study aims to leverage an Explainable AI approach, counterfactual explanations, to enhance the explainability and usability of deep learning-based traffic forecasting models. Specifically, the goal is to elucidate relationships between various input contextual features and their corresponding predictions. We present a comprehensive framework that generates counterfactual explanations for traffic forecasting and provides usable insights through the proposed scenario-driven counterfactual explanations. The study first implements a deep learning model to predict traffic speed based on historical traffic data and contextual variables. Counterfactual explanations are then used to illuminate how alterations in these input variables affect predicted outcomes, thereby enhancing the transparency of the deep learning model. We investigated the impact of contextual features on traffic speed prediction under varying spatial and temporal conditions. The scenario-driven counterfactual explanations integrate two types of user-defined constraints, directional and weighting constraints, to tailor the search for counterfactual explanations to specific use cases. These tailored explanations benefit machine learning practitioners who aim to understand the model's learning mechanisms and domain experts who seek insights for real-world applications. The results showcase the effectiveness of counterfactual explanations in revealing traffic patterns learned by deep learning models, showing its potential for interpreting black-box deep learning models used for spatiotemporal predictions in general.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Do You Trust Your Model? Emerging Malware Threats in the Deep Learning Ecosystem
Authors:
Dorjan Hitaj,
Giulio Pagnotta,
Fabio De Gaspari,
Sediola Ruko,
Briland Hitaj,
Luigi V. Mancini,
Fernando Perez-Cruz
Abstract:
Training high-quality deep learning models is a challenging task due to computational and technical requirements. A growing number of individuals, institutions, and companies increasingly rely on pre-trained, third-party models made available in public repositories. These models are often used directly or integrated in product pipelines with no particular precautions, since they are effectively ju…
▽ More
Training high-quality deep learning models is a challenging task due to computational and technical requirements. A growing number of individuals, institutions, and companies increasingly rely on pre-trained, third-party models made available in public repositories. These models are often used directly or integrated in product pipelines with no particular precautions, since they are effectively just data in tensor form and considered safe. In this paper, we raise awareness of a new machine learning supply chain threat targeting neural networks. We introduce MaleficNet 2.0, a novel technique to embed self-extracting, self-executing malware in neural networks. MaleficNet 2.0 uses spread-spectrum channel coding combined with error correction techniques to inject malicious payloads in the parameters of deep neural networks. MaleficNet 2.0 injection technique is stealthy, does not degrade the performance of the model, and is robust against removal techniques. We design our approach to work both in traditional and distributed learning settings such as Federated Learning, and demonstrate that it is effective even when a reduced number of bits is used for the model parameters. Finally, we implement a proof-of-concept self-extracting neural network malware using MaleficNet 2.0, demonstrating the practicality of the attack against a widely adopted machine learning framework. Our aim with this work is to raise awareness against these new, dangerous attacks both in the research community and industry, and we hope to encourage further research in mitigation techniques against such threats.
△ Less
Submitted 13 May, 2025; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Synthetic location trajectory generation using categorical diffusion models
Authors:
Simon Dirmeier,
Ye Hong,
Fernando Perez-Cruz
Abstract:
Diffusion probabilistic models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data, for instance, for computer vision, audio, natural language processing, or biomolecule generation. Here, we propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical loc…
▽ More
Diffusion probabilistic models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data, for instance, for computer vision, audio, natural language processing, or biomolecule generation. Here, we propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals. ILTs are of major importance in mobility research to understand the mobility behavior of populations and to ultimately inform political decision-making. We represent ILTs as multi-dimensional categorical random variables and propose to model their joint distribution using a continuous DPM by first applying the diffusion process in a continuous unconstrained space and then mapping the continuous variables into a discrete space. We demonstrate that our model can synthesize realistic ILPs by comparing conditionally and unconditionally generated sequences to real-world ILPs from a GNSS tracking data set which suggests the potential use of our model for synthetic data generation, for example, for benchmarking models used in mobility research.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
A causal intervention framework for synthesizing mobility data and evaluating predictive neural networks
Authors:
Ye Hong,
Yanan Xin,
Simon Dirmeier,
Fernando Perez-Cruz,
Martin Raubal
Abstract:
Deep neural networks are increasingly utilized in mobility prediction tasks, yet their intricate internal workings pose challenges for interpretability, especially in comprehending how various aspects of mobility behavior affect predictions. This study introduces a causal intervention framework to assess the impact of mobility-related factors on neural networks designed for next location predictio…
▽ More
Deep neural networks are increasingly utilized in mobility prediction tasks, yet their intricate internal workings pose challenges for interpretability, especially in comprehending how various aspects of mobility behavior affect predictions. This study introduces a causal intervention framework to assess the impact of mobility-related factors on neural networks designed for next location prediction -- a task focusing on predicting the immediate next location of an individual. To achieve this, we employ individual mobility models to synthesize location visit sequences and control behavior dynamics by intervening in their data generation process. We evaluate the interventional location sequences using mobility metrics and input them into well-trained networks to analyze performance variations. The results demonstrate the effectiveness in producing location sequences with distinct mobility behaviors, thereby facilitating the simulation of diverse yet realistic spatial and temporal changes. These changes result in performance fluctuations in next location prediction networks, revealing impacts of critical mobility behavior factors, including sequential patterns in location transitions, proclivity for exploring new locations, and preferences in location choices at population and individual levels. The gained insights hold value for the real-world application of mobility prediction networks, and the framework is expected to promote the use of causal inference to enhance the interpretability and robustness of neural networks in mobility applications.
△ Less
Submitted 1 August, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Anchor Data Augmentation
Authors:
Nora Schneider,
Shirin Goshtasbpour,
Fernando Perez-Cruz
Abstract:
We propose a novel algorithm for data augmentation in nonlinear over-parametrized regression. Our data augmentation algorithm borrows from the literature on causality and extends the recently proposed Anchor regression (AR) method for data augmentation, which is in contrast to the current state-of-the-art domain-agnostic solutions that rely on the Mixup literature. Our Anchor Data Augmentation (AD…
▽ More
We propose a novel algorithm for data augmentation in nonlinear over-parametrized regression. Our data augmentation algorithm borrows from the literature on causality and extends the recently proposed Anchor regression (AR) method for data augmentation, which is in contrast to the current state-of-the-art domain-agnostic solutions that rely on the Mixup literature. Our Anchor Data Augmentation (ADA) uses several replicas of the modified samples in AR to provide more training examples, leading to more robust regression predictions. We apply ADA to linear and nonlinear regression problems using neural networks. ADA is competitive with state-of-the-art C-Mixup solutions.
△ Less
Submitted 27 November, 2023; v1 submitted 12 November, 2023;
originally announced November 2023.
-
KG-FRUS: a Novel Graph-based Dataset of 127 Years of US Diplomatic Relations
Authors:
Gökberk Özsoy,
Luis Salamanca,
Matthew Connelly,
Raymond Hicks,
Fernando Pérez-Cruz
Abstract:
In the current paper, we present the KG-FRUS dataset, comprised of more than 300,000 US government diplomatic documents encoded in a Knowledge Graph (KG). We leverage the data of the Foreign Relations of the United States (FRUS) (available as XML files) to extract information about the documents and the individuals and countries mentioned within them. We use the extracted entities, and associated…
▽ More
In the current paper, we present the KG-FRUS dataset, comprised of more than 300,000 US government diplomatic documents encoded in a Knowledge Graph (KG). We leverage the data of the Foreign Relations of the United States (FRUS) (available as XML files) to extract information about the documents and the individuals and countries mentioned within them. We use the extracted entities, and associated metadata, to create a graph-based dataset. Further, we supplement the created KG with additional entities and relations from Wikidata. The relations in the KG capture the synergies and dynamics required to study and understand the complex fields of diplomacy, foreign relations, and politics. This goes well beyond a simple collection of documents which neglects the relations between entities in the text. We showcase a range of possibilities of the current dataset by illustrating different approaches to probe the KG. In the paper, we exemplify how to use a query language to answer simple research questions and how to use graph algorithms such as Node2Vec and PageRank, that benefit from the complete graph structure. More importantly, the chosen structure provides total flexibility for continuously expanding and enriching the graph. Our solution is general, so the proposed pipeline for building the KG can encode other original corpora of time-dependent and complex phenomena. Overall, we present a mechanism to create KG databases providing a more versatile representation of time-dependent related text data and a particular application to the all-important FRUS database.
△ Less
Submitted 30 October, 2023;
originally announced November 2023.
-
Diffusion models for probabilistic programming
Authors:
Simon Dirmeier,
Fernando Perez-Cruz
Abstract:
We propose Diffusion Model Variational Inference (DMVI), a novel method for automated approximate inference in probabilistic programming languages (PPLs). DMVI utilizes diffusion models as variational approximations to the true posterior distribution by deriving a novel bound to the marginal likelihood objective used in Bayesian modelling. DMVI is easy to implement, allows hassle-free inference in…
▽ More
We propose Diffusion Model Variational Inference (DMVI), a novel method for automated approximate inference in probabilistic programming languages (PPLs). DMVI utilizes diffusion models as variational approximations to the true posterior distribution by deriving a novel bound to the marginal likelihood objective used in Bayesian modelling. DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model. We evaluate DMVI on a set of common Bayesian models and show that its posterior inferences are in general more accurate than those of contemporary methods used in PPLs while having a similar computational cost and requiring less manual tuning.
△ Less
Submitted 21 November, 2023; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Uncertainty quantification and out-of-distribution detection using surjective normalizing flows
Authors:
Simon Dirmeier,
Ye Hong,
Yanan Xin,
Fernando Perez-Cruz
Abstract:
Reliable quantification of epistemic and aleatoric uncertainty is of crucial importance in applications where models are trained in one environment but applied to multiple different environments, often seen in real-world applications for example, in climate science or mobility analysis. We propose a simple approach using surjective normalizing flows to identify out-of-distribution data sets in dee…
▽ More
Reliable quantification of epistemic and aleatoric uncertainty is of crucial importance in applications where models are trained in one environment but applied to multiple different environments, often seen in real-world applications for example, in climate science or mobility analysis. We propose a simple approach using surjective normalizing flows to identify out-of-distribution data sets in deep neural network models that can be computed in a single forward pass. The method builds on recent developments in deep uncertainty quantification and generative modeling with normalizing flows. We apply our method to a synthetic data set that has been simulated using a mechanistic model from the mobility literature and several data sets simulated from interventional distributions induced by soft and atomic interventions on that model, and demonstrate that our method can reliably discern out-of-distribution data from in-distribution data. We compare the surjective flow model to a Dirichlet process mixture model and a bijective flow and find that the surjections are a crucial component to reliably distinguish in-distribution from out-of-distribution data.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Simulation-based inference using surjective sequential neural likelihood estimation
Authors:
Simon Dirmeier,
Carlo Albert,
Fernando Perez-Cruz
Abstract:
We present Surjective Sequential Neural Likelihood (SSNL) estimation, a novel method for simulation-based inference in models where the evaluation of the likelihood function is not tractable and only a simulator that can generate synthetic data is available. SSNL fits a dimensionality-reducing surjective normalizing flow model and uses it as a surrogate likelihood function which allows for convent…
▽ More
We present Surjective Sequential Neural Likelihood (SSNL) estimation, a novel method for simulation-based inference in models where the evaluation of the likelihood function is not tractable and only a simulator that can generate synthetic data is available. SSNL fits a dimensionality-reducing surjective normalizing flow model and uses it as a surrogate likelihood function which allows for conventional Bayesian inference using either Markov chain Monte Carlo methods or variational inference. By embedding the data in a low-dimensional space, SSNL solves several issues previous likelihood-based methods had when applied to high-dimensional data sets that, for instance, contain non-informative data dimensions or lie along a lower-dimensional manifold. We evaluate SSNL on a wide variety of experiments and show that it generally outperforms contemporary methods used in simulation-based inference, for instance, on a challenging real-world example from astrophysics which models the magnetic field strength of the sun using a solar dynamo model.
△ Less
Submitted 23 February, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Adaptive Annealed Importance Sampling with Constant Rate Progress
Authors:
Shirin Goshtasbpour,
Victor Cohen,
Fernando Perez-Cruz
Abstract:
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution given its unnormalized density function. This algorithm relies on a sequence of interpolating distributions bridging the target to an initial tractable distribution such as the well-known geometric mean path of unnormalized distributions which is assumed to be suboptimal in general. In this paper, we p…
▽ More
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution given its unnormalized density function. This algorithm relies on a sequence of interpolating distributions bridging the target to an initial tractable distribution such as the well-known geometric mean path of unnormalized distributions which is assumed to be suboptimal in general. In this paper, we prove that the geometric annealing corresponds to the distribution path that minimizes the KL divergence between the current particle distribution and the desired target when the feasible change in the particle distribution is constrained. Following this observation, we derive the constant rate discretization schedule for this annealing sequence, which adjusts the schedule to the difficulty of moving samples between the initial and the target distributions. We further extend our results to $f$-divergences and present the respective dynamics of annealing sequences based on which we propose the Constant Rate AIS (CR-AIS) algorithm and its efficient implementation for $α$-divergences. We empirically show that CR-AIS performs well on multiple benchmark distributions while avoiding the computationally expensive tuning loop in existing Adaptive AIS.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
PassGPT: Password Modeling and (Guided) Generation with Large Language Models
Authors:
Javier Rando,
Fernando Perez-Cruz,
Briland Hitaj
Abstract:
Large language models (LLMs) successfully model natural language from vast amounts of text without the need for explicit supervision. In this paper, we investigate the efficacy of LLMs in modeling passwords. We present PassGPT, a LLM trained on password leaks for password generation. PassGPT outperforms existing methods based on generative adversarial networks (GAN) by guessing twice as many previ…
▽ More
Large language models (LLMs) successfully model natural language from vast amounts of text without the need for explicit supervision. In this paper, we investigate the efficacy of LLMs in modeling passwords. We present PassGPT, a LLM trained on password leaks for password generation. PassGPT outperforms existing methods based on generative adversarial networks (GAN) by guessing twice as many previously unseen passwords. Furthermore, we introduce the concept of guided password generation, where we leverage PassGPT sampling procedure to generate passwords matching arbitrary constraints, a feat lacking in current GAN-based strategies. Lastly, we conduct an in-depth analysis of the entropy and probability distribution that PassGPT defines over passwords and discuss their use in enhancing existing password strength estimators.
△ Less
Submitted 14 June, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Forecasting Particle Accelerator Interruptions Using Logistic LASSO Regression
Authors:
Sichen Li,
Jochem Snuverink,
Fernando Perez-Cruz,
Andreas Adelmann
Abstract:
Unforeseen particle accelerator interruptions, also known as interlocks, lead to abrupt operational changes despite being necessary safety measures. These may result in substantial loss of beam time and perhaps even equipment damage. We propose a simple yet powerful binary classification model aiming to forecast such interruptions, in the case of the High Intensity Proton Accelerator complex at th…
▽ More
Unforeseen particle accelerator interruptions, also known as interlocks, lead to abrupt operational changes despite being necessary safety measures. These may result in substantial loss of beam time and perhaps even equipment damage. We propose a simple yet powerful binary classification model aiming to forecast such interruptions, in the case of the High Intensity Proton Accelerator complex at the Paul Scherrer Institut. The model is formulated as logistic regression penalized by least absolute shrinkage and selection operator, based on a statistical two sample test to distinguish between unstable and stable states of the accelerator.
The primary objective for receiving alarms prior to interlocks is to allow for countermeasures and reduce beam time loss. Hence, a continuous evaluation metric is developed to measure the saved beam time in any period, given the assumption that interlocks could be circumvented by reducing the beam current. The best-performing interlock-to-stable classifier can potentially increase the beam time by around 5 min in a day. Possible instrumentation for fast adjustment of the beam current is also listed and discussed.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
An evaluation of deep learning models for predicting water depth evolution in urban floods
Authors:
Stefania Russo,
Nathanaël Perraudin,
Steven Stalder,
Fernando Perez-Cruz,
Joao Paulo Leitao,
Guillaume Obozinski,
Jan Dirk Wegner
Abstract:
In this technical report we compare different deep learning models for prediction of water depth rasters at high spatial resolution. Efficient, accurate, and fast methods for water depth prediction are nowadays important as urban floods are increasing due to higher rainfall intensity caused by climate change, expansion of cities and changes in land use. While hydrodynamic models models can provide…
▽ More
In this technical report we compare different deep learning models for prediction of water depth rasters at high spatial resolution. Efficient, accurate, and fast methods for water depth prediction are nowadays important as urban floods are increasing due to higher rainfall intensity caused by climate change, expansion of cities and changes in land use. While hydrodynamic models models can provide reliable forecasts by simulating water depth at every location of a catchment, they also have a high computational burden which jeopardizes their application to real-time prediction in large urban areas at high spatial resolution. Here, we propose to address this issue by using data-driven techniques. Specifically, we evaluate deep learning models which are trained to reproduce the data simulated by the CADDIES cellular-automata flood model, providing flood forecasts that can occur at different future time horizons. The advantage of using such models is that they can learn the underlying physical phenomena a priori, preventing manual parameter setting and computational burden. We perform experiments on a dataset consisting of two catchments areas within Switzerland with 18 simpler, short rainfall patterns and 4 long, more complex ones. Our results show that the deep learning models present in general lower errors compared to the other methods, especially for water depths $>0.5m$. However, when testing on more complex rainfall events or unseen catchment areas, the deep models do not show benefits over the simpler ones.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
Design Space Exploration and Explanation via Conditional Variational Autoencoders in Meta-model-based Conceptual Design of Pedestrian Bridges
Authors:
Vera M. Balmer,
Sophia V. Kuhn,
Rafael Bischof,
Luis Salamanca,
Walter Kaufmann,
Fernando Perez-Cruz,
Michael A. Kraus
Abstract:
For conceptual design, engineers rely on conventional iterative (often manual) techniques. Emerging parametric models facilitate design space exploration based on quantifiable performance metrics, yet remain time-consuming and computationally expensive. Pure optimisation methods, however, ignore qualitative aspects (e.g. aesthetics or construction methods). This paper provides a performance-driven…
▽ More
For conceptual design, engineers rely on conventional iterative (often manual) techniques. Emerging parametric models facilitate design space exploration based on quantifiable performance metrics, yet remain time-consuming and computationally expensive. Pure optimisation methods, however, ignore qualitative aspects (e.g. aesthetics or construction methods). This paper provides a performance-driven design exploration framework to augment the human designer through a Conditional Variational Autoencoder (CVAE), which serves as forward performance predictor for given design features as well as an inverse design feature predictor conditioned on a set of performance requests. The CVAE is trained on 18'000 synthetically generated instances of a pedestrian bridge in Switzerland. Sensitivity analysis is employed for explainability and informing designers about (i) relations of the model between features and/or performances and (ii) structural improvements under user-defined objectives. A case study proved our framework's potential to serve as a future co-pilot for conceptual design studies of pedestrian bridges and beyond.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Vision Paper: Causal Inference for Interpretable and Robust Machine Learning in Mobility Analysis
Authors:
Yanan Xin,
Natasa Tagasovska,
Fernando Perez-Cruz,
Martin Raubal
Abstract:
Artificial intelligence (AI) is revolutionizing many areas of our lives, leading a new era of technological advancement. Particularly, the transportation sector would benefit from the progress in AI and advance the development of intelligent transportation systems. Building intelligent transportation systems requires an intricate combination of artificial intelligence and mobility analysis. The pa…
▽ More
Artificial intelligence (AI) is revolutionizing many areas of our lives, leading a new era of technological advancement. Particularly, the transportation sector would benefit from the progress in AI and advance the development of intelligent transportation systems. Building intelligent transportation systems requires an intricate combination of artificial intelligence and mobility analysis. The past few years have seen rapid development in transportation applications using advanced deep neural networks. However, such deep neural networks are difficult to interpret and lack robustness, which slows the deployment of these AI-powered algorithms in practice. To improve their usability, increasing research efforts have been devoted to developing interpretable and robust machine learning methods, among which the causal inference approach recently gained traction as it provides interpretable and actionable information. Moreover, most of these methods are developed for image or sequential data which do not satisfy specific requirements of mobility data analysis. This vision paper emphasizes research challenges in deep learning-based mobility analysis that require interpretability and robustness, summarizes recent developments in using causal inference for improving the interpretability and robustness of machine learning methods, and highlights opportunities in developing causally-enabled machine learning models tailored for mobility analysis. This research direction will make AI in the transportation sector more interpretable and reliable, thus contributing to safer, more efficient, and more sustainable future transportation systems.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Optimization of Annealed Importance Sampling Hyperparameters
Authors:
Shirin Goshtasbpour,
Fernando Perez-Cruz
Abstract:
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. Although AIS is guaranteed to provide unbiased estimate for any set of hyperparameters, the common implementations rely on simple heuristics such as the geometric average bridging distributions between initial and the target distribution which affect the estima…
▽ More
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. Although AIS is guaranteed to provide unbiased estimate for any set of hyperparameters, the common implementations rely on simple heuristics such as the geometric average bridging distributions between initial and the target distribution which affect the estimation performance when the computation budget is limited. In order to reduce the number of sampling iterations, we present a parameteric AIS process with flexible intermediary distributions defined by a residual density with respect to the geometric mean path. Our method allows parameter sharing between annealing distributions, the use of fix linear schedule for discretization and amortization of hyperparameter selection in latent variable models. We assess the performance of Optimized-Path AIS for marginal likelihood estimation of deep generative models and compare it to compare it to more computationally intensive AIS.
△ Less
Submitted 8 October, 2022; v1 submitted 27 September, 2022;
originally announced September 2022.
-
Facilitated machine learning for image-based fruit quality assessment
Authors:
Manuel Knott,
Fernando Perez-Cruz,
Thijs Defraeye
Abstract:
Image-based machine learning models can be used to make the sorting and grading of agricultural products more efficient. In many regions, implementing such systems can be difficult due to the lack of centralization and automation of postharvest supply chains. Stakeholders are often too small to specialize in machine learning, and large training data sets are unavailable. We propose a machine learn…
▽ More
Image-based machine learning models can be used to make the sorting and grading of agricultural products more efficient. In many regions, implementing such systems can be difficult due to the lack of centralization and automation of postharvest supply chains. Stakeholders are often too small to specialize in machine learning, and large training data sets are unavailable. We propose a machine learning procedure for images based on pre-trained Vision Transformers. It is easier to implement than the current standard approach of training Convolutional Neural Networks (CNNs) as we do not (re-)train deep neural networks. We evaluate our approach based on two data sets for apple defect detection and banana ripeness estimation. Our model achieves a competitive classification accuracy equal to or less than one percent below the best-performing CNN. At the same time, it requires three times fewer training samples to achieve a 90% accuracy.
△ Less
Submitted 9 January, 2023; v1 submitted 10 July, 2022;
originally announced July 2022.
-
OADAT: Experimental and Synthetic Clinical Optoacoustic Data for Standardized Image Processing
Authors:
Firat Ozdemir,
Berkan Lafci,
Xosé Luís Deán-Ben,
Daniel Razansky,
Fernando Perez-Cruz
Abstract:
Optoacoustic (OA) imaging is based on excitation of biological tissues with nanosecond-duration laser pulses followed by subsequent detection of ultrasound waves generated via light-absorption-mediated thermoelastic expansion. OA imaging features a powerful combination between rich optical contrast and high resolution in deep tissues. This enabled the exploration of a number of attractive new appl…
▽ More
Optoacoustic (OA) imaging is based on excitation of biological tissues with nanosecond-duration laser pulses followed by subsequent detection of ultrasound waves generated via light-absorption-mediated thermoelastic expansion. OA imaging features a powerful combination between rich optical contrast and high resolution in deep tissues. This enabled the exploration of a number of attractive new applications both in clinical and laboratory settings. However, no standardized datasets generated with different types of experimental set-up and associated processing methods are available to facilitate advances in broader applications of OA in clinical settings. This complicates an objective comparison between new and established data processing methods, often leading to qualitative results and arbitrary interpretations of the data. In this paper, we provide both experimental and synthetic OA raw signals and reconstructed image domain datasets rendered with different experimental parameters and tomographic acquisition geometries. We further provide trained neural networks to tackle three important challenges related to OA image processing, namely accurate reconstruction under limited view tomographic conditions, removal of spatial undersampling artifacts and anatomical segmentation for improved image reconstruction. Specifically, we define 44 experiments corresponding to the aforementioned challenges as benchmarks to be used as a reference for the development of more advanced processing methods.
△ Less
Submitted 3 May, 2023; v1 submitted 17 June, 2022;
originally announced June 2022.
-
What You See is What You Classify: Black Box Attributions
Authors:
Steven Stalder,
Nathanaël Perraudin,
Radhakrishna Achanta,
Fernando Perez-Cruz,
Michele Volpi
Abstract:
An important step towards explaining deep image classifiers lies in the identification of image regions that contribute to individual class scores in the model's output. However, doing this accurately is a difficult task due to the black-box nature of such networks. Most existing approaches find such attributions either using activations and gradients or by repeatedly perturbing the input. We inst…
▽ More
An important step towards explaining deep image classifiers lies in the identification of image regions that contribute to individual class scores in the model's output. However, doing this accurately is a difficult task due to the black-box nature of such networks. Most existing approaches find such attributions either using activations and gradients or by repeatedly perturbing the input. We instead address this challenge by training a second deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. These attributions are provided in the form of masks that only show the classifier-relevant parts of an image, masking out the rest. Our approach produces sharper and more boundary-precise masks when compared to the saliency maps generated by other methods. Moreover, unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks in a single forward pass. This makes the proposed method very efficient during inference. We show that our attributions are superior to established methods both visually and quantitatively with respect to the PASCAL VOC-2007 and Microsoft COCO-2014 datasets.
△ Less
Submitted 7 October, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
TATTOOED: A Robust Deep Neural Network Watermarking Scheme based on Spread-Spectrum Channel Coding
Authors:
Giulio Pagnotta,
Dorjan Hitaj,
Briland Hitaj,
Fernando Perez-Cruz,
Luigi V. Mancini
Abstract:
Watermarking of deep neural networks (DNNs) has gained significant traction in recent years, with numerous (watermarking) strategies being proposed as mechanisms that can help verify the ownership of a DNN in scenarios where these models are obtained without the permission of the owner. However, a growing body of work has demonstrated that existing watermarking mechanisms are highly susceptible to…
▽ More
Watermarking of deep neural networks (DNNs) has gained significant traction in recent years, with numerous (watermarking) strategies being proposed as mechanisms that can help verify the ownership of a DNN in scenarios where these models are obtained without the permission of the owner. However, a growing body of work has demonstrated that existing watermarking mechanisms are highly susceptible to removal techniques, such as fine-tuning, parameter pruning, or shuffling. In this paper, we build upon extensive prior work on covert (military) communication and propose TATTOOED, a novel DNN watermarking technique that is robust to existing threats. We demonstrate that using TATTOOED as their watermarking mechanisms, the DNN owner can successfully obtain the watermark and verify model ownership even in scenarios where 99% of model parameters are altered. Furthermore, we show that TATTOOED is easy to employ in training pipelines, and has negligible impact on model performance.
△ Less
Submitted 3 June, 2024; v1 submitted 12 February, 2022;
originally announced February 2022.
-
Learning Summary Statistics for Bayesian Inference with Autoencoders
Authors:
Carlo Albert,
Simone Ulzega,
Firat Ozdemir,
Fernando Perez-Cruz,
Antonietta Mira
Abstract:
For stochastic models with intractable likelihood functions, approximate Bayesian computation offers a way of approximating the true posterior through repeated comparisons of observations with simulated model outputs in terms of a small set of summary statistics. These statistics need to retain the information that is relevant for constraining the parameters but cancel out the noise. They can thus…
▽ More
For stochastic models with intractable likelihood functions, approximate Bayesian computation offers a way of approximating the true posterior through repeated comparisons of observations with simulated model outputs in terms of a small set of summary statistics. These statistics need to retain the information that is relevant for constraining the parameters but cancel out the noise. They can thus be seen as thermodynamic state variables, for general stochastic models. For many scientific applications, we need strictly more summary statistics than model parameters to reach a satisfactory approximation of the posterior. Therefore, we propose to use the inner dimension of deep neural network based Autoencoders as summary statistics. To create an incentive for the encoder to encode all the parameter-related information but not the noise, we give the decoder access to explicit or implicit information on the noise that has been used to generate the training data. We validate the approach empirically on two types of stochastic models.
△ Less
Submitted 23 May, 2022; v1 submitted 28 January, 2022;
originally announced January 2022.
-
Deep Reinforcement Learning for Random Access in Machine-Type Communication
Authors:
Muhammad Awais Jadoon,
Adriano Pastore,
Monica Navarro,
Fernando Perez-Cruz
Abstract:
Random access (RA) schemes are a topic of high interest in machine-type communication (MTC). In RA protocols, backoff techniques such as exponential backoff (EB) are used to stabilize the system to avoid low throughput and excessive delays. However, these backoff techniques show varying performance for different underlying assumptions and analytical models. Therefore, finding a better transmission…
▽ More
Random access (RA) schemes are a topic of high interest in machine-type communication (MTC). In RA protocols, backoff techniques such as exponential backoff (EB) are used to stabilize the system to avoid low throughput and excessive delays. However, these backoff techniques show varying performance for different underlying assumptions and analytical models. Therefore, finding a better transmission policy for slotted ALOHA RA is still a challenge. In this paper, we show the potential of deep reinforcement learning (DRL) for RA. We learn a transmission policy that balances between throughput and fairness. The proposed algorithm learns transmission probabilities using previous action and binary feedback signal, and it is adaptive to different traffic arrival rates. Moreover, we propose average age of packet (AoP) as a metric to measure fairness among users. Our results show that the proposed policy outperforms the baseline EB transmission schemes in terms of throughput and fairness.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
FedComm: Federated Learning as a Medium for Covert Communication
Authors:
Dorjan Hitaj,
Giulio Pagnotta,
Briland Hitaj,
Fernando Perez-Cruz,
Luigi V. Mancini
Abstract:
Proposed as a solution to mitigate the privacy implications related to the adoption of deep learning, Federated Learning (FL) enables large numbers of participants to successfully train deep neural networks without having to reveal the actual private training data. To date, a substantial amount of research has investigated the security and privacy properties of FL, resulting in a plethora of innov…
▽ More
Proposed as a solution to mitigate the privacy implications related to the adoption of deep learning, Federated Learning (FL) enables large numbers of participants to successfully train deep neural networks without having to reveal the actual private training data. To date, a substantial amount of research has investigated the security and privacy properties of FL, resulting in a plethora of innovative attack and defense strategies. This paper thoroughly investigates the communication capabilities of an FL scheme. In particular, we show that a party involved in the FL learning process can use FL as a covert communication medium to send an arbitrary message. We introduce FedComm, a novel multi-system covert-communication technique that enables robust sharing and transfer of targeted payloads within the FL framework. Our extensive theoretical and empirical evaluations show that FedComm provides a stealthy communication channel, with minimal disruptions to the training process. Our experiments show that FedComm successfully delivers 100% of a payload in the order of kilobits before the FL procedure converges. Our evaluation also shows that FedComm is independent of the application domain and the neural network architecture used by the underlying FL scheme.
△ Less
Submitted 17 May, 2023; v1 submitted 21 January, 2022;
originally announced January 2022.
-
Probabilistic modeling of lake surface water temperature using a Bayesian spatio-temporal graph convolutional neural network
Authors:
Michael Stalder,
Firat Ozdemir,
Artur Safin,
Jonas Sukys,
Damien Bouffard,
Fernando Perez-Cruz
Abstract:
Accurate lake temperature estimation is essential for numerous problems tackled in both hydrological and ecological domains. Nowadays physical models are developed to estimate lake dynamics; however, computations needed for accurate estimation of lake surface temperature can get prohibitively expensive. We propose to aggregate simulations of lake temperature at a certain depth together with a rang…
▽ More
Accurate lake temperature estimation is essential for numerous problems tackled in both hydrological and ecological domains. Nowadays physical models are developed to estimate lake dynamics; however, computations needed for accurate estimation of lake surface temperature can get prohibitively expensive. We propose to aggregate simulations of lake temperature at a certain depth together with a range of meteorological features to probabilistically estimate lake surface temperature. Accordingly, we introduce a spatio-temporal neural network that combines Bayesian recurrent neural networks and Bayesian graph convolutional neural networks. This work demonstrates that the proposed graphical model can deliver homogeneously good performance covering the whole lake surface despite having sparse training data available. Quantitative results are compared with a state-of-the-art Bayesian deep learning method. Code for the developed architectural layers, as well as demo scripts, are available on https://renkulab.io/projects/das/bstnn.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
A data acquisition setup for data driven acoustic design
Authors:
Romana Rust,
Achilleas Xydis,
Kurt Heutschi,
Nathanaël Perraudin,
Gonzalo Casas,
Chaoyu Du,
Jürgen Strauss,
Kurt Eggenschwiler,
Fernando Perez-Cruz,
Fabio Gramazio,
Matthias Kohler
Abstract:
In this paper, we present a novel interdisciplinary approach to study the relationship between diffusive surface structures and their acoustic performance. Using computational design, surface structures are iteratively generated and 3D printed at 1:10 model scale. They originate from different fabrication typologies and are designed to have acoustic diffusion and absorption effects. An automated r…
▽ More
In this paper, we present a novel interdisciplinary approach to study the relationship between diffusive surface structures and their acoustic performance. Using computational design, surface structures are iteratively generated and 3D printed at 1:10 model scale. They originate from different fabrication typologies and are designed to have acoustic diffusion and absorption effects. An automated robotic process measures the impulse responses of these surfaces by positioning a microphone and a speaker at multiple locations. The collected data serves two purposes: first, as an exploratory catalogue of different spatio-temporal-acoustic scenarios and second, as data set for predicting the acoustic response of digitally designed surface geometries using machine learning. In this paper, we present the automated data acquisition setup, the data processing and the computational generation of diffusive surface structures. We describe first results of comparative studies of measured surface panels and conclude with steps of future research.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Regularizing Transformers With Deep Probabilistic Layers
Authors:
Aurora Cobo Aguilera,
Pablo Martínez Olmos,
Antonio Artés-Rodríguez,
Fernando Pérez-Cruz
Abstract:
Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impute missing/noisy words with richer text or even improve BLEU score. More precisely, we use a Gaussia…
▽ More
Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impute missing/noisy words with richer text or even improve BLEU score. More precisely, we use a Gaussian Mixture Variational Autoencoder (GMVAE) as a regularizer layer and prove its effectiveness not only in Transformers but also in the most relevant encoder-decoder based LM, seq2seq with and without attention.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
A Novel Approach for Classification and Forecasting of Time Series in Particle Accelerators
Authors:
Sichen Li,
Mélissa Zacharias,
Jochem Snuverink,
Jaime Coello de Portugal,
Fernando Perez-Cruz,
Davide Reggiani,
Andreas Adelmann
Abstract:
The beam interruptions (interlocks) of particle accelerators, despite being necessary safety measures, lead to abrupt operational changes and a substantial loss of beam time. A novel time series classification approach is applied to decrease beam time loss in the High Intensity Proton Accelerator complex by forecasting interlock events. The forecasting is performed through binary classification of…
▽ More
The beam interruptions (interlocks) of particle accelerators, despite being necessary safety measures, lead to abrupt operational changes and a substantial loss of beam time. A novel time series classification approach is applied to decrease beam time loss in the High Intensity Proton Accelerator complex by forecasting interlock events. The forecasting is performed through binary classification of windows of multivariate time series. The time series are transformed into Recurrence Plots which are then classified by a Convolutional Neural Network, which not only captures the inner structure of the time series but also utilizes the advances of image classification techniques. Our best performing interlock-to-stable classifier reaches an Area under the ROC Curve value of $0.71 \pm 0.01$ compared to $0.65 \pm 0.01$ of a Random Forest model, and it can potentially reduce the beam time loss by $0.5 \pm 0.2$ seconds per interlock.
△ Less
Submitted 1 February, 2021;
originally announced February 2021.
-
Robust Sampling in Deep Learning
Authors:
Aurora Cobo Aguilera,
Antonio Artés-Rodríguez,
Fernando Pérez-Cruz,
Pablo Martínez Olmos
Abstract:
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. The key idea is to modify the contribution from each sample for tightening the empirical risk bound. During the stochastic training, the selection of samples is done according to their accuracy in such…
▽ More
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. The key idea is to modify the contribution from each sample for tightening the empirical risk bound. During the stochastic training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization. We study different scenarios and show the ones where it can make the convergence faster or increase the accuracy.
△ Less
Submitted 5 June, 2020; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Improved BiGAN training with marginal likelihood equalization
Authors:
Pablo Sánchez-Martín,
Pablo M. Olmos,
Fernando Perez-Cruz
Abstract:
We propose a novel training procedure for improving the performance of generative adversarial networks (GANs), especially to bidirectional GANs. First, we enforce that the empirical distribution of the inverse inference network matches the prior distribution, which favors the generator network reproducibility on the seen samples. Second, we have found that the marginal log-likelihood of the sample…
▽ More
We propose a novel training procedure for improving the performance of generative adversarial networks (GANs), especially to bidirectional GANs. First, we enforce that the empirical distribution of the inverse inference network matches the prior distribution, which favors the generator network reproducibility on the seen samples. Second, we have found that the marginal log-likelihood of the samples shows a severe overrepresentation of a certain type of samples. To address this issue, we propose to train the bidirectional GAN using a non-uniform sampling for the mini-batch selection, resulting in improved quality and variety in generated samples measured quantitatively and by visual inspection. We illustrate our new procedure with the well-known CIFAR10, Fashion MNIST and CelebA datasets.
△ Less
Submitted 23 May, 2020; v1 submitted 4 November, 2019;
originally announced November 2019.
-
Probabilistic Time of Arrival Localization
Authors:
Fernando Perez-Cruz,
Pablo M. Olmos,
Michael Minyi Zhang,
Howard Huang
Abstract:
In this paper, we take a new approach for time of arrival geo-localization. We show that the main sources of error in metropolitan areas are due to environmental imperfections that bias our solutions, and that we can rely on a probabilistic model to learn and compensate for them. The resulting localization error is validated using measurements from a live LTE cellular network to be less than 10 me…
▽ More
In this paper, we take a new approach for time of arrival geo-localization. We show that the main sources of error in metropolitan areas are due to environmental imperfections that bias our solutions, and that we can rely on a probabilistic model to learn and compensate for them. The resulting localization error is validated using measurements from a live LTE cellular network to be less than 10 meters, representing an order-of-magnitude improvement.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Probabilistic MIMO Symbol Detection with Expectation Consistency Approximate Inference
Authors:
Javier Cépedes,
Pablo M. Olmos,
Matilde Sánchez-Fernández,
Fernando Pérez-Cruz
Abstract:
In this paper we explore low-complexity probabilistic algorithms for soft symbol detection in high-dimensional multiple-input multiple-output (MIMO) systems. We present a novel algorithm based on the Expectation Consistency (EC) framework, which describes the approximate inference problem as an optimization over a non-convex function. EC generalizes algorithms such as Belief Propagation and Expect…
▽ More
In this paper we explore low-complexity probabilistic algorithms for soft symbol detection in high-dimensional multiple-input multiple-output (MIMO) systems. We present a novel algorithm based on the Expectation Consistency (EC) framework, which describes the approximate inference problem as an optimization over a non-convex function. EC generalizes algorithms such as Belief Propagation and Expectation Propagation. For the MIMO symbol detection problem, we discuss feasible methods to find stationary points of the EC function and explore their tradeoffs between accuracy and speed of convergence. The accuracy is studied, first in terms of input-output mutual information and show that the proposed EC MIMO detector greatly improves state-of-the-art methods, with a complexity order cubic in the number of transmitting antennas. Second, these gains are corroborated by combining the probabilistic output of the EC detector with a low-density parity-check (LDPC) channel code.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Out-of-Sample Testing for GANs
Authors:
Pablo Sánchez-Martín,
Pablo M. Olmos,
Fernando Pérez-Cruz
Abstract:
We propose a new method to evaluate GANs, namely EvalGAN. EvalGAN relies on a test set to directly measure the reconstruction quality in the original sample space (no auxiliary networks are necessary), and it also computes the (log)likelihood for the reconstructed samples in the test set. Further, EvalGAN is agnostic to the GAN algorithm and the dataset. We decided to test it on three state-of-the…
▽ More
We propose a new method to evaluate GANs, namely EvalGAN. EvalGAN relies on a test set to directly measure the reconstruction quality in the original sample space (no auxiliary networks are necessary), and it also computes the (log)likelihood for the reconstructed samples in the test set. Further, EvalGAN is agnostic to the GAN algorithm and the dataset. We decided to test it on three state-of-the-art GANs over the well-known CIFAR-10 and CelebA datasets.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Infinite Factorial Finite State Machine for Blind Multiuser Channel Estimation
Authors:
Francisco J. R. Ruiz,
Isabel Valera,
Lennart Svensson,
Fernando Perez-Cruz
Abstract:
New communication standards need to deal with machine-to-machine communications, in which users may start or stop transmitting at any time in an asynchronous manner. Thus, the number of users is an unknown and time-varying parameter that needs to be accurately estimated in order to properly recover the symbols transmitted by all users in the system. In this paper, we address the problem of joint c…
▽ More
New communication standards need to deal with machine-to-machine communications, in which users may start or stop transmitting at any time in an asynchronous manner. Thus, the number of users is an unknown and time-varying parameter that needs to be accurately estimated in order to properly recover the symbols transmitted by all users in the system. In this paper, we address the problem of joint channel parameter and data estimation in a multiuser communication channel in which the number of transmitters is not known. For that purpose, we develop the infinite factorial finite state machine model, a Bayesian nonparametric model based on the Markov Indian buffet that allows for an unbounded number of transmitters with arbitrary channel length. We propose an inference algorithm that makes use of slice sampling and particle Gibbs with ancestor sampling. Our approach is fully blind as it does not require a prior channel estimation step, prior knowledge of the number of transmitters, or any signaling information. Our experimental results, loosely based on the LTE random access channel, show that the proposed approach can effectively recover the data-generating process for a wide range of scenarios, with varying number of transmitters, number of receivers, constellation order, channel length, and signal-to-noise ratio.
△ Less
Submitted 18 October, 2018;
originally announced October 2018.
-
Sparse Three-parameter Restricted Indian Buffet Process for Understanding International Trade
Authors:
Melanie F. Pradier,
Viktor Stojkoski,
Zoran Utkovski,
Ljupco Kocarev,
Fernando Perez-Cruz
Abstract:
This paper presents a Bayesian nonparametric latent feature model specially suitable for exploratory analysis of high-dimensional count data. We perform a non-negative doubly sparse matrix factorization that has two main advantages: not only we are able to better approximate the row input distributions, but the inferred topics are also easier to interpret. By combining the three-parameter and rest…
▽ More
This paper presents a Bayesian nonparametric latent feature model specially suitable for exploratory analysis of high-dimensional count data. We perform a non-negative doubly sparse matrix factorization that has two main advantages: not only we are able to better approximate the row input distributions, but the inferred topics are also easier to interpret. By combining the three-parameter and restricted Indian buffet processes into a single prior, we increase the model flexibility, allowing for a full spectrum of sparse solutions in the latent space. We demonstrate the usefulness of our approach in the analysis of countries' economic structure. Compared to other approaches, empirical results show our model's ability to give easy-to-interpret information and better capture the underlying sparsity structure of data.
△ Less
Submitted 29 June, 2018;
originally announced June 2018.
-
PassGAN: A Deep Learning Approach for Password Guessing
Authors:
Briland Hitaj,
Paolo Gasti,
Giuseppe Ateniese,
Fernando Perez-Cruz
Abstract:
State-of-the-art password guessing tools, such as HashCat and John the Ripper, enable users to check billions of passwords per second against password hashes. In addition to performing straightforward dictionary attacks, these tools can expand password dictionaries using password generation rules, such as concatenation of words (e.g., "password123456") and leet speak (e.g., "password" becomes "p4s…
▽ More
State-of-the-art password guessing tools, such as HashCat and John the Ripper, enable users to check billions of passwords per second against password hashes. In addition to performing straightforward dictionary attacks, these tools can expand password dictionaries using password generation rules, such as concatenation of words (e.g., "password123456") and leet speak (e.g., "password" becomes "p4s5w0rd"). Although these rules work well in practice, expanding them to model further passwords is a laborious task that requires specialized expertise. To address this issue, in this paper we introduce PassGAN, a novel approach that replaces human-generated password rules with theory-grounded machine learning algorithms. Instead of relying on manual password analysis, PassGAN uses a Generative Adversarial Network (GAN) to autonomously learn the distribution of real passwords from actual password leaks, and to generate high-quality password guesses. Our experiments show that this approach is very promising. When we evaluated PassGAN on two large password datasets, we were able to surpass rule-based and state-of-the-art machine learning password guessing tools. However, in contrast with the other tools, PassGAN achieved this result without any a-priori knowledge on passwords or common password structures. Additionally, when we combined the output of PassGAN with the output of HashCat, we were able to match 51%-73% more passwords than with HashCat alone. This is remarkable, because it shows that PassGAN can autonomously extract a considerable number of password properties that current state-of-the art rules do not encode.
△ Less
Submitted 14 February, 2019; v1 submitted 1 September, 2017;
originally announced September 2017.
-
Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning
Authors:
Briland Hitaj,
Giuseppe Ateniese,
Fernando Perez-Cruz
Abstract:
Deep Learning has recently become hugely popular in machine learning, providing significant improvements in classification accuracy in the presence of highly-structured and large databases.
Researchers have also considered privacy implications of deep learning. Models are typically trained in a centralized manner with all the data being processed by the same training algorithm. If the data is a…
▽ More
Deep Learning has recently become hugely popular in machine learning, providing significant improvements in classification accuracy in the presence of highly-structured and large databases.
Researchers have also considered privacy implications of deep learning. Models are typically trained in a centralized manner with all the data being processed by the same training algorithm. If the data is a collection of users' private data, including habits, personal pictures, geographical positions, interests, and more, the centralized server will have access to sensitive information that could potentially be mishandled. To tackle this problem, collaborative deep learning models have recently been proposed where parties locally train their deep learning structures and only share a subset of the parameters in the attempt to keep their respective training sets private. Parameters can also be obfuscated via differential privacy (DP) to make information extraction even more challenging, as proposed by Shokri and Shmatikov at CCS'15.
Unfortunately, we show that any privacy-preserving collaborative deep learning is susceptible to a powerful attack that we devise in this paper. In particular, we show that a distributed, federated, or decentralized deep learning approach is fundamentally broken and does not protect the training sets of honest participants. The attack we developed exploits the real-time nature of the learning process that allows the adversary to train a Generative Adversarial Network (GAN) that generates prototypical samples of the targeted training set that was meant to be private (the samples generated by the GAN are intended to come from the same distribution as the training data). Interestingly, we show that record-level DP applied to the shared parameters of the model, as suggested in previous work, is ineffective (i.e., record-level DP is not designed to address our attack).
△ Less
Submitted 14 September, 2017; v1 submitted 24 February, 2017;
originally announced February 2017.
-
Complex-Valued Kernel Methods for Regression
Authors:
Rafael Boloix-Tortosa,
Juan José Murillo-Fuentes,
Irene Santos Velázquez,
Fernando Pérez-Cruz
Abstract:
Usually, complex-valued RKHS are presented as an straightforward application of the real-valued case. In this paper we prove that this procedure yields a limited solution for regression. We show that another kernel, here denoted as pseudo kernel, is needed to learn any function in complex-valued fields. Accordingly, we derive a novel RKHS to include it, the widely RKHS (WRKHS). When the pseudo-ker…
▽ More
Usually, complex-valued RKHS are presented as an straightforward application of the real-valued case. In this paper we prove that this procedure yields a limited solution for regression. We show that another kernel, here denoted as pseudo kernel, is needed to learn any function in complex-valued fields. Accordingly, we derive a novel RKHS to include it, the widely RKHS (WRKHS). When the pseudo-kernel cancels, WRKHS reduces to complex-valued RKHS of previous approaches. We address the kernel and pseudo-kernel design, paying attention to the kernel and the pseudo-kernel being complex-valued. In the experiments included we report remarkable improvements in simple scenarios where real a imaginary parts have different similitude relations for given inputs or cases where real and imaginary parts are correlated. In the context of these novel results we revisit the problem of non-linear channel equalization, to show that the WRKHS helps to design more efficient solutions.
△ Less
Submitted 31 October, 2016;
originally announced October 2016.
-
Deep Learning for Multi-label Classification
Authors:
Jesse Read,
Fernando Perez-Cruz
Abstract:
In multi-label classification, the main focus has been to develop ways of learning the underlying dependencies between labels, and to take advantage of this at classification time. Developing better feature-space representations has been predominantly employed to reduce complexity, e.g., by eliminating non-helpful feature attributes from the input space prior to (or during) training. This is an im…
▽ More
In multi-label classification, the main focus has been to develop ways of learning the underlying dependencies between labels, and to take advantage of this at classification time. Developing better feature-space representations has been predominantly employed to reduce complexity, e.g., by eliminating non-helpful feature attributes from the input space prior to (or during) training. This is an important task, since many multi-label methods typically create many different copies or views of the same input data as they transform it, and considerable memory can be saved by taking advantage of redundancy. In this paper, we show that a proper development of the feature space can make labels less interdependent and easier to model and predict at inference time. For this task we use a deep learning approach with restricted Boltzmann machines. We present a deep network that, in an empirical evaluation, outperforms a number of competitive methods from the literature
△ Less
Submitted 17 December, 2014;
originally announced February 2015.
-
Bayesian nonparametric comorbidity analysis of psychiatric disorders
Authors:
Francisco J. R. Ruiz,
Isabel Valera,
Carlos Blanco,
Fernando Perez-Cruz
Abstract:
The analysis of comorbidity is an open and complex research field in the branch of psychiatry, where clinical experience and several studies suggest that the relation among the psychiatric disorders may have etiological and treatment implications. In this paper, we are interested in applying latent feature modeling to find the latent structure behind the psychiatric disorders that can help to exam…
▽ More
The analysis of comorbidity is an open and complex research field in the branch of psychiatry, where clinical experience and several studies suggest that the relation among the psychiatric disorders may have etiological and treatment implications. In this paper, we are interested in applying latent feature modeling to find the latent structure behind the psychiatric disorders that can help to examine and explain the relationships among them. To this end, we use the large amount of information collected in the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database and propose to model these data using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the discrete nature of the data, we first need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efficient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. We also provide a variational inference algorithm for this model, which provides a complementary (and less expensive in terms of computational complexity) alternative to the Gibbs sampler allowing us to deal with a larger number of data. Finally, we use the model to analyze comorbidity among the psychiatric disorders diagnosed by experts from the NESARC database.
△ Less
Submitted 29 January, 2014;
originally announced January 2014.
-
Gaussian Processes for Nonlinear Signal Processing
Authors:
Fernando Pérez-Cruz,
Steven Van Vaerenbergh,
Juan José Murillo-Fuentes,
Miguel Lázaro-Gredilla,
Ignacio Santamaria
Abstract:
Gaussian processes (GPs) are versatile tools that have been successfully employed to solve nonlinear estimation problems in machine learning, but that are rarely used in signal processing. In this tutorial, we present GPs for regression as a natural nonlinear extension to optimal Wiener filtering. After establishing their basic formulation, we discuss several important aspects and extensions, incl…
▽ More
Gaussian processes (GPs) are versatile tools that have been successfully employed to solve nonlinear estimation problems in machine learning, but that are rarely used in signal processing. In this tutorial, we present GPs for regression as a natural nonlinear extension to optimal Wiener filtering. After establishing their basic formulation, we discuss several important aspects and extensions, including recursive and adaptive algorithms for dealing with non-stationarity, low-complexity solutions, non-Gaussian noise models and classification scenarios. Furthermore, we provide a selection of relevant applications to wireless digital communications.
△ Less
Submitted 27 September, 2013; v1 submitted 12 March, 2013;
originally announced March 2013.
-
Tree-Structure Expectation Propagation for LDPC Decoding over the BEC
Authors:
Pablo M. Olmos,
Juan José Murillo-Fuentes,
Fernando Pérez-Cruz
Abstract:
We present the tree-structure expectation propagation (Tree-EP) algorithm to decode low-density parity-check (LDPC) codes over discrete memoryless channels (DMCs). EP generalizes belief propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this se…
▽ More
We present the tree-structure expectation propagation (Tree-EP) algorithm to decode low-density parity-check (LDPC) codes over discrete memoryless channels (DMCs). EP generalizes belief propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal constraints over pairs of variables connected to a check node of the LDPC code's Tanner graph. Thanks to these additional constraints, the Tree-EP marginal estimates for each variable in the graph are more accurate than those provided by BP. We also reformulate the Tree-EP algorithm for the binary erasure channel (BEC) as a peeling-type algorithm (TEP) and we show that the algorithm has the same computational complexity as BP and it decodes a higher fraction of errors. We describe the TEP decoding process by a set of differential equations that represents the expected residual graph evolution as a function of the code parameters. The solution of these equations is used to predict the TEP decoder performance in both the asymptotic regime and the finite-length regime over the BEC. While the asymptotic threshold of the TEP decoder is the same as the BP decoder for regular and optimized codes, we propose a scaling law (SL) for finite-length LDPC codes, which accurately approximates the TEP improved performance and facilitates its optimization.
△ Less
Submitted 13 August, 2012; v1 submitted 3 January, 2012;
originally announced January 2012.
-
Tree-Structure Expectation Propagation for LDPC Decoding in Erasure Channels
Authors:
Pablo M. Olmos,
Juan José Murillo-Fuentes,
Fernando Pérez-Cruz
Abstract:
In this paper we present a new algorithm, denoted as TEP, to decode low-density parity-check (LDPC) codes over the Binary Erasure Channel (BEC). The TEP decoder is derived applying the expectation propagation (EP) algorithm with a tree- structured approximation. Expectation Propagation (EP) is a generalization to Belief Propagation (BP) in two ways. First, it can be used with any exponential famil…
▽ More
In this paper we present a new algorithm, denoted as TEP, to decode low-density parity-check (LDPC) codes over the Binary Erasure Channel (BEC). The TEP decoder is derived applying the expectation propagation (EP) algorithm with a tree- structured approximation. Expectation Propagation (EP) is a generalization to Belief Propagation (BP) in two ways. First, it can be used with any exponential family distribution over the cliques in the graph. Second, it can impose additional constraints on the marginal distributions. We use this second property to impose pair-wise marginal constraints in some check nodes of the LDPC code's Tanner graph. The algorithm has the same computational complexity than BP, but it can decode a higher fraction of errors when applied over the BEC. In this paper, we focus on the asymptotic performance of the TEP decoder, as the block size tends to infinity. We describe the TEP decoder by a set of differential equations that represents the residual graph evolution during the decoding process. The solution of these equations yields the capacity of this decoder for a given LDPC ensemble over the BEC. We show that the achieved capacity with the TEP is higher than the BP capacity, at the same computational complexity.
△ Less
Submitted 4 January, 2012; v1 submitted 22 September, 2010;
originally announced September 2010.
-
Channel Decoding with a Bayesian Equalizer
Authors:
Luis Salamanca,
Juan José Murillo-Fuentes,
Fernando Pérez-Cruz
Abstract:
Low-density parity-check (LPDC) decoders assume the channel estate information (CSI) is known and they have the true a posteriori probability (APP) for each transmitted bit. But in most cases of interest, the CSI needs to be estimated with the help of a short training sequence and the LDPC decoder has to decode the received word using faulty APP estimates. In this paper, we study the uncertainty i…
▽ More
Low-density parity-check (LPDC) decoders assume the channel estate information (CSI) is known and they have the true a posteriori probability (APP) for each transmitted bit. But in most cases of interest, the CSI needs to be estimated with the help of a short training sequence and the LDPC decoder has to decode the received word using faulty APP estimates. In this paper, we study the uncertainty in the CSI estimate and how it affects the bit error rate (BER) output by the LDPC decoder. To improve these APP estimates, we propose a Bayesian equalizer that takes into consideration not only the uncertainty due to the noise in the channel, but also the uncertainty in the CSI estimate, reducing the BER after the LDPC decoder.
△ Less
Submitted 4 June, 2010;
originally announced June 2010.