-
PCB-Vision: A Multiscene RGB-Hyperspectral Benchmark Dataset of Printed Circuit Boards
Authors:
Elias Arbash,
Margret Fuchs,
Behnood Rasti,
Sandra Lorenz,
Pedram Ghamisi,
Richard Gloaguen
Abstract:
Addressing the critical theme of recycling electronic waste (E-waste), this contribution is dedicated to developing advanced automated data processing pipelines as a basis for decision-making and process control. Aligning with the broader goals of the circular economy and the United Nations (UN) Sustainable Development Goals (SDG), our work leverages non-invasive analysis methods utilizing RGB and…
▽ More
Addressing the critical theme of recycling electronic waste (E-waste), this contribution is dedicated to developing advanced automated data processing pipelines as a basis for decision-making and process control. Aligning with the broader goals of the circular economy and the United Nations (UN) Sustainable Development Goals (SDG), our work leverages non-invasive analysis methods utilizing RGB and hyperspectral imaging data to provide both quantitative and qualitative insights into the E-waste stream composition for optimizing recycling efficiency. In this paper, we introduce 'PCB-Vision'; a pioneering RGB-hyperspectral printed circuit board (PCB) benchmark dataset, comprising 53 RGB images of high spatial resolution paired with their corresponding high spectral resolution hyperspectral data cubes in the visible and near-infrared (VNIR) range. Grounded in open science principles, our dataset provides a comprehensive resource for researchers through high-quality ground truths, focusing on three primary PCB components: integrated circuits (IC), capacitors, and connectors. We provide extensive statistical investigations on the proposed dataset together with the performance of several state-of-the-art (SOTA) models, including U-Net, Attention U-Net, Residual U-Net, LinkNet, and DeepLabv3+. By openly sharing this multi-scene benchmark dataset along with the baseline codes, we hope to foster transparent, traceable, and comparable developments of advanced data processing across various scientific communities, including, but not limited to, computer vision and remote sensing. Emphasizing our commitment to supporting a collaborative and inclusive scientific community, all materials, including code, data, ground truth, and masks, will be accessible at https://github.com/hifexplo/PCBVision.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
A Multisensor Hyperspectral Benchmark Dataset For Unmixing of Intimate Mixtures
Authors:
Bikram Koirala,
Behnood Rasti,
Zakaria Bnoulkacem,
Andrea de Lima Ribeiro,
Yuleika Madriz,
Erik Herrmann,
Arthur Gestels,
Thomas De Kerf,
Sandra Lorenz,
Margret Fuchs,
Koen Janssens,
Gunther Steenackers,
Richard Gloaguen,
Paul Scheunders
Abstract:
Optical hyperspectral cameras capture the spectral reflectance of materials. Since many materials behave as heterogeneous intimate mixtures with which each photon interacts differently, the relationship between spectral reflectance and material composition is very complex. Quantitative validation of spectral unmixing algorithms requires high-quality ground truth fractional abundance data, which ar…
▽ More
Optical hyperspectral cameras capture the spectral reflectance of materials. Since many materials behave as heterogeneous intimate mixtures with which each photon interacts differently, the relationship between spectral reflectance and material composition is very complex. Quantitative validation of spectral unmixing algorithms requires high-quality ground truth fractional abundance data, which are very difficult to obtain. In this work, we generated a comprehensive laboratory ground truth dataset of intimately mixed mineral powders. For this, five clay powders (Kaolin, Roof clay, Red clay, mixed clay, and Calcium hydroxide) were mixed homogeneously to prepare 325 samples of 60 binary, 150 ternary, 100 quaternary, and 15 quinary mixtures. Thirteen different hyperspectral sensors have been used to acquire the reflectance spectra of these mixtures in the visible, near, short, mid, and long-wavelength infrared regions (350-15385) nm. {\color{black} Overlaps in wavelength regions due to the operational ranges of each sensor} and variations in acquisition conditions {\color{black} resulted in} a large amount of spectral variability. Ground truth composition is given by construction, but to verify that the generated samples are sufficiently homogeneous, XRD and XRF elemental analysis is performed. We believe these data will be beneficial for validating advanced methods for nonlinear unmixing and material composition estimation, including studying spectral variability and training supervised unmixing approaches. The datasets can be downloaded from the following link: https://github.com/VisionlabUA/Multisensor_datasets.
△ Less
Submitted 30 August, 2023;
originally announced September 2023.
-
Tinto: Multisensor Benchmark for 3D Hyperspectral Point Cloud Segmentation in the Geosciences
Authors:
Ahmed J. Afifi,
Samuel T. Thiele,
Aldino Rizaldy,
Sandra Lorenz,
Pedram Ghamisi,
Raimon Tolosana-Delgado,
Moritz Kirsch,
Richard Gloaguen,
Michael Heizmann
Abstract:
The increasing use of deep learning techniques has reduced interpretation time and, ideally, reduced interpreter bias by automatically deriving geological maps from digital outcrop models. However, accurate validation of these automated mapping approaches is a significant challenge due to the subjective nature of geological mapping and the difficulty in collecting quantitative validation data. Add…
▽ More
The increasing use of deep learning techniques has reduced interpretation time and, ideally, reduced interpreter bias by automatically deriving geological maps from digital outcrop models. However, accurate validation of these automated mapping approaches is a significant challenge due to the subjective nature of geological mapping and the difficulty in collecting quantitative validation data. Additionally, many state-of-the-art deep learning methods are limited to 2D image data, which is insufficient for 3D digital outcrops, such as hyperclouds. To address these challenges, we present Tinto, a multi-sensor benchmark digital outcrop dataset designed to facilitate the development and validation of deep learning approaches for geological mapping, especially for non-structured 3D data like point clouds. Tinto comprises two complementary sets: 1) a real digital outcrop model from Corta Atalaya (Spain), with spectral attributes and ground-truth data, and 2) a synthetic twin that uses latent features in the original datasets to reconstruct realistic spectral data (including sensor noise and processing artifacts) from the ground-truth. The point cloud is dense and contains 3,242,964 labeled points. We used these datasets to explore the abilities of different deep learning approaches for automated geological mapping. By making Tinto publicly available, we hope to foster the development and adaptation of new deep learning tools for 3D applications in Earth sciences. The dataset can be accessed through this link: https://doi.org/10.14278/rodare.2256.
△ Less
Submitted 20 October, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Explainable and High-Performance Hate and Offensive Speech Detection
Authors:
Marzieh Babaeianjelodar,
Gurram Poorna Prudhvi,
Stephen Lorenz,
Keyu Chen,
Sumona Mondal,
Soumyabrata Dey,
Navin Kumar
Abstract:
The spread of information through social media platforms can create environments possibly hostile to vulnerable communities and silence certain groups in society. To mitigate such instances, several models have been developed to detect hate and offensive speech. Since detecting hate and offensive speech in social media platforms could incorrectly exclude individuals from social media platforms, wh…
▽ More
The spread of information through social media platforms can create environments possibly hostile to vulnerable communities and silence certain groups in society. To mitigate such instances, several models have been developed to detect hate and offensive speech. Since detecting hate and offensive speech in social media platforms could incorrectly exclude individuals from social media platforms, which can reduce trust, there is a need to create explainable and interpretable models. Thus, we build an explainable and interpretable high performance model based on the XGBoost algorithm, trained on Twitter data. For unbalanced Twitter data, XGboost outperformed the LSTM, AutoGluon, and ULMFiT models on hate speech detection with an F1 score of 0.75 compared to 0.38 and 0.37, and 0.38 respectively. When we down-sampled the data to three separate classes of approximately 5000 tweets, XGBoost performed better than LSTM, AutoGluon, and ULMFiT; with F1 scores for hate speech detection of 0.79 vs 0.69, 0.77, and 0.66 respectively. XGBoost also performed better than LSTM, AutoGluon, and ULMFiT in the down-sampled version for offensive speech detection with F1 score of 0.83 vs 0.88, 0.82, and 0.79 respectively. We use Shapley Additive Explanations (SHAP) on our XGBoost models' outputs to makes it explainable and interpretable compared to LSTM, AutoGluon and ULMFiT that are black-box models.
△ Less
Submitted 24 September, 2023; v1 submitted 26 June, 2022;
originally announced June 2022.
-
Uncertainty-aware Evaluation of Time-Series Classification for Online Handwriting Recognition with Domain Shift
Authors:
Andreas Klaß,
Sven M. Lorenz,
Martin W. Lauer-Schmaltz,
David Rügamer,
Bernd Bischl,
Christopher Mutschler,
Felix Ott
Abstract:
For many applications, analyzing the uncertainty of a machine learning model is indispensable. While research of uncertainty quantification (UQ) techniques is very advanced for computer vision applications, UQ methods for spatio-temporal data are less studied. In this paper, we focus on models for online handwriting recognition, one particular type of spatio-temporal data. The data is observed fro…
▽ More
For many applications, analyzing the uncertainty of a machine learning model is indispensable. While research of uncertainty quantification (UQ) techniques is very advanced for computer vision applications, UQ methods for spatio-temporal data are less studied. In this paper, we focus on models for online handwriting recognition, one particular type of spatio-temporal data. The data is observed from a sensor-enhanced pen with the goal to classify written characters. We conduct a broad evaluation of aleatoric (data) and epistemic (model) UQ based on two prominent techniques for Bayesian inference, Stochastic Weight Averaging-Gaussian (SWAG) and Deep Ensembles. Next to a better understanding of the model, UQ techniques can detect out-of-distribution data and domain shifts when combining right-handed and left-handed writers (an underrepresented group).
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
sympy2c: from symbolic expressions to fast C/C++ functions and ODE solvers in Python
Authors:
Uwe Schmitt,
Beatrice Moser,
Christiane S. Lorenz,
Alexandre Refregier
Abstract:
Computer algebra systems play an important role in science as they facilitate the development of new theoretical models. The resulting symbolic equations are often implemented in a compiled programming language in order to provide fast and portable codes for practical applications. We describe sympy2c, a new Python package designed to bridge the gap between the symbolic development and the numeric…
▽ More
Computer algebra systems play an important role in science as they facilitate the development of new theoretical models. The resulting symbolic equations are often implemented in a compiled programming language in order to provide fast and portable codes for practical applications. We describe sympy2c, a new Python package designed to bridge the gap between the symbolic development and the numerical implementation of a theoretical model. sympy2c translates symbolic equations implemented in the SymPy Python package to C/C++ code that is optimized using symbolic transformations. The resulting functions can be conveniently used as an extension module in Python. sympy2c is used within the PyCosmo Python package to solve the Einstein-Boltzmann equations, a large system of ODEs describing the evolution of linear perturbations in the Universe. After reviewing the functionalities and usage of sympy2c, we describe its implementation and optimization strategies. This includes, in particular, a novel approach to generate optimized ODE solvers making use of the sparsity of the symbolic Jacobian matrix. We demonstrate its performance using the Einstein-Boltzmann equations as a test case. sympy2c is widely applicable and may prove useful for various areas of computational physics. sympy2c is publicly available at https://cosmology.ethz.ch/research/software-lab/sympy2c.html
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
X-pire! - A digital expiration date for images in social networks
Authors:
Julian Backes,
Michael Backes,
Markus Dürmuth,
Sebastian Gerling,
Stefan Lorenz
Abstract:
The Internet and its current information culture of preserving all kinds of data cause severe problems with privacy. Most of today's Internet users, especially teenagers, publish various kinds of sensitive information, yet without recognizing that revealing this information might be detrimental to their future life and career. Unflattering images that can be openly accessed now and in the future,…
▽ More
The Internet and its current information culture of preserving all kinds of data cause severe problems with privacy. Most of today's Internet users, especially teenagers, publish various kinds of sensitive information, yet without recognizing that revealing this information might be detrimental to their future life and career. Unflattering images that can be openly accessed now and in the future, e.g., by potential employers, constitute a particularly important such privacy concern. We have developed a novel, fast, and scalable system called X-pire! that allows users to set an expiration date for images in social networks (e.g., Facebook and Flickr) and on static websites, without requiring any form of additional interaction with these web pages. Once the expiration date is reached, the images become unavailable. Moreover, the publishing user can dynamically prolong or shorten the expiration dates of his images later, and even enforce instantaneous expiration. Rendering the approach possible for social networks crucially required us to develop a novel technique for embedding encrypted information within JPEG files in a way that survives JPEG compression, even for highly optimized implementations of JPEG post-processing with their various idiosyncrasies as commonly used in such networks. We have implemented our system and conducted performance measurements to demonstrate its robustness and efficiency.
△ Less
Submitted 12 December, 2011;
originally announced December 2011.