Search | arXiv e-print repository

arXiv:2304.02011 [pdf, other]

doi 10.1016/j.str.2025.01.020

FakET: Simulating Cryo-Electron Tomograms with Neural Style Transfer

Authors: Pavol Harar, Lukas Herrmann, Philipp Grohs, David Haselbach

Abstract: In cryo-electron microscopy, accurate particle localization and classification are imperative. Recent deep learning solutions, though successful, require extensive training data sets. The protracted generation time of physics-based models, often employed to produce these data sets, limits their broad applicability. We introduce FakET, a method based on Neural Style Transfer, capable of simulating… ▽ More In cryo-electron microscopy, accurate particle localization and classification are imperative. Recent deep learning solutions, though successful, require extensive training data sets. The protracted generation time of physics-based models, often employed to produce these data sets, limits their broad applicability. We introduce FakET, a method based on Neural Style Transfer, capable of simulating the forward operator of any cryo transmission electron microscope. It can be used to adapt a synthetic training data set according to reference data producing high-quality simulated micrographs or tilt-series. To assess the quality of our generated data, we used it to train a state-of-the-art localization and classification architecture and compared its performance with a counterpart trained on benchmark data. Remarkably, our technique matches the performance, boosts data generation speed 750 times, uses 33 times less memory, and scales well to typical transmission electron microscope detector sizes. It leverages GPU acceleration and parallel processing. The source code is available at https://github.com/paloha/faket. △ Less

Submitted 19 February, 2025; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: 25 pages, 3 tables, 19 figures including supplement. Updated LaTeX project structure, updated figure captions, added in-text references to figures, fixed page numbering, fixed typos and typesetting

Journal ref: Structure, 2025

arXiv:2210.14219 [pdf, other]

Redistributor: Transforming Empirical Data Distributions

Authors: Pavol Harar, Dennis Elbrächter, Monika Dörfler, Kory D. Johnson

Abstract: We present an algorithm and package, Redistributor, which forces a collection of scalar samples to follow a desired distribution. When given independent and identically distributed samples of some random variable $S$ and the continuous cumulative distribution function of some desired target $T$, it provably produces a consistent estimator of the transformation $R$ which satisfies $R(S)=T$ in distr… ▽ More We present an algorithm and package, Redistributor, which forces a collection of scalar samples to follow a desired distribution. When given independent and identically distributed samples of some random variable $S$ and the continuous cumulative distribution function of some desired target $T$, it provably produces a consistent estimator of the transformation $R$ which satisfies $R(S)=T$ in distribution. As the distribution of $S$ or $T$ may be unknown, we also include algorithms for efficiently estimating these distributions from samples. This allows for various interesting use cases in image processing, where Redistributor serves as a remarkably simple and easy-to-use tool that is capable of producing visually appealing results. For color correction it outperforms other model-based methods and excels in achieving photorealistic style transfer, surpassing deep learning methods in content preservation. The package is implemented in Python and is optimized to efficiently handle large datasets, making it also suitable as a preprocessing step in machine learning. The source code is available at https://github.com/paloha/redistributor. △ Less

Submitted 5 July, 2024; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: 16 pages, 13 figures - Added more use cases and comparisons with other methods

arXiv:1907.06129 [pdf, other]

doi 10.1007/s00521-018-3464-7

Towards Robust Voice Pathology Detection

Authors: Pavol Harar, Zoltan Galaz, Jesus B. Alonso-Hernandez, Jiri Mekyska, Radim Burget, Zdenek Smekal

Abstract: Automatic objective non-invasive detection of pathological voice based on computerized analysis of acoustic signals can play an important role in early diagnosis, progression tracking and even effective treatment of pathological voices. In search towards such a robust voice pathology detection system we investigated 3 distinct classifiers within supervised learning and anomaly detection paradigms.… ▽ More Automatic objective non-invasive detection of pathological voice based on computerized analysis of acoustic signals can play an important role in early diagnosis, progression tracking and even effective treatment of pathological voices. In search towards such a robust voice pathology detection system we investigated 3 distinct classifiers within supervised learning and anomaly detection paradigms. We conducted a set of experiments using a variety of input data such as raw waveforms, spectrograms, mel-frequency cepstral coefficients (MFCC) and conventional acoustic (dysphonic) features (AF). In comparison with previously published works, this article is the first to utilize combination of 4 different databases comprising normophonic and pathological recordings of sustained phonation of the vowel /a/ unrestricted to a subset of vocal pathologies. Furthermore, to our best knowledge, this article is the first to explore gradient boosted trees and deep learning for this application. The following best classification performances measured by F1 score on dedicated test set were achieved: XGBoost (0.733) using AF and MFCC, DenseNet (0.621) using MFCC, and Isolation Forest (0.610) using AF. Even though these results are of exploratory character, conducted experiments do show promising potential of gradient boosting and deep learning methods to robustly detect voice pathologies. △ Less

Submitted 13 July, 2019; originally announced July 2019.

Comments: 11 pages, 1 figure, 10 tables. Keywords: Voice pathology detection, deep learning, gradient boosting, anomaly detection

Journal ref: Neural Computing and Applications (2018): 1-11

arXiv:1907.05905 [pdf, other]

doi 10.1109/IWOBI.2017.7985525

Voice Pathology Detection Using Deep Learning: a Preliminary Study

Authors: Pavol Harar, Jesus B. Alonso-Hernandez, Jiri Mekyska, Zoltan Galaz, Radim Burget, Zdenek Smekal

Abstract: This paper describes a preliminary investigation of Voice Pathology Detection using Deep Neural Networks (DNN). We used voice recordings of sustained vowel /a/ produced at normal pitch from German corpus Saarbruecken Voice Database (SVD). This corpus contains voice recordings and electroglottograph signals of more than 2 000 speakers. The idea behind this experiment is the use of convolutional lay… ▽ More This paper describes a preliminary investigation of Voice Pathology Detection using Deep Neural Networks (DNN). We used voice recordings of sustained vowel /a/ produced at normal pitch from German corpus Saarbruecken Voice Database (SVD). This corpus contains voice recordings and electroglottograph signals of more than 2 000 speakers. The idea behind this experiment is the use of convolutional layers in combination with recurrent Long-Short-Term-Memory (LSTM) layers on raw audio signal. Each recording was split into 64 ms Hamming windowed segments with 30 ms overlap. Our trained model achieved 71.36% accuracy with 65.04% sensitivity and 77.67% specificity on 206 validation files and 68.08% accuracy with 66.75% sensitivity and 77.89% specificity on 874 testing files. This is a promising result in favor of this approach because it is comparable to similar previously published experiment that used different methodology. Further investigation is needed to achieve the state-of-the-art results. △ Less

Submitted 12 July, 2019; originally announced July 2019.

Comments: 4 pages, 1 figure, 5 tables

Journal ref: In 2017 international conference and workshop on bioinspired intelligence (IWOBI), pp. 1-4. IEEE, 2017

arXiv:1903.08950 [pdf, other]

doi 10.1109/ICUMT48472.2019.8970740

Improving Machine Hearing on Limited Data Sets

Authors: Pavol Harar, Roswitha Bammer, Anna Breger, Monika Dörfler, Zdenek Smekal

Abstract: Convolutional neural network (CNN) architectures have originated and revolutionized machine learning for images. In order to take advantage of CNNs in predictive modeling with audio data, standard FFT-based signal processing methods are often applied to convert the raw audio waveforms into an image-like representations (e.g. spectrograms). Even though conventional images and spectrograms differ in… ▽ More Convolutional neural network (CNN) architectures have originated and revolutionized machine learning for images. In order to take advantage of CNNs in predictive modeling with audio data, standard FFT-based signal processing methods are often applied to convert the raw audio waveforms into an image-like representations (e.g. spectrograms). Even though conventional images and spectrograms differ in their feature properties, this kind of pre-processing reduces the amount of training data necessary for successful training. In this contribution we investigate how input and target representations interplay with the amount of available training data in a music information retrieval setting. We compare the standard mel-spectrogram inputs with a newly proposed representation, called Mel scattering. Furthermore, we investigate the impact of additional target data representations by using an augmented target loss function which incorporates unused available information. We observe that all proposed methods outperform the standard mel-transform representation when using a limited data set and discuss their strengths and limitations. The source code for reproducibility of our experiments as well as intermediate results and model checkpoints are available in an online repository. △ Less

Submitted 12 July, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

Comments: 13 pages, 3 figures, 2 tables. Repository for reproducibility: https://gitlab.com/hararticles/gs-ms-mt/. Keywords: audio, CNN, limited data, Mel scattering, mel-spectrogram, augmented target loss function. Rewritten and restructured after peer revision. Recomputed and added new experiments and visualizations. Changed the presentation of the results

Journal ref: 2019 11th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pp. 1-6. IEEE, 2019

arXiv:1901.07598 [pdf, other]

doi 10.1007/s10851-019-00902-2

On orthogonal projections for dimension reduction and applications in augmented target loss functions for learning problems

Authors: Anna Breger, Jose Ignacio Orlando, Pavol Harar, Monika Dörfler, Sophie Klimscha, Christoph Grechenig, Bianca S. Gerendas, Ursula Schmidt-Erfurth, Martin Ehler

Abstract: The use of orthogonal projections on high-dimensional input and target data in learning frameworks is studied. First, we investigate the relations between two standard objectives in dimension reduction, preservation of variance and of pairwise relative distances. Investigations of their asymptotic correlation as well as numerical experiments show that a projection does usually not satisfy both obj… ▽ More The use of orthogonal projections on high-dimensional input and target data in learning frameworks is studied. First, we investigate the relations between two standard objectives in dimension reduction, preservation of variance and of pairwise relative distances. Investigations of their asymptotic correlation as well as numerical experiments show that a projection does usually not satisfy both objectives at once. In a standard classification problem we determine projections on the input data that balance the objectives and compare subsequent results. Next, we extend our application of orthogonal projections to deep learning tasks and introduce a general framework of augmented target loss functions. These loss functions integrate additional information via transformations and projections of the target data. In two supervised learning problems, clinical image segmentation and music information classification, the application of our proposed augmented target loss functions increase the accuracy. △ Less

Submitted 9 September, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

Journal ref: Journal of Mathematical Imaging and Vision, 2019

arXiv:1706.08818 [pdf, other]

doi 10.3390/axioms8040106

Gabor frames and deep scattering networks in audio processing

Authors: Roswitha Bammer, Monika Dörfler, Pavol Harar

Abstract: This paper introduces Gabor scattering, a feature extractor based on Gabor frames and Mallat's scattering transform. By using a simple signal model for audio signals specific properties of Gabor scattering are studied. It is shown that for each layer, specific invariances to certain signal characteristics occur. Furthermore, deformation stability of the coefficient vector generated by the feature… ▽ More This paper introduces Gabor scattering, a feature extractor based on Gabor frames and Mallat's scattering transform. By using a simple signal model for audio signals specific properties of Gabor scattering are studied. It is shown that for each layer, specific invariances to certain signal characteristics occur. Furthermore, deformation stability of the coefficient vector generated by the feature extractor is derived by using a decoupling technique which exploits the contractivity of general scattering networks. Deformations are introduced as changes in spectral shape and frequency modulation. The theoretical results are illustrated by numerical examples and experiments. Numerical evidence is given by evaluation on a synthetic and a "real" data set, that the invariances encoded by the Gabor scattering transform lead to higher performance in comparison with just using Gabor transform, especially when few training samples are available. △ Less

Submitted 1 October, 2019; v1 submitted 27 June, 2017; originally announced June 2017.

Comments: 26 pages, 8 figures, 4 tables. Repository for reproducibility: https://gitlab.com/hararticles/gs-gt . Keywords: machine learning; scattering transform; Gabor transform; deep learning; time-frequency analysis; CNN. Accepted and published after peer revision

Journal ref: Axioms 2019, 8(4), 106

Showing 1–7 of 7 results for author: Harar, P