Search | arXiv e-print repository

jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval

Authors: Michael Günther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Bo Wang, Sedigheh Eslami, Scott Martens, Maximilian Werk, Nan Wang, Han Xiao

Abstract: We introduce jina-embeddings-v4, a 3.8 billion parameter multimodal embedding model that unifies text and image representations through a novel architecture supporting both single-vector and multi-vector embeddings in the late interaction style. The model incorporates task-specific Low-Rank Adaptation (LoRA) adapters to optimize performance across diverse retrieval scenarios, including query-docum… ▽ More We introduce jina-embeddings-v4, a 3.8 billion parameter multimodal embedding model that unifies text and image representations through a novel architecture supporting both single-vector and multi-vector embeddings in the late interaction style. The model incorporates task-specific Low-Rank Adaptation (LoRA) adapters to optimize performance across diverse retrieval scenarios, including query-document retrieval, semantic text similarity, and code search. Comprehensive evaluations demonstrate that jina-embeddings-v4 achieves state-of-the-art performance on both single-modal and cross-modal retrieval tasks, with particular strength in processing visually rich content such as tables, charts, diagrams, and mixed-media formats. To facilitate evaluation of this capability, we also introduce Jina-VDR, a novel benchmark specifically designed for visually rich image retrieval. △ Less

Submitted 24 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

Comments: 22 pages, 1-10 main, 14-22 experimental results, benchmark tables

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2504.07323 [pdf, ps, other]

Prekey Pogo: Investigating Security and Privacy Issues in WhatsApp's Handshake Mechanism

Authors: Gabriel K. Gegenhuber, Philipp É. Frenzel, Maximilian Günther, Aljosha Judmayer

Abstract: WhatsApp, the world's largest messaging application, uses a version of the Signal protocol to provide end-to-end encryption (E2EE) with strong security guarantees, including Perfect Forward Secrecy (PFS). To ensure PFS right from the start of a new conversation -- even when the recipient is offline -- a stash of ephemeral (one-time) prekeys must be stored on a server. While the critical role of th… ▽ More WhatsApp, the world's largest messaging application, uses a version of the Signal protocol to provide end-to-end encryption (E2EE) with strong security guarantees, including Perfect Forward Secrecy (PFS). To ensure PFS right from the start of a new conversation -- even when the recipient is offline -- a stash of ephemeral (one-time) prekeys must be stored on a server. While the critical role of these one-time prekeys in achieving PFS has been outlined in the Signal specification, we are the first to demonstrate a targeted depletion attack against them on individual WhatsApp user devices. Our findings not only reveal an attack that can degrade PFS for certain messages, but also expose inherent privacy risks and serious availability implications arising from the refilling and distribution procedure essential for this security mechanism. △ Less

Submitted 16 June, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

Comments: USENIX WOOT Conference 2025

arXiv:2503.18479 [pdf, other]

Differentiable Simulator for Electrically Reconfigurable Electromagnetic Structures

Authors: Johannes Müller, Dennis Philipp, Matthias Günther

Abstract: This paper introduces a novel CUDA-enabled PyTorch-based framework designed for the gradient-based optimization of such reconfigurable electromagnetic structures with electrically tunable parameters. Traditional optimization techniques for these structures often rely on non-gradient-based methods, limiting efficiency and flexibility. Our framework leverages automatic differentiation, facilitating… ▽ More This paper introduces a novel CUDA-enabled PyTorch-based framework designed for the gradient-based optimization of such reconfigurable electromagnetic structures with electrically tunable parameters. Traditional optimization techniques for these structures often rely on non-gradient-based methods, limiting efficiency and flexibility. Our framework leverages automatic differentiation, facilitating the application of gradient-based optimization methods. This approach is particularly advantageous for embedding within deep learning frameworks, enabling sophisticated optimization strategies. We demonstrate the framework's effectiveness through comprehensive simulations involving resonant structures with tunable parameters. Key contributions include the efficient solution of the inverse problem. The framework's performance is validated using three different resonant structures: a single-loop copper wire (Unit-Cell) as well as an 8x1 and an 8x8 array of resonant unit cells with multiple inductively coupled unit cells (1d and 2d Metasurfaces). Results show precise in-silico control over the magnetic field's component normal to the surface of each resonant structure, achieving desired field strengths with minimal error. The proposed framework is compatible with existing simulation software. This PyTorch-based framework sets the stage for advanced electromagnetic control strategies for resonant structures with application in e.g. MRI, providing a robust platform for further exploration and innovation in the design and optimization of resonant electromagnetic structures. △ Less

Submitted 24 March, 2025; originally announced March 2025.

arXiv:2502.13595 [pdf, ps, other]

MMTEB: Massive Multilingual Text Embedding Benchmark

Authors: Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, Márton Kardos, Ashwin Mathur, David Stap, Jay Gala, Wissam Siblini, Dominik Krzemiński, Genta Indra Winata, Saba Sturua, Saiteja Utpala, Mathieu Ciancone, Marion Schaeffer, Gabriel Sequeira, Diganta Misra, Shreeya Dhakal, Jonathan Rystrøm, Roman Solomatin, Ömer Çağatan, Akash Kundu, Martin Bernstorff, Shitao Xiao, Akshita Sukhlecha, Bhavish Pahwa , et al. (61 additional authors not shown)

Abstract: Text embeddings are typically evaluated on a limited set of tasks, which are constrained by language, domain, and task diversity. To address these limitations and provide a more comprehensive evaluation, we introduce the Massive Multilingual Text Embedding Benchmark (MMTEB) - a large-scale, community-driven expansion of MTEB, covering over 500 quality-controlled evaluation tasks across 250+ langua… ▽ More Text embeddings are typically evaluated on a limited set of tasks, which are constrained by language, domain, and task diversity. To address these limitations and provide a more comprehensive evaluation, we introduce the Massive Multilingual Text Embedding Benchmark (MMTEB) - a large-scale, community-driven expansion of MTEB, covering over 500 quality-controlled evaluation tasks across 250+ languages. MMTEB includes a diverse set of challenging, novel tasks such as instruction following, long-document retrieval, and code retrieval, representing the largest multilingual collection of evaluation tasks for embedding models to date. Using this collection, we develop several highly multilingual benchmarks, which we use to evaluate a representative set of models. We find that while large language models (LLMs) with billions of parameters can achieve state-of-the-art performance on certain language subsets and task categories, the best-performing publicly available model is multilingual-e5-large-instruct with only 560 million parameters. To facilitate accessibility and reduce computational cost, we introduce a novel downsampling method based on inter-task correlation, ensuring a diverse selection while preserving relative model rankings. Furthermore, we optimize tasks such as retrieval by sampling hard negatives, creating smaller but effective splits. These optimizations allow us to introduce benchmarks that drastically reduce computational demands. For instance, our newly introduced zero-shot English benchmark maintains a ranking order similar to the full-scale version but at a fraction of the computational cost. △ Less

Submitted 8 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

Comments: Accepted for ICLR: https://openreview.net/forum?id=zl3pfz4VCV

arXiv:2502.03359 [pdf, other]

GHOST: Gaussian Hypothesis Open-Set Technique

Authors: Ryan Rabinowitz, Steve Cruz, Manuel Günther, Terrance E. Boult

Abstract: Evaluations of large-scale recognition methods typically focus on overall performance. While this approach is common, it often fails to provide insights into performance across individual classes, which can lead to fairness issues and misrepresentation. Addressing these gaps is crucial for accurately assessing how well methods handle novel or unseen classes and ensuring a fair evaluation. To addre… ▽ More Evaluations of large-scale recognition methods typically focus on overall performance. While this approach is common, it often fails to provide insights into performance across individual classes, which can lead to fairness issues and misrepresentation. Addressing these gaps is crucial for accurately assessing how well methods handle novel or unseen classes and ensuring a fair evaluation. To address fairness in Open-Set Recognition (OSR), we demonstrate that per-class performance can vary dramatically. We introduce Gaussian Hypothesis Open Set Technique (GHOST), a novel hyperparameter-free algorithm that models deep features using class-wise multivariate Gaussian distributions with diagonal covariance matrices. We apply Z-score normalization to logits to mitigate the impact of feature magnitudes that deviate from the model's expectations, thereby reducing the likelihood of the network assigning a high score to an unknown sample. We evaluate GHOST across multiple ImageNet-1K pre-trained deep networks and test it with four different unknown datasets. Using standard metrics such as AUOSCR, AUROC and FPR95, we achieve statistically significant improvements, advancing the state-of-the-art in large-scale OSR. Source code is provided online. △ Less

Submitted 10 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

Comments: Accepted at AAAI Conference on Artificial Intelligence 2025

arXiv:2412.08802 [pdf, other]

jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images

Authors: Andreas Koukounas, Georgios Mastrapas, Sedigheh Eslami, Bo Wang, Mohammad Kalim Akram, Michael Günther, Isabelle Mohr, Saba Sturua, Nan Wang, Han Xiao

Abstract: Contrastive Language-Image Pretraining (CLIP) has been widely used for crossmodal information retrieval and multimodal understanding tasks. However, CLIP models are mainly optimized for crossmodal vision-language tasks and underperform in single-mode text tasks. Moreover, these models are often trained on English datasets and therefore lack multilingual understanding. Additionally, from a visual u… ▽ More Contrastive Language-Image Pretraining (CLIP) has been widely used for crossmodal information retrieval and multimodal understanding tasks. However, CLIP models are mainly optimized for crossmodal vision-language tasks and underperform in single-mode text tasks. Moreover, these models are often trained on English datasets and therefore lack multilingual understanding. Additionally, from a visual understanding perspective, previous CLIP-based models exhibit insufficient understanding of visually rich documents. In this work, we propose jina-clip-v2, a contrastive vision-language model trained on text pairs, triplets and image-text pairs via a multi-task and multi-stage contrastive learning paradigm in order to support both text-only and crossmodal tasks. We employ a multilingual text encoder and expand the training dataset to include multilingual texts from 29 non-English languages, including Hindi, Chinese, German, French, and others, as well as images of visually rich documents. We evaluate the model's performance and show that jina-clip-v2 achieves notable improvements over state-of-the-art CLIP-based models in zero-shot text-only retrieval, semantic textual similarity, and crossmodal retrieval tasks in both English and multilingual settings. jina-clip-v2 also provides for flexibility in embedding dimensionality, enabling users to select the granularity of the representations. jina-clip-v2 is publicly available at https://huggingface.co/jinaai/jina-clip-v2. △ Less

Submitted 24 April, 2025; v1 submitted 11 December, 2024; originally announced December 2024.

Comments: 30 pages, 1-10 main paper, 10-12 refs, 12-30 benchmarks

MSC Class: 68T50 ACM Class: I.2.7; I.2.10

arXiv:2411.18147 [pdf, other]

Online Knowledge Integration for 3D Semantic Mapping: A Survey

Authors: Felix Igelbrink, Marian Renz, Martin Günther, Piper Powell, Lennart Niecksch, Oscar Lima, Martin Atzmueller, Joachim Hertzberg

Abstract: Semantic mapping is a key component of robots operating in and interacting with objects in structured environments. Traditionally, geometric and knowledge representations within a semantic map have only been loosely integrated. However, recent advances in deep learning now allow full integration of prior knowledge, represented as knowledge graphs or language concepts, into sensor data processing a… ▽ More Semantic mapping is a key component of robots operating in and interacting with objects in structured environments. Traditionally, geometric and knowledge representations within a semantic map have only been loosely integrated. However, recent advances in deep learning now allow full integration of prior knowledge, represented as knowledge graphs or language concepts, into sensor data processing and semantic mapping pipelines. Semantic scene graphs and language models enable modern semantic mapping approaches to incorporate graph-based prior knowledge or to leverage the rich information in human language both during and after the mapping process. This has sparked substantial advances in semantic mapping, leading to previously impossible novel applications. This survey reviews these recent developments comprehensively, with a focus on online integration of knowledge into semantic mapping. We specifically focus on methods using semantic scene graphs for integrating symbolic prior knowledge and language models for respective capture of implicit common-sense knowledge and natural language concepts △ Less

Submitted 27 November, 2024; originally announced November 2024.

Comments: Submitted to Robotics and Autonomous Systems

arXiv:2411.11194 [pdf]

Careless Whisper: Exploiting Stealthy End-to-End Leakage in Mobile Instant Messengers

Authors: Gabriel K. Gegenhuber, Maximilian Günther, Markus Maier, Aljosha Judmayer, Florian Holzbauer, Philipp É. Frenzel, Johanna Ullrich

Abstract: With over 3 billion users globally, mobile instant messaging apps have become indispensable for both personal and professional communication. Besides plain messaging, many services implement additional features such as delivery and read receipts informing a user when a message has successfully reached its target. This paper highlights that delivery receipts can pose significant privacy risks to us… ▽ More With over 3 billion users globally, mobile instant messaging apps have become indispensable for both personal and professional communication. Besides plain messaging, many services implement additional features such as delivery and read receipts informing a user when a message has successfully reached its target. This paper highlights that delivery receipts can pose significant privacy risks to users. We use specifically crafted messages that trigger delivery receipts allowing any user to be pinged without their knowledge or consent. By using this technique at high frequency, we demonstrate how an attacker could extract private information such as the online and activity status of a victim, e.g., screen on/off. Moreover, we can infer the number of currently active user devices and their operating system, as well as launch resource exhaustion attacks, such as draining a user's battery or data allowance, all without generating any notification on the target side. Due to the widespread adoption of vulnerable messengers (WhatsApp and Signal) and the fact that any user can be targeted simply by knowing their phone number, we argue for a design change to address this issue. △ Less

Submitted 19 November, 2024; v1 submitted 17 November, 2024; originally announced November 2024.

arXiv:2410.01498 [pdf, other]

Quo Vadis RankList-based System in Face Recognition?

Authors: Xinyi Zhang, Manuel Günther

Abstract: Face recognition in the wild has gained a lot of focus in the last few years, and many face recognition models are designed to verify faces in medium-quality images. Especially due to the availability of large training datasets with similar conditions, deep face recognition models perform exceptionally well in such tasks. However, in other tasks where substantially less training data is available,… ▽ More Face recognition in the wild has gained a lot of focus in the last few years, and many face recognition models are designed to verify faces in medium-quality images. Especially due to the availability of large training datasets with similar conditions, deep face recognition models perform exceptionally well in such tasks. However, in other tasks where substantially less training data is available, such methods struggle, especially when required to compare high-quality enrollment images with low-quality probes. On the other hand, traditional RankList-based methods have been developed that compare faces indirectly by comparing to cohort faces with similar conditions. In this paper, we revisit these RankList methods and extend them to use the logits of the state-of-the-art DaliFace network, instead of an external cohort. We show that through a reasonable Logit-Cohort Selection (LoCoS) the performance of RankList-based functions can be improved drastically. Experiments on two challenging face recognition datasets not only demonstrate the enhanced performance of our proposed method but also set the stage for future advancements in handling diverse image qualities. △ Less

Submitted 2 October, 2024; originally announced October 2024.

Comments: Accepted for presentation at IJCB 2024

arXiv:2409.10173 [pdf, other]

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Authors: Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang, Han Xiao

Abstract: We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classificat… ▽ More We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks. With a default output dimension of 1024, users can flexibly reduce the embedding dimensions to as low as 32 without compromising performance, enabled by Matryoshka Representation Learning. △ Less

Submitted 19 September, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

Comments: 20 pages, pp11-13 references, pp14-20 appendix and experiment tables

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2409.07220 [pdf, other]

Watchlist Challenge: 3rd Open-set Face Detection and Identification

Authors: Furkan Kasım, Terrance E. Boult, Rensso Mora, Bernardo Biesseck, Rafael Ribeiro, Jan Schlueter, Tomáš Repák, Rafael Henrique Vareto, David Menotti, William Robson Schwartz, Manuel Günther

Abstract: In the current landscape of biometrics and surveillance, the ability to accurately recognize faces in uncontrolled settings is paramount. The Watchlist Challenge addresses this critical need by focusing on face detection and open-set identification in real-world surveillance scenarios. This paper presents a comprehensive evaluation of participating algorithms, using the enhanced UnConstrained Coll… ▽ More In the current landscape of biometrics and surveillance, the ability to accurately recognize faces in uncontrolled settings is paramount. The Watchlist Challenge addresses this critical need by focusing on face detection and open-set identification in real-world surveillance scenarios. This paper presents a comprehensive evaluation of participating algorithms, using the enhanced UnConstrained College Students (UCCS) dataset with new evaluation protocols. In total, four participants submitted four face detection and nine open-set face recognition systems. The evaluation demonstrates that while detection capabilities are generally robust, closed-set identification performance varies significantly, with models pre-trained on large-scale datasets showing superior performance. However, open-set scenarios require further improvement, especially at higher true positive identification rates, i.e., lower thresholds. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: Accepted for presentation at IJCB 2024

arXiv:2409.04701 [pdf, other]

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models

Authors: Michael Günther, Isabelle Mohr, Daniel James Williams, Bo Wang, Han Xiao

Abstract: Many use cases require retrieving smaller portions of text, and dense vector-based retrieval systems often perform better with shorter text segments, as the semantics are less likely to be over-compressed in the embeddings. Consequently, practitioners often split text documents into smaller chunks and encode them separately. However, chunk embeddings created in this way can lose contextual informa… ▽ More Many use cases require retrieving smaller portions of text, and dense vector-based retrieval systems often perform better with shorter text segments, as the semantics are less likely to be over-compressed in the embeddings. Consequently, practitioners often split text documents into smaller chunks and encode them separately. However, chunk embeddings created in this way can lose contextual information from surrounding chunks, resulting in sub-optimal representations. In this paper, we introduce a novel method called late chunking, which leverages long context embedding models to first embed all tokens of the long text, with chunking applied after the transformer model and just before mean pooling - hence the term late in its naming. The resulting chunk embeddings capture the full contextual information, leading to superior results across various retrieval tasks. The method is generic enough to be applied to a wide range of long-context embedding models and works without additional training. To further increase the effectiveness of late chunking, we propose a dedicated fine-tuning approach for embedding models. △ Less

Submitted 2 October, 2024; v1 submitted 6 September, 2024; originally announced September 2024.

Comments: 11 pages, 3rd draft

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2409.02629 [pdf, ps, other]

AdvSecureNet: A Python Toolkit for Adversarial Machine Learning

Authors: Melih Catal, Manuel Günther

Abstract: Machine learning models are vulnerable to adversarial attacks. Several tools have been developed to research these vulnerabilities, but they often lack comprehensive features and flexibility. We introduce AdvSecureNet, a PyTorch based toolkit for adversarial machine learning that is the first to natively support multi-GPU setups for attacks, defenses, and evaluation. It is the first toolkit that s… ▽ More Machine learning models are vulnerable to adversarial attacks. Several tools have been developed to research these vulnerabilities, but they often lack comprehensive features and flexibility. We introduce AdvSecureNet, a PyTorch based toolkit for adversarial machine learning that is the first to natively support multi-GPU setups for attacks, defenses, and evaluation. It is the first toolkit that supports both CLI and API interfaces and external YAML configuration files to enhance versatility and reproducibility. The toolkit includes multiple attacks, defenses and evaluation metrics. Rigiorous software engineering practices are followed to ensure high code quality and maintainability. The project is available as an open-source project on GitHub at https://github.com/melihcatal/advsecurenet and installable via PyPI. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2408.16672 [pdf, other]

Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

Authors: Rohan Jha, Bo Wang, Michael Günther, Georgios Mastrapas, Saba Sturua, Isabelle Mohr, Andreas Koukounas, Mohammad Kalim Akram, Nan Wang, Han Xiao

Abstract: Multi-vector dense models, such as ColBERT, have proven highly effective in information retrieval. ColBERT's late interaction scoring approximates the joint query-document attention seen in cross-encoders while maintaining inference efficiency closer to traditional dense retrieval models, thanks to its bi-encoder architecture and recent optimizations in indexing and search. In this work we propose… ▽ More Multi-vector dense models, such as ColBERT, have proven highly effective in information retrieval. ColBERT's late interaction scoring approximates the joint query-document attention seen in cross-encoders while maintaining inference efficiency closer to traditional dense retrieval models, thanks to its bi-encoder architecture and recent optimizations in indexing and search. In this work we propose a number of incremental improvements to the ColBERT model architecture and training pipeline, using methods shown to work in the more mature single-vector embedding model training paradigm, particularly those that apply to heterogeneous multilingual data or boost efficiency with little tradeoff. Our new model, Jina-ColBERT-v2, demonstrates strong performance across a range of English and multilingual retrieval tasks. △ Less

Submitted 14 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

Comments: 8 pages, references at pp7,8; EMNLP workshop submission

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2407.14087 [pdf, other]

Score Normalization for Demographic Fairness in Face Recognition

Authors: Yu Linghu, Tiago de Freitas Pereira, Christophe Ecabert, Sébastien Marcel, Manuel Günther

Abstract: Fair biometric algorithms have similar verification performance across different demographic groups given a single decision threshold. Unfortunately, for state-of-the-art face recognition networks, score distributions differ between demographics. Contrary to work that tries to align those distributions by extra training or fine-tuning, we solely focus on score post-processing methods. As proved, w… ▽ More Fair biometric algorithms have similar verification performance across different demographic groups given a single decision threshold. Unfortunately, for state-of-the-art face recognition networks, score distributions differ between demographics. Contrary to work that tries to align those distributions by extra training or fine-tuning, we solely focus on score post-processing methods. As proved, well-known sample-centered score normalization techniques, Z-norm and T-norm, do not improve fairness for high-security operating points. Thus, we extend the standard Z/T-norm to integrate demographic information in normalization. Additionally, we investigate several possibilities to incorporate cohort similarities for both genuine and impostor pairs per demographic to improve fairness across different operating points. We run experiments on two datasets with different demographics (gender and ethnicity) and show that our techniques generally improve the overall fairness of five state-of-the-art pre-trained face recognition networks, without downgrading verification performance. We also indicate that an equal contribution of False Match Rate (FMR) and False Non-Match Rate (FNMR) in fairness evaluation is required for the highest gains. Code and protocols are available. △ Less

Submitted 22 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

Comments: Accepted for presentation at IJCB 2024

arXiv:2407.14064 [pdf, other]

Refining Tuberculosis Detection in CXR Imaging: Addressing Bias in Deep Neural Networks via Interpretability

Authors: Özgür Acar Güler, Manuel Günther, André Anjos

Abstract: Automatic classification of active tuberculosis from chest X-ray images has the potential to save lives, especially in low- and mid-income countries where skilled human experts can be scarce. Given the lack of available labeled data to train such systems and the unbalanced nature of publicly available datasets, we argue that the reliability of deep learning models is limited, even if they can be s… ▽ More Automatic classification of active tuberculosis from chest X-ray images has the potential to save lives, especially in low- and mid-income countries where skilled human experts can be scarce. Given the lack of available labeled data to train such systems and the unbalanced nature of publicly available datasets, we argue that the reliability of deep learning models is limited, even if they can be shown to obtain perfect classification accuracy on the test data. One way of evaluating the reliability of such systems is to ensure that models use the same regions of input images for predictions as medical experts would. In this paper, we show that pre-training a deep neural network on a large-scale proxy task, as well as using mixed objective optimization network (MOON), a technique to balance different classes during pre-training and fine-tuning, can improve the alignment of decision foundations between models and experts, as compared to a model directly trained on the target dataset. At the same time, these approaches keep perfect classification accuracy according to the area under the receiver operating characteristic curve (AUROC) on the test set, and improve generalization on an independent, unseen dataset. For the purpose of reproducibility, our source code is made available online. △ Less

Submitted 8 October, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

Comments: Preprint of paper presented at EUVIP 2024

arXiv:2406.18726 [pdf, other]

Data-driven identification of port-Hamiltonian DAE systems by Gaussian processes

Authors: Peter Zaspel, Michael Günther

Abstract: Port-Hamiltonian systems (pHS) allow for a structure-preserving modeling of dynamical systems. Coupling pHS via linear relations between input and output defines an overall pHS, which is structure preserving. However, in multiphysics applications, some subsystems do not allow for a physical pHS description, as (a) this is not available or (b) too expensive. Here, data-driven approaches can be used… ▽ More Port-Hamiltonian systems (pHS) allow for a structure-preserving modeling of dynamical systems. Coupling pHS via linear relations between input and output defines an overall pHS, which is structure preserving. However, in multiphysics applications, some subsystems do not allow for a physical pHS description, as (a) this is not available or (b) too expensive. Here, data-driven approaches can be used to deliver a pHS for such subsystems, which can then be coupled to the other subsystems in a structure-preserving way. In this work, we derive a data-driven identification approach for port-Hamiltonian differential algebraic equation (DAE) systems. The approach uses input and state space data to estimate nonlinear effort functions of pH-DAEs. As underlying technique, we us (multi-task) Gaussian processes. This work thereby extends over the current state of the art, in which only port-Hamiltonian ordinary differential equation systems could be identified via Gaussian processes. We apply this approach successfully to two applications from network design and constrained multibody system dynamics, based on pH-DAE system of index one and three, respectively. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.09112 [pdf, other]

Large-Scale Evaluation of Open-Set Image Classification Techniques

Authors: Halil Bisgin, Andres Palechor, Mike Suter, Manuel Günther

Abstract: The goal for classification is to correctly assign labels to unseen samples. However, most methods misclassify samples with unseen labels and assign them to one of the known classes. Open-Set Classification (OSC) algorithms aim to maximize both closed and open-set recognition capabilities. Recent studies showed the utility of such algorithms on small-scale data sets, but limited experimentation ma… ▽ More The goal for classification is to correctly assign labels to unseen samples. However, most methods misclassify samples with unseen labels and assign them to one of the known classes. Open-Set Classification (OSC) algorithms aim to maximize both closed and open-set recognition capabilities. Recent studies showed the utility of such algorithms on small-scale data sets, but limited experimentation makes it difficult to assess their performances in real-world problems. Here, we provide a comprehensive comparison of various OSC algorithms, including training-based (SoftMax, Garbage, EOS) and post-processing methods (Maximum SoftMax Scores, Maximum Logit Scores, OpenMax, EVM, PROSER), the latter are applied on features from the former. We perform our evaluation on three large-scale protocols that mimic real-world challenges, where we train on known and negative open-set samples, and test on known and unknown instances. Our results show that EOS helps to improve performance of almost all post-processing algorithms. Particularly, OpenMax and PROSER are able to exploit better-trained networks, demonstrating the utility of hybrid models. However, while most algorithms work well on negative test samples -- samples of open-set classes seen during training -- they tend to perform poorly when tested on samples of previously unseen unknown classes, especially in challenging conditions. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.20204 [pdf, other]

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Authors: Andreas Koukounas, Georgios Mastrapas, Michael Günther, Bo Wang, Scott Martens, Isabelle Mohr, Saba Sturua, Mohammad Kalim Akram, Joan Fontanals Martínez, Saahil Ognawala, Susana Guzman, Maximilian Werk, Nan Wang, Han Xiao

Abstract: Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by mapping them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval… ▽ More Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by mapping them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval systems that keep separate embeddings and models for text-only and multimodal tasks. We propose a novel, multi-task contrastive training method to address this issue, which we use to train the jina-clip-v1 model to achieve the state-of-the-art performance on both text-image and text-text retrieval tasks. △ Less

Submitted 26 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: 4 pages, MFM-EAI@ICML2024

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2404.18767 [pdf, ps, other]

A Port-Hamiltonian System Perspective on Electromagneto-Quasistatic Field Formulations of Darwin-Type

Authors: Markus Clemens, Marvin-Lucas Henkel, Fotios Kasolis, Michael Günther

Abstract: Electromagneto-quasistatic (EMQS) field formulations are often dubbed as Darwin-type field formulations which approximate the Maxwell equations by neglecting radiation effects while modelling resistive, capacitive, and inductive effects. A common feature of EMQS field models is the Darwin-Ampére equation formulated with the magnetic vector potential and the electric scalar potential. EMQS field fo… ▽ More Electromagneto-quasistatic (EMQS) field formulations are often dubbed as Darwin-type field formulations which approximate the Maxwell equations by neglecting radiation effects while modelling resistive, capacitive, and inductive effects. A common feature of EMQS field models is the Darwin-Ampére equation formulated with the magnetic vector potential and the electric scalar potential. EMQS field formulations yield different approximations to the Maxwell equations by choice of additional gauge equations. These EMQS formulations are analyzed within the port-Hamiltonian system (PHS) framework. It is shown via the PHS compatibility equation that formulations based on the combination of the Darwin-Ampére equation and the full Maxwell continuity equation yield port-Hamiltonian systems implying numerical stability and specific EMQS energy conservation. △ Less

Submitted 11 September, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: 8 pages, 0 figures, pre-submission version (preprint), presented at and submitted to the proceedings of "The 15th International Conference on Scientific Computing in Electrical Engineering" (SCEE 2024), March 4-8, 2024, Darmstadt, Germany

arXiv:2404.09932 [pdf, other]

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (17 additional authors not shown)

Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions. This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions. △ Less

Submitted 5 September, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.14435 [pdf, other]

Biased Binary Attribute Classifiers Ignore the Majority Classes

Authors: Xinyi Zhang, Johanna Sophie Bieri, Manuel Günther

Abstract: To visualize the regions of interest that classifiers base their decisions on, different Class Activation Mapping (CAM) methods have been developed. However, all of these techniques target categorical classifiers only, though most real-world tasks are binary classification. In this paper, we extend gradient-based CAM techniques to work with binary classifiers and visualize the active regions for b… ▽ More To visualize the regions of interest that classifiers base their decisions on, different Class Activation Mapping (CAM) methods have been developed. However, all of these techniques target categorical classifiers only, though most real-world tasks are binary classification. In this paper, we extend gradient-based CAM techniques to work with binary classifiers and visualize the active regions for binary facial attribute classifiers. When training an unbalanced binary classifier on an imbalanced dataset, it is well-known that the majority class, i.e. the class with many training samples, is mostly predicted much better than minority class with few training instances. In our experiments on the CelebA dataset, we verify these results, when training an unbalanced classifier to extract 40 facial attributes simultaneously. One would expect that the biased classifier has learned to extract features mainly for the majority classes and that the proportional energy of the activations mainly reside in certain specific regions of the image where the attribute is located. However, we find very little regular activation for samples of majority classes, while the active regions for minority classes seem mostly reasonable and overlap with our expectations. These results suggest that biased classifiers mainly rely on bias activation for majority classes. When training a balanced classifier on the imbalanced data by employing attribute-specific class weights, majority and minority classes are classified similarly well and show expected activations for almost all attributes △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2402.17016 [pdf, other]

Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings

Authors: Isabelle Mohr, Markus Krimmel, Saba Sturua, Mohammad Kalim Akram, Andreas Koukounas, Michael Günther, Georgios Mastrapas, Vinit Ravishankar, Joan Fontanals Martínez, Feng Wang, Qi Liu, Ziniu Yu, Jie Fu, Saahil Ognawala, Susana Guzman, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao

Abstract: We introduce a novel suite of state-of-the-art bilingual text embedding models that are designed to support English and another target language. These models are capable of processing lengthy text inputs with up to 8192 tokens, making them highly versatile for a range of natural language processing tasks such as text retrieval, clustering, and semantic textual similarity (STS) calculations. By f… ▽ More We introduce a novel suite of state-of-the-art bilingual text embedding models that are designed to support English and another target language. These models are capable of processing lengthy text inputs with up to 8192 tokens, making them highly versatile for a range of natural language processing tasks such as text retrieval, clustering, and semantic textual similarity (STS) calculations. By focusing on bilingual models and introducing a unique multi-task learning objective, we have significantly improved the model performance on STS tasks, which outperforms the capabilities of existing multilingual models in both target language understanding and cross-lingual evaluation tasks. Moreover, our bilingual models are more efficient, requiring fewer parameters and less memory due to their smaller vocabulary needs. Furthermore, we have expanded the Massive Text Embedding Benchmark (MTEB) to include benchmarks for German and Spanish embedding models. This integration aims to stimulate further research and advancement in text embedding technologies for these languages. △ Less

Submitted 26 February, 2024; originally announced February 2024.

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2312.14250 [pdf, other]

HElium: A Language and Compiler for Fully Homomorphic Encryption with Support for Proxy Re-Encryption

Authors: Mirko Günther, Lars Schütze, Kilian Becher, Thorsten Strufe, Jeronimo Castrillon

Abstract: Privacy-preserving analysis of confidential data can increase the value of such data and even improve peoples' lives. Fully homomorphic encryption (FHE) can enable privacy-preserving analysis. However, FHE adds a large amount of computational overhead and its efficient use requires a high level of expertise. Compilers can automate certain aspects such as parameterization and circuit optimizations.… ▽ More Privacy-preserving analysis of confidential data can increase the value of such data and even improve peoples' lives. Fully homomorphic encryption (FHE) can enable privacy-preserving analysis. However, FHE adds a large amount of computational overhead and its efficient use requires a high level of expertise. Compilers can automate certain aspects such as parameterization and circuit optimizations. This in turn makes FHE accessible to non-cryptographers. Yet, multi-party scenarios remain complicated and exclude many promising use cases such as analyses of large amounts of health records for medical research. Proxy re-encryption (PRE), a technique that allows the conversion of data from multiple sources to a joint encryption key, can enable FHE for multi-party scenarios. Today, there are no optimizing compilers for FHE with PRE capabilities. We propose HElium, the first optimizing FHE compiler with native support for proxy re-encryption. HElium features HEDSL, a domain-specific language (DSL) specifically designed for multi-party scenarios. By tracking encryption keys and transforming the computation circuit during compilation, HElium minimizes the number of expensive PRE operations. We evaluate the effectiveness of HElium's optimizations based on the real-world use case of the tumor recurrence rate, a well-known subject of medical research. Our empirical evaluation shows that HElium substantially reduces the overhead introduced through complex PRE operations, an effect that increases for larger amounts of input data. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 11 pages, 8 figures, 1 algorithm

arXiv:2311.00400 [pdf, other]

Open-Set Face Recognition with Maximal Entropy and Objectosphere Loss

Authors: Rafael Henrique Vareto, Yu Linghu, Terrance E. Boult, William Robson Schwartz, Manuel Günther

Abstract: Open-set face recognition characterizes a scenario where unknown individuals, unseen during the training and enrollment stages, appear on operation time. This work concentrates on watchlists, an open-set task that is expected to operate at a low False Positive Identification Rate and generally includes only a few enrollment samples per identity. We introduce a compact adapter network that benefits… ▽ More Open-set face recognition characterizes a scenario where unknown individuals, unseen during the training and enrollment stages, appear on operation time. This work concentrates on watchlists, an open-set task that is expected to operate at a low False Positive Identification Rate and generally includes only a few enrollment samples per identity. We introduce a compact adapter network that benefits from additional negative face images when combined with distinct cost functions, such as Objectosphere Loss (OS) and the proposed Maximal Entropy Loss (MEL). MEL modifies the traditional Cross-Entropy loss in favor of increasing the entropy for negative samples and attaches a penalty to known target classes in pursuance of gallery specialization. The proposed approach adopts pre-trained deep neural networks (DNNs) for face recognition as feature extractors. Then, the adapter network takes deep feature representations and acts as a substitute for the output layer of the pre-trained DNN in exchange for an agile domain adaptation. Promising results have been achieved following open-set protocols for three different datasets: LFW, IJB-C, and UCCS as well as state-of-the-art performance when supplementary negative data is properly selected to fine-tune the adapter network. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: Accepted for publication in Image and Vision Computing 2023

arXiv:2310.19923 [pdf, other]

Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

Authors: Michael Günther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao

Abstract: Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information. While these models are essential for tasks like information retrieval, semantic clustering, and text re-ranking, most existing open-source models, especially those built on architectures like BERT, struggle to represent lengthy documents and often… ▽ More Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information. While these models are essential for tasks like information retrieval, semantic clustering, and text re-ranking, most existing open-source models, especially those built on architectures like BERT, struggle to represent lengthy documents and often resort to truncation. One common approach to mitigate this challenge involves splitting documents into smaller paragraphs for embedding. However, this strategy results in a much larger set of vectors, consequently leading to increased memory consumption and computationally intensive vector searches with elevated latency. To address these challenges, we introduce Jina Embeddings 2, an open-source text embedding model capable of accommodating up to 8192 tokens. This model is designed to transcend the conventional 512-token limit and adeptly process long documents. Jina Embeddings 2 not only achieves state-of-the-art performance on a range of embedding-related tasks in the MTEB benchmark but also matches the performance of OpenAI's proprietary ada-002 model. Additionally, our experiments indicate that an extended context can enhance performance in tasks such as NarrativeQA. △ Less

Submitted 4 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: 14 pages

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2309.13853 [pdf, other]

doi 10.1038/s41467-024-46640-x

A Ferroelectric Compute-in-Memory Annealer for Combinatorial Optimization Problems

Authors: Xunzhao Yin, Yu Qian, Alptekin Vardar, Marcel Gunther, Franz Muller, Nellie Laleni, Zijian Zhao, Zhouhang Jiang, Zhiguo Shi, Yiyu Shi, Xiao Gong, Cheng Zhuo, Thomas Kampfe, Kai Ni

Abstract: Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in developing computing hardware tailored specifically for COP… ▽ More Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in developing computing hardware tailored specifically for COPs, including digital annealers, dynamical Ising machines, and quantum/photonic systems. However, significant hurdles still remain, such as the memory access issue, the system scalability and restricted applicability to certain types of COPs, and VLSI-incompatibility, respectively. Here, a ferroelectric field effect transistor (FeFET) based compute-in-memory (CiM) annealer is proposed. After converting COPs into quadratic unconstrained binary optimization (QUBO) formulations, a hardware-algorithm co-design is conducted, yielding an energy-efficient, versatile, and scalable hardware for COPs. To accelerate the core vector-matrix-vector (VMV) multiplication of QUBO formulations, a FeFET based CiM array is exploited, which can accelerate the intended operation in-situ due to its unique three-terminal structure. In particular, a lossless compression technique is proposed to prune typically sparse QUBO matrix to reduce hardware cost. Furthermore, a multi-epoch simulated annealing (MESA) algorithm is proposed to replace conventional simulated annealing for its faster convergence and better solution quality. The effectiveness of the proposed techniques is validated through the utilization of developed chip prototypes for successfully solving graph coloring problem, indicating great promise of FeFET CiM annealer in solving general COPs. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: 39 pages, 12 figures

arXiv:2308.12371 [pdf, other]

Open-set Face Recognition with Neural Ensemble, Maximal Entropy Loss and Feature Augmentation

Authors: Rafael Henrique Vareto, Manuel Günther, William Robson Schwartz

Abstract: Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of inter… ▽ More Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of interest. As a response, this work introduces a novel method that associates an ensemble of compact neural networks with a margin-based cost function that explores additional samples. Supplementary negative samples can be obtained from external databases or synthetically built at the representation level in training time with a new mix-up feature augmentation approach. Deep neural networks pre-trained on large face datasets serve as the preliminary feature extraction module. We carry out experiments on well-known LFW and IJB-C datasets where results show that the approach is able to boost closed and open-set identification rates. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Journal ref: 36th Conference on Graphics, Patterns and Images (SIBGRAPI 2023)

arXiv:2308.03666 [pdf, other]

Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness

Authors: Shide Du, Zihan Fang, Shiyang Lan, Yanchao Tan, Manuel Günther, Shiping Wang, Wenzhong Guo

Abstract: As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence… ▽ More As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed. △ Less

Submitted 18 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.11224 [pdf, other]

Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

Authors: Michael Günther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, Han Xiao

Abstract: Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating textual inputs into numerical representations, capturing the semantics of the text. These models excel in applications like dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of high-quality pairwise and triplet dataset… ▽ More Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating textual inputs into numerical representations, capturing the semantics of the text. These models excel in applications like dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of high-quality pairwise and triplet datasets. It underlines the crucial role of data cleaning in dataset preparation, offers in-depth insights into the model training process, and concludes with a comprehensive performance evaluation using the Massive Text Embedding Benchmark (MTEB). Furthermore, to increase the model's awareness of grammatical negation, we construct a novel training and evaluation dataset of negated and non-negated statements, which we make publicly available to the community. △ Less

Submitted 20 October, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: 9 pages, 2 page appendix

MSC Class: 68T50 ACM Class: H.3.1; H.3.3; I.2.7; I.5.4

arXiv:2210.07356 [pdf, other]

Consistency and Accuracy of CelebA Attribute Values

Authors: Haiyu Wu, Grace Bezold, Manuel Günther, Terrance Boult, Michael C. King, Kevin W. Bowyer

Abstract: We report the first systematic analysis of the experimental foundations of facial attribute classification. Two annotators independently assigning attribute values shows that only 12 of 40 common attributes are assigned values with >= 95% consistency, and three (high cheekbones, pointed nose, oval face) have essentially random consistency. Of 5,068 duplicate face appearances in CelebA, attributes… ▽ More We report the first systematic analysis of the experimental foundations of facial attribute classification. Two annotators independently assigning attribute values shows that only 12 of 40 common attributes are assigned values with >= 95% consistency, and three (high cheekbones, pointed nose, oval face) have essentially random consistency. Of 5,068 duplicate face appearances in CelebA, attributes have contradicting values on from 10 to 860 of the 5,068 duplicates. Manual audit of a subset of CelebA estimates error rates as high as 40% for (no beard=false), even though the labeling consistency experiment indicates that no beard could be assigned with >= 95% consistency. Selecting the mouth slightly open (MSO) for deeper analysis, we estimate the error rate for (MSO=true) at about 20% and (MSO=false) at about 2%. A corrected version of the MSO attribute values enables learning a model that achieves higher accuracy than previously reported for MSO. Corrected values for CelebA MSO are available at https://github.com/HaiyuWu/CelebAMSO. △ Less

Submitted 16 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

arXiv:2210.06789 [pdf, other]

Large-Scale Open-Set Classification Protocols for ImageNet

Authors: Andres Palechor, Annesha Bhoumik, Manuel Günther

Abstract: Open-Set Classification (OSC) intends to adapt closed-set classification models to real-world scenarios, where the classifier must correctly label samples of known classes while rejecting previously unseen unknown samples. Only recently, research started to investigate on algorithms that are able to handle these unknown samples correctly. Some of these approaches address OSC by including into the… ▽ More Open-Set Classification (OSC) intends to adapt closed-set classification models to real-world scenarios, where the classifier must correctly label samples of known classes while rejecting previously unseen unknown samples. Only recently, research started to investigate on algorithms that are able to handle these unknown samples correctly. Some of these approaches address OSC by including into the training set negative samples that a classifier learns to reject, expecting that these data increase the robustness of the classifier on unknown classes. Most of these approaches are evaluated on small-scale and low-resolution image datasets like MNIST, SVHN or CIFAR, which makes it difficult to assess their applicability to the real world, and to compare them among each other. We propose three open-set protocols that provide rich datasets of natural images with different levels of similarity between known and unknown classes. The protocols consist of subsets of ImageNet classes selected to provide training and testing data closer to real-world scenarios. Additionally, we propose a new validation metric that can be employed to assess whether the training of deep learning models addresses both the classification of known samples and the rejection of unknown samples. We use the protocols to compare the performance of two baseline open-set algorithms to the standard SoftMax baseline and find that the algorithms work well on negative samples that have been seen during training, and partially on out-of-distribution detection tasks, but drop performance in the presence of samples from previously unseen unknown classes. △ Less

Submitted 18 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: This is a pre-print of the original paper accepted at the Winter Conference on Applications of Computer Vision (WACV) 2023

arXiv:2209.01473 [pdf, other]

Model-based Analysis and Specification of Functional Requirements and Tests for Complex Automotive Systems

Authors: Carsten Wiecher, Constantin Mandel, Matthias Günther, Jannik Fischbach, Joel Greenyer, Matthias Greinert, Carsten Wolff, Roman Dumitrescu, Daniel Mendez, Albert Albers

Abstract: The specification of requirements and tests are crucial activities in automotive development projects. However, due to the increasing complexity of automotive systems, practitioners fail to specify requirements and tests for distributed and evolving systems with complex interactions when following traditional development processes. To address this research gap, we propose a technique that starts w… ▽ More The specification of requirements and tests are crucial activities in automotive development projects. However, due to the increasing complexity of automotive systems, practitioners fail to specify requirements and tests for distributed and evolving systems with complex interactions when following traditional development processes. To address this research gap, we propose a technique that starts with the early identification of validation concerns from a stakeholder perspective, which we use to systematically design tests that drive a scenario-based modeling and analysis of system requirements. To ensure complete and consistent requirements and test specifications in a form that is required in automotive development projects, we develop a Model-Based Systems Engineering (MBSE) methodology. This methodology supports system architects and test designers in the collaborative application of our technique and in maintaining a central system model, in order to automatically derive the required specifications. We evaluate our methodology by applying it at KOSTAL (Tier1 supplier) and within student projects as part of the masters program Embedded Systems Engineering. Our study corroborates that our methodology is applicable and improves existing requirements and test specification processes by supporting the integrated and stakeholder-focused modeling of product and validation systems, where the early definition of stakeholder and validation concerns fosters a problem-oriented, iterative and test-driven requirements modeling. △ Less

Submitted 15 November, 2023; v1 submitted 3 September, 2022; originally announced September 2022.

arXiv:2208.04040 [pdf, other]

Eight Years of Face Recognition Research: Reproducibility, Achievements and Open Issues

Authors: Tiago de Freitas Pereira, Dominic Schmidli, Yu Linghu, Xinyi Zhang, Sébastien Marcel, Manuel Günther

Abstract: Automatic face recognition is a research area with high popularity. Many different face recognition algorithms have been proposed in the last thirty years of intensive research in the field. With the popularity of deep learning and its capability to solve a huge variety of different problems, face recognition researchers have concentrated effort on creating better models under this paradigm. From… ▽ More Automatic face recognition is a research area with high popularity. Many different face recognition algorithms have been proposed in the last thirty years of intensive research in the field. With the popularity of deep learning and its capability to solve a huge variety of different problems, face recognition researchers have concentrated effort on creating better models under this paradigm. From the year 2015, state-of-the-art face recognition has been rooted in deep learning models. Despite the availability of large-scale and diverse datasets for evaluating the performance of face recognition algorithms, many of the modern datasets just combine different factors that influence face recognition, such as face pose, occlusion, illumination, facial expression and image quality. When algorithms produce errors on these datasets, it is not clear which of the factors has caused this error and, hence, there is no guidance in which direction more research is required. This work is a followup from our previous works developed in 2014 and eventually published in 2016, showing the impact of various facial aspects on face recognition algorithms. By comparing the current state-of-the-art with the best systems from the past, we demonstrate that faces under strong occlusions, some types of illumination, and strong expressions are problems mastered by deep learning algorithms, whereas recognition with low-resolution images, extreme pose variations, and open-set recognition is still an open problem. To show this, we run a sequence of experiments using six different datasets and five different face recognition algorithms in an open-source and reproducible manner. We provide the source code to run all of our experiments, which is easily extensible so that utilizing your own deep network in our evaluation is just a few minutes away. △ Less

Submitted 9 August, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

arXiv:2205.07985 [pdf]

Expert Systems with Logic#. A Novel Modeling Framework for Logic Programming in an Object-Oriented Context of C#

Authors: F. Lorenz, M. Günther

Abstract: We present a novel approach how logic programming for expert systems can be declared directly in an object-oriented language. We present a novel approach how logic programming for expert systems can be declared directly in an object-oriented language. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: 23 pages, 4 figures, 4 tables, 7 appendices

ACM Class: I.2.1; I.2.5; D.1.6

arXiv:2204.06286 [pdf, other]

Electromagnetic Quasistatic Field Formulations of Darwin Type

Authors: Markus Clemens, Marvin-Lucas Henkel, Fotios Kasolis, Michael Günther, Herbert De Gersem, Sebastian Schöps

Abstract: Electromagnetic quasistatic (EMQS) fields, where radiation effects are neglected, while Ohmic losses and electric and magnetic field energies are considered, can be modeled using Darwin-type field models as an approximation to the full Maxwell equations. Commonly formulated in terms of magnetic vector and electric scalar potentials, these EMQS formulations are not gauge invariant. Several EMQS for… ▽ More Electromagnetic quasistatic (EMQS) fields, where radiation effects are neglected, while Ohmic losses and electric and magnetic field energies are considered, can be modeled using Darwin-type field models as an approximation to the full Maxwell equations. Commonly formulated in terms of magnetic vector and electric scalar potentials, these EMQS formulations are not gauge invariant. Several EMQS formulations resulting from different gauge equations are considered and analyzed in terms of their structural properties and their modeling capabilities and limitations. Associated discrete field formulations in the context of the Maxwell-grid equations of the Finite Integration Technique are considered in frequency and time domain and are studied with respect to their algebraic properties. A comparison of numerical simulation results w.r.t. reference solutions obtained with established formulations for the full Maxwell equations are presented. △ Less

Submitted 13 April, 2022; originally announced April 2022.

MSC Class: 78A30; 78M12 ACM Class: I.6; J.6

Journal ref: ICS Newsletter, vol. 29, no. 1, pages 3-9, ISSN 1026-0854, 2022

arXiv:2201.09946 [pdf, other]

Microphone Utility Estimation in Acoustic Sensor Networks using Single-Channel Signal Features

Authors: Michael Günther, Andreas Brendel, Walter Kellermann

Abstract: In multichannel signal processing with distributed sensors, choosing the optimal subset of observed sensor signals to be exploited is crucial in order to maximize algorithmic performance and reduce computational load, ideally both at the same time. In the acoustic domain, signal cross-correlation is a natural choice to quantify the usefulness of microphone signals, i.e., microphone utility, for ar… ▽ More In multichannel signal processing with distributed sensors, choosing the optimal subset of observed sensor signals to be exploited is crucial in order to maximize algorithmic performance and reduce computational load, ideally both at the same time. In the acoustic domain, signal cross-correlation is a natural choice to quantify the usefulness of microphone signals, i.e., microphone utility, for array processing, but its estimation requires that the uncoded signals are synchronized and transmitted between nodes. In resource-constrained environments like acoustic sensor networks, low data transmission rates often make transmission of all observed signals to the centralized location infeasible, thus discouraging direct estimation of signal cross-correlation. Instead, we employ characteristic features of the recorded signals to estimate the usefulness of individual microphone signals. In this contribution, we provide a comprehensive analysis of model-based microphone utility estimation approaches that use signal features and, as an alternative, also propose machine learning-based estimation methods that identify optimal sensor signal utility features. The performance of both approaches is validated experimentally using both simulated and recorded acoustic data, comprising a variety of realistic and practically relevant acoustic scenarios including moving and static sources. △ Less

Submitted 14 January, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

Comments: submitted to EURASIP Journal on Audio, Speech, and Music Processing

arXiv:2109.07011 [pdf, other]

doi 10.3847/2041-8213/ac4b5e

Testing Self-Organized Criticality Across the Main Sequence using Stellar Flares from TESS

Authors: Adina D. Feinstein, Darryl Z. Seligman, Maximilian N. Günther, Fred C. Adams

Abstract: Self-organized criticality describes a class of dynamical systems that maintain themselves in an attractor state with no intrinsic length or time scale. Fundamentally, this theoretical construct requires a mechanism for instability that may trigger additional instabilities locally via dissipative processes. This concept has been invoked to explain nonlinear dynamical phenomena such as featureless… ▽ More Self-organized criticality describes a class of dynamical systems that maintain themselves in an attractor state with no intrinsic length or time scale. Fundamentally, this theoretical construct requires a mechanism for instability that may trigger additional instabilities locally via dissipative processes. This concept has been invoked to explain nonlinear dynamical phenomena such as featureless energy spectra that have been observed empirically for earthquakes, avalanches, and solar flares. If this interpretation proves correct, it implies that the solar coronal magnetic field maintains itself in a critical state via a delicate balance between the dynamo-driven injection of magnetic energy and the release of that energy via flaring events. All-sky high-cadence surveys like the Transiting Exoplanet Survey Satellite (TESS) provide the necessary data to compare the energy distribution of flaring events in stars of different spectral types to that observed in the Sun. We identified $\sim 10^6$ flaring events on $\sim 10^5$ stars observed by TESS at 2-minute cadence. By fitting the flare frequency distribution for different mass bins, we find that all main sequence stars exhibit distributions of flaring events similar to that observed in the Sun, independent of their mass or age. This may suggest that stars universally maintain a critical state in their coronal topologies via magnetic reconnection events. If this interpretation proves correct, we may be able to infer properties of magnetic fields, interior structure, and dynamo mechanisms for stars that are otherwise unresolved point sources. △ Less

Submitted 12 January, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

Comments: 6 pages, 3 figures, Accepted to ApJL

arXiv:2004.11657 [pdf, other]

doi 10.1109/ICRA40945.2020.9197426

YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation

Authors: Till Grenzdörffer, Martin Günther, Joachim Hertzberg

Abstract: While a great variety of 3D cameras have been introduced in recent years, most publicly available datasets for object recognition and pose estimation focus on one single camera. In this work, we present a dataset of 32 scenes that have been captured by 7 different 3D cameras, totaling 49,294 frames. This allows evaluating the sensitivity of pose estimation algorithms to the specifics of the used c… ▽ More While a great variety of 3D cameras have been introduced in recent years, most publicly available datasets for object recognition and pose estimation focus on one single camera. In this work, we present a dataset of 32 scenes that have been captured by 7 different 3D cameras, totaling 49,294 frames. This allows evaluating the sensitivity of pose estimation algorithms to the specifics of the used camera and the development of more robust algorithms that are more independent of the camera model. Vice versa, our dataset enables researchers to perform a quantitative comparison of the data from several different cameras and depth sensing technologies and evaluate their algorithms before selecting a camera for their specific task. The scenes in our dataset contain 20 different objects from the common benchmark YCB object and model set [1], [2]. We provide full ground truth 6DoF poses for each object, per-pixel segmentation, 2D and 3D bounding boxes and a measure of the amount of occlusion of each object. We have also performed an initial evaluation of the cameras using our dataset on a state-of-the-art object recognition and pose estimation system [3]. △ Less

Submitted 29 September, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: Published at ICRA-2020

arXiv:2004.00517 [pdf, ps, other]

Tracing Contacts to Control the COVID-19 Pandemic

Authors: Christoph Günther, Michael Günther, Daniel Günther

Abstract: The control of the COVID-19 pandemic requires a considerable reduction of contacts mostly achieved by imposing movement control up to the level of enforced quarantine. This has lead to a collapse of substantial parts of the economy. Carriers of the disease are infectious roughly 3 days after exposure to the virus. First symptoms occur later or not at all. As a consequence tracing the contacts of p… ▽ More The control of the COVID-19 pandemic requires a considerable reduction of contacts mostly achieved by imposing movement control up to the level of enforced quarantine. This has lead to a collapse of substantial parts of the economy. Carriers of the disease are infectious roughly 3 days after exposure to the virus. First symptoms occur later or not at all. As a consequence tracing the contacts of people identified as carriers is essential for controlling the pandemic. This tracing must work everywhere, in particular indoors, where people are closest to each other. Furthermore, it should respect people's privacy. The present paper presents a method to enable a thorough traceability with very little risk on privacy. In our opinion, the latter capabilities are necessary to control the pandemic during a future relaunch of our economy. △ Less

Submitted 1 April, 2020; originally announced April 2020.

Comments: 5 pages, no figures

arXiv:2002.08672 [pdf, other]

GivEn -- Shape Optimization for Gas Turbines in Volatile Energy Networks

Authors: Jan Backhaus, Matthias Bolten, Onur Tanil Doganay, Matthias Ehrhardt, Benedikt Engel, Christian Frey, Hanno Gottschalk, Michael Günther, Camilla Hahn, Jens Jäschke, Peter Jaksch, Kathrin Klamroth, Alexander Liefke, Daniel Luft, Lucas Mäde, Vincent Marciniak, Marco Reese, Johanna Schultes, Volker Schulz, Sebastian Schmitz, Johannes Steiner, Michael Stiglmayr

Abstract: This paper describes the project GivEn that develops a novel multicriteria optimization process for gas turbine blades and vanes using modern "adjoint" shape optimization algorithms. Given the many start and shut-down processes of gas power plants in volatile energy grids, besides optimizing gas turbine geometries for efficiency, the durability understood as minimization of the probability of fail… ▽ More This paper describes the project GivEn that develops a novel multicriteria optimization process for gas turbine blades and vanes using modern "adjoint" shape optimization algorithms. Given the many start and shut-down processes of gas power plants in volatile energy grids, besides optimizing gas turbine geometries for efficiency, the durability understood as minimization of the probability of failure is a design objective of increasing importance. We also describe the underlying coupling structure of the multiphysical simulations and use modern, gradient based multicriteria optimization procedures to enhance the exploration of Pareto-optimal solutions. △ Less

Submitted 20 February, 2020; originally announced February 2020.

ACM Class: G.1.6; G.3; G.1.8

arXiv:1911.12674 [pdf, other]

RETRO: Relation Retrofitting For In-Database Machine Learning on Textual Data

Authors: Michael Günther, Maik Thiele, Wolfgang Lehner

Abstract: There are massive amounts of textual data residing in databases, valuable for many machine learning (ML) tasks. Since ML techniques depend on numerical input representations, word embeddings are increasingly utilized to convert symbolic representations such as text into meaningful numbers. However, a naive one-to-one mapping of each word in a database to a word embedding vector is not sufficient a… ▽ More There are massive amounts of textual data residing in databases, valuable for many machine learning (ML) tasks. Since ML techniques depend on numerical input representations, word embeddings are increasingly utilized to convert symbolic representations such as text into meaningful numbers. However, a naive one-to-one mapping of each word in a database to a word embedding vector is not sufficient and would lead to poor accuracies in ML tasks. Thus, we argue to additionally incorporate the information given by the database schema into the embedding, e.g. which words appear in the same column or are related to each other. In this paper, we propose RETRO (RElational reTROfitting), a novel approach to learn numerical representations of text values in databases, capturing the best of both worlds, the rich information encoded by word embeddings and the relational information encoded by database tables. We formulate relation retrofitting as a learning problem and present an efficient algorithm solving it. We investigate the impact of various hyperparameters on the learning problem and derive good settings for all of them. Our evaluation shows that the proposed embeddings are ready-to-use for many ML tasks such as classification and regression and even outperform state-of-the-art techniques in integration tasks such as null value imputation and link prediction. △ Less

Submitted 22 January, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

Comments: 14 pages

MSC Class: H.2.8; H.3.3; I.2.7 ACM Class: H.2.8; H.3.3; I.2.7

arXiv:1909.02871 [pdf, other]

Galois Field Arithmetics for Linear Network Coding using AVX512 Instruction Set Extensions

Authors: Stephan M. Günther, Nicolas Appel, Georg Carle

Abstract: Linear network coding requires arithmetic operations over Galois fields, more specifically over finite extension fields. While coding over GF(2) reduces to simple XOR operations, this field is less preferred for practical applications of random linear network coding due to high chances of linear dependencies and therefore redundant coded packets. Coding over larger fields such as GF(16) and GF(256… ▽ More Linear network coding requires arithmetic operations over Galois fields, more specifically over finite extension fields. While coding over GF(2) reduces to simple XOR operations, this field is less preferred for practical applications of random linear network coding due to high chances of linear dependencies and therefore redundant coded packets. Coding over larger fields such as GF(16) and GF(256) does not have that issue, but is significantly slower. SIMD vector extensions of processors such as AVX2 on x86-based systems or NEON on ARM-based devices offer the potential to increase performance by orders of magnitude. In this paper we present an implementation of different algorithms and Galois fields based on the AVX512 instruction set extension and integrate it into the finite field library libmoepgf. We compare the performance of the new implementation to the reference implementation based on AVX2, showing a significant increase in throughput. In addition, we provide a survey of the best possible coding performance offered by a variety of different platforms. △ Less

Submitted 4 September, 2019; originally announced September 2019.

Comments: 6 pages, 2 figures, the updated finite field library is available under the LGPL at https://moep80211.net/plink/libmoepgf-avx512

arXiv:1908.10950 [pdf, other]

doi 10.1016/j.cpc.2020.107192

Constrained Hybrid Monte Carlo algorithms for gauge-Higgs models

Authors: Michael Günther, Roman Höllwieser, Francesco Knechtli

Abstract: We develop Hybrid Monte Carlo (HMC) algorithms for constrained Hamiltonian systems of gauge- Higgs models and introduce a new observable for the constraint effective Higgs potential. We use an extension of the so-called Rattle algorithm to general Hamiltonians for constrained systems, which we adapt to the 4D Abelian-Higgs model and the 5D SU(2) gauge theory on the torus and on the orbifold. The d… ▽ More We develop Hybrid Monte Carlo (HMC) algorithms for constrained Hamiltonian systems of gauge- Higgs models and introduce a new observable for the constraint effective Higgs potential. We use an extension of the so-called Rattle algorithm to general Hamiltonians for constrained systems, which we adapt to the 4D Abelian-Higgs model and the 5D SU(2) gauge theory on the torus and on the orbifold. The derivative of the potential is measured via the expectation value of the Lagrange multiplier for the constraint condition and allows a much more precise determination of the effective potential than conventional histogram methods. With the new method, we can access the potential over the full domain of the Higgs variable, while the histogram method is restricted to a short region around the expectation value of the Higgs field in unconstrained simulations, and the statistical precision does not deteriorate when the volume is increased. We further verify our results by comparing to the one-loop Higgs potential of the 4D Abelian-Higgs model in unitary gauge and find good agreement. To our knowledge, this is the first time this problem has been addressed for theories with gauge fields. The algorithm can also be used in four dimensions to study finite temperature and density transitions via effective Polyakov loop actions. △ Less

Submitted 28 January, 2020; v1 submitted 7 August, 2019; originally announced August 2019.

Comments: added comparison to one-loop potential in section 3.3, improved text; version accepted for publication in Computer Physics Communications

arXiv:1811.04110 [pdf, other]

Reducing Network Agnostophobia

Authors: Akshay Raj Dhamija, Manuel Günther, Terrance E. Boult

Abstract: Agnostophobia, the fear of the unknown, can be experienced by deep learning engineers while applying their networks to real-world applications. Unfortunately, network behavior is not well defined for inputs far from a networks training set. In an uncontrolled environment, networks face many instances that are not of interest to them and have to be rejected in order to avoid a false positive. This… ▽ More Agnostophobia, the fear of the unknown, can be experienced by deep learning engineers while applying their networks to real-world applications. Unfortunately, network behavior is not well defined for inputs far from a networks training set. In an uncontrolled environment, networks face many instances that are not of interest to them and have to be rejected in order to avoid a false positive. This problem has previously been tackled by researchers by either a) thresholding softmax, which by construction cannot return "none of the known classes", or b) using an additional background or garbage class. In this paper, we show that both of these approaches help, but are generally insufficient when previously unseen classes are encountered. We also introduce a new evaluation metric that focuses on comparing the performance of multiple approaches in scenarios where such unseen classes or unknowns are encountered. Our major contributions are simple yet effective Entropic Open-Set and Objectosphere losses that train networks using negative samples from some classes. These novel losses are designed to maximize entropy for unknown inputs while increasing separation in deep feature space by modifying magnitudes of known and unknown samples. Experiments on networks trained to classify classes from MNIST and CIFAR-10 show that our novel loss functions are significantly better at dealing with unknown inputs from datasets such as Devanagari, NotMNIST, CIFAR-100, and SVHN. △ Less

Submitted 22 December, 2018; v1 submitted 9 November, 2018; originally announced November 2018.

Comments: Neural Information Processing Systems (NeurIPS) 2018

arXiv:1801.02480 [pdf, other]

doi 10.1016/j.patrec.2017.10.024

Facial Attributes: Accuracy and Adversarial Robustness

Authors: Andras Rozsa, Manuel Günther, Ethan M. Rudd, Terrance E. Boult

Abstract: Facial attributes, emerging soft biometrics, must be automatically and reliably extracted from images in order to be usable in stand-alone systems. While recent methods extract facial attributes using deep neural networks (DNNs) trained on labeled facial attribute data, the robustness of deep attribute representations has not been evaluated. In this paper, we examine the representational stability… ▽ More Facial attributes, emerging soft biometrics, must be automatically and reliably extracted from images in order to be usable in stand-alone systems. While recent methods extract facial attributes using deep neural networks (DNNs) trained on labeled facial attribute data, the robustness of deep attribute representations has not been evaluated. In this paper, we examine the representational stability of several approaches that recently advanced the state of the art on the CelebA benchmark by generating adversarial examples formed by adding small, non-random perturbations to inputs yielding altered classifications. We show that our fast flipping attribute (FFA) technique generates more adversarial examples than traditional algorithms, and that the adversarial robustness of DNNs varies highly between facial attributes. We also test the correlation of facial attributes and find that only for related attributes do the formed adversarial perturbations change the classification of others. Finally, we introduce the concept of natural adversarial samples, i.e., misclassified images where predictions can be corrected via small perturbations. We demonstrate that natural adversarial samples commonly occur and show that many of these images remain misclassified even with additional training epochs, even though their correct classification may require only a small adjustment to network parameters. △ Less

Submitted 20 April, 2018; v1 submitted 3 January, 2018; originally announced January 2018.

Comments: arXiv admin note: text overlap with arXiv:1605.05411

Journal ref: Pattern Recognition Letters, 2017, ISSN 0167-8655

arXiv:1708.02337 [pdf, other]

doi 10.1109/BTAS.2017.8272759

Unconstrained Face Detection and Open-Set Face Recognition Challenge

Authors: Manuel Günther, Peiyun Hu, Christian Herrmann, Chi Ho Chan, Min Jiang, Shufan Yang, Akshay Raj Dhamija, Deva Ramanan, Jürgen Beyerer, Josef Kittler, Mohamad Al Jazaery, Mohammad Iqbal Nouyed, Guodong Guo, Cezary Stankiewicz, Terrance E. Boult

Abstract: Face detection and recognition benchmarks have shifted toward more difficult environments. The challenge presented in this paper addresses the next step in the direction of automatic detection and identification of people from outdoor surveillance cameras. While face detection has shown remarkable success in images collected from the web, surveillance cameras include more diverse occlusions, poses… ▽ More Face detection and recognition benchmarks have shifted toward more difficult environments. The challenge presented in this paper addresses the next step in the direction of automatic detection and identification of people from outdoor surveillance cameras. While face detection has shown remarkable success in images collected from the web, surveillance cameras include more diverse occlusions, poses, weather conditions and image blur. Although face verification or closed-set face identification have surpassed human capabilities on some datasets, open-set identification is much more complex as it needs to reject both unknown identities and false accepts from the face detector. We show that unconstrained face detection can approach high detection rates albeit with moderate false accept rates. By contrast, open-set face recognition is currently weak and requires much more attention. △ Less

Submitted 25 September, 2018; v1 submitted 7 August, 2017; originally announced August 2017.

Comments: This is an ERRATA version of the paper originally presented at the International Joint Conference on Biometrics. Due to a bug in our evaluation code, the results of the participants changed. The final conclusion, however, is still the same

arXiv:1708.01697 [pdf, other]

Adversarial Robustness: Softmax versus Openmax

Authors: Andras Rozsa, Manuel Günther, Terrance E. Boult

Abstract: Deep neural networks (DNNs) provide state-of-the-art results on various tasks and are widely used in real world applications. However, it was discovered that machine learning models, including the best performing DNNs, suffer from a fundamental problem: they can unexpectedly and confidently misclassify examples formed by slightly perturbing otherwise correctly recognized inputs. Various approaches… ▽ More Deep neural networks (DNNs) provide state-of-the-art results on various tasks and are widely used in real world applications. However, it was discovered that machine learning models, including the best performing DNNs, suffer from a fundamental problem: they can unexpectedly and confidently misclassify examples formed by slightly perturbing otherwise correctly recognized inputs. Various approaches have been developed for efficiently generating these so-called adversarial examples, but those mostly rely on ascending the gradient of loss. In this paper, we introduce the novel logits optimized targeting system (LOTS) to directly manipulate deep features captured at the penultimate layer. Using LOTS, we analyze and compare the adversarial robustness of DNNs using the traditional Softmax layer with Openmax, which was designed to provide open set recognition by defining classes derived from deep representations, and is claimed to be more robust to adversarial perturbations. We demonstrate that Openmax provides less vulnerable systems than Softmax to traditional attacks, however, we show that it can be equally susceptible to more sophisticated adversarial generation techniques that directly work on deep representations. △ Less

Submitted 4 August, 2017; originally announced August 2017.

Comments: Accepted to British Machine Vision Conference (BMVC) 2017

arXiv:1705.01567 [pdf, other]

Toward Open-Set Face Recognition

Authors: Manuel Günther, Steve Cruz, Ethan M. Rudd, Terrance E. Boult

Abstract: Much research has been conducted on both face identification and face verification, with greater focus on the latter. Research on face identification has mostly focused on using closed-set protocols, which assume that all probe images used in evaluation contain identities of subjects that are enrolled in the gallery. Real systems, however, where only a fraction of probe sample identities are enrol… ▽ More Much research has been conducted on both face identification and face verification, with greater focus on the latter. Research on face identification has mostly focused on using closed-set protocols, which assume that all probe images used in evaluation contain identities of subjects that are enrolled in the gallery. Real systems, however, where only a fraction of probe sample identities are enrolled in the gallery, cannot make this closed-set assumption. Instead, they must assume an open set of probe samples and be able to reject/ignore those that correspond to unknown identities. In this paper, we address the widespread misconception that thresholding verification-like scores is a good way to solve the open-set face identification problem, by formulating an open-set face identification protocol and evaluating different strategies for assessing similarity. Our open-set identification protocol is based on the canonical labeled faces in the wild (LFW) dataset. Additionally to the known identities, we introduce the concepts of known unknowns (known, but uninteresting persons) and unknown unknowns (people never seen before) to the biometric community. We compare three algorithms for assessing similarity in a deep feature space under an open-set protocol: thresholded verification-like scores, linear discriminant analysis (LDA) scores, and an extreme value machine (EVM) probabilities. Our findings suggest that thresholding EVM probabilities, which are open-set by design, outperforms thresholding verification-like scores. △ Less

Submitted 18 May, 2017; v1 submitted 3 May, 2017; originally announced May 2017.

Comments: Accepted for Publication in CVPR 2017 Biometrics Workshop

arXiv:1612.00138 [pdf, other]

Towards Robust Deep Neural Networks with BANG

Authors: Andras Rozsa, Manuel Gunther, Terrance E. Boult

Abstract: Machine learning models, including state-of-the-art deep neural networks, are vulnerable to small perturbations that cause unexpected classification errors. This unexpected lack of robustness raises fundamental questions about their generalization properties and poses a serious concern for practical deployments. As such perturbations can remain imperceptible - the formed adversarial examples demon… ▽ More Machine learning models, including state-of-the-art deep neural networks, are vulnerable to small perturbations that cause unexpected classification errors. This unexpected lack of robustness raises fundamental questions about their generalization properties and poses a serious concern for practical deployments. As such perturbations can remain imperceptible - the formed adversarial examples demonstrate an inherent inconsistency between vulnerable machine learning models and human perception - some prior work casts this problem as a security issue. Despite the significance of the discovered instabilities and ensuing research, their cause is not well understood and no effective method has been developed to address the problem. In this paper, we present a novel theory to explain why this unpleasant phenomenon exists in deep neural networks. Based on that theory, we introduce a simple, efficient, and effective training approach, Batch Adjusted Network Gradients (BANG), which significantly improves the robustness of machine learning models. While the BANG technique does not rely on any form of data augmentation or the utilization of adversarial images for training, the resultant classifiers are more resistant to adversarial perturbations while maintaining or even enhancing the overall classification performance. △ Less

Submitted 30 January, 2018; v1 submitted 30 November, 2016; originally announced December 2016.

Comments: Accepted to the IEEE Winter Conference on Applications of Computer Vision (WACV), 2018

Showing 1–50 of 57 results for author: Guenther, M