Skip to main content

Showing 1–50 of 57 results for author: Guenther, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.18902  [pdf, ps, other

    cs.AI cs.CL cs.IR

    jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval

    Authors: Michael Günther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Bo Wang, Sedigheh Eslami, Scott Martens, Maximilian Werk, Nan Wang, Han Xiao

    Abstract: We introduce jina-embeddings-v4, a 3.8 billion parameter multimodal embedding model that unifies text and image representations through a novel architecture supporting both single-vector and multi-vector embeddings in the late interaction style. The model incorporates task-specific Low-Rank Adaptation (LoRA) adapters to optimize performance across diverse retrieval scenarios, including query-docum… ▽ More

    Submitted 24 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: 22 pages, 1-10 main, 14-22 experimental results, benchmark tables

    MSC Class: 68T50 ACM Class: I.2.7

  2. arXiv:2504.07323  [pdf, ps, other

    cs.CR cs.NI

    Prekey Pogo: Investigating Security and Privacy Issues in WhatsApp's Handshake Mechanism

    Authors: Gabriel K. Gegenhuber, Philipp É. Frenzel, Maximilian Günther, Aljosha Judmayer

    Abstract: WhatsApp, the world's largest messaging application, uses a version of the Signal protocol to provide end-to-end encryption (E2EE) with strong security guarantees, including Perfect Forward Secrecy (PFS). To ensure PFS right from the start of a new conversation -- even when the recipient is offline -- a stash of ephemeral (one-time) prekeys must be stored on a server. While the critical role of th… ▽ More

    Submitted 16 June, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: USENIX WOOT Conference 2025

  3. arXiv:2503.18479  [pdf, other

    physics.comp-ph cs.CE eess.SY

    Differentiable Simulator for Electrically Reconfigurable Electromagnetic Structures

    Authors: Johannes Müller, Dennis Philipp, Matthias Günther

    Abstract: This paper introduces a novel CUDA-enabled PyTorch-based framework designed for the gradient-based optimization of such reconfigurable electromagnetic structures with electrically tunable parameters. Traditional optimization techniques for these structures often rely on non-gradient-based methods, limiting efficiency and flexibility. Our framework leverages automatic differentiation, facilitating… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  4. arXiv:2502.13595  [pdf, ps, other

    cs.CL cs.AI cs.IR

    MMTEB: Massive Multilingual Text Embedding Benchmark

    Authors: Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, Márton Kardos, Ashwin Mathur, David Stap, Jay Gala, Wissam Siblini, Dominik Krzemiński, Genta Indra Winata, Saba Sturua, Saiteja Utpala, Mathieu Ciancone, Marion Schaeffer, Gabriel Sequeira, Diganta Misra, Shreeya Dhakal, Jonathan Rystrøm, Roman Solomatin, Ömer Çağatan, Akash Kundu, Martin Bernstorff, Shitao Xiao, Akshita Sukhlecha, Bhavish Pahwa , et al. (61 additional authors not shown)

    Abstract: Text embeddings are typically evaluated on a limited set of tasks, which are constrained by language, domain, and task diversity. To address these limitations and provide a more comprehensive evaluation, we introduce the Massive Multilingual Text Embedding Benchmark (MMTEB) - a large-scale, community-driven expansion of MTEB, covering over 500 quality-controlled evaluation tasks across 250+ langua… ▽ More

    Submitted 8 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted for ICLR: https://openreview.net/forum?id=zl3pfz4VCV

  5. arXiv:2502.03359  [pdf, other

    cs.CV cs.AI cs.LG

    GHOST: Gaussian Hypothesis Open-Set Technique

    Authors: Ryan Rabinowitz, Steve Cruz, Manuel Günther, Terrance E. Boult

    Abstract: Evaluations of large-scale recognition methods typically focus on overall performance. While this approach is common, it often fails to provide insights into performance across individual classes, which can lead to fairness issues and misrepresentation. Addressing these gaps is crucial for accurately assessing how well methods handle novel or unseen classes and ensuring a fair evaluation. To addre… ▽ More

    Submitted 10 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: Accepted at AAAI Conference on Artificial Intelligence 2025

  6. arXiv:2412.08802  [pdf, other

    cs.CL cs.CV cs.IR

    jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images

    Authors: Andreas Koukounas, Georgios Mastrapas, Sedigheh Eslami, Bo Wang, Mohammad Kalim Akram, Michael Günther, Isabelle Mohr, Saba Sturua, Nan Wang, Han Xiao

    Abstract: Contrastive Language-Image Pretraining (CLIP) has been widely used for crossmodal information retrieval and multimodal understanding tasks. However, CLIP models are mainly optimized for crossmodal vision-language tasks and underperform in single-mode text tasks. Moreover, these models are often trained on English datasets and therefore lack multilingual understanding. Additionally, from a visual u… ▽ More

    Submitted 24 April, 2025; v1 submitted 11 December, 2024; originally announced December 2024.

    Comments: 30 pages, 1-10 main paper, 10-12 refs, 12-30 benchmarks

    MSC Class: 68T50 ACM Class: I.2.7; I.2.10

  7. arXiv:2411.18147  [pdf, other

    cs.RO cs.CV cs.LG

    Online Knowledge Integration for 3D Semantic Mapping: A Survey

    Authors: Felix Igelbrink, Marian Renz, Martin Günther, Piper Powell, Lennart Niecksch, Oscar Lima, Martin Atzmueller, Joachim Hertzberg

    Abstract: Semantic mapping is a key component of robots operating in and interacting with objects in structured environments. Traditionally, geometric and knowledge representations within a semantic map have only been loosely integrated. However, recent advances in deep learning now allow full integration of prior knowledge, represented as knowledge graphs or language concepts, into sensor data processing a… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

    Comments: Submitted to Robotics and Autonomous Systems

  8. arXiv:2411.11194  [pdf

    cs.CR cs.NI

    Careless Whisper: Exploiting Stealthy End-to-End Leakage in Mobile Instant Messengers

    Authors: Gabriel K. Gegenhuber, Maximilian Günther, Markus Maier, Aljosha Judmayer, Florian Holzbauer, Philipp É. Frenzel, Johanna Ullrich

    Abstract: With over 3 billion users globally, mobile instant messaging apps have become indispensable for both personal and professional communication. Besides plain messaging, many services implement additional features such as delivery and read receipts informing a user when a message has successfully reached its target. This paper highlights that delivery receipts can pose significant privacy risks to us… ▽ More

    Submitted 19 November, 2024; v1 submitted 17 November, 2024; originally announced November 2024.

  9. arXiv:2410.01498  [pdf, other

    cs.CV

    Quo Vadis RankList-based System in Face Recognition?

    Authors: Xinyi Zhang, Manuel Günther

    Abstract: Face recognition in the wild has gained a lot of focus in the last few years, and many face recognition models are designed to verify faces in medium-quality images. Especially due to the availability of large training datasets with similar conditions, deep face recognition models perform exceptionally well in such tasks. However, in other tasks where substantially less training data is available,… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: Accepted for presentation at IJCB 2024

  10. arXiv:2409.10173  [pdf, other

    cs.CL cs.AI cs.IR

    jina-embeddings-v3: Multilingual Embeddings With Task LoRA

    Authors: Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang, Han Xiao

    Abstract: We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classificat… ▽ More

    Submitted 19 September, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: 20 pages, pp11-13 references, pp14-20 appendix and experiment tables

    MSC Class: 68T50 ACM Class: I.2.7

  11. arXiv:2409.07220  [pdf, other

    cs.CV

    Watchlist Challenge: 3rd Open-set Face Detection and Identification

    Authors: Furkan Kasım, Terrance E. Boult, Rensso Mora, Bernardo Biesseck, Rafael Ribeiro, Jan Schlueter, Tomáš Repák, Rafael Henrique Vareto, David Menotti, William Robson Schwartz, Manuel Günther

    Abstract: In the current landscape of biometrics and surveillance, the ability to accurately recognize faces in uncontrolled settings is paramount. The Watchlist Challenge addresses this critical need by focusing on face detection and open-set identification in real-world surveillance scenarios. This paper presents a comprehensive evaluation of participating algorithms, using the enhanced UnConstrained Coll… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Accepted for presentation at IJCB 2024

  12. arXiv:2409.04701  [pdf, other

    cs.CL cs.IR

    Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models

    Authors: Michael Günther, Isabelle Mohr, Daniel James Williams, Bo Wang, Han Xiao

    Abstract: Many use cases require retrieving smaller portions of text, and dense vector-based retrieval systems often perform better with shorter text segments, as the semantics are less likely to be over-compressed in the embeddings. Consequently, practitioners often split text documents into smaller chunks and encode them separately. However, chunk embeddings created in this way can lose contextual informa… ▽ More

    Submitted 2 October, 2024; v1 submitted 6 September, 2024; originally announced September 2024.

    Comments: 11 pages, 3rd draft

    MSC Class: 68T50 ACM Class: I.2.7

  13. arXiv:2409.02629  [pdf, ps, other

    cs.CV cs.AI cs.CR cs.LG

    AdvSecureNet: A Python Toolkit for Adversarial Machine Learning

    Authors: Melih Catal, Manuel Günther

    Abstract: Machine learning models are vulnerable to adversarial attacks. Several tools have been developed to research these vulnerabilities, but they often lack comprehensive features and flexibility. We introduce AdvSecureNet, a PyTorch based toolkit for adversarial machine learning that is the first to natively support multi-GPU setups for attacks, defenses, and evaluation. It is the first toolkit that s… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  14. arXiv:2408.16672  [pdf, other

    cs.IR cs.AI cs.CL

    Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

    Authors: Rohan Jha, Bo Wang, Michael Günther, Georgios Mastrapas, Saba Sturua, Isabelle Mohr, Andreas Koukounas, Mohammad Kalim Akram, Nan Wang, Han Xiao

    Abstract: Multi-vector dense models, such as ColBERT, have proven highly effective in information retrieval. ColBERT's late interaction scoring approximates the joint query-document attention seen in cross-encoders while maintaining inference efficiency closer to traditional dense retrieval models, thanks to its bi-encoder architecture and recent optimizations in indexing and search. In this work we propose… ▽ More

    Submitted 14 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

    Comments: 8 pages, references at pp7,8; EMNLP workshop submission

    MSC Class: 68T50 ACM Class: I.2.7

  15. arXiv:2407.14087  [pdf, other

    cs.CV

    Score Normalization for Demographic Fairness in Face Recognition

    Authors: Yu Linghu, Tiago de Freitas Pereira, Christophe Ecabert, Sébastien Marcel, Manuel Günther

    Abstract: Fair biometric algorithms have similar verification performance across different demographic groups given a single decision threshold. Unfortunately, for state-of-the-art face recognition networks, score distributions differ between demographics. Contrary to work that tries to align those distributions by extra training or fine-tuning, we solely focus on score post-processing methods. As proved, w… ▽ More

    Submitted 22 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted for presentation at IJCB 2024

  16. arXiv:2407.14064  [pdf, other

    cs.CV

    Refining Tuberculosis Detection in CXR Imaging: Addressing Bias in Deep Neural Networks via Interpretability

    Authors: Özgür Acar Güler, Manuel Günther, André Anjos

    Abstract: Automatic classification of active tuberculosis from chest X-ray images has the potential to save lives, especially in low- and mid-income countries where skilled human experts can be scarce. Given the lack of available labeled data to train such systems and the unbalanced nature of publicly available datasets, we argue that the reliability of deep learning models is limited, even if they can be s… ▽ More

    Submitted 8 October, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: Preprint of paper presented at EUVIP 2024

  17. arXiv:2406.18726  [pdf, other

    eess.SY cs.LG math.NA

    Data-driven identification of port-Hamiltonian DAE systems by Gaussian processes

    Authors: Peter Zaspel, Michael Günther

    Abstract: Port-Hamiltonian systems (pHS) allow for a structure-preserving modeling of dynamical systems. Coupling pHS via linear relations between input and output defines an overall pHS, which is structure preserving. However, in multiphysics applications, some subsystems do not allow for a physical pHS description, as (a) this is not available or (b) too expensive. Here, data-driven approaches can be used… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  18. arXiv:2406.09112  [pdf, other

    cs.CV cs.AI cs.LG

    Large-Scale Evaluation of Open-Set Image Classification Techniques

    Authors: Halil Bisgin, Andres Palechor, Mike Suter, Manuel Günther

    Abstract: The goal for classification is to correctly assign labels to unseen samples. However, most methods misclassify samples with unseen labels and assign them to one of the known classes. Open-Set Classification (OSC) algorithms aim to maximize both closed and open-set recognition capabilities. Recent studies showed the utility of such algorithms on small-scale data sets, but limited experimentation ma… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  19. arXiv:2405.20204  [pdf, other

    cs.CL cs.AI cs.CV cs.IR

    Jina CLIP: Your CLIP Model Is Also Your Text Retriever

    Authors: Andreas Koukounas, Georgios Mastrapas, Michael Günther, Bo Wang, Scott Martens, Isabelle Mohr, Saba Sturua, Mohammad Kalim Akram, Joan Fontanals Martínez, Saahil Ognawala, Susana Guzman, Maximilian Werk, Nan Wang, Han Xiao

    Abstract: Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by mapping them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval… ▽ More

    Submitted 26 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 4 pages, MFM-EAI@ICML2024

    MSC Class: 68T50 ACM Class: I.2.7

  20. arXiv:2404.18767  [pdf, ps, other

    cs.CE

    A Port-Hamiltonian System Perspective on Electromagneto-Quasistatic Field Formulations of Darwin-Type

    Authors: Markus Clemens, Marvin-Lucas Henkel, Fotios Kasolis, Michael Günther

    Abstract: Electromagneto-quasistatic (EMQS) field formulations are often dubbed as Darwin-type field formulations which approximate the Maxwell equations by neglecting radiation effects while modelling resistive, capacitive, and inductive effects. A common feature of EMQS field models is the Darwin-Ampére equation formulated with the magnetic vector potential and the electric scalar potential. EMQS field fo… ▽ More

    Submitted 11 September, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 8 pages, 0 figures, pre-submission version (preprint), presented at and submitted to the proceedings of "The 15th International Conference on Scientific Computing in Electrical Engineering" (SCEE 2024), March 4-8, 2024, Darmstadt, Germany

  21. arXiv:2404.09932  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Foundational Challenges in Assuring Alignment and Safety of Large Language Models

    Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (17 additional authors not shown)

    Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.

    Submitted 5 September, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  22. arXiv:2403.14435  [pdf, other

    cs.CV cs.AI cs.LG

    Biased Binary Attribute Classifiers Ignore the Majority Classes

    Authors: Xinyi Zhang, Johanna Sophie Bieri, Manuel Günther

    Abstract: To visualize the regions of interest that classifiers base their decisions on, different Class Activation Mapping (CAM) methods have been developed. However, all of these techniques target categorical classifiers only, though most real-world tasks are binary classification. In this paper, we extend gradient-based CAM techniques to work with binary classifiers and visualize the active regions for b… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  23. arXiv:2402.17016  [pdf, other

    cs.CL cs.AI cs.IR

    Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings

    Authors: Isabelle Mohr, Markus Krimmel, Saba Sturua, Mohammad Kalim Akram, Andreas Koukounas, Michael Günther, Georgios Mastrapas, Vinit Ravishankar, Joan Fontanals Martínez, Feng Wang, Qi Liu, Ziniu Yu, Jie Fu, Saahil Ognawala, Susana Guzman, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao

    Abstract: We introduce a novel suite of state-of-the-art bilingual text embedding models that are designed to support English and another target language. These models are capable of processing lengthy text inputs with up to 8192 tokens, making them highly versatile for a range of natural language processing tasks such as text retrieval, clustering, and semantic textual similarity (STS) calculations. By f… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    MSC Class: 68T50 ACM Class: I.2.7

  24. arXiv:2312.14250  [pdf, other

    cs.CR cs.PL

    HElium: A Language and Compiler for Fully Homomorphic Encryption with Support for Proxy Re-Encryption

    Authors: Mirko Günther, Lars Schütze, Kilian Becher, Thorsten Strufe, Jeronimo Castrillon

    Abstract: Privacy-preserving analysis of confidential data can increase the value of such data and even improve peoples' lives. Fully homomorphic encryption (FHE) can enable privacy-preserving analysis. However, FHE adds a large amount of computational overhead and its efficient use requires a high level of expertise. Compilers can automate certain aspects such as parameterization and circuit optimizations.… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 11 pages, 8 figures, 1 algorithm

  25. arXiv:2311.00400  [pdf, other

    cs.CV

    Open-Set Face Recognition with Maximal Entropy and Objectosphere Loss

    Authors: Rafael Henrique Vareto, Yu Linghu, Terrance E. Boult, William Robson Schwartz, Manuel Günther

    Abstract: Open-set face recognition characterizes a scenario where unknown individuals, unseen during the training and enrollment stages, appear on operation time. This work concentrates on watchlists, an open-set task that is expected to operate at a low False Positive Identification Rate and generally includes only a few enrollment samples per identity. We introduce a compact adapter network that benefits… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted for publication in Image and Vision Computing 2023

  26. arXiv:2310.19923  [pdf, other

    cs.CL cs.AI cs.LG

    Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

    Authors: Michael Günther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao

    Abstract: Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information. While these models are essential for tasks like information retrieval, semantic clustering, and text re-ranking, most existing open-source models, especially those built on architectures like BERT, struggle to represent lengthy documents and often… ▽ More

    Submitted 4 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 14 pages

    MSC Class: 68T50 ACM Class: I.2.7

  27. A Ferroelectric Compute-in-Memory Annealer for Combinatorial Optimization Problems

    Authors: Xunzhao Yin, Yu Qian, Alptekin Vardar, Marcel Gunther, Franz Muller, Nellie Laleni, Zijian Zhao, Zhouhang Jiang, Zhiguo Shi, Yiyu Shi, Xiao Gong, Cheng Zhuo, Thomas Kampfe, Kai Ni

    Abstract: Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in developing computing hardware tailored specifically for COP… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: 39 pages, 12 figures

  28. arXiv:2308.12371  [pdf, other

    cs.CV cs.AI cs.LG

    Open-set Face Recognition with Neural Ensemble, Maximal Entropy Loss and Feature Augmentation

    Authors: Rafael Henrique Vareto, Manuel Günther, William Robson Schwartz

    Abstract: Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of inter… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Journal ref: 36th Conference on Graphics, Patterns and Images (SIBGRAPI 2023)

  29. arXiv:2308.03666  [pdf, other

    stat.ML cs.LG

    Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness

    Authors: Shide Du, Zihan Fang, Shiyang Lan, Yanchao Tan, Manuel Günther, Shiping Wang, Wenzhong Guo

    Abstract: As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence… ▽ More

    Submitted 18 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  30. arXiv:2307.11224  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

    Authors: Michael Günther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, Han Xiao

    Abstract: Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating textual inputs into numerical representations, capturing the semantics of the text. These models excel in applications like dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of high-quality pairwise and triplet dataset… ▽ More

    Submitted 20 October, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: 9 pages, 2 page appendix

    MSC Class: 68T50 ACM Class: H.3.1; H.3.3; I.2.7; I.5.4

  31. arXiv:2210.07356  [pdf, other

    cs.CV

    Consistency and Accuracy of CelebA Attribute Values

    Authors: Haiyu Wu, Grace Bezold, Manuel Günther, Terrance Boult, Michael C. King, Kevin W. Bowyer

    Abstract: We report the first systematic analysis of the experimental foundations of facial attribute classification. Two annotators independently assigning attribute values shows that only 12 of 40 common attributes are assigned values with >= 95% consistency, and three (high cheekbones, pointed nose, oval face) have essentially random consistency. Of 5,068 duplicate face appearances in CelebA, attributes… ▽ More

    Submitted 16 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

  32. arXiv:2210.06789  [pdf, other

    cs.CV cs.LG

    Large-Scale Open-Set Classification Protocols for ImageNet

    Authors: Andres Palechor, Annesha Bhoumik, Manuel Günther

    Abstract: Open-Set Classification (OSC) intends to adapt closed-set classification models to real-world scenarios, where the classifier must correctly label samples of known classes while rejecting previously unseen unknown samples. Only recently, research started to investigate on algorithms that are able to handle these unknown samples correctly. Some of these approaches address OSC by including into the… ▽ More

    Submitted 18 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: This is a pre-print of the original paper accepted at the Winter Conference on Applications of Computer Vision (WACV) 2023

  33. arXiv:2209.01473  [pdf, other

    cs.SE

    Model-based Analysis and Specification of Functional Requirements and Tests for Complex Automotive Systems

    Authors: Carsten Wiecher, Constantin Mandel, Matthias Günther, Jannik Fischbach, Joel Greenyer, Matthias Greinert, Carsten Wolff, Roman Dumitrescu, Daniel Mendez, Albert Albers

    Abstract: The specification of requirements and tests are crucial activities in automotive development projects. However, due to the increasing complexity of automotive systems, practitioners fail to specify requirements and tests for distributed and evolving systems with complex interactions when following traditional development processes. To address this research gap, we propose a technique that starts w… ▽ More

    Submitted 15 November, 2023; v1 submitted 3 September, 2022; originally announced September 2022.

  34. arXiv:2208.04040  [pdf, other

    cs.CV

    Eight Years of Face Recognition Research: Reproducibility, Achievements and Open Issues

    Authors: Tiago de Freitas Pereira, Dominic Schmidli, Yu Linghu, Xinyi Zhang, Sébastien Marcel, Manuel Günther

    Abstract: Automatic face recognition is a research area with high popularity. Many different face recognition algorithms have been proposed in the last thirty years of intensive research in the field. With the popularity of deep learning and its capability to solve a huge variety of different problems, face recognition researchers have concentrated effort on creating better models under this paradigm. From… ▽ More

    Submitted 9 August, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

  35. arXiv:2205.07985  [pdf

    cs.AI cs.PL

    Expert Systems with Logic#. A Novel Modeling Framework for Logic Programming in an Object-Oriented Context of C#

    Authors: F. Lorenz, M. Günther

    Abstract: We present a novel approach how logic programming for expert systems can be declared directly in an object-oriented language.

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: 23 pages, 4 figures, 4 tables, 7 appendices

    ACM Class: I.2.1; I.2.5; D.1.6

  36. arXiv:2204.06286  [pdf, other

    cs.CE

    Electromagnetic Quasistatic Field Formulations of Darwin Type

    Authors: Markus Clemens, Marvin-Lucas Henkel, Fotios Kasolis, Michael Günther, Herbert De Gersem, Sebastian Schöps

    Abstract: Electromagnetic quasistatic (EMQS) fields, where radiation effects are neglected, while Ohmic losses and electric and magnetic field energies are considered, can be modeled using Darwin-type field models as an approximation to the full Maxwell equations. Commonly formulated in terms of magnetic vector and electric scalar potentials, these EMQS formulations are not gauge invariant. Several EMQS for… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

    MSC Class: 78A30; 78M12 ACM Class: I.6; J.6

    Journal ref: ICS Newsletter, vol. 29, no. 1, pages 3-9, ISSN 1026-0854, 2022

  37. arXiv:2201.09946  [pdf, other

    eess.AS cs.SD

    Microphone Utility Estimation in Acoustic Sensor Networks using Single-Channel Signal Features

    Authors: Michael Günther, Andreas Brendel, Walter Kellermann

    Abstract: In multichannel signal processing with distributed sensors, choosing the optimal subset of observed sensor signals to be exploited is crucial in order to maximize algorithmic performance and reduce computational load, ideally both at the same time. In the acoustic domain, signal cross-correlation is a natural choice to quantify the usefulness of microphone signals, i.e., microphone utility, for ar… ▽ More

    Submitted 14 January, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: submitted to EURASIP Journal on Audio, Speech, and Music Processing

  38. arXiv:2109.07011  [pdf, other

    astro-ph.SR astro-ph.EP cs.LG nlin.AO

    Testing Self-Organized Criticality Across the Main Sequence using Stellar Flares from TESS

    Authors: Adina D. Feinstein, Darryl Z. Seligman, Maximilian N. Günther, Fred C. Adams

    Abstract: Self-organized criticality describes a class of dynamical systems that maintain themselves in an attractor state with no intrinsic length or time scale. Fundamentally, this theoretical construct requires a mechanism for instability that may trigger additional instabilities locally via dissipative processes. This concept has been invoked to explain nonlinear dynamical phenomena such as featureless… ▽ More

    Submitted 12 January, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

    Comments: 6 pages, 3 figures, Accepted to ApJL

  39. YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation

    Authors: Till Grenzdörffer, Martin Günther, Joachim Hertzberg

    Abstract: While a great variety of 3D cameras have been introduced in recent years, most publicly available datasets for object recognition and pose estimation focus on one single camera. In this work, we present a dataset of 32 scenes that have been captured by 7 different 3D cameras, totaling 49,294 frames. This allows evaluating the sensitivity of pose estimation algorithms to the specifics of the used c… ▽ More

    Submitted 29 September, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: Published at ICRA-2020

  40. arXiv:2004.00517  [pdf, ps, other

    cs.SI

    Tracing Contacts to Control the COVID-19 Pandemic

    Authors: Christoph Günther, Michael Günther, Daniel Günther

    Abstract: The control of the COVID-19 pandemic requires a considerable reduction of contacts mostly achieved by imposing movement control up to the level of enforced quarantine. This has lead to a collapse of substantial parts of the economy. Carriers of the disease are infectious roughly 3 days after exposure to the virus. First symptoms occur later or not at all. As a consequence tracing the contacts of p… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: 5 pages, no figures

  41. arXiv:2002.08672  [pdf, other

    math.OC cs.CE

    GivEn -- Shape Optimization for Gas Turbines in Volatile Energy Networks

    Authors: Jan Backhaus, Matthias Bolten, Onur Tanil Doganay, Matthias Ehrhardt, Benedikt Engel, Christian Frey, Hanno Gottschalk, Michael Günther, Camilla Hahn, Jens Jäschke, Peter Jaksch, Kathrin Klamroth, Alexander Liefke, Daniel Luft, Lucas Mäde, Vincent Marciniak, Marco Reese, Johanna Schultes, Volker Schulz, Sebastian Schmitz, Johannes Steiner, Michael Stiglmayr

    Abstract: This paper describes the project GivEn that develops a novel multicriteria optimization process for gas turbine blades and vanes using modern "adjoint" shape optimization algorithms. Given the many start and shut-down processes of gas power plants in volatile energy grids, besides optimizing gas turbine geometries for efficiency, the durability understood as minimization of the probability of fail… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

    ACM Class: G.1.6; G.3; G.1.8

  42. arXiv:1911.12674  [pdf, other

    cs.DB cs.CL cs.LG

    RETRO: Relation Retrofitting For In-Database Machine Learning on Textual Data

    Authors: Michael Günther, Maik Thiele, Wolfgang Lehner

    Abstract: There are massive amounts of textual data residing in databases, valuable for many machine learning (ML) tasks. Since ML techniques depend on numerical input representations, word embeddings are increasingly utilized to convert symbolic representations such as text into meaningful numbers. However, a naive one-to-one mapping of each word in a database to a word embedding vector is not sufficient a… ▽ More

    Submitted 22 January, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: 14 pages

    MSC Class: H.2.8; H.3.3; I.2.7 ACM Class: H.2.8; H.3.3; I.2.7

  43. arXiv:1909.02871  [pdf, other

    cs.DC cs.NI

    Galois Field Arithmetics for Linear Network Coding using AVX512 Instruction Set Extensions

    Authors: Stephan M. Günther, Nicolas Appel, Georg Carle

    Abstract: Linear network coding requires arithmetic operations over Galois fields, more specifically over finite extension fields. While coding over GF(2) reduces to simple XOR operations, this field is less preferred for practical applications of random linear network coding due to high chances of linear dependencies and therefore redundant coded packets. Coding over larger fields such as GF(16) and GF(256… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: 6 pages, 2 figures, the updated finite field library is available under the LGPL at https://moep80211.net/plink/libmoepgf-avx512

  44. Constrained Hybrid Monte Carlo algorithms for gauge-Higgs models

    Authors: Michael Günther, Roman Höllwieser, Francesco Knechtli

    Abstract: We develop Hybrid Monte Carlo (HMC) algorithms for constrained Hamiltonian systems of gauge- Higgs models and introduce a new observable for the constraint effective Higgs potential. We use an extension of the so-called Rattle algorithm to general Hamiltonians for constrained systems, which we adapt to the 4D Abelian-Higgs model and the 5D SU(2) gauge theory on the torus and on the orbifold. The d… ▽ More

    Submitted 28 January, 2020; v1 submitted 7 August, 2019; originally announced August 2019.

    Comments: added comparison to one-loop potential in section 3.3, improved text; version accepted for publication in Computer Physics Communications

  45. arXiv:1811.04110  [pdf, other

    cs.CV

    Reducing Network Agnostophobia

    Authors: Akshay Raj Dhamija, Manuel Günther, Terrance E. Boult

    Abstract: Agnostophobia, the fear of the unknown, can be experienced by deep learning engineers while applying their networks to real-world applications. Unfortunately, network behavior is not well defined for inputs far from a networks training set. In an uncontrolled environment, networks face many instances that are not of interest to them and have to be rejected in order to avoid a false positive. This… ▽ More

    Submitted 22 December, 2018; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: Neural Information Processing Systems (NeurIPS) 2018

  46. Facial Attributes: Accuracy and Adversarial Robustness

    Authors: Andras Rozsa, Manuel Günther, Ethan M. Rudd, Terrance E. Boult

    Abstract: Facial attributes, emerging soft biometrics, must be automatically and reliably extracted from images in order to be usable in stand-alone systems. While recent methods extract facial attributes using deep neural networks (DNNs) trained on labeled facial attribute data, the robustness of deep attribute representations has not been evaluated. In this paper, we examine the representational stability… ▽ More

    Submitted 20 April, 2018; v1 submitted 3 January, 2018; originally announced January 2018.

    Comments: arXiv admin note: text overlap with arXiv:1605.05411

    Journal ref: Pattern Recognition Letters, 2017, ISSN 0167-8655

  47. Unconstrained Face Detection and Open-Set Face Recognition Challenge

    Authors: Manuel Günther, Peiyun Hu, Christian Herrmann, Chi Ho Chan, Min Jiang, Shufan Yang, Akshay Raj Dhamija, Deva Ramanan, Jürgen Beyerer, Josef Kittler, Mohamad Al Jazaery, Mohammad Iqbal Nouyed, Guodong Guo, Cezary Stankiewicz, Terrance E. Boult

    Abstract: Face detection and recognition benchmarks have shifted toward more difficult environments. The challenge presented in this paper addresses the next step in the direction of automatic detection and identification of people from outdoor surveillance cameras. While face detection has shown remarkable success in images collected from the web, surveillance cameras include more diverse occlusions, poses… ▽ More

    Submitted 25 September, 2018; v1 submitted 7 August, 2017; originally announced August 2017.

    Comments: This is an ERRATA version of the paper originally presented at the International Joint Conference on Biometrics. Due to a bug in our evaluation code, the results of the participants changed. The final conclusion, however, is still the same

  48. arXiv:1708.01697  [pdf, other

    cs.CV

    Adversarial Robustness: Softmax versus Openmax

    Authors: Andras Rozsa, Manuel Günther, Terrance E. Boult

    Abstract: Deep neural networks (DNNs) provide state-of-the-art results on various tasks and are widely used in real world applications. However, it was discovered that machine learning models, including the best performing DNNs, suffer from a fundamental problem: they can unexpectedly and confidently misclassify examples formed by slightly perturbing otherwise correctly recognized inputs. Various approaches… ▽ More

    Submitted 4 August, 2017; originally announced August 2017.

    Comments: Accepted to British Machine Vision Conference (BMVC) 2017

  49. arXiv:1705.01567  [pdf, other

    cs.CV

    Toward Open-Set Face Recognition

    Authors: Manuel Günther, Steve Cruz, Ethan M. Rudd, Terrance E. Boult

    Abstract: Much research has been conducted on both face identification and face verification, with greater focus on the latter. Research on face identification has mostly focused on using closed-set protocols, which assume that all probe images used in evaluation contain identities of subjects that are enrolled in the gallery. Real systems, however, where only a fraction of probe sample identities are enrol… ▽ More

    Submitted 18 May, 2017; v1 submitted 3 May, 2017; originally announced May 2017.

    Comments: Accepted for Publication in CVPR 2017 Biometrics Workshop

  50. arXiv:1612.00138  [pdf, other

    cs.CV

    Towards Robust Deep Neural Networks with BANG

    Authors: Andras Rozsa, Manuel Gunther, Terrance E. Boult

    Abstract: Machine learning models, including state-of-the-art deep neural networks, are vulnerable to small perturbations that cause unexpected classification errors. This unexpected lack of robustness raises fundamental questions about their generalization properties and poses a serious concern for practical deployments. As such perturbations can remain imperceptible - the formed adversarial examples demon… ▽ More

    Submitted 30 January, 2018; v1 submitted 30 November, 2016; originally announced December 2016.

    Comments: Accepted to the IEEE Winter Conference on Applications of Computer Vision (WACV), 2018