Search | arXiv e-print repository

LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts

Authors: Henrique Da Silva Gameiro, Andrei Kucharavy, Ljiljana Dolamic

Abstract: With the emergence of widely available powerful LLMs, disinformation generated by large Language Models (LLMs) has become a major concern. Historically, LLM detectors have been touted as a solution, but their effectiveness in the real world is still to be proven. In this paper, we focus on an important setting in information operations -- short news-like posts generated by moderately sophisticated… ▽ More With the emergence of widely available powerful LLMs, disinformation generated by large Language Models (LLMs) has become a major concern. Historically, LLM detectors have been touted as a solution, but their effectiveness in the real world is still to be proven. In this paper, we focus on an important setting in information operations -- short news-like posts generated by moderately sophisticated attackers. We demonstrate that existing LLM detectors, whether zero-shot or purpose-trained, are not ready for real-world use in that setting. All tested zero-shot detectors perform inconsistently with prior benchmarks and are highly vulnerable to sampling temperature increase, a trivial attack absent from recent benchmarks. A purpose-trained detector generalizing across LLMs and unseen attacks can be developed, but it fails to generalize to new human-written texts. We argue that the former indicates domain-specific benchmarking is needed, while the latter suggests a trade-off between the adversarial evasion resilience and overfitting to the reference human text, with both needing evaluation in benchmarks and currently absent. We believe this suggests a re-consideration of current LLM detector benchmarking approaches and provides a dynamically extensible benchmark to allow it (https://github.com/Reliable-Information-Lab-HEVS/benchmark_llm_texts_detection). △ Less

Submitted 27 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

Comments: 20 pages, 7 tables, 13 figures, under consideration for EMNLP

ACM Class: I.2.7; K.6.5

arXiv:2312.07110 [pdf, other]

LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature

Authors: Maxime Würsch, Andrei Kucharavy, Dimitri Percia David, Alain Mermoud

Abstract: The cybersecurity landscape evolves rapidly and poses threats to organizations. To enhance resilience, one needs to track the latest developments and trends in the domain. It has been demonstrated that standard bibliometrics approaches show their limits in such a fast-evolving domain. For this purpose, we use large language models (LLMs) to extract relevant knowledge entities from cybersecurity-re… ▽ More The cybersecurity landscape evolves rapidly and poses threats to organizations. To enhance resilience, one needs to track the latest developments and trends in the domain. It has been demonstrated that standard bibliometrics approaches show their limits in such a fast-evolving domain. For this purpose, we use large language models (LLMs) to extract relevant knowledge entities from cybersecurity-related texts. We use a subset of arXiv preprints on cybersecurity as our data and compare different LLMs in terms of entity recognition (ER) and relevance. The results suggest that LLMs do not produce good knowledge entities that reflect the cybersecurity context, but our results show some potential for noun extractors. For this reason, we developed a noun extractor boosted with some statistical analysis to extract specific and relevant compound nouns from the domain. Later, we tested our model to identify trends in the LLM domain. We observe some limitations, but it offers promising results to monitor the evolution of emergent trends. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 24 pages, 9 figures

arXiv:2306.09991 [pdf, other]

Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning

Authors: Andrei Kucharavy, Rachid Guerraoui, Ljiljana Dolamic

Abstract: Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Arti… ▽ More Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Artificial Neural Networks (ANNs), leading to better machine learning algorithms. However, SGD is not applicable in a non-differentiable setting, leaving all that prior research off the table. In this paper, we show that a class of evolutionary algorithms (EAs) inspired by the Gillespie-Orr Mutational Landscapes model for natural evolution is formally equivalent to SGD in certain settings and, in practice, is well adapted to large ANNs. We refer to such EAs as Gillespie-Orr EA class (GO-EAs) and empirically show how an insight transfer from SGD can work for them. We then show that for ANNs trained to near-optimality or in the transfer learning setting, the equivalence also allows transferring the insights from the Mutational Landscapes model to SGD. We then leverage this equivalence to experimentally show how SGD and GO-EAs can provide mutual insight through examples of minima flatness, transfer learning, and mixing of individuals in EAs applied to large models. △ Less

Submitted 20 May, 2023; originally announced June 2023.

Comments: To be published in ALIFE 2023; 16 pages, 10 figures, 1 listing

ACM Class: I.2.8; G.1.6

arXiv:2304.13540 [pdf, ps, other]

Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search

Authors: Andrei Kucharavy, Matteo Monti, Rachid Guerraoui, Ljiljana Dolamic

Abstract: Modern machine learning (ML) models are capable of impressive performances. However, their prowess is not due only to the improvements in their architecture and training algorithms but also to a drastic increase in computational power used to train them. Such a drastic increase led to a growing interest in distributed ML, which in turn made worker failures and adversarial attacks an increasingly… ▽ More Modern machine learning (ML) models are capable of impressive performances. However, their prowess is not due only to the improvements in their architecture and training algorithms but also to a drastic increase in computational power used to train them. Such a drastic increase led to a growing interest in distributed ML, which in turn made worker failures and adversarial attacks an increasingly pressing concern. While distributed byzantine resilient algorithms have been proposed in a differentiable setting, none exist in a gradient-free setting. The goal of this work is to address this shortcoming. For that, we introduce a more general definition of byzantine-resilience in ML - the \textit{model-consensus}, that extends the definition of the classical distributed consensus. We then leverage this definition to show that a general class of gradient-free ML algorithms - ($1,λ$)-Evolutionary Search - can be combined with classical distributed consensus algorithms to generate gradient-free byzantine-resilient distributed learning algorithms. We provide proofs and pseudo-code for two specific cases - the Total Order Broadcast and proof-of-work leader election. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 10 pages, 4 listings, 2 theorems

ACM Class: I.2.11; D.1.3; F.1.2

arXiv:2304.08968 [pdf, other]

Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs

Authors: Da Silva Gameiro Henrique, Andrei Kucharavy, Rachid Guerraoui

Abstract: The self-attention revolution allowed generative language models to scale and achieve increasingly impressive abilities. Such models - commonly referred to as Large Language Models (LLMs) - have recently gained prominence with the general public, thanks to conversational fine-tuning, putting their behavior in line with public expectations regarding AI. This prominence amplified prior concerns rega… ▽ More The self-attention revolution allowed generative language models to scale and achieve increasingly impressive abilities. Such models - commonly referred to as Large Language Models (LLMs) - have recently gained prominence with the general public, thanks to conversational fine-tuning, putting their behavior in line with public expectations regarding AI. This prominence amplified prior concerns regarding the misuse of LLMs and led to the emergence of numerous tools to detect LLMs in the wild. Unfortunately, most such tools are critically flawed. While major publications in the LLM detectability field suggested that LLMs were easy to detect with fine-tuned autoencoders, the limitations of their results are easy to overlook. Specifically, they assumed publicly available generative models without fine-tunes or non-trivial prompts. While the importance of these assumptions has been demonstrated, until now, it remained unclear how well such detection could be countered. Here, we show that an attacker with access to such detectors' reference human texts and output not only evades detection but can fully frustrate the detector training - with a reasonable budget and all its outputs labeled as such. Achieving it required combining common "reinforcement from critic" loss function modification and AdamW optimizer, which led to surprisingly good fine-tuning generalization. Finally, we warn against the temptation to transpose the conclusions obtained in RNN-driven text GANs to LLMs due to their better representative ability. These results have critical implications for the detection and prevention of malicious use of generative language models, and we hope they will aid the designers of generative models and detectors. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: 15 pages, 6 figures; 10 pages, 7 figures Supplementary Materials; under review at ECML 2023

ACM Class: I.2.7; K.6.5

arXiv:2303.12132 [pdf, other]

Fundamentals of Generative Large Language Models and Perspectives in Cyber-Defense

Authors: Andrei Kucharavy, Zachary Schillaci, Loïc Maréchal, Maxime Würsch, Ljiljana Dolamic, Remi Sabonnadiere, Dimitri Percia David, Alain Mermoud, Vincent Lenders

Abstract: Generative Language Models gained significant attention in late 2022 / early 2023, notably with the introduction of models refined to act consistently with users' expectations of interactions with AI (conversational models). Arguably the focal point of public attention has been such a refinement of the GPT3 model -- the ChatGPT and its subsequent integration with auxiliary capabilities, including… ▽ More Generative Language Models gained significant attention in late 2022 / early 2023, notably with the introduction of models refined to act consistently with users' expectations of interactions with AI (conversational models). Arguably the focal point of public attention has been such a refinement of the GPT3 model -- the ChatGPT and its subsequent integration with auxiliary capabilities, including search as part of Microsoft Bing. Despite extensive prior research invested in their development, their performance and applicability to a range of daily tasks remained unclear and niche. However, their wider utilization without a requirement for technical expertise, made in large part possible through conversational fine-tuning, revealed the extent of their true capabilities in a real-world environment. This has garnered both public excitement for their potential applications and concerns about their capabilities and potential malicious uses. This review aims to provide a brief overview of the history, state of the art, and implications of Generative Language Models in terms of their principles, abilities, limitations, and future prospects -- especially in the context of cyber-defense, with a focus on the Swiss operational environment. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 41 pages (without references), 13 figures; public report of Cyber-Defence Campus

ACM Class: I.2.7; I.2.1; K.6.5; K.4.2; J.7

arXiv:2206.00282 [pdf, other]

Needle In A Haystack, Fast: Benchmarking Image Perceptual Similarity Metrics At Scale

Authors: Cyril Vallez, Andrei Kucharavy, Ljiljana Dolamic

Abstract: The advent of the internet, followed shortly by the social media made it ubiquitous in consuming and sharing information between anyone with access to it. The evolution in the consumption of media driven by this change, led to the emergence of images as means to express oneself, convey information and convince others efficiently. With computer vision algorithms progressing radically over the last… ▽ More The advent of the internet, followed shortly by the social media made it ubiquitous in consuming and sharing information between anyone with access to it. The evolution in the consumption of media driven by this change, led to the emergence of images as means to express oneself, convey information and convince others efficiently. With computer vision algorithms progressing radically over the last decade, it is become easier and easier to study at scale the role of images in the flow of information online. While the research questions and overall pipelines differ radically, almost all start with a crucial first step - evaluation of global perceptual similarity between different images. That initial step is crucial for overall pipeline performance and processes most images. A number of algorithms are available and currently used to perform it, but so far no comprehensive review was available to guide the choice of researchers as to the choice of an algorithm best suited to their question, assumptions and computational resources. With this paper we aim to fill this gap, showing that classical computer vision methods are not necessarily the best approach, whereas a pair of relatively little used methods - Dhash perceptual hash and SimCLR v2 ResNets achieve excellent performance, scale well and are computationally efficient. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: 26 pages, 10 figures

ACM Class: H.3.1; I.4.10; I.4.7; I.5.5; I.5.4; K.4

arXiv:2108.12275 [pdf, other]

Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs?

Authors: Kevin Blin, Andrei Kucharavy

Abstract: In this paper we address the problem of fine-tuned text generation with a limited computational budget. For that, we use a well-performing text generative adversarial network (GAN) architecture - Diversity-Promoting GAN (DPGAN), and attempted a drop-in replacement of the LSTM layer with a self-attention-based Transformer layer in order to leverage their efficiency. The resulting Self-Attention DPG… ▽ More In this paper we address the problem of fine-tuned text generation with a limited computational budget. For that, we use a well-performing text generative adversarial network (GAN) architecture - Diversity-Promoting GAN (DPGAN), and attempted a drop-in replacement of the LSTM layer with a self-attention-based Transformer layer in order to leverage their efficiency. The resulting Self-Attention DPGAN (SADPGAN) was evaluated for performance, quality and diversity of generated text and stability. Computational experiments suggested that a transformer architecture is unable to drop-in replace the LSTM layer, under-performing during the pre-training phase and undergoing a complete mode collapse during the GAN tuning phase. Our results suggest that the transformer architecture need to be adapted before it can be used as a replacement for RNNs in text-generating GANs. △ Less

Submitted 26 August, 2021; originally announced August 2021.

Comments: accepted to RANLP 2021

MSC Class: 68T50; 68T05 ACM Class: I.2.7

arXiv:2007.14585 [pdf]

On the Transcriptomic Signature and General Stress State Associated with Aneuploidy

Authors: Hung-Ji Tsai, Anjali R. Nelliat, Andrei Kucharavy, Mohammad Ikbal Choudhury, Sean X. Sun, Michael C. Schatz, Rong Li

Abstract: Whether aneuploid cells with diverse karyotypes have any properties in common has a been a subject of intense interest. A recent study by Terhorst et al. (1) reinvestigated the common aneuploidy gene expression (CAGE), disputing the conclusion of our recent work (2). In this short article, which has been submitted to PNAS as a Letter to the Editor, we explain our major concerns about Terhorst et a… ▽ More Whether aneuploid cells with diverse karyotypes have any properties in common has a been a subject of intense interest. A recent study by Terhorst et al. (1) reinvestigated the common aneuploidy gene expression (CAGE), disputing the conclusion of our recent work (2). In this short article, which has been submitted to PNAS as a Letter to the Editor, we explain our major concerns about Terhorst et al. and why we believe that our previous conclusion stands valid. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: 1 page, no figure, with new analyses (a letter to PNAS Editor)

arXiv:2006.04720 [pdf, other]

Host-Pathongen Co-evolution Inspired Algorithm Enables Robust GAN Training

Authors: Andrei Kucharavy, El Mahdi El Mhamdi, Rachid Guerraoui

Abstract: Generative adversarial networks (GANs) are pairs of artificial neural networks that are trained one against each other. The outputs from a generator are mixed with the real-world inputs to the discriminator and both networks are trained until an equilibrium is reached, where the discriminator cannot distinguish generated inputs from real ones. Since their introduction, GANs have allowed for the ge… ▽ More Generative adversarial networks (GANs) are pairs of artificial neural networks that are trained one against each other. The outputs from a generator are mixed with the real-world inputs to the discriminator and both networks are trained until an equilibrium is reached, where the discriminator cannot distinguish generated inputs from real ones. Since their introduction, GANs have allowed for the generation of impressive imitations of real-life films, images and texts, whose fakeness is barely noticeable to humans. Despite their impressive performance, training GANs remains to this day more of an art than a reliable procedure, in a large part due to training process stability. Generators are susceptible to mode dropping and convergence to random patterns, which have to be mitigated by computationally expensive multiple restarts. Curiously, GANs bear an uncanny similarity to a co-evolution of a pathogen and its host's immune system in biology. In a biological context, the majority of potential pathogens indeed never make it and are kept at bay by the hots' immune system. Yet some are efficient enough to present a risk of a serious condition and recurrent infections. Here, we explore that similarity to propose a more robust algorithm for GANs training. We empirically show the increased stability and a better ability to generate high-quality images while using less computational power. △ Less

Submitted 9 June, 2020; v1 submitted 22 May, 2020; originally announced June 2020.

Comments: 8 pages, 10 figures

MSC Class: 92B20; 68T05; ACM Class: I.5.2

arXiv:1902.01686 [pdf, other]

The Probabilistic Fault Tolerance of Neural Networks in the Continuous Limit

Authors: El-Mahdi El-Mhamdi, Rachid Guerraoui, Andrei Kucharavy, Sergei Volodin

Abstract: The loss of a few neurons in a brain rarely results in any visible loss of function. However, the insight into what "few" means in this context is unclear. How many random neuron failures will it take to lead to a visible loss of function? In this paper, we address the fundamental question of the impact of the crash of a random subset of neurons on the overall computation of a neural network and t… ▽ More The loss of a few neurons in a brain rarely results in any visible loss of function. However, the insight into what "few" means in this context is unclear. How many random neuron failures will it take to lead to a visible loss of function? In this paper, we address the fundamental question of the impact of the crash of a random subset of neurons on the overall computation of a neural network and the error in the output it produces. We study fault tolerance of neural networks subject to small random neuron/weight crash failures in a probabilistic setting. We give provable guarantees on the robustness of the network to these crashes. Our main contribution is a bound on the error in the output of a network under small random Bernoulli crashes proved by using a Taylor expansion in the continuous limit, where close-by neurons at a layer are similar. The failure mode we adopt in our model is characteristic of neuromorphic hardware, a promising technology to speed up artificial neural networks, as well as of biological networks. We show that our theoretical bounds can be used to compare the fault tolerance of different architectures and to design a regularizer improving the fault tolerance of a given architecture. We design an algorithm achieving fault tolerance using a reasonable number of neurons. In addition to the theoretical proof, we also provide experimental validation of our results and suggest a connection to the generalization capacity problem. △ Less

Submitted 25 September, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

Comments: 10 pages (without references), 2 figures, 2 tables, 1 algorithm, 26 pages of supplementary material

Showing 1–11 of 11 results for author: Kucharavy, A