-
Extreme Multi-label Completion for Semantic Document Labelling with Taxonomy-Aware Parallel Learning
Authors:
Julien Audiffren,
Christophe Broillet,
Ljiljana Dolamic,
Philippe Cudré-Mauroux
Abstract:
In Extreme Multi Label Completion (XMLCo), the objective is to predict the missing labels of a collection of documents. Together with XML Classification, XMLCo is arguably one of the most challenging document classification tasks, as the very high number of labels (at least ten of thousands) is generally very large compared to the number of available labelled documents in the training dataset. Suc…
▽ More
In Extreme Multi Label Completion (XMLCo), the objective is to predict the missing labels of a collection of documents. Together with XML Classification, XMLCo is arguably one of the most challenging document classification tasks, as the very high number of labels (at least ten of thousands) is generally very large compared to the number of available labelled documents in the training dataset. Such a task is often accompanied by a taxonomy that encodes the labels organic relationships, and many methods have been proposed to leverage this hierarchy to improve the results of XMLCo algorithms. In this paper, we propose a new approach to this problem, TAMLEC (Taxonomy-Aware Multi-task Learning for Extreme multi-label Completion). TAMLEC divides the problem into several Taxonomy-Aware Tasks, i.e. subsets of labels adapted to the hierarchical paths of the taxonomy, and trains on these tasks using a dynamic Parallel Feature sharing approach, where some parts of the model are shared between tasks while others are task-specific. Then, at inference time, TAMLEC uses the labels available in a document to infer the appropriate tasks and to predict missing labels. To achieve this result, TAMLEC uses a modified transformer architecture that predicts ordered sequences of labels on a Weak-Semilattice structure that is naturally induced by the tasks. This approach yields multiple advantages. First, our experiments on real-world datasets show that TAMLEC outperforms state-of-the-art methods for various XMLCo problems. Second, TAMLEC is by construction particularly suited for few-shots XML tasks, where new tasks or labels are introduced with only few examples, and extensive evaluations highlight its strong performance compared to existing methods.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
NMT-Obfuscator Attack: Ignore a sentence in translation with only one word
Authors:
Sahar Sadrizadeh,
César Descalzo,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
Neural Machine Translation systems are used in diverse applications due to their impressive performance. However, recent studies have shown that these systems are vulnerable to carefully crafted small perturbations to their inputs, known as adversarial attacks. In this paper, we propose a new type of adversarial attack against NMT models. In this attack, we find a word to be added between two sent…
▽ More
Neural Machine Translation systems are used in diverse applications due to their impressive performance. However, recent studies have shown that these systems are vulnerable to carefully crafted small perturbations to their inputs, known as adversarial attacks. In this paper, we propose a new type of adversarial attack against NMT models. In this attack, we find a word to be added between two sentences such that the second sentence is ignored and not translated by the NMT model. The word added between the two sentences is such that the whole adversarial text is natural in the source language. This type of attack can be harmful in practical scenarios since the attacker can hide malicious information in the automatic translation made by the target NMT model. Our experiments show that different NMT models and translation tasks are vulnerable to this type of attack. Our attack can successfully force the NMT models to ignore the second part of the input in the translation for more than 50% of all cases while being able to maintain low perplexity for the whole input.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Can the Variation of Model Weights be used as a Criterion for Self-Paced Multilingual NMT?
Authors:
Àlex R. Atrio,
Alexis Allemann,
Ljiljana Dolamic,
Andrei Popescu-Belis
Abstract:
Many-to-one neural machine translation systems improve over one-to-one systems when training data is scarce. In this paper, we design and test a novel algorithm for selecting the language of minibatches when training such systems. The algorithm changes the language of the minibatch when the weights of the model do not evolve significantly, as measured by the smoothed KL divergence between all laye…
▽ More
Many-to-one neural machine translation systems improve over one-to-one systems when training data is scarce. In this paper, we design and test a novel algorithm for selecting the language of minibatches when training such systems. The algorithm changes the language of the minibatch when the weights of the model do not evolve significantly, as measured by the smoothed KL divergence between all layers of the Transformer network. This algorithm outperforms the use of alternating monolingual batches, but not the use of shuffled batches, in terms of translation quality (measured with BLEU and COMET) and convergence speed.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts
Authors:
Henrique Da Silva Gameiro,
Andrei Kucharavy,
Ljiljana Dolamic
Abstract:
With the emergence of widely available powerful LLMs, disinformation generated by large Language Models (LLMs) has become a major concern. Historically, LLM detectors have been touted as a solution, but their effectiveness in the real world is still to be proven. In this paper, we focus on an important setting in information operations -- short news-like posts generated by moderately sophisticated…
▽ More
With the emergence of widely available powerful LLMs, disinformation generated by large Language Models (LLMs) has become a major concern. Historically, LLM detectors have been touted as a solution, but their effectiveness in the real world is still to be proven. In this paper, we focus on an important setting in information operations -- short news-like posts generated by moderately sophisticated attackers.
We demonstrate that existing LLM detectors, whether zero-shot or purpose-trained, are not ready for real-world use in that setting. All tested zero-shot detectors perform inconsistently with prior benchmarks and are highly vulnerable to sampling temperature increase, a trivial attack absent from recent benchmarks. A purpose-trained detector generalizing across LLMs and unseen attacks can be developed, but it fails to generalize to new human-written texts.
We argue that the former indicates domain-specific benchmarking is needed, while the latter suggests a trade-off between the adversarial evasion resilience and overfitting to the reference human text, with both needing evaluation in benchmarks and currently absent. We believe this suggests a re-consideration of current LLM detector benchmarking approaches and provides a dynamically extensible benchmark to allow it (https://github.com/Reliable-Information-Lab-HEVS/benchmark_llm_texts_detection).
△ Less
Submitted 27 September, 2024; v1 submitted 5 September, 2024;
originally announced September 2024.
-
Sparse vs Contiguous Adversarial Pixel Perturbations in Multimodal Models: An Empirical Analysis
Authors:
Cristian-Alexandru Botocan,
Raphael Meier,
Ljiljana Dolamic
Abstract:
Assessing the robustness of multimodal models against adversarial examples is an important aspect for the safety of its users. We craft L0-norm perturbation attacks on the preprocessed input images. We launch them in a black-box setup against four multimodal models and two unimodal DNNs, considering both targeted and untargeted misclassification. Our attacks target less than 0.04% of perturbed ima…
▽ More
Assessing the robustness of multimodal models against adversarial examples is an important aspect for the safety of its users. We craft L0-norm perturbation attacks on the preprocessed input images. We launch them in a black-box setup against four multimodal models and two unimodal DNNs, considering both targeted and untargeted misclassification. Our attacks target less than 0.04% of perturbed image area and integrate different spatial positioning of perturbed pixels: sparse positioning and pixels arranged in different contiguous shapes (row, column, diagonal, and patch). To the best of our knowledge, we are the first to assess the robustness of three state-of-the-art multimodal models (ALIGN, AltCLIP, GroupViT) against different sparse and contiguous pixel distribution perturbations. The obtained results indicate that unimodal DNNs are more robust than multimodal models. Furthermore, models using CNN-based Image Encoder are more vulnerable than models with ViT - for untargeted attacks, we obtain a 99% success rate by perturbing less than 0.02% of the image area.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers
Authors:
Manuel Mondal,
Ljiljana Dolamic,
Gérôme Bovet,
Philippe Cudré-Mauroux,
Julien Audiffren
Abstract:
Multiple Choice Questions (MCQ) have become a commonly used approach to assess the capabilities of Large Language Models (LLMs), due to their ease of manipulation and evaluation. The experimental appraisals of the LLMs' Stated Answer (their answer to MCQ) have pointed to their apparent ability to perform probabilistic reasoning or to grasp uncertainty. In this work, we investigate whether these ap…
▽ More
Multiple Choice Questions (MCQ) have become a commonly used approach to assess the capabilities of Large Language Models (LLMs), due to their ease of manipulation and evaluation. The experimental appraisals of the LLMs' Stated Answer (their answer to MCQ) have pointed to their apparent ability to perform probabilistic reasoning or to grasp uncertainty. In this work, we investigate whether these aptitudes are measurable outside tailored prompting and MCQ by reformulating these issues as direct text-completion - the fundamental computational unit of LLMs. We introduce Revealed Belief, an evaluation framework that evaluates LLMs on tasks requiring reasoning under uncertainty, which complements MCQ scoring by analyzing text-completion probability distributions. Our findings suggest that while LLMs frequently state the correct answer, their Revealed Belief shows that they often allocate probability mass inconsistently, exhibit systematic biases, and often fail to update their beliefs appropriately when presented with new evidence, leading to strong potential impacts on downstream tasks. These results suggest that common evaluation methods may only provide a partial picture and that more research is needed to assess the extent and nature of their capabilities.
△ Less
Submitted 17 June, 2025; v1 submitted 21 June, 2024;
originally announced June 2024.
-
A Classification-Guided Approach for Adversarial Attacks against Neural Machine Translation
Authors:
Sahar Sadrizadeh,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
Neural Machine Translation (NMT) models have been shown to be vulnerable to adversarial attacks, wherein carefully crafted perturbations of the input can mislead the target model. In this paper, we introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier. In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations in t…
▽ More
Neural Machine Translation (NMT) models have been shown to be vulnerable to adversarial attacks, wherein carefully crafted perturbations of the input can mislead the target model. In this paper, we introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier. In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations in the target language by the NMT model belong to a different class than the original translations. Unlike previous attacks, our new approach has a more substantial effect on the translation by altering the overall meaning, which then leads to a different class determined by an oracle classifier. To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks by incorporating output translations of the target NMT model and the output logits of a classifier within the attack process. Extensive experiments, including a comparison with existing untargeted attacks, show that our attack is considerably more successful in altering the class of the output translation and has more effect on the translation. This new paradigm can reveal the vulnerabilities of NMT systems by focusing on the class of translation rather than the mere translation quality as studied traditionally.
△ Less
Submitted 22 February, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Authors:
Andrei Kucharavy,
Rachid Guerraoui,
Ljiljana Dolamic
Abstract:
Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Arti…
▽ More
Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Artificial Neural Networks (ANNs), leading to better machine learning algorithms. However, SGD is not applicable in a non-differentiable setting, leaving all that prior research off the table.
In this paper, we show that a class of evolutionary algorithms (EAs) inspired by the Gillespie-Orr Mutational Landscapes model for natural evolution is formally equivalent to SGD in certain settings and, in practice, is well adapted to large ANNs. We refer to such EAs as Gillespie-Orr EA class (GO-EAs) and empirically show how an insight transfer from SGD can work for them. We then show that for ANNs trained to near-optimality or in the transfer learning setting, the equivalence also allows transferring the insights from the Mutational Landscapes model to SGD.
We then leverage this equivalence to experimentally show how SGD and GO-EAs can provide mutual insight through examples of minima flatness, transfer learning, and mixing of individuals in EAs applied to large models.
△ Less
Submitted 20 May, 2023;
originally announced June 2023.
-
A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models
Authors:
Sahar Sadrizadeh,
Clément Barbier,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
In this paper, we propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models. First, we propose an optimization problem to generate adversarial examples that are semantically similar to the original sentences but destroy the translation generated by the target NMT model. This optimization problem is discrete, and we propose a continuous relaxation to solve it.…
▽ More
In this paper, we propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models. First, we propose an optimization problem to generate adversarial examples that are semantically similar to the original sentences but destroy the translation generated by the target NMT model. This optimization problem is discrete, and we propose a continuous relaxation to solve it. With this relaxation, we find a probability distribution for each token in the adversarial example, and then we can generate multiple adversarial examples by sampling from these distributions. Experimental results show that our attack significantly degrades the translation quality of multiple NMT models while maintaining the semantic similarity between the original and adversarial sentences. Furthermore, our attack outperforms the baselines in terms of success rate, similarity preservation, effect on translation quality, and token error rate. Finally, we propose a black-box extension of our attack by sampling from an optimized probability distribution for a reference model whose gradients are accessible.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Assessing the Importance of Frequency versus Compositionality for Subword-based Tokenization in NMT
Authors:
Benoist Wolleb,
Romain Silvestri,
Giorgos Vernikos,
Ljiljana Dolamic,
Andrei Popescu-Belis
Abstract:
Subword tokenization is the de facto standard for tokenization in neural language models and machine translation systems. Three advantages are frequently cited in favor of subwords: shorter encoding of frequent tokens, compositionality of subwords, and ability to deal with unknown words. As their relative importance is not entirely clear yet, we propose a tokenization approach that enables us to s…
▽ More
Subword tokenization is the de facto standard for tokenization in neural language models and machine translation systems. Three advantages are frequently cited in favor of subwords: shorter encoding of frequent tokens, compositionality of subwords, and ability to deal with unknown words. As their relative importance is not entirely clear yet, we propose a tokenization approach that enables us to separate frequency (the first advantage) from compositionality. The approach uses Huffman coding to tokenize words, by order of frequency, using a fixed amount of symbols. Experiments with CS-DE, EN-FR and EN-DE NMT show that frequency alone accounts for 90%-95% of the scores reached by BPE, hence compositionality has less importance than previously thought.
△ Less
Submitted 12 January, 2024; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search
Authors:
Andrei Kucharavy,
Matteo Monti,
Rachid Guerraoui,
Ljiljana Dolamic
Abstract:
Modern machine learning (ML) models are capable of impressive performances. However, their prowess is not due only to the improvements in their architecture and training algorithms but also to a drastic increase in computational power used to train them.
Such a drastic increase led to a growing interest in distributed ML, which in turn made worker failures and adversarial attacks an increasingly…
▽ More
Modern machine learning (ML) models are capable of impressive performances. However, their prowess is not due only to the improvements in their architecture and training algorithms but also to a drastic increase in computational power used to train them.
Such a drastic increase led to a growing interest in distributed ML, which in turn made worker failures and adversarial attacks an increasingly pressing concern. While distributed byzantine resilient algorithms have been proposed in a differentiable setting, none exist in a gradient-free setting.
The goal of this work is to address this shortcoming. For that, we introduce a more general definition of byzantine-resilience in ML - the \textit{model-consensus}, that extends the definition of the classical distributed consensus. We then leverage this definition to show that a general class of gradient-free ML algorithms - ($1,λ$)-Evolutionary Search - can be combined with classical distributed consensus algorithms to generate gradient-free byzantine-resilient distributed learning algorithms. We provide proofs and pseudo-code for two specific cases - the Total Order Broadcast and proof-of-work leader election.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Fundamentals of Generative Large Language Models and Perspectives in Cyber-Defense
Authors:
Andrei Kucharavy,
Zachary Schillaci,
Loïc Maréchal,
Maxime Würsch,
Ljiljana Dolamic,
Remi Sabonnadiere,
Dimitri Percia David,
Alain Mermoud,
Vincent Lenders
Abstract:
Generative Language Models gained significant attention in late 2022 / early 2023, notably with the introduction of models refined to act consistently with users' expectations of interactions with AI (conversational models). Arguably the focal point of public attention has been such a refinement of the GPT3 model -- the ChatGPT and its subsequent integration with auxiliary capabilities, including…
▽ More
Generative Language Models gained significant attention in late 2022 / early 2023, notably with the introduction of models refined to act consistently with users' expectations of interactions with AI (conversational models). Arguably the focal point of public attention has been such a refinement of the GPT3 model -- the ChatGPT and its subsequent integration with auxiliary capabilities, including search as part of Microsoft Bing. Despite extensive prior research invested in their development, their performance and applicability to a range of daily tasks remained unclear and niche. However, their wider utilization without a requirement for technical expertise, made in large part possible through conversational fine-tuning, revealed the extent of their true capabilities in a real-world environment. This has garnered both public excitement for their potential applications and concerns about their capabilities and potential malicious uses. This review aims to provide a brief overview of the history, state of the art, and implications of Generative Language Models in terms of their principles, abilities, limitations, and future prospects -- especially in the context of cyber-defense, with a focus on the Swiss operational environment.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Targeted Adversarial Attacks against Neural Machine Translation
Authors:
Sahar Sadrizadeh,
AmirHossein Dabiri Aghdam,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
Neural Machine Translation (NMT) systems are used in various applications. However, it has been shown that they are vulnerable to very small perturbations of their inputs, known as adversarial attacks. In this paper, we propose a new targeted adversarial attack against NMT models. In particular, our goal is to insert a predefined target keyword into the translation of the adversarial sentence whil…
▽ More
Neural Machine Translation (NMT) systems are used in various applications. However, it has been shown that they are vulnerable to very small perturbations of their inputs, known as adversarial attacks. In this paper, we propose a new targeted adversarial attack against NMT models. In particular, our goal is to insert a predefined target keyword into the translation of the adversarial sentence while maintaining similarity between the original sentence and the perturbed one in the source domain. To this aim, we propose an optimization problem, including an adversarial loss term and a similarity term. We use gradient projection in the embedding space to craft an adversarial sentence. Experimental results show that our attack outperforms Seq2Sick, the other targeted adversarial attack against NMT models, in terms of success rate and decrease in translation quality. Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences while similarity with the original sentence stays preserved.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
TransFool: An Adversarial Attack against Neural Machine Translation Models
Authors:
Sahar Sadrizadeh,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
Deep neural networks have been shown to be vulnerable to small perturbations of their inputs, known as adversarial attacks. In this paper, we investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. To fool NMT models, TransFool builds on a multi-term optimization problem and a gradient projection step. By…
▽ More
Deep neural networks have been shown to be vulnerable to small perturbations of their inputs, known as adversarial attacks. In this paper, we investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. To fool NMT models, TransFool builds on a multi-term optimization problem and a gradient projection step. By integrating the embedding representation of a language model, we generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples. Experimental results demonstrate that, for different translation tasks and NMT architectures, our white-box attack can severely degrade the translation quality while the semantic similarity between the original and the adversarial sentences stays high. Moreover, we show that TransFool is transferable to unknown target models. Finally, based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks both in white-box and black-box settings. Thus, TransFool permits us to better characterize the vulnerability of NMT models and outlines the necessity to design strong defense mechanisms and more robust NMT systems for real-life applications.
△ Less
Submitted 16 June, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Identifying Emerging Technologies and Leading Companies using Network Dynamics of Patent Clusters: a Cybersecurity Case Study
Authors:
Michael Tsesmelis,
Ljiljana Dolamic,
Marcus Matthias Keupp,
Dimitri Percia David,
Alain Mermoud
Abstract:
Strategic decisions rely heavily on non-scientific instrumentation to forecast emerging technologies and leading companies. Instead, we build a fast quantitative system with a small computational footprint to discover the most important technologies and companies in a given field, using generalisable methods applicable to any industry. With the help of patent data from the US Patent and Trademark…
▽ More
Strategic decisions rely heavily on non-scientific instrumentation to forecast emerging technologies and leading companies. Instead, we build a fast quantitative system with a small computational footprint to discover the most important technologies and companies in a given field, using generalisable methods applicable to any industry. With the help of patent data from the US Patent and Trademark Office, we first assign a value to each patent thanks to automated machine learning tools. We then apply network science to track the interaction and evolution of companies and clusters of patents (i.e. technologies) to create rankings for both sets that highlight important or emerging network nodes thanks to five network centrality indices. Finally, we illustrate our system with a case study based on the cybersecurity industry. Our results produce useful insights, for instance by highlighting (i) emerging technologies with a growing mean patent value and cluster size, (ii) the most influential companies in the field and (iii) attractive startups with few but impactful patents. Complementary analysis also provides evidence of decreasing marginal returns of research and development in larger companies in the cybersecurity industry.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Needle In A Haystack, Fast: Benchmarking Image Perceptual Similarity Metrics At Scale
Authors:
Cyril Vallez,
Andrei Kucharavy,
Ljiljana Dolamic
Abstract:
The advent of the internet, followed shortly by the social media made it ubiquitous in consuming and sharing information between anyone with access to it. The evolution in the consumption of media driven by this change, led to the emergence of images as means to express oneself, convey information and convince others efficiently. With computer vision algorithms progressing radically over the last…
▽ More
The advent of the internet, followed shortly by the social media made it ubiquitous in consuming and sharing information between anyone with access to it. The evolution in the consumption of media driven by this change, led to the emergence of images as means to express oneself, convey information and convince others efficiently. With computer vision algorithms progressing radically over the last decade, it is become easier and easier to study at scale the role of images in the flow of information online. While the research questions and overall pipelines differ radically, almost all start with a crucial first step - evaluation of global perceptual similarity between different images. That initial step is crucial for overall pipeline performance and processes most images. A number of algorithms are available and currently used to perform it, but so far no comprehensive review was available to guide the choice of researchers as to the choice of an algorithm best suited to their question, assumptions and computational resources. With this paper we aim to fill this gap, showing that classical computer vision methods are not necessarily the best approach, whereas a pair of relatively little used methods - Dhash perceptual hash and SimCLR v2 ResNets achieve excellent performance, scale well and are computationally efficient.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Block-Sparse Adversarial Attack to Fool Transformer-Based Text Classifiers
Authors:
Sahar Sadrizadeh,
Ljiljana Dolamic,
Pascal Frossard
Abstract:
Recently, it has been shown that, in spite of the significant performance of deep neural networks in different fields, those are vulnerable to adversarial examples. In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers. The adversarial perturbation in our method is imposed to be block-sparse so that the resultant adversarial example differs from t…
▽ More
Recently, it has been shown that, in spite of the significant performance of deep neural networks in different fields, those are vulnerable to adversarial examples. In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers. The adversarial perturbation in our method is imposed to be block-sparse so that the resultant adversarial example differs from the original sentence in only a few words. Due to the discrete nature of textual data, we perform gradient projection to find the minimizer of our proposed optimization problem. Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5% on different datasets (AG News, MNLI, and Yelp Reviews). Furthermore, the block-sparsity constraint of the proposed optimization problem results in small perturbations in the adversarial example.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
From Scattered Sources to Comprehensive Technology Landscape: A Recommendation-based Retrieval Approach
Authors:
Chi Thang Duong,
Dimitri Percia David,
Ljiljana Dolamic,
Alain Mermoud,
Vincent Lenders,
Karl Aberer
Abstract:
Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and incomplete approach. In this work, we propose an end-to-end recommendation based retrieval approach to support automatic retrieval of technologi…
▽ More
Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and incomplete approach. In this work, we propose an end-to-end recommendation based retrieval approach to support automatic retrieval of technologies and their associated companies from raw Web data. This is a two-task setup involving (i) technology classification of entities extracted from company corpus, and (ii) technology and company retrieval based on classified technologies. Our proposed framework approaches the first task by leveraging DistilBERT which is a state-of-the-art language model. For the retrieval task, we introduce a recommendation-based retrieval technique to simultaneously support retrieving related companies, technologies related to a specific company and companies relevant to a technology. To evaluate these tasks, we also construct a data set that includes company documents and entities extracted from these documents together with company categories and technology labels. Experiments show that our approach is able to return 4 times more relevant companies while outperforming traditional retrieval baseline in retrieving technologies.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.