-
Automatic Bill of Materials
Authors:
Nicholas Boucher,
Ross Anderson
Abstract:
Ensuring the security of software supply chains requires reliable identification of upstream dependencies. We present the Automatic Bill of Materials, or ABOM, a technique for embedding dependency metadata in binaries at compile time. Rather than relying on developers to explicitly enumerate dependency names and versions, ABOM embeds a hash of each distinct input source code file into the binary e…
▽ More
Ensuring the security of software supply chains requires reliable identification of upstream dependencies. We present the Automatic Bill of Materials, or ABOM, a technique for embedding dependency metadata in binaries at compile time. Rather than relying on developers to explicitly enumerate dependency names and versions, ABOM embeds a hash of each distinct input source code file into the binary emitted by a compiler. Hashes are stored in Compressed Bloom Filters, highly space-efficient probabilistic data structures, which enable querying for the presence of dependencies without the possibility of false negatives. If leveraged across the ecosystem, ABOMs provide a zero-touch, backwards-compatible, drop-in solution for fast supply chain attack detection in real-world, language-independent software.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
When Vision Fails: Text Attacks Against ViT and OCR
Authors:
Nicholas Boucher,
Jenny Blessing,
Ilia Shumailov,
Ross Anderson,
Nicolas Papernot
Abstract:
While text-based machine learning models that operate on visual inputs of rendered text have become robust against a wide range of existing attacks, we show that they are still vulnerable to visual adversarial examples encoded as text. We use the Unicode functionality of combining diacritical marks to manipulate encoded text so that small visual perturbations appear when the text is rendered. We s…
▽ More
While text-based machine learning models that operate on visual inputs of rendered text have become robust against a wide range of existing attacks, we show that they are still vulnerable to visual adversarial examples encoded as text. We use the Unicode functionality of combining diacritical marks to manipulate encoded text so that small visual perturbations appear when the text is rendered. We show how a genetic algorithm can be used to generate visual adversarial examples in a black-box setting, and conduct a user study to establish that the model-fooling adversarial examples do not affect human comprehension. We demonstrate the effectiveness of these attacks in the real world by creating adversarial examples against production models published by Facebook, Microsoft, IBM, and Google.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
If it's Provably Secure, It Probably Isn't: Why Learning from Proof Failure is Hard
Authors:
Ross Anderson,
Nicholas Boucher
Abstract:
In this paper we're going to explore the ways in which security proofs can fail, and their broader lessons for security engineering. To mention just one example, Larry Paulson proved the security of SSL/TLS using his theorem prover Isabelle in 1999, yet it's sprung multiple leaks since then, from timing attacks to Heartbleed. We will go through a number of other examples in the hope of elucidating…
▽ More
In this paper we're going to explore the ways in which security proofs can fail, and their broader lessons for security engineering. To mention just one example, Larry Paulson proved the security of SSL/TLS using his theorem prover Isabelle in 1999, yet it's sprung multiple leaks since then, from timing attacks to Heartbleed. We will go through a number of other examples in the hope of elucidating general principles. Proofs can be irrelevant, they can be opaque, they can be misleading and they can even be wrong. So we can look to the philosophy of mathematics for illumination. But the problem is more general. What happens, for example, when we have a choice between relying on mathematics and on physics? The security proofs claimed for quantum cryptosystems based on entanglement raise some pointed questions and may engage the philosophy of physics. And then there's the other varieties of assurance; we will recall the reliance placed on FIPS-140 evaluations, which API attacks suggested may have been overblown. Where the defenders focus their assurance effort on a subsystem or a model that cannot capture the whole attack surface they may just tell the attacker where to focus their effort. However, we think it's deeper and broader than that. The models of proof and assurance on which we try to rely have a social aspect, which we can try to understand from other perspectives ranging from the philosophy or sociology of science to the psychology of shared attention. These perspectives suggest, in various ways, how the management of errors and exceptions may be particularly poor. They do not merely relate to failure modes that the designers failed to consider properly or at all; they also relate to failure modes that the designers (or perhaps the verifiers) did not want to consider for institutional and cultural reasons.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Boosting Big Brother: Attacking Search Engines with Encodings
Authors:
Nicholas Boucher,
Luca Pajola,
Ilia Shumailov,
Ross Anderson,
Mauro Conti
Abstract:
Search engines are vulnerable to attacks against indexing and searching via text encoding manipulation. By imperceptibly perturbing text using uncommon encoded representations, adversaries can control results across search engines for specific search queries. We demonstrate that this attack is successful against two major commercial search engines - Google and Bing - and one open source search eng…
▽ More
Search engines are vulnerable to attacks against indexing and searching via text encoding manipulation. By imperceptibly perturbing text using uncommon encoded representations, adversaries can control results across search engines for specific search queries. We demonstrate that this attack is successful against two major commercial search engines - Google and Bing - and one open source search engine - Elasticsearch. We further demonstrate that this attack is successful against LLM chat search including Bing's GPT-4 chatbot and Google's Bard chatbot. We also present a variant of the attack targeting text summarization and plagiarism detection models, two ML tasks closely tied to search. We provide a set of defenses against these techniques and warn that adversaries can leverage these attacks to launch disinformation campaigns against unsuspecting users, motivating the need for search engine maintainers to patch deployed systems.
△ Less
Submitted 27 July, 2023; v1 submitted 27 April, 2023;
originally announced April 2023.
-
Threat Models over Space and Time: A Case Study of E2EE Messaging Applications
Authors:
Partha Das Chowdhury,
Maria Sameen,
Jenny Blessing,
Nicholas Boucher,
Joseph Gardiner,
Tom Burrows,
Ross Anderson,
Awais Rashid
Abstract:
Threat modelling is foundational to secure systems engineering and should be done in consideration of the context within which systems operate. On the other hand, the continuous evolution of both the technical sophistication of threats and the system attack surface is an inescapable reality. In this work, we explore the extent to which real-world systems engineering reflects the changing threat co…
▽ More
Threat modelling is foundational to secure systems engineering and should be done in consideration of the context within which systems operate. On the other hand, the continuous evolution of both the technical sophistication of threats and the system attack surface is an inescapable reality. In this work, we explore the extent to which real-world systems engineering reflects the changing threat context. To this end we examine the desktop clients of six widely used end-to-end-encrypted mobile messaging applications to understand the extent to which they adjusted their threat model over space (when enabling clients on new platforms, such as desktop clients) and time (as new threats emerged). We experimented with short-lived adversarial access against these desktop clients and analyzed the results with respect to two popular threat elicitation frameworks, STRIDE and LINDDUN. The results demonstrate that system designers need to both recognise the threats in the evolving context within which systems operate and, more importantly, to mitigate them by rescoping trust boundaries in a manner that those within the administrative boundary cannot violate security and privacy properties. Such a nuanced understanding of trust boundary scopes and their relationship with administrative boundaries allows for better administration of shared components, including securing them with safe defaults.
△ Less
Submitted 28 May, 2023; v1 submitted 13 January, 2023;
originally announced January 2023.
-
Talking Trojan: Analyzing an Industry-Wide Disclosure
Authors:
Nicholas Boucher,
Ross Anderson
Abstract:
While vulnerability research often focuses on technical findings and post-public release industrial response, we provide an analysis of the rest of the story: the coordinated disclosure process from discovery through public release. The industry-wide 'Trojan Source' vulnerability which affected most compilers, interpreters, code editors, and code repositories provided an interesting natural experi…
▽ More
While vulnerability research often focuses on technical findings and post-public release industrial response, we provide an analysis of the rest of the story: the coordinated disclosure process from discovery through public release. The industry-wide 'Trojan Source' vulnerability which affected most compilers, interpreters, code editors, and code repositories provided an interesting natural experiment, enabling us to compare responses by firms versus nonprofits and by firms that managed their own response versus firms that outsourced it. We document the interaction with bug bounty programs, government disclosure assistance, academic peer review, and press coverage, among other topics. We compare the response to an attack on source code with the response to a comparable attack on NLP systems employing machine-learning techniques. We conclude with recommendations to improve the global coordinated disclosure system.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Trojan Source: Invisible Vulnerabilities
Authors:
Nicholas Boucher,
Ross Anderson
Abstract:
We present a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye. This attack exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed, leading to vulnerabilities that cannot be perceived directly by…
▽ More
We present a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye. This attack exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed, leading to vulnerabilities that cannot be perceived directly by human code reviewers. 'Trojan Source' attacks, as we call them, pose an immediate threat both to first-party software and of supply-chain compromise across the industry. We present working examples of Trojan Source attacks in C, C++, C#, JavaScript, Java, Rust, Go, Python, SQL, Bash, Assembly, and Solidity. We propose definitive compiler-level defenses, and describe other mitigating controls that can be deployed in editors, repositories, and build pipelines while compilers are upgraded to block this attack. We document an industry-wide coordinated disclosure for these vulnerabilities; as they affect most compilers, editors, and repositories, the exercise teaches how different firms, open-source communities, and other stakeholders respond to vulnerability disclosure.
△ Less
Submitted 8 March, 2023; v1 submitted 30 October, 2021;
originally announced November 2021.
-
Bad Characters: Imperceptible NLP Attacks
Authors:
Nicholas Boucher,
Ilia Shumailov,
Ross Anderson,
Nicolas Papernot
Abstract:
Several years of research have shown that machine-learning systems are vulnerable to adversarial examples, both in theory and in practice. Until now, such attacks have primarily targeted visual models, exploiting the gap between human and machine perception. Although text-based models have also been attacked with adversarial examples, such attacks struggled to preserve semantic meaning and indisti…
▽ More
Several years of research have shown that machine-learning systems are vulnerable to adversarial examples, both in theory and in practice. Until now, such attacks have primarily targeted visual models, exploiting the gap between human and machine perception. Although text-based models have also been attacked with adversarial examples, such attacks struggled to preserve semantic meaning and indistinguishability. In this paper, we explore a large class of adversarial examples that can be used to attack text-based models in a black-box setting without making any human-perceptible visual modification to inputs. We use encoding-specific perturbations that are imperceptible to the human eye to manipulate the outputs of a wide range of Natural Language Processing (NLP) systems from neural machine-translation pipelines to web search engines. We find that with a single imperceptible encoding injection -- representing one invisible character, homoglyph, reordering, or deletion -- an attacker can significantly reduce the performance of vulnerable models, and with three injections most models can be functionally broken. Our attacks work against currently-deployed commercial systems, including those produced by Microsoft and Google, in addition to open source models published by Facebook, IBM, and HuggingFace. This novel series of attacks presents a significant threat to many language processing systems: an attacker can affect systems in a targeted manner without any assumptions about the underlying model. We conclude that text-based NLP systems require careful input sanitization, just like conventional applications, and that given such systems are now being deployed rapidly at scale, the urgent attention of architects and operators is required.
△ Less
Submitted 10 December, 2021; v1 submitted 17 June, 2021;
originally announced June 2021.