Search | arXiv e-print repository

Assessing and Exploiting Domain Name Misinformation

Abstract: Cloud providers' support for network evasion techniques that misrepresent the server's domain name is more prevalent than previously believed, which has serious implications for security and privacy due to the reliance on domain names in common security architectures. Domain fronting is one such evasive technique used by privacy enhancing technologies and malware to hide the domains they visit, an… ▽ More Cloud providers' support for network evasion techniques that misrepresent the server's domain name is more prevalent than previously believed, which has serious implications for security and privacy due to the reliance on domain names in common security architectures. Domain fronting is one such evasive technique used by privacy enhancing technologies and malware to hide the domains they visit, and it uses shared hosting and HTTPS to present a benign domain to observers while signaling the target domain in the encrypted HTTP request. In this paper, we construct an ontology of domain name misinformation and detail a novel measurement methodology to identify support among cloud infrastructure providers. Despite several of the largest cloud providers having publicly stated that they no longer support domain fronting, our findings demonstrate a more complex environment with many exceptions. We also present a novel and straightforward attack that allows an adversary to man-in-the-middle all the victim's encrypted traffic bound to a content delivery network that supports domain fronting, breaking the authenticity, confidentiality, and integrity guarantees expected by the victim when using HTTPS. By using dynamic linker hijacking to rewrite the HTTP Host field, our attack does not generate any artifacts that are visible to the victim or passive network monitoring solutions, and the attacker does not need a separate channel to exfiltrate data or perform command-and-control, which can be achieved by rewriting HTTP headers. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: Presented at the 8th International Workshop on Traffic Measurements for Cybersecurity (WTMC 2023)

arXiv:2009.01939 [pdf, other]

Accurate TLS Fingerprinting using Destination Context and Knowledge Bases

Authors: Blake Anderson, David McGrew

Abstract: Network fingerprinting is used to identify applications, provide insight into network traffic, and detect malicious activity. With the broad adoption of TLS, traditional fingerprinting techniques that rely on clear-text data are no longer viable. TLS-specific techniques have been introduced that create a fingerprint string from carefully selected data features in the client_hello to facilitate pro… ▽ More Network fingerprinting is used to identify applications, provide insight into network traffic, and detect malicious activity. With the broad adoption of TLS, traditional fingerprinting techniques that rely on clear-text data are no longer viable. TLS-specific techniques have been introduced that create a fingerprint string from carefully selected data features in the client_hello to facilitate process identification before data is exchanged. Unfortunately, this approach fails in practice because hundreds of processes can map to the same fingerprint string. We solve this problem by presenting a TLS fingerprinting system that makes use of the destination address, port, and server name in addition to a carefully constructed fingerprint string. The destination context is used to disambiguate the set of processes that match a fingerprint string by applying a weighted naive Bayes classifier, resulting in far greater performance. △ Less

Submitted 3 September, 2020; originally announced September 2020.

arXiv:1805.11544 [pdf, other]

Limitless HTTP in an HTTPS World: Inferring the Semantics of the HTTPS Protocol without Decryption

Authors: Blake Anderson, Andrew Chi, Scott Dunlop, David McGrew

Abstract: We present new analytic techniques for inferring HTTP semantics from passive observations of HTTPS that can infer the value of important fields including the status-code, Content-Type, and Server, and the presence or absence of several additional HTTP header fields, e.g., Cookie and Referer. Our goals are twofold: to better understand the limitations of the confidentiality of HTTPS, and to explore… ▽ More We present new analytic techniques for inferring HTTP semantics from passive observations of HTTPS that can infer the value of important fields including the status-code, Content-Type, and Server, and the presence or absence of several additional HTTP header fields, e.g., Cookie and Referer. Our goals are twofold: to better understand the limitations of the confidentiality of HTTPS, and to explore benign uses of traffic analysis such as application troubleshooting and malware detection that could replace HTTPS interception and static private keys in some scenarios. We found that our techniques improve the efficacy of malware detection, but they do not enable more powerful website fingerprinting attacks against Tor. Our broader set of results raises concerns about the confidentiality goals of TLS relative to a user's expectation of privacy, warranting future research. We apply our methods to the semantics of both HTTP/1.1 and HTTP/2 on data collected from automated runs of Firefox 58.0, Chrome 63.0, and Tor Browser 7.0.11 in a lab setting, and from applications running in a malware sandbox. We obtain ground truth plaintext for a diverse set of applications from the malware sandbox by extracting the key material needed for decryption from RAM post-execution. We developed an iterative approach to simultaneously solve several multi-class (field values) and binary (field presence) classification problems, and we show that our inference algorithm achieves an unweighted $F_1$ score greater than 0.900 for most HTTP fields examined. △ Less

Submitted 29 May, 2018; originally announced May 2018.

arXiv:1706.08003 [pdf, other]

OS Fingerprinting: New Techniques and a Study of Information Gain and Obfuscation

Authors: Blake Anderson, David McGrew

Abstract: Passive operating system fingerprinting reveals valuable information to the defenders of heterogeneous private networks; at the same time, attackers can use fingerprinting to reconnoiter networks, so defenders need obfuscation techniques to foil them. We present an effective approach for passive fingerprinting that uses data features from TLS as well as the TCP/IP and HTTP protocols in a multi-ses… ▽ More Passive operating system fingerprinting reveals valuable information to the defenders of heterogeneous private networks; at the same time, attackers can use fingerprinting to reconnoiter networks, so defenders need obfuscation techniques to foil them. We present an effective approach for passive fingerprinting that uses data features from TLS as well as the TCP/IP and HTTP protocols in a multi-session model, which is applicable whenever several sessions can be observed within a time window. In experiments on a real-world private network, our approach identified operating system major and minor versions with accuracies of 99.4% and 97.5%, respectively, and provided significant information gain. We also show that obfuscation strategies can often be defeated due to the difficulty of manipulating data features from all protocols, especially TLS, by studying how obfuscation affects our fingerprinting system. Because devices running unpatched operating systems on private networks create significant vulnerabilities, their detection is critical; our approach achieved over 98% accuracy at this important goal. △ Less

Submitted 24 June, 2017; originally announced June 2017.

Comments: 10 pages, 5 figures

arXiv:1607.01639 [pdf, other]

Deciphering Malware's use of TLS (without Decryption)

Authors: Blake Anderson, Subharthi Paul, David McGrew

Abstract: The use of TLS by malware poses new challenges to network threat detection because traditional pattern-matching techniques can no longer be applied to its messages. However, TLS also introduces a complex set of observable data features that allow many inferences to be made about both the client and the server. We show that these features can be used to detect and understand malware communication,… ▽ More The use of TLS by malware poses new challenges to network threat detection because traditional pattern-matching techniques can no longer be applied to its messages. However, TLS also introduces a complex set of observable data features that allow many inferences to be made about both the client and the server. We show that these features can be used to detect and understand malware communication, while at the same time preserving the privacy of benign uses of encryption. These data features also allow for accurate malware family attribution of network communication, even when restricted to a single, encrypted flow. To demonstrate this, we performed a detailed study of how TLS is used by malware and enterprise applications. We provide a general analysis on millions of TLS encrypted flows, and a targeted study on 18 malware families composed of thousands of unique malware samples and ten-of-thousands of malicious TLS flows. Importantly, we identify and accommodate the bias introduced by the use of a malware sandbox. The performance of a malware classifier is correlated with a malware family's use of TLS, i.e., malware families that actively evolve their use of cryptography are more difficult to classify. We conclude that malware's usage of TLS is distinct from benign usage in an enterprise setting, and that these differences can be effectively used in rules and machine learning classifiers. △ Less

Submitted 6 July, 2016; originally announced July 2016.

Comments: 15 pages, 4 figures

Showing 1–5 of 5 results for author: McGrew, D