-
A Study of Malware Prevention in Linux Distributions
Authors:
Duc-Ly Vu,
Trevor Dunlap,
Karla Obermeier-Velazquez,
Paul Gibert,
John Speed Meyers,
Santiago Torres-Arias
Abstract:
Malicious attacks on open source software packages are a growing concern. This concern morphed into a panic-inducing crisis after the revelation of the XZ Utils backdoor, which would have provided the attacker with, according to one observer, a "skeleton key" to the internet. This study therefore explores the challenges of preventing and detecting malware in Linux distribution package repositories…
▽ More
Malicious attacks on open source software packages are a growing concern. This concern morphed into a panic-inducing crisis after the revelation of the XZ Utils backdoor, which would have provided the attacker with, according to one observer, a "skeleton key" to the internet. This study therefore explores the challenges of preventing and detecting malware in Linux distribution package repositories. To do so, we ask two research questions: (1) What measures have Linux distributions implemented to counter malware, and how have maintainers experienced these efforts? (2) How effective are current malware detection tools at identifying malicious Linux packages? To answer these questions, we conduct interviews with maintainers at several major Linux distributions and introduce a Linux package malware benchmark dataset. Using this dataset, we evaluate the performance of six open source malware detection scanners. Distribution maintainers, according to the interviews, have mostly focused on reproducible builds to date. Our interviews identified only a single Linux distribution, Wolfi OS, that performs active malware scanning. Using this new benchmark dataset, the evaluation found that the performance of existing open-source malware scanners is underwhelming. Most studied tools excel at producing false positives but only infrequently detect true malware. Those that avoid high false positive rates often do so at the expense of a satisfactory true positive. Our findings provide insights into Linux distribution package repositories' current practices for malware detection and demonstrate the current inadequacy of open-source tools designed to detect malicious Linux packages.
△ Less
Submitted 25 November, 2024; v1 submitted 17 November, 2024;
originally announced November 2024.
-
A Benchmark Comparison of Python Malware Detection Approaches
Authors:
Duc-Ly Vu,
Zachary Newman,
John Speed Meyers
Abstract:
While attackers often distribute malware to victims via open-source, community-driven package repositories, these repositories do not currently run automated malware detection systems. In this work, we explore the security goals of the repository administrators and the requirements for deployments of such malware scanners via a case study of the Python ecosystem and PyPI repository, which includes…
▽ More
While attackers often distribute malware to victims via open-source, community-driven package repositories, these repositories do not currently run automated malware detection systems. In this work, we explore the security goals of the repository administrators and the requirements for deployments of such malware scanners via a case study of the Python ecosystem and PyPI repository, which includes interviews with administrators and maintainers. Further, we evaluate existing malware detection techniques for deployment in this setting by creating a benchmark dataset and comparing several existing tools, including the malware checks implemented in PyPI, Bandit4Mal, and OSSGadget's OSS Detect Backdoor.
We find that repository administrators have exacting technical demands for such malware detection tools. Specifically, they consider a false positive rate of even 0.01% to be unacceptably high, given the large number of package releases that might trigger false alerts. Measured tools have false positive rates between 15% and 97%; increasing thresholds for detection rules to reduce this rate renders the true positive rate useless. In some cases, these checks emitted alerts more often for benign packages than malicious ones. However, we also find a successful socio-technical malware detection system: external security researchers also perform repository malware scans and report the results to repository administrators. These parties face different incentives and constraints on their time and tooling. We conclude with recommendations for improving detection capabilities and strengthening the collaboration between security researchers and software repository administrators.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Social Networks as a Collective Intelligence: An Examination of the Python Ecosystem
Authors:
Thomas Pike,
Robert Colter,
Mark Bailey,
Jackie Kazil,
John Speed Meyers
Abstract:
The Python ecosystem represents a global, data rich, technology-enabled network. By analyzing Python's dependency network, its top 14 most imported libraries and cPython (or core Python) libraries, this research finds clear evidence the Python network can be considered a problem solving network. Analysis of the contributor network of the top 14 libraries and cPython reveals emergent specialization…
▽ More
The Python ecosystem represents a global, data rich, technology-enabled network. By analyzing Python's dependency network, its top 14 most imported libraries and cPython (or core Python) libraries, this research finds clear evidence the Python network can be considered a problem solving network. Analysis of the contributor network of the top 14 libraries and cPython reveals emergent specialization, where experts of specific libraries are isolated and focused while other experts link these critical libraries together, optimizing both local and global information exchange efficiency. As these networks are expanded, the local efficiency drops while the density increases, representing a possible transition point between exploitation (optimizing working solutions) and exploration (finding new solutions). These results provide insight into the optimal functioning of technology-enabled social networks and may have larger implications for the effective functioning of modern organizations.
△ Less
Submitted 16 January, 2022;
originally announced January 2022.