-
HoneyWin: High-Interaction Windows Honeypot in Enterprise Environment
Authors:
Yan Lin Aung,
Yee Loon Khoo,
Davis Yang Zheng,
Bryan Swee Duo,
Sudipta Chattopadhyay,
Jianying Zhou,
Liming Lu,
Weihan Goh
Abstract:
Windows operating systems (OS) are ubiquitous in enterprise Information Technology (IT) and operational technology (OT) environments. Due to their widespread adoption and known vulnerabilities, they are often the primary targets of malware and ransomware attacks. With 93% of the ransomware targeting Windows-based systems, there is an urgent need for advanced defensive mechanisms to detect, analyze…
▽ More
Windows operating systems (OS) are ubiquitous in enterprise Information Technology (IT) and operational technology (OT) environments. Due to their widespread adoption and known vulnerabilities, they are often the primary targets of malware and ransomware attacks. With 93% of the ransomware targeting Windows-based systems, there is an urgent need for advanced defensive mechanisms to detect, analyze, and mitigate threats effectively. In this paper, we propose HoneyWin a high-interaction Windows honeypot that mimics an enterprise IT environment. The HoneyWin consists of three Windows 11 endpoints and an enterprise-grade gateway provisioned with comprehensive network traffic capturing, host-based logging, deceptive tokens, endpoint security and real-time alerts capabilities. The HoneyWin has been deployed live in the wild for 34 days and receives more than 5.79 million unsolicited connections, 1.24 million login attempts, 5 and 354 successful logins via remote desktop protocol (RDP) and secure shell (SSH) respectively. The adversary interacted with the deceptive token in one of the RDP sessions and exploited the public-facing endpoint to initiate the Simple Mail Transfer Protocol (SMTP) brute-force bot attack via SSH sessions. The adversary successfully harvested 1,250 SMTP credentials after attempting 151,179 credentials during the attack.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
ChatIoT: Large Language Model-based Security Assistant for Internet of Things with Retrieval-Augmented Generation
Authors:
Ye Dong,
Yan Lin Aung,
Sudipta Chattopadhyay,
Jianying Zhou
Abstract:
Internet of Things (IoT) has gained widespread popularity, revolutionizing industries and daily life. However, it has also emerged as a prime target for attacks. Numerous efforts have been made to improve IoT security, and substantial IoT security and threat information, such as datasets and reports, have been developed. However, existing research often falls short in leveraging these insights to…
▽ More
Internet of Things (IoT) has gained widespread popularity, revolutionizing industries and daily life. However, it has also emerged as a prime target for attacks. Numerous efforts have been made to improve IoT security, and substantial IoT security and threat information, such as datasets and reports, have been developed. However, existing research often falls short in leveraging these insights to assist or guide users in harnessing IoT security practices in a clear and actionable way. In this paper, we propose ChatIoT, a large language model (LLM)-based IoT security assistant designed to disseminate IoT security and threat intelligence. By leveraging the versatile property of retrieval-augmented generation (RAG), ChatIoT successfully integrates the advanced language understanding and reasoning capabilities of LLM with fast-evolving IoT security information. Moreover, we develop an end-to-end data processing toolkit to handle heterogeneous datasets. This toolkit converts datasets of various formats into retrievable documents and optimizes chunking strategies for efficient retrieval. Additionally, we define a set of common use case specifications to guide the LLM in generating answers aligned with users' specific needs and expertise levels. Finally, we implement a prototype of ChatIoT and conduct extensive experiments with different LLMs, such as LLaMA3, LLaMA3.1, and GPT-4o. Experimental evaluations demonstrate that ChatIoT can generate more reliable, relevant, and technical in-depth answers for most use cases. When evaluating the answers with LLaMA3:70B, ChatIoT improves the above metrics by over 10% on average, particularly in relevance and technicality, compared to using LLMs alone.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Generative AI for Internet of Things Security: Challenges and Opportunities
Authors:
Yan Lin Aung,
Ivan Christian,
Ye Dong,
Xiaodong Ye,
Sudipta Chattopadhyay,
Jianying Zhou
Abstract:
As Generative AI (GenAI) continues to gain prominence and utility across various sectors, their integration into the realm of Internet of Things (IoT) security evolves rapidly. This work delves into an examination of the state-of-the-art literature and practical applications on how GenAI could improve and be applied in the security landscape of IoT. Our investigation aims to map the current state…
▽ More
As Generative AI (GenAI) continues to gain prominence and utility across various sectors, their integration into the realm of Internet of Things (IoT) security evolves rapidly. This work delves into an examination of the state-of-the-art literature and practical applications on how GenAI could improve and be applied in the security landscape of IoT. Our investigation aims to map the current state of GenAI implementation within IoT security, exploring their potential to fortify security measures further. Through the compilation, synthesis, and analysis of the latest advancements in GenAI technologies applied to IoT, this paper not only introduces fresh insights into the field, but also lays the groundwork for future research directions. It explains the prevailing challenges within IoT security, discusses the effectiveness of GenAI in addressing these issues, and identifies significant research gaps through MITRE Mitigations. Accompanied with three case studies, we provide a comprehensive overview of the progress and future prospects of GenAI applications in IoT security. This study serves as a foundational resource to improve IoT security through the innovative application of GenAI, thus contributing to the broader discourse on IoT security and technology integration.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples
Authors:
Michael Färber,
David Lamprecht,
Johan Krause,
Linn Aung,
Peter Haase
Abstract:
We present SemOpenAlex, an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities, such as authors, institutions, journals, and concepts. SemOpenAlex is licensed under CC0, providing free and open access to the data. We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source…
▽ More
We present SemOpenAlex, an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities, such as authors, institutions, journals, and concepts. SemOpenAlex is licensed under CC0, providing free and open access to the data. We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source in the Linked Open Data cloud, complete with resolvable URIs and links to other data sources. Moreover, we provide embeddings for knowledge graph entities using high-performance computing. SemOpenAlex enables a broad range of use-case scenarios, such as exploratory semantic search via our website, large-scale scientific impact quantification, and other forms of scholarly big data analytics within and across scientific disciplines. Additionally, it enables academic recommender systems, such as recommending collaborators, publications, and venues, including explainability capabilities. Finally, SemOpenAlex can serve for RDF query optimization benchmarks, creating scholarly knowledge-guided language models, and as a hub for semantic scientific publishing.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
DeepFire2: A Convolutional Spiking Neural Network Accelerator on FPGAs
Authors:
Myat Thu Linn Aung,
Daniel Gerlinghoff,
Chuping Qu,
Liwei Yang,
Tian Huang,
Rick Siow Mong Goh,
Tao Luo,
Weng-Fai Wong
Abstract:
Brain-inspired spiking neural networks (SNNs) replace the multiply-accumulate operations of traditional neural networks by integrate-and-fire neurons, with the goal of achieving greater energy efficiency. Specialized hardware implementations of those neurons clearly have advantages over general-purpose devices in terms of power and performance, but exhibit poor scalability when it comes to acceler…
▽ More
Brain-inspired spiking neural networks (SNNs) replace the multiply-accumulate operations of traditional neural networks by integrate-and-fire neurons, with the goal of achieving greater energy efficiency. Specialized hardware implementations of those neurons clearly have advantages over general-purpose devices in terms of power and performance, but exhibit poor scalability when it comes to accelerating large neural networks. DeepFire2 introduces a hardware architecture which can map large network layers efficiently across multiple super logic regions in a multi-die FPGA. That gives more control over resource allocation and parallelism, benefiting both throughput and energy consumption. Avoiding the use of lookup tables to implement the AND operations of an SNN, prevents the layer size to be limited by logic resources. A deep pipeline does not only lead to an increased clock speed of up to 600 MHz. We double the throughput and power efficiency compared to our previous version of DeepFire, which equates to an almost 10-fold improvement over other previous implementations. Importantly, we are able to deploy a large ImageNet model, while maintaining a throughput of over 1500 frames per second.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Bridging the gap between prostate radiology and pathology through machine learning
Authors:
Indrani Bhattacharya,
David S. Lim,
Han Lin Aung,
Xingchen Liu,
Arun Seetharaman,
Christian A. Kunder,
Wei Shao,
Simon J. C. Soerensen,
Richard E. Fan,
Pejman Ghanouni,
Katherine J. To'o,
James D. Brooks,
Geoffrey A. Sonn,
Mirabela Rusu
Abstract:
Prostate cancer is the second deadliest cancer for American men. While Magnetic Resonance Imaging (MRI) is increasingly used to guide targeted biopsies for prostate cancer diagnosis, its utility remains limited due to high rates of false positives and false negatives as well as low inter-reader agreements. Machine learning methods to detect and localize cancer on prostate MRI can help standardize…
▽ More
Prostate cancer is the second deadliest cancer for American men. While Magnetic Resonance Imaging (MRI) is increasingly used to guide targeted biopsies for prostate cancer diagnosis, its utility remains limited due to high rates of false positives and false negatives as well as low inter-reader agreements. Machine learning methods to detect and localize cancer on prostate MRI can help standardize radiologist interpretations. However, existing machine learning methods vary not only in model architecture, but also in the ground truth labeling strategies used for model training. In this study, we compare different labeling strategies, namely, pathology-confirmed radiologist labels, pathologist labels on whole-mount histopathology images, and lesion-level and pixel-level digital pathologist labels (previously validated deep learning algorithm on histopathology images to predict pixel-level Gleason patterns) on whole-mount histopathology images. We analyse the effects these labels have on the performance of the trained machine learning models. Our experiments show that (1) radiologist labels and models trained with them can miss cancers, or underestimate cancer extent, (2) digital pathologist labels and models trained with them have high concordance with pathologist labels, and (3) models trained with digital pathologist labels achieve the best performance in prostate cancer detection in two different cohorts with different disease distributions, irrespective of the model architecture used. Digital pathologist labels can reduce challenges associated with human annotations, including labor, time, inter- and intra-reader variability, and can help bridge the gap between prostate radiology and pathology by enabling the training of reliable machine learning models to detect and localize prostate cancer on MRI.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
Farmland Parcel Delineation Using Spatio-temporal Convolutional Networks
Authors:
Han Lin Aung,
Burak Uzkent,
Marshall Burke,
David Lobell,
Stefano Ermon
Abstract:
Farm parcel delineation provides cadastral data that is important in developing and managing climate change policies. Specifically, farm parcel delineation informs applications in downstream governmental policies of land allocation, irrigation, fertilization, green-house gases (GHG's), etc. This data can also be useful for the agricultural insurance sector for assessing compensations following dam…
▽ More
Farm parcel delineation provides cadastral data that is important in developing and managing climate change policies. Specifically, farm parcel delineation informs applications in downstream governmental policies of land allocation, irrigation, fertilization, green-house gases (GHG's), etc. This data can also be useful for the agricultural insurance sector for assessing compensations following damages associated with extreme weather events - a growing trend related to climate change. Using satellite imaging can be a scalable and cost effective manner to perform the task of farm parcel delineation to collect this valuable data. In this paper, we break down this task using satellite imaging into two approaches: 1) Segmentation of parcel boundaries, and 2) Segmentation of parcel areas. We implemented variations of UNets, one of which takes into account temporal information, which achieved the best results on our dataset on farmland parcels in France in 2017.
△ Less
Submitted 20 April, 2020; v1 submitted 11 April, 2020;
originally announced April 2020.
-
HADES-IoT: A Practical Host-Based Anomaly Detection System for IoT Devices (Extended Version)
Authors:
Dominik Breitenbacher,
Ivan Homoliak,
Yan Lin Aung,
Nils Ole Tippenhauer,
Yuval Elovici
Abstract:
Internet of Things (IoT) devices have become ubiquitous and are spread across many application domains including the industry, transportation, healthcare, and households. However, the proliferation of the IoT devices has raised the concerns about their security, especially when observing that many manufacturers focus only on the core functionality of their products due to short time to market and…
▽ More
Internet of Things (IoT) devices have become ubiquitous and are spread across many application domains including the industry, transportation, healthcare, and households. However, the proliferation of the IoT devices has raised the concerns about their security, especially when observing that many manufacturers focus only on the core functionality of their products due to short time to market and low-cost pressures, while neglecting security aspects. Moreover, it does not exist any established or standardized method for measuring and ensuring the security of IoT devices. Consequently, vulnerabilities are left untreated, allowing attackers to exploit IoT devices for various purposes, such as compromising privacy, recruiting devices into a botnet, or misusing devices to perform cryptocurrency mining.
In this paper, we present a practical Host-based Anomaly DEtection System for IoT (HADES-IoT) that represents the last line of defense. HADES-IoT has proactive detection capabilities, provides tamper-proof resistance, and it can be deployed on a wide range of Linux-based IoT devices. The main advantage of HADES-IoT is its low performance overhead, which makes it suitable for the IoT domain, where state-of-the-art approaches cannot be applied due to their high-performance demands. We deployed HADES-IoT on seven IoT devices to evaluate its effectiveness and performance overhead. Our experiments show that HADES-IoT achieved 100% effectiveness in the detection of current IoT malware such as VPNFilter and IoTReaper; while on average, requiring only 5.5% of available memory and causing only a low CPU load.
△ Less
Submitted 2 May, 2019;
originally announced May 2019.