Search | arXiv e-print repository

SigN: SIMBox Activity Detection Through Latency Anomalies at the Cellular Edge

Authors: Anne Josiane Kouam, Aline Carneiro Viana, Philippe Martins, Cedric Adjih, Alain Tchana

Abstract: Despite their widespread adoption, cellular networks face growing vulnerabilities due to their inherent complexity and the integration of advanced technologies. One of the major threats in this landscape is Voice over IP (VoIP) to GSM gateways, known as SIMBox devices. These devices use multiple SIM cards to route VoIP traffic through cellular networks, enabling international bypass fraud with los… ▽ More Despite their widespread adoption, cellular networks face growing vulnerabilities due to their inherent complexity and the integration of advanced technologies. One of the major threats in this landscape is Voice over IP (VoIP) to GSM gateways, known as SIMBox devices. These devices use multiple SIM cards to route VoIP traffic through cellular networks, enabling international bypass fraud with losses of up to $3.11 billion annually. Beyond financial impact, SIMBox activity degrades network performance, threatens national security, and facilitates eavesdropping on communications. Existing detection methods for SIMBox activity are hindered by evolving fraud techniques and implementation complexities, limiting their practical adoption in operator networks.This paper addresses the limitations of current detection methods by introducing SigN , a novel approach to identifying SIMBox activity at the cellular edge. The proposed method focuses on detecting remote SIM card association, a technique used by SIMBox appliances to mimic human mobility patterns. The method detects latency anomalies between SIMBox and standard devices by analyzing cellular signaling during network attachment. Extensive indoor and outdoor experiments demonstrate that SIMBox devices generate significantly higher attachment latencies, particularly during the authentication phase, where latency is up to 23 times greater than that of standard devices. We attribute part of this overhead to immutable factors such as LTE authentication standards and Internet-based communication protocols. Therefore, our approach offers a robust, scalable, and practical solution to mitigate SIMBox activity risks at the network edge. △ Less

Submitted 3 February, 2025; originally announced February 2025.

arXiv:2410.18053 [pdf, other]

doi 10.1145/3652892.3700761

B-Side: Binary-Level Static System Call Identification

Authors: Gaspard Thévenon, Kevin Nguetchouang, Kahina Lazri, Alain Tchana, Pierre Olivier

Abstract: System call filtering is widely used to secure programs in multi-tenant environments, and to sandbox applications in modern desktop software deployment and package management systems. Filtering rules are hard to write and maintain manually, hence generating them automatically is essential. To that aim, analysis tools able to identify every system call that can legitimately be invoked by a program… ▽ More System call filtering is widely used to secure programs in multi-tenant environments, and to sandbox applications in modern desktop software deployment and package management systems. Filtering rules are hard to write and maintain manually, hence generating them automatically is essential. To that aim, analysis tools able to identify every system call that can legitimately be invoked by a program are needed. Existing static analysis works lack precision because of a high number of false positives, and/or assume the availability of program/libraries source code -- something unrealistic in many scenarios such as cloud production environments. We present B-Side, a static binary analysis tool able to identify a superset of the system calls that an x86-64 static/dynamic executable may invoke at runtime. B-Side assumes no access to program/libraries sources, and shows a good degree of precision by leveraging symbolic execution, combined with a heuristic to detect system call wrappers, which represent an important source of precision loss in existing works. B-Side also allows to statically detect phases of execution in a program in which different filtering rules can be applied. We validate B-Side and demonstrate its higher precision compared to state-of-the-art works: over a set of popular applications, B-Side's average $F_1$ score is 0.81, vs. 0.31 and 0.53 for competitors. Over 557 static and dynamically-compiled binaries taken from the Debian repositories, B-Side identifies an average of 43 system calls, vs. 271 and 95 for two state-of-the art competitors. We further evaluate the strictness of the phase-based filtering policies that can be obtained with B-Side. △ Less

Submitted 23 October, 2024; originally announced October 2024.

Comments: Accepted to appear in the 25th ACM/IFIP International Middleware Conference (Middleware'24)

arXiv:2310.15582 [pdf, other]

doi 10.1145/3590140.3629116

SecV: Secure Code Partitioning via Multi-Language Secure Values

Authors: Peterson Yuhala, Pascal Felber, Hugo Guiroux, Jean-Pierre Lozi, Alain Tchana, Valerio Schiavoni, Gaël Thomas

Abstract: Trusted execution environments like Intel SGX provide \emph{enclaves}, which offer strong security guarantees for applications. Running entire applications inside enclaves is possible, but this approach leads to a large trusted computing base (TCB). As such, various tools have been developed to partition programs written in languages such as C or Java into \emph{trusted} and \emph{untrusted} parts… ▽ More Trusted execution environments like Intel SGX provide \emph{enclaves}, which offer strong security guarantees for applications. Running entire applications inside enclaves is possible, but this approach leads to a large trusted computing base (TCB). As such, various tools have been developed to partition programs written in languages such as C or Java into \emph{trusted} and \emph{untrusted} parts, which are run in and out of enclaves respectively. However, those tools depend on language-specific taint-analysis and partitioning techniques. They cannot be reused for other languages and there is thus a need for tools that transcend this language barrier. We address this challenge by proposing a multi-language technique to specify sensitive code or data, as well as a multi-language tool to analyse and partition the resulting programs for trusted execution environments like Intel SGX. We leverage GraalVM's Truffle framework, which provides a language-agnostic abstract syntax tree (AST) representation for programs, to provide special AST nodes called \emph{secure nodes} that encapsulate sensitive program information. Secure nodes can easily be embedded into the ASTs of a wide range of languages via Truffle's \emph{polyglot API}. Our technique includes a multi-language dynamic taint tracking tool to analyse and partition applications based on our generic secure nodes. Our extensive evaluation with micro- and macro-benchmarks shows that we can use our technique for two languages (Javascript and \python), and that partitioned programs can obtain up to $14.5\%$ performance improvement as compared to unpartitioned versions. △ Less

Submitted 20 December, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 12 pages

arXiv:2305.00766 [pdf, other]

doi 10.1145/3464298.3493406

Montsalvat: Intel SGX Shielding for GraalVM Native Images

Authors: Peterson Yuhala, Jämes Ménétrey, Pascal Felber, Valerio Schiavoni, Alain Tchana, Gaël Thomas, Hugo Guiroux, Jean-Pierre Lozi

Abstract: The popularity of the Java programming language has led to its wide adoption in cloud computing infrastructures. However, Java applications running in untrusted clouds are vulnerable to various forms of privileged attacks. The emergence of trusted execution environments (TEEs) such as Intel SGX mitigates this problem. TEEs protect code and data in secure enclaves inaccessible to untrusted software… ▽ More The popularity of the Java programming language has led to its wide adoption in cloud computing infrastructures. However, Java applications running in untrusted clouds are vulnerable to various forms of privileged attacks. The emergence of trusted execution environments (TEEs) such as Intel SGX mitigates this problem. TEEs protect code and data in secure enclaves inaccessible to untrusted software, including the kernel and hypervisors. To efficiently use TEEs, developers must manually partition their applications into trusted and untrusted parts, in order to reduce the size of the trusted computing base (TCB) and minimise the risks of security vulnerabilities. However, partitioning applications poses two important challenges: (i) ensuring efficient object communication between the partitioned components, and (ii) ensuring the consistency of garbage collection between the parts, especially with memory-managed languages such as Java. We present Montsalvat, a tool which provides a practical and intuitive annotation-based partitioning approach for Java applications destined for secure enclaves. Montsalvat provides an RMI-like mechanism to ensure inter-object communication, as well as consistent garbage collection across the partitioned components. We implement Montsalvat with GraalVM native-image, a tool for compiling Java applications ahead-of-time into standalone native executables that do not require a JVM at runtime. Our extensive evaluation with micro- and macro-benchmarks shows our partitioning approach to boost performance in real-world applications △ Less

Submitted 20 December, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: 13 pages, Proceedings of the 22nd International Middleware Conference

arXiv:2305.00763 [pdf, other]

SGX Switchless Calls Made Configless

Authors: Peterson Yuhala, Michael Paper, Timothée Zerbib, Pascal Felber, Valerio Schiavoni, Alain Tchana

Abstract: Intel's software guard extensions (SGX) provide hardware enclaves to guarantee confidentiality and integrity for sensitive code and data. However, systems leveraging such security mechanisms must often pay high performance overheads. A major source of this overhead is SGX enclave transitions which induce expensive cross-enclave context switches. The Intel SGX SDK mitigates this with a switchless c… ▽ More Intel's software guard extensions (SGX) provide hardware enclaves to guarantee confidentiality and integrity for sensitive code and data. However, systems leveraging such security mechanisms must often pay high performance overheads. A major source of this overhead is SGX enclave transitions which induce expensive cross-enclave context switches. The Intel SGX SDK mitigates this with a switchless call mechanism for transitionless cross-enclave calls using worker threads. Intel's SGX switchless call implementation improves performance but provides limited flexibility: developers need to statically fix the system configuration at build time, which is error-prone and misconfigurations lead to performance degradations and waste of CPU resources. ZC-SWITCHLESS is a configless and efficient technique to drive the execution of SGX switchless calls. Its dynamic approach optimises the total switchless worker threads at runtime to minimise CPU waste. The experimental evaluation shows that ZC-SWITCHLESS obviates the performance penalty of misconfigured switchless systems while minimising CPU waste. △ Less

Submitted 7 July, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: 10 pages, 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

arXiv:2301.02059 [pdf, other]

Zen: LSTM-based generation of individual spatiotemporal cellular traffic with interactions

Authors: Anne Josiane Kouam, Aline Carneiro Viana, Alain Tchana

Abstract: Domain-wide recognized by their high value in human presence and activity studies, cellular network datasets (i.e., Charging Data Records, named CdRs), however, present accessibility, usability, and privacy issues, restricting their exploitation and research reproducibility.This paper tackles such challenges by modeling Cdrs that fulfill real-world data attributes. Our designed framework, named Ze… ▽ More Domain-wide recognized by their high value in human presence and activity studies, cellular network datasets (i.e., Charging Data Records, named CdRs), however, present accessibility, usability, and privacy issues, restricting their exploitation and research reproducibility.This paper tackles such challenges by modeling Cdrs that fulfill real-world data attributes. Our designed framework, named Zen follows a four-fold methodology related to (i) the LTSM-based modeling of users' traffic behavior, (ii) the realistic and flexible emulation of spatiotemporal mobility behavior, (iii) the structure of lifelike cellular network infrastructure and social interactions, and (iv) the combination of the three previous modules into realistic Cdrs traces with an individual basis, realistically. Results show that Zen's first and third models accurately capture individual and global distributions of a fully anonymized real-world Cdrs dataset, while the second model is consistent with the literature's revealed features in human mobility. Finally, we validate Zen Cdrs ability of reproducing daily cellular behaviors of the urban population and its usefulness in practical networking applications such as dynamic population tracing, Radio Access Network's power savings, and anomaly detection as compared to real-world CdRs. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Report number: hal-03910141

arXiv:2205.10929 [pdf, other]

rgpdOS: GDPR Enforcement By The Operating System

Authors: Alain Tchana, Raphael Colin, Adrien Le Berre, Vincent Berger, Benoit Combemale, Natacha Crooks, Ludovic Pailler

Abstract: The General Data Protection Regulation (GDPR) forces IT companies to comply with a number of principles when dealing with European citizens' personal data. Non-compliant companies are exposed to penalties which may represent up to 4% of their turnover. Currently, it is very hard for companies driven by personal data to make their applications GDPR-compliant, especially if those applications were d… ▽ More The General Data Protection Regulation (GDPR) forces IT companies to comply with a number of principles when dealing with European citizens' personal data. Non-compliant companies are exposed to penalties which may represent up to 4% of their turnover. Currently, it is very hard for companies driven by personal data to make their applications GDPR-compliant, especially if those applications were developed before the GDPR was established. We present rgpdOS, a GDPR-aware operating system that aims to bring GDPR-compliance to every application, while requiring minimal changes to application code. △ Less

Submitted 30 May, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

arXiv:2205.06842 [pdf, other]

Virtual Disk Snapshot Management at Scale

Authors: Kevin Nguetchouang, Theophile Dubuc, Stella Bitchebe, Alain Tchana, Pierre Olivier

Abstract: Contrary to the other resources such as CPU, memory, and network, for which virtualization is efficiently achieved through direct access, disk virtualization is peculiar. In this paper, we make four contributions. Our first contribution is the characterization of disk utilization in a public large-scale cloud infrastructure. It reveals the presence of long snapshot chains, sometimes composed of up… ▽ More Contrary to the other resources such as CPU, memory, and network, for which virtualization is efficiently achieved through direct access, disk virtualization is peculiar. In this paper, we make four contributions. Our first contribution is the characterization of disk utilization in a public large-scale cloud infrastructure. It reveals the presence of long snapshot chains, sometimes composed of up to 1000 files. Our second contribution is to show that long chains lead to performance and memory footprint scalability issues by experimental measurements. Our third contribution is the extension of the Qcow2 format and its driver in Qemu to address the identified scalability challenges. Our fourth contribution is the thorough evaluation of our prototype, called sQemu, demonstrating that it brings significant performance enhancements and memory footprint reduction. For example, it improves the throughput of RocksDB by about 48% compared to vanilla Qemu on a snapshot chain of length 500. The memory overhead on that chain is also reduced by 15x. △ Less

Submitted 13 May, 2022; originally announced May 2022.

arXiv:2202.13483 [pdf, other]

Out of Hypervisor (OoH): When Nested Virtualization Becomes Practical

Authors: Stella Bitchebe, Alain Tchana

Abstract: This paper introduces Out of Hypervisor (OoH), a new research axis close to nested virtualization. Instead of emulating a full virtual hardware inside a VM to support a hypervisor, the OoH principle is to individually expose current hypervisor-oriented hardware virtualization features to the guest OS so that its processes could also take benefit from those features. In fact, several hardware virtu… ▽ More This paper introduces Out of Hypervisor (OoH), a new research axis close to nested virtualization. Instead of emulating a full virtual hardware inside a VM to support a hypervisor, the OoH principle is to individually expose current hypervisor-oriented hardware virtualization features to the guest OS so that its processes could also take benefit from those features. In fact, several hardware virtualization features such as Intel PML, SPP, CAT, and EPT which currently can only be used by the hypervisor also be beneficial for processes that run inside the VM. We illustrate OoH with Intel PML (Page Modification Logging), a feature which allows efficient dirty page tracking for improving VM live migration. According to the fact that dirty page tracking is at the heart of process checkpointing (CRIU) and concurrent garbage collection (Boehm), we present two OoH PML designs namely Shadow PML (SPML) and Extended PML (EPML). The former requires no hardware changes but incurs significant overhead, justifying EPML which extends PML. We evaluated and compared SPML and EPML with /proc and userfaultfd,t wo default solutions in Linux. We do this using a key-value store database as the benchmark. The results show that EPML reduces CRIU checkpointing time by about 14% while leading to a negligible overhead (of about 0.5%) compared to SPML, /proc, and userfaultfd. △ Less

Submitted 27 February, 2022; originally announced February 2022.

arXiv:2104.04060 [pdf, other]

Network in Disaggregated Datacenters

Authors: Brice Ekane, Yohan Pipereau, Boris Teabe, Alain Tchana, Gael Thomas, Noel de palma, Daniel Hagimont

Abstract: Nowadays, datacenters lean on a computer-centric approach based on monolithic servers which include all necessary hardware resources (mainly CPU, RAM, network and disks) to run applications. Such an architecture comes with two main limitations: (1) difficulty to achieve full resource utilization and (2) coarse granularity for hardware maintenance. Recently, many works investigated a resource-centr… ▽ More Nowadays, datacenters lean on a computer-centric approach based on monolithic servers which include all necessary hardware resources (mainly CPU, RAM, network and disks) to run applications. Such an architecture comes with two main limitations: (1) difficulty to achieve full resource utilization and (2) coarse granularity for hardware maintenance. Recently, many works investigated a resource-centric approach called disaggregated architecture where the datacenter is composed of self-content resource boards interconnected using fast interconnection technologies, each resource board including instances of one resource type. The resource-centric architecture allows each resource to be managed (maintenance, allocation) independently. LegoOS is the first work which studied the implications of disaggregation on the operating system, proposing to disaggregate the operating system itself. They demonstrated the suitability of this approach, considering mainly CPU and RAM resources. However, they didn't study the implication of disaggregation on network resources. We reproduced a LegoOS infrastructure and extended it to support disaggregated networking. We show that networking can be disaggregated following the same principles, and that classical networking optimizations such as DMA, DDIO or loopback can be reproduced in such an environment. Our evaluations show the viability of the approach and the potential of future disaggregated infrastructures. △ Less

Submitted 15 March, 2021; originally announced April 2021.

Comments: 10 pages, 8 figures

arXiv:2104.02987 [pdf, other]

Plinius: Secure and Persistent Machine Learning Model Training

Authors: Peterson Yuhala, Pascal Felber, Valerio Schiavoni, Alain Tchana

Abstract: With the increasing popularity of cloud based machine learning (ML) techniques there comes a need for privacy and integrity guarantees for ML data. In addition, the significant scalability challenges faced by DRAM coupled with the high access-times of secondary storage represent a huge performance bottleneck for ML systems. While solutions exist to tackle the security aspect, performance remains a… ▽ More With the increasing popularity of cloud based machine learning (ML) techniques there comes a need for privacy and integrity guarantees for ML data. In addition, the significant scalability challenges faced by DRAM coupled with the high access-times of secondary storage represent a huge performance bottleneck for ML systems. While solutions exist to tackle the security aspect, performance remains an issue. Persistent memory (PM) is resilient to power loss (unlike DRAM), provides fast and fine-granular access to memory (unlike disk storage) and has latency and bandwidth close to DRAM (in the order of ns and GB/s, respectively). We present PLINIUS, a ML framework using Intel SGX enclaves for secure training of ML models and PM for fault tolerance guarantees. P LINIUS uses a novel mirroring mechanism to create and maintain (i) encrypted mirror copies of ML models on PM, and (ii) encrypted training data in byte-addressable PM, for near-instantaneous data recovery after a system failure. Compared to disk-based checkpointing systems,PLINIUS is 3.2x and 3.7x faster respectively for saving and restoring models on real PM hardware, achieving robust and secure ML model training in SGX enclaves. △ Less

Submitted 8 April, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

arXiv:2006.00380 [pdf, ps, other]

Memory virtualization in virtualized systems: segmentation is better than paging

Authors: Boris Teabe, Peterson Yuhala, Alain Tchana, Fabien Hermenier, Daniel Hagimont, Gilles Muller

Abstract: The utilization of paging for virtual machine (VM) memory management is the root cause of memory virtualization overhead. This paper shows that paging is not necessary in the hypervisor. In fact, memory fragmentation, which explains paging utilization, is not an issue in virtualized datacenters thanks to VM memory demand patterns. Our solution Compromis, a novel Memory Management Unit, uses direct… ▽ More The utilization of paging for virtual machine (VM) memory management is the root cause of memory virtualization overhead. This paper shows that paging is not necessary in the hypervisor. In fact, memory fragmentation, which explains paging utilization, is not an issue in virtualized datacenters thanks to VM memory demand patterns. Our solution Compromis, a novel Memory Management Unit, uses direct segment for VM memory management combined with paging for VM's processes. The paper presents a systematic methodology for implementing Compromis in the hardware, the hypervisor and the datacenter scheduler. Evaluation results show that Compromis outperforms the two popular memory virtualization solutions: shadow paging and Extended Page Table by up to 30% and 370% respectively. △ Less

Submitted 30 May, 2020; originally announced June 2020.

arXiv:2001.09991 [pdf, other]

Intel Page Modification Logging, a hardware virtualization feature: study and improvement for virtual machine working set estimation

Authors: Stella Bitchebe, Djob Mvondo, Alain Tchana, Laurent Réveillère, Noël De Palma

Abstract: Intel Page Modification Logging (PML) is a novel hardware feature for tracking virtual machine (VM) accessed memory pages. This task is essential in today's data centers since it allows, among others, checkpointing, live migration and working set size (WSS) estimation. Relying on the Xen hypervisor, this paper studies PML from three angles: power consumption, efficiency, and performance impact on… ▽ More Intel Page Modification Logging (PML) is a novel hardware feature for tracking virtual machine (VM) accessed memory pages. This task is essential in today's data centers since it allows, among others, checkpointing, live migration and working set size (WSS) estimation. Relying on the Xen hypervisor, this paper studies PML from three angles: power consumption, efficiency, and performance impact on user applications. Our findings are as follows. First, PML does not incur any power consumption overhead. Second, PML reduces by up to 10.18% both VM live migration and checkpointing time. Third, PML slightly reduces by up to 0.95% the performance degradation on applications incurred by live migration and checkpointing. Fourth, PML however does not allow accurate WSS estimation because read accesses are not tracked and hot pages cannot be identified. A naive extension of PML for addressing these limitations could lead to severe performance degradation (up to 34.8%) for the VM whose WSS is computed. This paper presents Page Reference Logging (PRL), a smart extension of PML for allowing both read and write accesses to be tracked. It does this without impacting user VMs. The paper also presents a WSS estimation system which leverages PRL and shows how this algorithm can be integrated into a data center which implements memory overcommitment. We implement PRL and the WSS estimation system under Gem5, a very popular hardware simulator. The evaluation results validate the accuracy of PRL in the estimation of WSS. They also show that PRL incurs no performance degradation for user VMs. △ Less

Submitted 26 January, 2020; originally announced January 2020.

arXiv:1901.01222 [pdf, other]

Efficient, Dynamic Multi-tenant Edge Computation in EdgeOS

Authors: Yuxin Ren, Vlad Nitu, Guyue Liu, Gabriel Parmer, Timothy Wood, Alain Tchana, Riley Kennedy

Abstract: In the future, computing will be immersed in the world around us -- from augmented reality to autonomous vehicles to the Internet of Things. Many of these smart devices will offer services that respond in real time to their physical surroundings, requiring complex processing with strict performance guarantees. Edge clouds promise a pervasive computational infrastructure a short network hop away fr… ▽ More In the future, computing will be immersed in the world around us -- from augmented reality to autonomous vehicles to the Internet of Things. Many of these smart devices will offer services that respond in real time to their physical surroundings, requiring complex processing with strict performance guarantees. Edge clouds promise a pervasive computational infrastructure a short network hop away from end devices, but today's operating systems are a poor fit to meet the goals of scalable isolation, dense multi-tenancy, and predictable performance required by these emerging applications. In this paper we present EdgeOS, a micro-kernel based operating system that meets these goals by blending recent advances in real-time systems and network function virtualization. EdgeOS introduces a Featherweight Process model that offers lightweight isolation and supports extreme scalability even under high churn. Our architecture provides efficient communication mechanisms, and low-overhead per-client isolation. To achieve high performance networking, EdgeOS employs kernel bypass paired with the isolation properties of Featherweight Processes. We have evaluated our EdgeOS prototype for running high scale network middleboxes using the Click software router and endpoint applications using memcached. EdgeOS reduces startup latency by 170X compared to Linux processes and over five orders of magnitude compared to containers, while providing three orders of magnitude latency improvement when running 300 to 1000 edge-cloud memcached instances on one server. △ Less

Submitted 4 January, 2019; originally announced January 2019.

Showing 1–14 of 14 results for author: Tchana, A