-
Inferring Networked Device Categories from Low-Level Activity Indicators
Authors:
Kyumars Sheykh Esmaili,
Jaideep Chandrashekar,
Pascal Le Guyadec
Abstract:
We study the problem of inferring the type of a networked device in a home network by leveraging low level traffic activity indicators seen at commodity home gateways. We analyze a dataset of detailed device network activity obtained from 240 subscriber homes of a large European ISP and extract a number of traffic and spatial fingerprints for individual devices. We develop a two level taxonomy to…
▽ More
We study the problem of inferring the type of a networked device in a home network by leveraging low level traffic activity indicators seen at commodity home gateways. We analyze a dataset of detailed device network activity obtained from 240 subscriber homes of a large European ISP and extract a number of traffic and spatial fingerprints for individual devices. We develop a two level taxonomy to describe devices onto which we map individual devices using a number of heuristics. We leverage the heuristically derived labels to train classifiers that distinguish device classes based on the traffic and spatial fingerprints of a device. Our results show an accuracy level up to 91% for the coarse level category and up to 84% for the fine grained category. By incorporating information from other sources (e.g., MAC OUI), we are able to further improve accuracy to above 97% and 92%, respectively. Finally, we also extract a set of simple and human-readable rules that concisely capture the behaviour of these distinct device categories.
△ Less
Submitted 1 September, 2017;
originally announced September 2017.
-
Kafka versus RabbitMQ
Authors:
Philippe Dobbelaere,
Kyumars Sheykh Esmaili
Abstract:
Publish/subscribe is a distributed interaction paradigm well adapted to the deployment of scalable and loosely coupled systems.
Apache Kafka and RabbitMQ are two popular open-source and commercially-supported pub/sub systems that have been around for almost a decade and have seen wide adoption. Given the popularity of these two systems and the fact that both are branded as pub/sub systems, two f…
▽ More
Publish/subscribe is a distributed interaction paradigm well adapted to the deployment of scalable and loosely coupled systems.
Apache Kafka and RabbitMQ are two popular open-source and commercially-supported pub/sub systems that have been around for almost a decade and have seen wide adoption. Given the popularity of these two systems and the fact that both are branded as pub/sub systems, two frequently asked questions in the relevant online forums are: how do they compare against each other and which one to use?
In this paper, we frame the arguments in a holistic approach by establishing a common comparison framework based on the core functionalities of pub/sub systems. Using this framework, we then venture into a qualitative and quantitative (i.e. empirical) comparison of the common features of the two systems. Additionally, we also highlight the distinct features that each of these systems has. After enumerating a set of use cases that are best suited for RabbitMQ or Kafka, we try to guide the reader through a determination table to choose the best architecture given his/her particular set of requirements.
△ Less
Submitted 1 September, 2017;
originally announced September 2017.
-
On the Effectiveness of Polynomial Realization of Reed-Solomon Codes for Storage Systems
Authors:
Kyumars Sheykh Esmaili,
Anwitaman Datta
Abstract:
There are different ways to realize Reed Solomon (RS) codes. While in the storage community, using the generator matrices to implement RS codes is more popular, in the coding theory community the generator polynomials are typically used to realize RS codes. Prominent exceptions include HDFS-RAID, which uses generator polynomial based erasure codes, and extends the Apache Hadoop's file system.
In…
▽ More
There are different ways to realize Reed Solomon (RS) codes. While in the storage community, using the generator matrices to implement RS codes is more popular, in the coding theory community the generator polynomials are typically used to realize RS codes. Prominent exceptions include HDFS-RAID, which uses generator polynomial based erasure codes, and extends the Apache Hadoop's file system.
In this paper we evaluate the performance of an implementation of polynomial realization of Reed-Solomon codes, along with our optimized version of it, against that of a widely-used library (Jerasure) that implements the main matrix realization alternatives. Our experimental study shows that despite significant performance gains yielded by our optimizations, the polynomial implementations' performance is constantly inferior to those of matrix realization alternatives in general, and that of Cauchy bit matrices in particular.
△ Less
Submitted 16 December, 2013;
originally announced December 2013.
-
The CORE Storage Primitive: Cross-Object Redundancy for Efficient Data Repair & Access in Erasure Coded Storage
Authors:
Kyumars Sheykh Esmaili,
Lluis Pamies-Juarez,
Anwitaman Datta
Abstract:
Erasure codes are an integral part of many distributed storage systems aimed at Big Data, since they provide high fault-tolerance for low overheads. However, traditional erasure codes are inefficient on reading stored data in degraded environments (when nodes might be unavailable), and on replenishing lost data (vital for long term resilience). Consequently, novel codes optimized to cope with dist…
▽ More
Erasure codes are an integral part of many distributed storage systems aimed at Big Data, since they provide high fault-tolerance for low overheads. However, traditional erasure codes are inefficient on reading stored data in degraded environments (when nodes might be unavailable), and on replenishing lost data (vital for long term resilience). Consequently, novel codes optimized to cope with distributed storage system nuances are vigorously being researched. In this paper, we take an engineering alternative, exploring the use of simple and mature techniques -juxtaposing a standard erasure code with RAID-4 like parity. We carry out an analytical study to determine the efficacy of this approach over traditional as well as some novel codes. We build upon this study to design CORE, a general storage primitive that we integrate into HDFS. We benchmark this implementation in a proprietary cluster and in EC2. Our experiments show that compared to traditional erasure codes, CORE uses 50% less bandwidth and is up to 75% faster while recovering a single failed node, while the gains are respectively 15% and 60% for double node failures.
△ Less
Submitted 26 June, 2013; v1 submitted 21 February, 2013;
originally announced February 2013.
-
Challenges in Kurdish Text Processing
Authors:
Kyumars Sheykh Esmaili
Abstract:
Despite having a large number of speakers, the Kurdish language is among the less-resourced languages. In this work we highlight the challenges and problems in providing the required tools and techniques for processing texts written in Kurdish. From a high-level perspective, the main challenges are: the inherent diversity of the language, standardization and segmentation issues, and the lack of la…
▽ More
Despite having a large number of speakers, the Kurdish language is among the less-resourced languages. In this work we highlight the challenges and problems in providing the required tools and techniques for processing texts written in Kurdish. From a high-level perspective, the main challenges are: the inherent diversity of the language, standardization and segmentation issues, and the lack of language resources.
△ Less
Submitted 1 December, 2012;
originally announced December 2012.