Skip to main content

Showing 1–42 of 42 results for author: State, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.07295  [pdf, ps, other

    cs.CL

    Exploring the Impact of Temperature on Large Language Models:Hot or Cold?

    Authors: Lujun Li, Lama Sleem, Niccolo' Gentile, Geoffrey Nichil, Radu State

    Abstract: The sampling temperature, a critical hyperparameter in large language models (LLMs), modifies the logits before the softmax layer, thereby reshaping the distribution of output tokens. Recent studies have challenged the Stochastic Parrots analogy by demonstrating that LLMs are capable of understanding semantics rather than merely memorizing data and that randomness, modulated by sampling temperatur… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  2. arXiv:2505.16078  [pdf, ps, other

    cs.CL

    Small Language Models in the Real World: Insights from Industrial Text Classification

    Authors: Lujun Li, Lama Sleem, Niccolo' Gentile, Geoffrey Nichil, Radu State

    Abstract: With the emergence of ChatGPT, Transformer models have significantly advanced text classification and related tasks. Decoder-only models such as Llama exhibit strong performance and flexibility, yet they suffer from inefficiency on inference due to token-by-token generation, and their effectiveness in text classification tasks heavily depends on prompt quality. Moreover, their substantial GPU reso… ▽ More

    Submitted 23 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  3. arXiv:2503.24102  [pdf, ps, other

    cs.CL

    Is LLM the Silver Bullet to Low-Resource Languages Machine Translation?

    Authors: Yewei Song, Lujun Li, Cedric Lothritz, Saad Ezzini, Lama Sleem, Niccolo Gentile, Radu State, Tegawendé F. Bissyandé, Jacques Klein

    Abstract: Low-Resource Languages (LRLs) present significant challenges in natural language processing due to their limited linguistic resources and underrepresentation in standard datasets. While recent advances in Large Language Models (LLMs) and Neural Machine Translation have substantially improved translation capabilities for high-resource languages, performance disparities persist for LRLs, particularl… ▽ More

    Submitted 5 June, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

  4. LongKey: Keyphrase Extraction for Long Documents

    Authors: Jeovane Honorio Alves, Radu State, Cinthia Obladen de Almendra Freitas, Jean Paul Barddal

    Abstract: In an era of information overload, manually annotating the vast and growing corpus of documents and scholarly papers is increasingly impractical. Automated keyphrase extraction addresses this challenge by identifying representative terms within texts. However, most existing methods focus on short documents (up to 512 tokens), leaving a gap in processing long-context documents. In this paper, we in… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: Accepted for presentation at the 2024 IEEE International Conference on Big Data (IEEE BigData 2024). Code available at https://github.com/jeohalves/longkey

  5. arXiv:2405.08044  [pdf, other

    cs.LG cs.AI

    On the Volatility of Shapley-Based Contribution Metrics in Federated Learning

    Authors: Arno Geimer, Beltran Fiz, Radu State

    Abstract: Federated learning (FL) is a collaborative and privacy-preserving Machine Learning paradigm, allowing the development of robust models without the need to centralize sensitive data. A critical challenge in FL lies in fairly and accurately allocating contributions from diverse participants. Inaccurate allocation can undermine trust, lead to unfair compensation, and thus participants may lack the in… ▽ More

    Submitted 26 May, 2025; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at IJCNN 2025

  6. arXiv:2401.07398  [pdf, other

    cs.CV cs.LG eess.IV

    Cross Domain Early Crop Mapping using CropSTGAN

    Authors: Yiqun Wang, Hui Huang, Radu State

    Abstract: Driven by abundant satellite imagery, machine learning-based approaches have recently been promoted to generate high-resolution crop cultivation maps to support many agricultural applications. One of the major challenges faced by these approaches is the limited availability of ground truth labels. In the absence of ground truth, existing work usually adopts the "direct transfer strategy" that trai… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

  7. arXiv:2301.10209  [pdf, other

    cs.NI cs.SI

    XRP-NDN Overlay: Improving the Communication Efficiency of Consensus-Validation based Blockchains with an NDN Overlay

    Authors: Lucian Trestioreanu, Wazen M. Shbair, Flaviene Scheidt de Cristo, Radu State

    Abstract: With the growing adoption of Distributed Ledger Technologies and the subsequent scaling of these networks, there is an inherent need for efficient and resilient communication used by the underlying consensus and replication mechanisms. While resilient and efficient communication is one of the main pillars of an efficient blockchain network as a whole, the Distributed Ledger Technology is still rel… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: 8 pages (arxiv); IEEE NOMS 2023 conference (4 pages)

  8. arXiv:2206.10446  [pdf, other

    cs.SE

    Deep dive into Interledger: Understanding the Interledger ecosystem

    Authors: Lucian Trestioreanu, Cyril Cassagnes, Radu State

    Abstract: At the technical level, the goal of Interledger is to provide an architecture and a minimal set of protocols to enable interoperability between any value transfer systems. The Interledger protocol is a protocol for inter-blockchain payments which can also accommodate FIAT currencies. To understand how it is possible to achieve this goal, several aspects of the technology require a deeper analysis.… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: 65 pages, 28 figures, 4 tables

  9. A Flash(bot) in the Pan: Measuring Maximal Extractable Value in Private Pools

    Authors: Ben Weintraub, Christof Ferreira Torres, Cristina Nita-Rotaru, Radu State

    Abstract: The rise of Ethereum has lead to a flourishing decentralized marketplace that has, unfortunately, fallen victim to frontrunning and Maximal Extractable Value (MEV) activities, where savvy participants game transaction orderings within a block for profit. One popular solution to address such behavior is Flashbots, a private pool with infrastructure and design goals aimed at eliminating the negative… ▽ More

    Submitted 28 September, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: 14 pages, ACM IMC 2022

  10. Topology Analysis of the XRP Ledger

    Authors: Vytautas Tumas, Sean Rivera, Damien Magoni, Radu State

    Abstract: XRP Ledger is one of the oldest, well-established blockchains. Despite the popularity of the XRP Ledger, little is known about its underlying peer-to-peer network. The structural properties of a network impact its efficiency, security and robustness. We aim to close the knowledge gap by providing a detailed analysis of the XRP overlay network. In this paper we examine the graph-theoretic propert… ▽ More

    Submitted 10 January, 2023; v1 submitted 2 May, 2022; originally announced May 2022.

    Comments: Extended edition, 8 pages. In The 38th ACM/SIGAPP Symposium on Applied Computing, March 27 - March 31, 2023, Tallinn, Estonia

  11. arXiv:2110.09207  [pdf, other

    cs.CR

    SPON: Enabling Resilient Inter-Ledgers Payments with an Intrusion-Tolerant Overlay

    Authors: Lucian Trestioreanu, Cristina Nita-Rotaru, Aanchal Malhotra, Radu State

    Abstract: Payment systems are a critical component of everyday life in our society. While in many situations payments are still slow, opaque, siloed, expensive or even fail, users expect them to be fast, transparent, cheap, reliable and global. Recent technologies such as distributed ledgers create opportunities for near-real-time, cheaper and more transparent payments. However, in order to achieve a global… ▽ More

    Submitted 3 November, 2021; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: 9 pages, 14 figures, IEEE Conference on Communications and Network Security October 2021

  12. arXiv:2108.10071  [pdf, other

    cs.CR

    Elysium: Context-Aware Bytecode-Level Patching to Automatically Heal Vulnerable Smart Contracts

    Authors: Christof Ferreira Torres, Hugo Jonker, Radu State

    Abstract: Fixing bugs is easiest by patching source code. However, source code is not always available: only 0.3% of the ~49M smart contracts that are currently deployed on Ethereum have their source code publicly available. Moreover, since contracts may call functions from other contracts, security flaws in closed-source contracts may affect open-source contracts as well. However, current state-of-the-art… ▽ More

    Submitted 4 July, 2022; v1 submitted 23 August, 2021; originally announced August 2021.

  13. arXiv:2106.11036  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Know Your Model (KYM): Increasing Trust in AI and Machine Learning

    Authors: Mary Roszel, Robert Norvill, Jean Hilger, Radu State

    Abstract: The widespread utilization of AI systems has drawn attention to the potential impacts of such systems on society. Of particular concern are the consequences that prediction errors may have on real-world scenarios, and the trust humanity places in AI systems. It is necessary to understand how we can evaluate trustworthiness in AI and how individuals and entities alike can develop trustworthy AI sys… ▽ More

    Submitted 31 May, 2021; originally announced June 2021.

    Comments: 10 pages

  14. arXiv:2102.03347  [pdf, other

    cs.CR

    Frontrunner Jones and the Raiders of the Dark Forest: An Empirical Study of Frontrunning on the Ethereum Blockchain

    Authors: Christof Ferreira Torres, Ramiro Camino, Radu State

    Abstract: Ethereum prospered the inception of a plethora of smart contract applications, ranging from gambling games to decentralized finance. However, Ethereum is also considered a highly adversarial environment, where vulnerable smart contracts will eventually be exploited. Recently, Ethereum's pool of pending transaction has become a far more aggressive environment. In the hope of making some profit, att… ▽ More

    Submitted 3 June, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

  15. arXiv:2101.06204  [pdf, other

    cs.CR

    The Eye of Horus: Spotting and Analyzing Attacks on Ethereum Smart Contracts

    Authors: Christof Ferreira Torres, Antonio Ken Iannillo, Arthur Gervais, Radu State

    Abstract: In recent years, Ethereum gained tremendously in popularity, growing from a daily transaction average of 10K in January 2016 to an average of 500K in January 2020. Similarly, smart contracts began to carry more value, making them appealing targets for attackers. As a result, they started to become victims of attacks, costing millions of dollars. In response to these attacks, both academia and indu… ▽ More

    Submitted 15 January, 2021; originally announced January 2021.

  16. arXiv:2005.12156  [pdf, other

    cs.CR

    ConFuzzius: A Data Dependency-Aware Hybrid Fuzzer for Smart Contracts

    Authors: Christof Ferreira Torres, Antonio Ken Iannillo, Arthur Gervais, Radu State

    Abstract: Smart contracts are Turing-complete programs that are executed across a blockchain. Unlike traditional programs, once deployed, they cannot be modified. As smart contracts carry more value, they become more of an exciting target for attackers. Over the last years, they suffered from exploits costing millions of dollars due to simple programming mistakes. As a result, a variety of tools for detecti… ▽ More

    Submitted 10 March, 2021; v1 submitted 25 May, 2020; originally announced May 2020.

  17. arXiv:2005.03773  [pdf, other

    cs.LG stat.ML

    Minority Class Oversampling for Tabular Data with Deep Generative Models

    Authors: Ramiro Camino, Christian Hammerschmidt, Radu State

    Abstract: In practice, machine learning experts are often confronted with imbalanced data. Without accounting for the imbalance, common classifiers perform poorly and standard evaluation metrics mislead the practitioners on the model's performance. A common method to treat imbalanced datasets is under- and oversampling. In this process, samples are either removed from the majority class or synthetic samples… ▽ More

    Submitted 20 July, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

  18. arXiv:2003.09241  [pdf

    cs.GT cs.CY

    Blockchain Governance: An Overview and Prediction of Optimal Strategies using Nash Equilibrium

    Authors: Nida Khan, Tabrez Ahmad, Anass Patel, Radu State

    Abstract: Blockchain governance is a subject of ongoing research and an interdisciplinary view of blockchain governance is vital to aid in further research for establishing a formal governance framework for this nascent technology. In this paper, the position of blockchain governance within the hierarchy of Institutional governance is discussed. Blockchain governance is analyzed from the perspective of IT g… ▽ More

    Submitted 20 March, 2020; originally announced March 2020.

    Comments: Accepted for publication in AUEIRC-Springer 2020

  19. arXiv:1910.01449  [pdf, ps, other

    cs.CR cs.LG stat.ML

    A Data Science Approach for Honeypot Detection in Ethereum

    Authors: Ramiro Camino, Christof Ferreira Torres, Mathis Baden, Radu State

    Abstract: Ethereum smart contracts have recently drawn a considerable amount of attention from the media, the financial industry and academia. With the increase in popularity, malicious users found new opportunities to profit by deceiving newcomers. Consequently, attackers started luring other attackers into contracts that seem to have exploitable flaws, but that actually contain a complex hidden trap that… ▽ More

    Submitted 19 December, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

  20. arXiv:1908.09899  [pdf, other

    cs.LG stat.ML

    SynGAN: Towards Generating Synthetic Network Attacks using GANs

    Authors: Jeremy Charlier, Aman Singh, Gaston Ormazabal, Radu State, Henning Schulzrinne

    Abstract: The rapid digital transformation without security considerations has resulted in the rise of global-scale cyberattacks. The first line of defense against these attacks are Network Intrusion Detection Systems (NIDS). Once deployed, however, these systems work as blackboxes with a high rate of false positives with no measurable effectiveness. There is a need to continuously test and improve these sy… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

  21. arXiv:1905.13020  [pdf, other

    cs.LG stat.ML

    Visualization of AE's Training on Credit Card Transactions with Persistent Homology

    Authors: Jeremy Charlier, Francois Petit, Gaston Ormazabal, Radu State, Jean Hilger

    Abstract: Auto-encoders are among the most popular neural network architecture for dimension reduction. They are composed of two parts: the encoder which maps the model distribution to a latent manifold and the decoder which maps the latent manifold to a reconstructed distribution. However, auto-encoders are known to provoke chaotically scattered data distribution in the latent manifold resulting in an inco… ▽ More

    Submitted 12 August, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1905.09894

  22. arXiv:1905.12568  [pdf, other

    cs.LG stat.ML

    Predicting Sparse Clients' Actions with CPOPT-Net in the Banking Environment

    Authors: Jeremy Charlier, Radu State, Jean Hilger

    Abstract: The digital revolution of the banking system with evolving European regulations have pushed the major banking actors to innovate by a newly use of their clients' digital information. Given highly sparse client activities, we propose CPOPT-Net, an algorithm that combines the CP canonical tensor decomposition, a multidimensional matrix decomposition that factorizes a tensor as the sum of rank-one te… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

  23. arXiv:1905.12567  [pdf, other

    cs.LG stat.ML

    MQLV: Optimal Policy of Money Management in Retail Banking with Q-Learning

    Authors: Jeremy Charlier, Gaston Ormazabal, Radu State, Jean Hilger

    Abstract: Reinforcement learning has become one of the best approach to train a computer game emulator capable of human level performance. In a reinforcement learning approach, an optimal value function is learned across a set of actions, or decisions, that leads to a set of states giving different rewards, with the objective to maximize the overall reward. A policy assigns to each state-action pairs an exp… ▽ More

    Submitted 21 August, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

  24. arXiv:1905.10363  [pdf, other

    math.NA cs.CE cs.LG stat.ML

    User-Device Authentication in Mobile Banking using APHEN for Paratuck2 Tensor Decomposition

    Authors: Jeremy Charlier, Eric Falk, Radu State, Jean Hilger

    Abstract: The new financial European regulations such as PSD2 are changing the retail banking services. Noticeably, the monitoring of the personal expenses is now opened to other institutions than retail banks. Nonetheless, the retail banks are looking to leverage the user-device authentication on the mobile banking applications to enhance the personal financial advertisement. To address the profiling of th… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

  25. arXiv:1905.09894  [pdf, other

    cs.LG stat.ML

    PHom-GeM: Persistent Homology for Generative Models

    Authors: Jeremy Charlier, Radu State, Jean Hilger

    Abstract: Generative neural network models, including Generative Adversarial Network (GAN) and Auto-Encoders (AE), are among the most popular neural network models to generate adversarial data. The GAN model is composed of a generator that produces synthetic data and of a discriminator that discriminates between the generator's output and the true data. AE consist of an encoder which maps the model distribu… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

  26. arXiv:1905.09869  [pdf, other

    cs.CE math.NA

    Non-Negative PARATUCK2 Tensor Decomposition Combined to LSTM Network For Smart Contracts Profiling

    Authors: Jeremy Charlier, Radu State, Jean Hilger

    Abstract: Smart contracts are programs stored and executed on a blockchain. The Ethereum platform, an open-source blockchain-based platform, has been designed to use these programs offering secured protocols and transaction costs reduction. The Ethereum Virtual Machine performs smart contracts runs, where the execution of each contract is limited to the amount of gas required to execute the operations descr… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

  27. arXiv:1902.11212  [pdf, other

    cs.GT cs.MA

    Infer Your Enemies and Know Yourself, Learning in Real-Time Bidding with Partially Observable Opponents

    Authors: Manxing Du, Alexander I. Cowen-Rivers, Ying Wen, Phu Sakulwongtana, Jun Wang, Mats Brorsson, Radu State

    Abstract: Real-time bidding, as one of the most popular mechanisms for selling online ad slots, facilitates advertisers to reach their potential customers. The goal of bidding optimization is to maximize the advertisers' return on investment (ROI) under a certain budget setting. A straightforward solution is to model the bidding function in an explicit form. However, the static functional solutions lack gen… ▽ More

    Submitted 28 February, 2019; originally announced February 2019.

  28. arXiv:1902.10666  [pdf, other

    cs.LG stat.ML

    Improving Missing Data Imputation with Deep Generative Models

    Authors: Ramiro D. Camino, Christian A. Hammerschmidt, Radu State

    Abstract: Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative models. Previous experiments with Generative Adversarial Networks and Variational Autoencoders showed interesting results in this domain, but it is not clear whic… ▽ More

    Submitted 27 February, 2019; originally announced February 2019.

  29. arXiv:1902.06976  [pdf, other

    cs.CR

    The Art of The Scam: Demystifying Honeypots in Ethereum Smart Contracts

    Authors: Christof Ferreira Torres, Mathis Steichen, Radu State

    Abstract: Modern blockchains, such as Ethereum, enable the execution of so-called smart contracts - programs that are executed across a decentralised network of nodes. As smart contracts become more popular and carry more value, they become more of an interesting target for attackers. In the past few years, several smart contracts have been exploited by attackers. However, a new trend towards a more proacti… ▽ More

    Submitted 29 May, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

  30. arXiv:1807.01202  [pdf, other

    stat.ML cs.LG

    Generating Multi-Categorical Samples with Generative Adversarial Networks

    Authors: Ramiro Camino, Christian Hammerschmidt, Radu State

    Abstract: We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have delivered considerable results, GANs struggle to perform equally well on discrete data. We propose and compare several architectures based on multiple (Gumbel) softmax output layers taking into accou… ▽ More

    Submitted 4 July, 2018; v1 submitted 3 July, 2018; originally announced July 2018.

    Journal ref: Presented at the ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models, Stockholm, Sweden

  31. arXiv:1803.00897  [pdf, other

    cs.LG

    Impact of Biases in Big Data

    Authors: Patrick Glauner, Petko Valtchev, Radu State

    Abstract: The underlying paradigm of big data-driven machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. Is having simply more data always helpful? In 1936, The Literary Digest collected 2.3M filled in questionnaires to predict the outcome of that year's US presidential election. The outcome of this big d… ▽ More

    Submitted 2 March, 2018; originally announced March 2018.

    Journal ref: Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2018)

  32. arXiv:1801.05627  [pdf, ps, other

    cs.LG

    On the Reduction of Biases in Big Data Sets for the Detection of Irregular Power Usage

    Authors: Patrick Glauner, Radu State, Petko Valtchev, Diogo Duarte

    Abstract: In machine learning, a bias occurs whenever training sets are not representative for the test data, which results in unreliable models. The most common biases in data are arguably class imbalance and covariate shift. In this work, we aim to shed light on this topic in order to increase the overall attention to this issue in the field of machine learning. We propose a scalable novel framework for r… ▽ More

    Submitted 3 April, 2018; v1 submitted 17 January, 2018; originally announced January 2018.

    Journal ref: Proceedings of the 13th International FLINS Conference on Data Science and Knowledge Engineering for Sensing Decision Support (FLINS 2018)

  33. arXiv:1709.03008  [pdf, other

    cs.LG cs.AI cs.HC

    Identifying Irregular Power Usage by Turning Predictions into Holographic Spatial Visualizations

    Authors: Patrick Glauner, Niklas Dahringer, Oleksandr Puhachov, Jorge Augusto Meira, Petko Valtchev, Radu State, Diogo Duarte

    Abstract: Power grids are critical infrastructure assets that face non-technical losses (NTL) such as electricity theft or faulty meters. NTL may range up to 40% of the total electricity distributed in emerging countries. Industrial NTL detection systems are still largely based on expert knowledge when deciding whether to carry out costly on-site inspections of customers. Electricity providers are reluctant… ▽ More

    Submitted 9 September, 2017; originally announced September 2017.

    Comments: Proceedings of the 17th IEEE International Conference on Data Mining Workshops (ICDMW 2017)

  34. arXiv:1707.09430  [pdf, ps, other

    stat.ML cs.LG

    Human in the Loop: Interactive Passive Automata Learning via Evidence-Driven State-Merging Algorithms

    Authors: Christian A. Hammerschmidt, Radu State, Sicco Verwer

    Abstract: We present an interactive version of an evidence-driven state-merging (EDSM) algorithm for learning variants of finite state automata. Learning these automata often amounts to recovering or reverse engineering the model generating the data despite noisy, incomplete, or imperfectly sampled data sources rather than optimizing a purely numeric target function. Domain expertise and human knowledge abo… ▽ More

    Submitted 28 July, 2017; originally announced July 2017.

    Comments: 4 pages, presented at the Human in the Loop workshop at ICML 2017

  35. arXiv:1703.10121  [pdf, ps, other

    cs.LG cs.AI stat.ML

    The Top 10 Topics in Machine Learning Revisited: A Quantitative Meta-Study

    Authors: Patrick Glauner, Manxing Du, Victor Paraschiv, Andrey Boytsov, Isabel Lopez Andrade, Jorge Meira, Petko Valtchev, Radu State

    Abstract: Which topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we revisit this question from a quantitative perspective. Concretely, we collect 54K abstracts of papers published between 2007 and 2016 in leading machine learning journals and conferences. We then use m… ▽ More

    Submitted 29 March, 2017; originally announced March 2017.

    Journal ref: Proceedings of the 25th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2017)

  36. arXiv:1702.03767  [pdf, other

    cs.LG cs.AI

    Is Big Data Sufficient for a Reliable Detection of Non-Technical Losses?

    Authors: Patrick Glauner, Angelo Migliosi, Jorge Meira, Petko Valtchev, Radu State, Franck Bettinger

    Abstract: Non-technical losses (NTL) occur during the distribution of electricity in power grids and include, but are not limited to, electricity theft and faulty meters. In emerging countries, they may range up to 40% of the total electricity distributed. In order to detect NTLs, machine learning methods are used that learn irregular consumption patterns from customer data and inspection results. The Big D… ▽ More

    Submitted 25 July, 2017; v1 submitted 13 February, 2017; originally announced February 2017.

    Comments: Proceedings of the 19th International Conference on Intelligent System Applications to Power Systems (ISAP 2017)

  37. arXiv:1611.07100  [pdf, other

    stat.ML cs.AI

    Interpreting Finite Automata for Sequential Data

    Authors: Christian Albert Hammerschmidt, Sicco Verwer, Qin Lin, Radu State

    Abstract: Automaton models are often seen as interpretable models. Interpretability itself is not well defined: it remains unclear what interpretability means without first explicitly specifying objectives or desired attributes. In this paper, we identify the key properties used to interpret automata and propose a modification of a state-merging approach to learn variants of finite state automata. We apply… ▽ More

    Submitted 24 November, 2016; v1 submitted 21 November, 2016; originally announced November 2016.

    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

    ACM Class: I.2.6

  38. arXiv:1607.00872  [pdf, other

    cs.LG cs.AI

    Neighborhood Features Help Detecting Non-Technical Losses in Big Data Sets

    Authors: Patrick Glauner, Jorge Meira, Lautaro Dolberg, Radu State, Franck Bettinger, Yves Rangoni, Diogo Duarte

    Abstract: Electricity theft is a major problem around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non-technical losses (NTL), which are losses that occur during the distribution of electricity in power grids. In this paper, we build features from the neighborhood of customers. We first split t… ▽ More

    Submitted 25 July, 2017; v1 submitted 4 July, 2016; originally announced July 2016.

    Comments: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing Applications and Technologies (BDCAT 2016)

  39. The Challenge of Non-Technical Loss Detection using Artificial Intelligence: A Survey

    Authors: Patrick Glauner, Jorge Augusto Meira, Petko Valtchev, Radu State, Franck Bettinger

    Abstract: Detection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science. NTLs cause significant harm to the economy, as in some countries they may range up to 40% of the total electricity distributed. The predominant research direction is employing artificial intellig… ▽ More

    Submitted 25 July, 2017; v1 submitted 2 June, 2016; originally announced June 2016.

    Journal ref: International Journal of Computational Intelligence Systems (IJCIS), vol. 10, issue 1, pp. 760-775, 2017

  40. arXiv:1602.08350  [pdf, ps, other

    cs.LG cs.AI

    Large-Scale Detection of Non-Technical Losses in Imbalanced Data Sets

    Authors: Patrick O. Glauner, Andre Boechat, Lautaro Dolberg, Radu State, Franck Bettinger, Yves Rangoni, Diogo Duarte

    Abstract: Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-re… ▽ More

    Submitted 25 July, 2017; v1 submitted 26 February, 2016; originally announced February 2016.

    Comments: Proceedings of the Seventh IEEE Conference on Innovative Smart Grid Technologies (ISGT 2016)

  41. arXiv:1208.2877  [pdf, other

    cs.CR

    Torinj : Automated Exploitation Malware Targeting Tor Users

    Authors: Gerard Wagener, Alexandre Dulaunoy, Radu State

    Abstract: We propose in this paper a new propagation vector for malicious software by abusing the Tor network. Tor is particularly relevant, since operating a Tor exit node is easy and involves low costs compared to attack institutional or ISP networks. After presenting the Tor network from an attacker perspective, we describe an automated exploitation malware which is operated on a Tor exit node targeting… ▽ More

    Submitted 14 August, 2012; originally announced August 2012.

  42. arXiv:cs/0610109  [pdf, ps, other

    cs.NI

    Intrusion detection mechanisms for VoIP applications

    Authors: Mohamed El Baker Nassar, Radu State, Olivier Festor

    Abstract: VoIP applications are emerging today as an important component in business and communication industry. In this paper, we address the intrusion detection and prevention in VoIP networks and describe how a conceptual solution based on the Bayes inference approach can be used to reinforce the existent security mechanisms. Our approach is based on network monitoring and analyzing of the VoIP-specifi… ▽ More

    Submitted 18 October, 2006; originally announced October 2006.

    Journal ref: Dans Third annual VoIP security workshop (VSW'06) (2006)