-
G2D: Generate to Detect Anomaly
Authors:
Masoud Pourreza,
Bahram Mohammadi,
Mostafa Khaki,
Samir Bouindour,
Hichem Snoussi,
Mohammad Sabokrou
Abstract:
In this paper, we propose a novel method for irregularity detection. Previous researches solve this problem as a One-Class Classification (OCC) task where they train a reference model on all of the available samples. Then, they consider a test sample as an anomaly if it has a diversion from the reference model. Generative Adversarial Networks (GANs) have achieved the most promising results for OCC…
▽ More
In this paper, we propose a novel method for irregularity detection. Previous researches solve this problem as a One-Class Classification (OCC) task where they train a reference model on all of the available samples. Then, they consider a test sample as an anomaly if it has a diversion from the reference model. Generative Adversarial Networks (GANs) have achieved the most promising results for OCC while implementing and training such networks, especially for the OCC task, is a cumbersome and computationally expensive procedure. To cope with the mentioned challenges, we present a simple but effective method to solve the irregularity detection as a binary classification task in order to make the implementation easier along with improving the detection performance. We learn two deep neural networks (generator and discriminator) in a GAN-style setting on merely the normal samples. During training, the generator gradually becomes an expert to generate samples which are similar to the normal ones. In the training phase, when the generator fails to produce normal data (in the early stages of learning and also prior to the complete convergence), it can be considered as an irregularity generator. In this way, we simultaneously generate the irregular samples. Afterward, we train a binary classifier on the generated anomalous samples along with the normal instances in order to be capable of detecting irregularities. The proposed framework applies to different related applications of outlier and anomaly detection in images and videos, respectively. The results confirm that our proposed method is superior to the baseline and state-of-the-art solutions.
△ Less
Submitted 27 June, 2020; v1 submitted 20 June, 2020;
originally announced June 2020.
-
Progressive Cleaning and Mining of Uncertain Smart Water Meter Data
Authors:
Milad Khaki
Abstract:
Several municipalities have recently installed wireless 'smart' water meters that allow functionalities such as demand response, leak alerts, identification of characteristic demand patterns, and detailed consumption analysis. To achieve these benefits, the meter data needs to be error-free, which is not necessarily available in practice, due to 'dirtiness' or 'uncertainty' of data, which is mostl…
▽ More
Several municipalities have recently installed wireless 'smart' water meters that allow functionalities such as demand response, leak alerts, identification of characteristic demand patterns, and detailed consumption analysis. To achieve these benefits, the meter data needs to be error-free, which is not necessarily available in practice, due to 'dirtiness' or 'uncertainty' of data, which is mostly unavoidable.
The focus of this paper is to investigate practical solutions to mine uncertain data for reliable results and to evaluate the impact of dirty data on filters. This evaluation would eventually lead to valuable information, which can be used for educated decision making on water planning strategies. We perform a systematic study of the errors existing in a large-scale smart water meter deployments, which is helpful to better understand the nature of errors.
Identifying customers contributing to a load peak is used as the main filter. The filter outputs are then combined with the domain expert knowledge to evaluate their accuracy and validity and also to look for potential errors. After discovering each error, we analyze its trails in the data and track back its source, which would eventually lead to the removal of the error or dealing with it accordingly. This procedure is applied progressively to ensure that all detectable errors are discovered and characterized in the data model.
We evaluate the performance of the proposed approach using the smart water meter consumption data obtained from the City of Abbotsford, British Columbia, Canada. We present the results of both unprocessed and cleaned data and analyze, in detail, the sensitivity of the selected filter to the errors.
△ Less
Submitted 6 February, 2020;
originally announced February 2020.
-
Measuring Personalization of Web Search
Authors:
Anikó Hannák,
Piotr Sapieżyński,
Arash Molavi Khaki,
David Lazer,
Alan Mislove,
Christo Wilson
Abstract:
Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing level of personalization is leading to concerns about Filter Bubble effects, where certain users are simply unable to access information that the search engines' algorithm decides is irrelevan…
▽ More
Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing level of personalization is leading to concerns about Filter Bubble effects, where certain users are simply unable to access information that the search engines' algorithm decides is irrelevant. Despite these concerns, there has been little quantification of the extent of personalization in Web search today, or the user attributes that cause it.
In light of this situation, we make three contributions. First, we develop a methodology for measuring personalization in Web search results. While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute differences in search results to personalization. Second, we apply our methodology to 200 users on Google Web Search and 100 users on Bing. We find that, on average, 11.7% of results show differences due to personalization on Google, while 15.8% of results are personalized on Bing, but that this varies widely by search query and by result ranking. Third, we investigate the user features used to personalize on Google Web Search and Bing. Surprisingly, we only find measurable personalization as a result of searching with a logged in account and the IP address of the searching user. Our results are a first step towards understanding the extent and effects of personalization on Web search engines today.
△ Less
Submitted 15 June, 2017;
originally announced June 2017.