-
Benchmarking Hashing Algorithms for Load Balancing in a Distributed Database Environment
Authors:
Alexander Slesarev,
Mikhail Mikhailov,
George Chernishev
Abstract:
Modern high load applications store data using multiple database instances. Such an architecture requires data consistency, and it is important to ensure even distribution of data among nodes. Load balancing is used to achieve these goals.
Hashing is the backbone of virtually all load balancing systems. Since the introduction of classic Consistent Hashing, many algorithms have been devised for t…
▽ More
Modern high load applications store data using multiple database instances. Such an architecture requires data consistency, and it is important to ensure even distribution of data among nodes. Load balancing is used to achieve these goals.
Hashing is the backbone of virtually all load balancing systems. Since the introduction of classic Consistent Hashing, many algorithms have been devised for this purpose.
One of the purposes of the load balancer is to ensure storage cluster scalability. It is crucial for the performance of the whole system to transfer as few data records as possible during node addition or removal. The load balancer hashing algorithm has the greatest impact on this process.
In this paper we experimentally evaluate several hashing algorithms used for load balancing, conducting both simulated and real system experiments. To evaluate algorithm performance, we have developed a benchmark suite based on Unidata MDM~ -- a scalable toolkit for various Master Data Management (MDM) applications. For assessment, we have employed three criteria~ -- uniformity of the produced distribution, the number of moved records, and computation speed. Following the results of our experiments, we have created a table, in which each algorithm is given an assessment according to the abovementioned criteria.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Revisiting Data Compression in Column-Stores
Authors:
Alexander Slesarev,
Evgeniy Klyuchikov,
Kirill Smirnov,
George Chernishev
Abstract:
Data compression is widely used in contemporary column-oriented DBMSes to lower space usage and to speed up query processing. Pioneering systems have introduced compression to tackle the disk bandwidth bottleneck by trading CPU processing power for it. The main issue of this is a trade-off between the compression ratio and the decompression CPU cost. Existing results state that light-weight compre…
▽ More
Data compression is widely used in contemporary column-oriented DBMSes to lower space usage and to speed up query processing. Pioneering systems have introduced compression to tackle the disk bandwidth bottleneck by trading CPU processing power for it. The main issue of this is a trade-off between the compression ratio and the decompression CPU cost. Existing results state that light-weight compression with small decompression costs outperforms heavy-weight compression schemes in column-stores. However, since the time these results were obtained, CPU, RAM, and disk performance have advanced considerably. Moreover, novel compression algorithms have emerged.
In this paper, we revisit the problem of compression in disk-based column-stores. More precisely, we study the I/O-RAM compression scheme which implies that there are two types of pages of different size: disk pages (compressed) and in-memory pages (uncompressed). In this scheme, the buffer manager is responsible for decompressing pages as soon as they arrive from disk. This scheme is rather popular as it is easy to implement: several modern column and row-stores use it.
We pose and address the following research questions: 1) Are heavy-weight compression schemes still inappropriate for disk-based column-stores?, 2) Are new light-weight compression algorithms better than the old ones?, 3) Is there a need for SIMD-employing decompression algorithms in case of a disk-based system? We study these questions experimentally using a columnar query engine and Star Schema Benchmark.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
Neural Codes for Image Retrieval
Authors:
Artem Babenko,
Anton Slesarev,
Alexandr Chigorin,
Victor Lempitsky
Abstract:
It has been shown that the activations invoked by an image within the top layers of a large convolutional neural network provide a high-level descriptor of the visual content of the image. In this paper, we investigate the use of such descriptors (neural codes) within the image retrieval application. In the experiments with several standard retrieval benchmarks, we establish that neural codes perf…
▽ More
It has been shown that the activations invoked by an image within the top layers of a large convolutional neural network provide a high-level descriptor of the visual content of the image. In this paper, we investigate the use of such descriptors (neural codes) within the image retrieval application. In the experiments with several standard retrieval benchmarks, we establish that neural codes perform competitively even when the convolutional neural network has been trained for an unrelated classification task (e.g.\ Image-Net). We also evaluate the improvement in the retrieval performance of neural codes, when the network is retrained on a dataset of images that are similar to images encountered at test time.
We further evaluate the performance of the compressed neural codes and show that a simple PCA compression provides very good short codes that give state-of-the-art accuracy on a number of datasets. In general, neural codes turn out to be much more resilient to such compression in comparison other state-of-the-art descriptors. Finally, we show that discriminative dimensionality reduction trained on a dataset of pairs of matched photographs improves the performance of PCA-compressed neural codes even further. Overall, our quantitative experiments demonstrate the promise of neural codes as visual descriptors for image retrieval.
△ Less
Submitted 7 July, 2014; v1 submitted 7 April, 2014;
originally announced April 2014.
-
Link Graph Analysis for Adult Images Classification
Authors:
Evgeny Kharitonov,
Anton Slesarev,
Ilya Muchnik,
Fedor Romanenko,
Dmitry Belyaev,
Dmitry Kotlyarov
Abstract:
In order to protect an image search engine's users from undesirable results adult images' classifier should be built. The information about links from websites to images is employed to create such a classifier. These links are represented as a bipartite website-image graph. Each vertex is equipped with scores of adultness and decentness. The scores for image vertexes are initialized with zero, tho…
▽ More
In order to protect an image search engine's users from undesirable results adult images' classifier should be built. The information about links from websites to images is employed to create such a classifier. These links are represented as a bipartite website-image graph. Each vertex is equipped with scores of adultness and decentness. The scores for image vertexes are initialized with zero, those for website vertexes are initialized according to a text-based website classifier. An iterative algorithm that propagates scores within a website-image graph is described. The scores obtained are used to classify images by choosing an appropriate threshold. The experiments on Internet-scale data have shown that the algorithm under consideration increases classification recall by 17% in comparison with a simple algorithm which classifies an image as adult if it is connected with at least one adult site (at the same precision level).
△ Less
Submitted 19 July, 2010;
originally announced July 2010.