Search | arXiv e-print repository

doi 10.1016/j.patcog.2022.109269

How to Use K-means for Big Data Clustering?

Authors: Rustam Mussabayev, Nenad Mladenovic, Bassem Jarboui, Ravil Mussabayev

Abstract: K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of data. Therefore, it is crucial to improve K-means by scaling it to big data using as few of the following computational resources as possible: data, time, and algor… ▽ More K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of data. Therefore, it is crucial to improve K-means by scaling it to big data using as few of the following computational resources as possible: data, time, and algorithmic ingredients. We propose a new parallel scheme of using K-means and K-means++ algorithms for big data clustering that satisfies the properties of a ``true big data'' algorithm and outperforms the classical and recent state-of-the-art MSSC approaches in terms of solution quality and runtime. The new approach naturally implements global search by decomposing the MSSC problem without using additional metaheuristics. This work shows that data decomposition is the basic approach to solve the big data clustering problem. The empirical success of the new algorithm allowed us to challenge the common belief that more data is required to obtain a good clustering solution. Moreover, the present work questions the established trend that more sophisticated hybrid approaches and algorithms are required to obtain a better clustering solution. △ Less

Submitted 23 November, 2023; v1 submitted 14 April, 2022; originally announced April 2022.

Journal ref: Pattern Recognition, Volume 137, 2023, 109269, ISSN 0031-3203

arXiv:1503.02009 [pdf, ps, other]

Towards an intelligent VNS heuristic for the k-labelled spanning forest problem

Authors: Sergio Consoli, Josè Andrès Moreno Pèrez, Nenad Mladenovic

Abstract: In a currently ongoing project, we investigate a new possibility for solving the k-labelled spanning forest (kLSF) problem by an intelligent Variable Neighbourhood Search (Int-VNS) metaheuristic. In the kLSF problem we are given an undirected input graph G and an integer positive value k, and the aim is to find a spanning forest of G having the minimum number of connected components and the upper… ▽ More In a currently ongoing project, we investigate a new possibility for solving the k-labelled spanning forest (kLSF) problem by an intelligent Variable Neighbourhood Search (Int-VNS) metaheuristic. In the kLSF problem we are given an undirected input graph G and an integer positive value k, and the aim is to find a spanning forest of G having the minimum number of connected components and the upper bound k on the number of labels to use. The problem is related to the minimum labelling spanning tree (MLST) problem, whose goal is to get the spanning tree of the input graph with the minimum number of labels, and has several applications in the real world, where one aims to ensure connectivity by means of homogeneous connections. The Int-VNS metaheuristic that we propose for the kLSF problem is derived from the promising intelligent VNS strategy recently proposed for the MLST problem, and integrates the basic VNS for the kLSF problem with other complementary approaches from machine learning, statistics and experimental algorithmics, in order to produce high-quality performance and to completely automate the resulting strategy. △ Less

Submitted 5 March, 2015; originally announced March 2015.

Comments: 2 pages, Fifteenth International Conference on Computer Aided Systems Theory (EUROCAST 2015), Las Palmas de Gran Canaria, Spain

Journal ref: Computer Aided Systems Theory, pages 79-80 (2015)

arXiv:1503.01376 [pdf, ps, other]

BVNS para el problema del bosque generador k-etiquetado

Authors: Sergio Consoli, Nenad Mladenovìc, Josè A. Moreno-Pèrez

Abstract: In this paper we propose an efficient solution for the problem of generating k-labeling forest VNS. This problem is an extension of the Minimum Spanning Tree Problem Labelling problem with important applications in telecommunications networks and multimodal transport. It is, given an undirected graph whose links are labeled, and an integer positive number k, find the spanning forest with the lowes… ▽ More In this paper we propose an efficient solution for the problem of generating k-labeling forest VNS. This problem is an extension of the Minimum Spanning Tree Problem Labelling problem with important applications in telecommunications networks and multimodal transport. It is, given an undirected graph whose links are labeled, and an integer positive number k, find the spanning forest with the lowest number of connected components using at most k different labels. To address the problem a Basic Variable Neighbourhood Search is proposed where the maximum amplitude of the neighbourhood space, n, is a key parameter. Different strategies are studied to establish the value of n. BVNS with the best selected strategy is experimentally compared with other metaheuristics that have appeared in the literature applied to this type of problem. △ Less

Submitted 4 March, 2015; originally announced March 2015.

Comments: 8 pages, in Spanish. X Congreso Espanol sobre Metaheurìsticas, Algoritmos Evolutivos y Bioinspirados, MAEB 2015, Mèrida - Almendralejo 4-6 Feb 2015; Proceedings of the X Congreso Espanol sobre Metaheurìsticas, Algoritmos Evolutivos y Bioinspirados, MAEB 2015, Francisco Chavez de la O et al. (Eds.), ISBN: 978-84-697-2150-6, pages: 629-636, 2015

arXiv:1405.1980 [pdf]

Mejora de la exploracion y la explotacion de las heuristicas constructivas para el MLSTP

Authors: Sergio Consoli, Jose Andres Moreno-Perez, Kenneth Darby-Dowman, Nenad Mladenovic

Abstract: This paper studies constructive heuristics for the minimum labelling spanning tree (MLST) problem. The purpose is to find a spanning tree that uses edges that are as similar as possible. Given an undirected labeled connected graph (i.e., with a label or color for each edge), the minimum labeling spanning tree problem seeks a spanning tree whose edges have the smallest possible number of distinct l… ▽ More This paper studies constructive heuristics for the minimum labelling spanning tree (MLST) problem. The purpose is to find a spanning tree that uses edges that are as similar as possible. Given an undirected labeled connected graph (i.e., with a label or color for each edge), the minimum labeling spanning tree problem seeks a spanning tree whose edges have the smallest possible number of distinct labels. The model can represent many real-world problems in telecommunication networks, electric networks, and multimodal transportation networks, among others, and the problem has been shown to be NP-complete even for complete graphs. A primary heuristic, named the maximum vertex covering algorithm has been proposed. Several versions of this constructive heuristic have been proposed to improve its efficiency. Here we describe the problem, review the literature and compare some variants of this algorithm. △ Less

Submitted 16 April, 2014; originally announced May 2014.

Comments: 9 pages, in Spanish. Quinto Congreso Espanol de Metaheuristicas, Algoritmos Evolutivos y Bioinspirados (MAEB 2007), Tenerife, Spain, available at: http://www.redheur.org/files/MAEBs/MAEB07.pdf; Proceedings of the Quinto Congreso Espanol de Metaheuristicas, Algoritmos Evolutivos y Bioinspirados, 2007

arXiv:1201.2320

Solving the minimum labelling spanning tree problem using intelligent optimization

Authors: Sergio Consoli, Nenad Mladenovic, Jose Andres Moreno-Perez

Abstract: Given a connected, undirected graph whose edges are labelled (or coloured), the minimum labelling spanning tree (MLST) problem seeks a spanning tree whose edges have the smallest number of distinct labels (or colours). In recent work, the MLST problem has been shown to be NP-hard and some effective heuristics have been proposed and analyzed. In this paper we present an intelligent optimization alg… ▽ More Given a connected, undirected graph whose edges are labelled (or coloured), the minimum labelling spanning tree (MLST) problem seeks a spanning tree whose edges have the smallest number of distinct labels (or colours). In recent work, the MLST problem has been shown to be NP-hard and some effective heuristics have been proposed and analyzed. In this paper we present an intelligent optimization algorithm to solve the problem. It is obtained by the basic Variable Neighbourhood Search heuristic with the integration of other complements from machine learning, statistics and experimental algorithmics, in order to produce high-quality performance and to completely automate the resulting optimization strategy. We present experimental results on randomly generated graphs with different statistical properties, showing the crucial effects of the implementation, the robustness, and the empirical scalability of our intelligent algorithm. Furthermore, the computational experiments show that the proposed strategy outperforms the heuristics recommended in the literature and is able to obtain optimal or near-optimal solutions in short computational running time. △ Less

Submitted 3 March, 2014; v1 submitted 11 January, 2012; originally announced January 2012.

Comments: This paper has been withdrawn by the authors due to major modifications on the algorithm which make obsolete and inconsistent the computational results reported

Showing 1–5 of 5 results for author: Mladenovic, N