A parallel sampling based clustering

Sastry, Aditya AV; Netti, Kalyan

Computer Science > Machine Learning

arXiv:1412.1947 (cs)

[Submitted on 5 Dec 2014]

Title:A parallel sampling based clustering

Authors:Aditya AV Sastry, Kalyan Netti

View PDF

Abstract:The problem of automatically clustering data is an age old problem. People have created numerous algorithms to tackle this problem. The execution time of any of this algorithm grows with the number of input points and the number of cluster centers required. To reduce the number of input points we could average the points locally and use the means or the local centers as the input for clustering. However since the required number of local centers is very high, running the clustering algorithm on the entire dataset to obtain these representational points is very time consuming. To remedy this problem, in this paper we are proposing two subclustering schemes where by we subdivide the dataset into smaller sets and run the clustering algorithm on the smaller datasets to obtain the required number of datapoints to run our clustering algorithm with. As we are subdividing the given dataset, we could run clustering algorithm on each smaller piece of the dataset in parallel. We found that both parallel and serial execution of this method to be much faster than the original clustering algorithm and error in running the clustering algorithm on a reduced set to be very less.

Subjects:	Machine Learning (cs.LG)
MSC classes:	68Q32
Cite as:	arXiv:1412.1947 [cs.LG]
	(or arXiv:1412.1947v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1412.1947

Submission history

From: Aditya AV Sastry Mr. [view email]
[v1] Fri, 5 Dec 2014 10:50:31 UTC (68 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2014-12

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Aditya A. V. Sastry
Kalyan Netti

export BibTeX citation

Computer Science > Machine Learning

Title:A parallel sampling based clustering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A parallel sampling based clustering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators