A1 Journal article (refereed)
Improving Scalable K-Means++ (2021)

Hämäläinen, J., Kärkkäinen, T., & Rossi, T. (2021). Improving Scalable K-Means++. Algorithms, 14(1), Article 6. https://doi.org/10.3390/a14010006

JYU authors or editors

Hämäläinen, Joonas
Kärkkäinen, Tommi
Rossi, Tuomo

Publication details

All authors or editors: Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo

Journal or series: Algorithms

eISSN: 1999-4893

Publication year: 2021

Publication date: 27/12/2020

Volume: 14

Issue number: 1

Article number: 6

Publisher: MDPI AG

Publication country: Switzerland

Publication language: English

DOI: https://doi.org/10.3390/a14010006

Publication open access: Openly available

Publication channel open access: Open Access channel

Publication is parallel published (JYX): https://jyx.jyu.fi/handle/123456789/73628

Abstract

Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art by improving clustering accuracy and the speed of convergence. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases

Keywords: data mining; cluster analysis; algorithmics; algorithms

Free keywords: clustering initialization; K-means‖; K-means++; random projection

Fields of science:

113 Computer and information sciences (Natural sciences)

Contributing organizations

JYU units:

Faculty of Information Technology

Related projects

Competitive funding to strengthen universities’ research profiles. Profiling actions at the JYU, round 3
- - Hämäläinen, Keijo
- Research Council of Finland
01/04/2017-30/11/2021
STRUCTURE PREDICTION OF HYBRID NANOPARTICLES VIA ARTIFICIAL INTELLIGENCE
- - Kärkkäinen, Tommi
- Research Council of Finland
01/01/2018-31/07/2022

Ministry reporting: Yes

VIRTA submission year: 2021