A3 Kirjan tai muun kokoomateoksen osa
Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods (2022)


Niemelä, M., & Kärkkäinen, T. (2022). Improving Clustering and Cluster Validation with Missing Data Using Distance Estimation Methods. In T. T. Tuovinen, J. Periaux, & P. Neittaanmäki (Eds.), Computational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical Challenges (pp. 123-133). Springer. Intelligent Systems, Control and Automation: Science and Engineering, 76. https://doi.org/10.1007/978-3-030-70787-3_9


JYU-tekijät tai -toimittajat


Julkaisun tiedot

Julkaisun kaikki tekijät tai toimittajat: Niemelä, Marko; Kärkkäinen, Tommi

Emojulkaisu: Computational Sciences and Artificial Intelligence in Industry : New Digital Technologies for Solving Future Societal and Economical Challenges

Emojulkaisun toimittajat: Tuovinen, Tero T.; Periaux, Jacques; Neittaanmäki, Pekka

ISBN: 978-3-030-70786-6

eISBN: 978-3-030-70787-3

Lehti tai sarja: Intelligent Systems, Control and Automation: Science and Engineering

ISSN: 2213-8986

eISSN: 2213-8994

Julkaisuvuosi: 2022

Sarjan numero: 76

Artikkelin sivunumerot: 123-133

Kirjan kokonaissivumäärä: 275

Kustantaja: Springer

Kustannuspaikka: Cham

Julkaisumaa: Sveitsi

Julkaisun kieli: englanti

DOI: https://doi.org/10.1007/978-3-030-70787-3_9

Julkaisun avoin saatavuus: Ei avoin

Julkaisukanavan avoin saatavuus:

Julkaisu on rinnakkaistallennettu (JYX): https://jyx.jyu.fi/handle/123456789/84512

Lisätietoja: The CSAI 2019 Conference (Computational Science and AI in Industry: New Digital Technologies for Solving Future Societal and Economical Challenges) took place at Jyväskylä, Finland, on June 12–14, 2019.


Tiivistelmä

Missing data introduces a challenge in the field of unsupervised learning. In clustering, when the form and the number of clusters are to be determined, one needs to deal with the missing values both in the clustering process and in the cluster validation. In the previous research, the clustering algorithm has been treated using robust clustering methods and available data strategy, and the cluster validation indices have been computed with the partial distance approximation. However, lately special methods for distance estimation with missing values have been proposed and this work is the first one where these methods are systematically applied and tested in clustering and cluster validation. More precisely, we propose, implement, and analyze the use of distance estimation methods to improve the discrimination power of clustering and cluster validation indices. A novel, robust prototype-based clustering process in two stages is suggested. Our results and conclusions confirm the usefulness of the distance estimation methods in clustering but, surprisingly, not in cluster validation.


YSO-asiasanat: koneoppiminen; klusterianalyysi; algoritmit


Liittyvät organisaatiot


Hankkeet, joissa julkaisu on tehty


OKM-raportointi: Kyllä

Raportointivuosi: 2022

Alustava JUFO-taso: 2


Viimeisin päivitys 2022-20-12 klo 08:53