The rapid growth of big data has heightened the need for effective clustering techniques to derive actionable insights. While the K-Means clustering algorithm is popular for its simplicity and efficiency, it faces challenges such as sensitivity to initial centroid selection and scalability issues. This study seeks to enhance K-Means by integrating advanced initialization techniques and refining the clustering process, resulting in improved quality and computational efficiency in big data contexts.
As organizations in sectors like healthcare, finance, and marketing increasingly rely on data analysis, K-Means plays a crucial role in identifying patterns within large datasets. Our research addresses the algorithm's limitations by employing factor analysis for dimensionality reduction and utilizing Principal Component Analysis (PCA) to transform correlated variables, leading to greater accuracy in high-dimensional spaces. Through rigorous experimentation, we evaluate the improved algorithm against standard K-Means, demonstrating significant enhancements in clustering quality, particularly in applications such as customer segmentation and risk assessment. This work contributes meaningfully to data analytics by presenting a refined K-Means algorithm that effectively navigates the complexities of large-scale datasets, facilitating informed decision-making across various domains.
- Arbeit zitieren
- Elhadi Suiam (Autor:in), 2025, Improving K-Means Clustering Algorithm for Enhanced Performance in Big Data Analytics, München, GRIN Verlag, https://www.grin.com/document/1600454