In this paper, we present a methodology to perform clustering and grouping analysis for dataset with classification constraints or definitions. The discussion is demonstrated with a full example based on read data. We start with the observed difference in the CIA and UN subregional definition of European countries, and consider what the impact is from a subregional house price ratio perspective. As documented in this report, we find that the presented approach useful for clustering analysis of the pre-identified subgroups to address subgroup based clustering problems.
We present an approach to perform clustering analysis when the target is subject to pre-defined subcategories in the underlying dataset. As an illustrative example, we perform full analysis on real data to address real world questions.
With the proposed methodology, one is able to quantify and measure relative and general dis-tances between pre-defined subcategories within the dataset, hence quantitative clustering analysis conditional on the pre-defined subcategories are made possible even if the subcategories are not defined from a perspective that is related with the underlying data.
Inhaltsverzeichnis (Table of Contents)
- Motivation
- Geometric Representation of Data
- Relative Location of Geometric Representations
- General and Relative Distances
- Conclusions
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This paper explores a methodology for clustering and grouping analysis within datasets that incorporate pre-defined classifications or constraints. Using a real-world example based on housing price ratios, the study examines the impact of different subregional definitions on clustering outcomes.
- Clustering with constraints
- Geometric representation of data
- Relative location and distance measures
- Comparison of classification definitions
- Subgroup-based clustering
Zusammenfassung der Kapitel (Chapter Summaries)
- Motivation: This chapter introduces the challenges of traditional clustering methods when dealing with datasets subject to pre-defined properties. The paper proposes an approach to address these challenges and presents a real-world example based on IMF house price data and contrasting regional classifications from the CIA and UN. The chapter defines the data, target objectives, and challenges encountered using traditional methods like K-means clustering.
- Geometric Representation of Data: This chapter outlines a methodology to represent constrained subsets of data geometrically as convex polygons. The approach utilizes two numerical ratios (price-to-rent and price-to-income) to represent data on a 2D plane. The chapter provides a visual representation of these polygons based on both CIA and UN regional classifications.
Schlüsselwörter (Keywords)
The paper focuses on data classification, grouping analysis, clustering, and the impact of predefined constraints on clustering outcomes. It utilizes real-world data on housing price ratios and compares regional classifications from the CIA and UN to illustrate the methodology.
- Quote paper
- Yang Liu (Author), 2019, A Clustering Method for Analysis of Data Subject to Pre-defined Classifications, Munich, GRIN Verlag, https://www.grin.com/document/491428