In this paper, we present a methodology to perform clustering and grouping analysis for dataset with classification constraints or definitions. The discussion is demonstrated with a full example based on read data. We start with the observed difference in the CIA and UN subregional definition of European countries, and consider what the impact is from a subregional house price ratio perspective. As documented in this report, we find that the presented approach useful for clustering analysis of the pre-identified subgroups to address subgroup based clustering problems.

We present an approach to perform clustering analysis when the target is subject to pre-defined subcategories in the underlying dataset. As an illustrative example, we perform full analysis on real data to address real world questions.

With the proposed methodology, one is able to quantify and measure relative and general dis-tances between pre-defined subcategories within the dataset, hence quantitative clustering analysis conditional on the pre-defined subcategories are made possible even if the subcategories are not defined from a perspective that is related with the underlying data.

Excerpt

Inhaltsverzeichnis (Table of Contents)

Motivation
Geometric Representation of Data
Relative Location of Geometric Representations
General and Relative Distances
Conclusions

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This paper explores a methodology for clustering and grouping analysis within datasets that incorporate pre-defined classifications or constraints. Using a real-world example based on housing price ratios, the study examines the impact of different subregional definitions on clustering outcomes.

Clustering with constraints
Geometric representation of data
Relative location and distance measures
Comparison of classification definitions
Subgroup-based clustering

Zusammenfassung der Kapitel (Chapter Summaries)

Motivation: This chapter introduces the challenges of traditional clustering methods when dealing with datasets subject to pre-defined properties. The paper proposes an approach to address these challenges and presents a real-world example based on IMF house price data and contrasting regional classifications from the CIA and UN. The chapter defines the data, target objectives, and challenges encountered using traditional methods like K-means clustering.
Geometric Representation of Data: This chapter outlines a methodology to represent constrained subsets of data geometrically as convex polygons. The approach utilizes two numerical ratios (price-to-rent and price-to-income) to represent data on a 2D plane. The chapter provides a visual representation of these polygons based on both CIA and UN regional classifications.

Schlüsselwörter (Keywords)

The paper focuses on data classification, grouping analysis, clustering, and the impact of predefined constraints on clustering outcomes. It utilizes real-world data on housing price ratios and compares regional classifications from the CIA and UN to illustrate the methodology.

Excerpt out of 12 pages - scroll top

Details

Title: A Clustering Method for Analysis of Data Subject to Pre-defined Classifications
Grade: A
Author: Yang Liu (Author)
Publication Year: 2019
Pages: 12
Catalog Number: V491428
ISBN (eBook): 9783668986466
ISBN (Book): 9783668986473
Language: English
Tags: clustering method analysis data subject pre-defined classifications
Product Safety: GRIN Publishing GmbH

Quote paper: Yang Liu (Author), 2019, A Clustering Method for Analysis of Data Subject to Pre-defined Classifications, Munich, GRIN Verlag, https://www.grin.com/document/491428

A Clustering Method for Analysis of Data Subject to Pre-defined Classifications