In this paper, we present a methodology to perform clustering and grouping analysis for dataset with classification constraints or definitions. The discussion is demonstrated with a full example based on read data. We start with the observed difference in the CIA and UN subregional definition of European countries, and consider what the impact is from a subregional house price ratio perspective. As documented in this report, we find that the presented approach useful for clustering analysis of the pre-identified subgroups to address subgroup based clustering problems.

We present an approach to perform clustering analysis when the target is subject to pre-defined subcategories in the underlying dataset. As an illustrative example, we perform full analysis on real data to address real world questions.

With the proposed methodology, one is able to quantify and measure relative and general dis-tances between pre-defined subcategories within the dataset, hence quantitative clustering analysis conditional on the pre-defined subcategories are made possible even if the subcategories are not defined from a perspective that is related with the underlying data.

Leseprobe

1 Motivation

2 Geometric Representation of Data

3 Relative Location of Geometric Representations

4 General and Relative Distances

5 Conclusions

Research Objectives and Themes

The primary objective of this paper is to introduce a robust methodology for performing clustering and grouping analysis on datasets that are subject to pre-existing classification constraints or definitions. By utilizing real-world housing price ratio data as a case study, the author aims to demonstrate how to quantitatively measure and compare the distances between these predefined subgroups to address complex clustering problems.

Development of a geometric representation method for constrained data.
Comparative analysis of different classification standards (CIA vs. UN subregions).
Quantitative assessment of relative and general distances between data clusters.
Application of linkage dendrograms for hierarchical clustering under constraints.
Evaluation of real-world datasets regarding subregional house price performance.

Excerpt from the Publication

Clustering with Constraints

When geographical or industrial definitions are introduced, it is often the perspective from which the answer is expected. Given the subregion definitions in our example, it is intuitive to ask the following questions:

• if we were to categories Europe into 2 subregions only, which of the existing subregions shall we group together under CIA and UN definitions, respectively?

• given CIA has 5 and UN has 4 subregions defined, which subregions can we group together to reduce the number of subregions by 1?

The linkage dendrogram plots can be obtained easily for clustering analysis with constraints. Figure 6 demonstrate the links and path of clustering results using the general distance between the pre-defined subregions under CIA and UN definitions respectively.

Summary of Chapters

1 Motivation: Introduces the limitations of standard clustering algorithms when applied to data with predefined constraints and establishes the need for a new analytical approach.

2 Geometric Representation of Data: Explains how to model subset data as convex polygons on a 2D plane based on numerical ratios, utilizing CIA and UN classification examples.

3 Relative Location of Geometric Representations: Discusses how to assess the relative positions of polygons and whether they intersect, providing mathematical definitions for centroids and signed areas.

4 General and Relative Distances: Defines the mathematical framework for calculating distances between polygons, distinguishing between intersecting and non-intersecting scenarios.

5 Conclusions: Summarizes the effectiveness of the proposed methodology in enabling quantitative clustering analysis even when subcategories are not inherently derived from the underlying data.

Keywords

Clustering, Data Classification, Grouping Analysis, Pre-defined Constraints, Geometric Representation, Convex Polygons, House Price Ratios, Subregional Analysis, Quantitative Finance, Distance Measurement, Linkage Dendrograms, European Classification, IMF Data, Pattern Recognition, Statistical Methodology

Frequently Asked Questions

What is the core purpose of this research paper?

The paper proposes a new methodology for performing clustering analysis on datasets that are already subject to external classification constraints, which standard methods often fail to handle effectively.

Which primary analytical methods are utilized in the study?

The author uses geometric representation of data as convex polygons, calculates centroids and signed areas to determine relative positions, and employs general distance matrices to construct linkage dendrograms.

What are the central themes addressed in the publication?

The central themes include constrained data clustering, the impact of differing geographical definitions (CIA vs. UN), and the quantitative measurement of relationships between predefined subgroups.

What is the main objective or research question?

The primary objective is to develop a system to group and measure data under pre-defined conditions and to demonstrate this system by comparing house price ratios across different regional classifications.

What is covered in the main body of the paper?

The main body details the data acquisition (IMF housing data), the geometric modeling of regions, the mathematical definition of distances, and the application of clustering results to solve real-world grouping problems.

Which keywords characterize this work?

Key terms include Clustering, Data Classification, Grouping Analysis, Convex Polygons, and Subregional Analysis.

How does this method handle overlapping clusters?

If polygons intersect, the method denotes the distance as negative, weighted by the intersect area, allowing for a nuanced understanding of how strongly different categories overlap.

What practical application does the author suggest for this methodology?

Beyond regional housing analysis, the author suggests applications in corporate lending portfolio classification, where entities might be classified by both country of operation and country of incorporation.

Ende der Leseprobe aus 12 Seiten - nach oben

Details

Titel: A Clustering Method for Analysis of Data Subject to Pre-defined Classifications
Note: A
Autor: Yang Liu (Autor:in)
Erscheinungsjahr: 2019
Seiten: 12
Katalognummer: V491428
ISBN (eBook): 9783668986466
ISBN (Buch): 9783668986473
Sprache: Englisch
Schlagworte: clustering method analysis data subject pre-defined classifications
Produktsicherheit: GRIN Publishing GmbH

Arbeit zitieren: Yang Liu (Autor:in), 2019, A Clustering Method for Analysis of Data Subject to Pre-defined Classifications, München, GRIN Verlag, https://www.grin.com/document/491428

A Clustering Method for Analysis of Data Subject to Pre-defined Classifications