The Spearman Correlation is a well known approach to assess the rank correlation of two data sets. One of the advantages choosing Spearman over other correlation coefficients such as the Pearson is that the difference in original value series is less important while the relative rank of the value is what matters most in this coefficient. The Spearman correlation coefficient is often used to assess and validate the performance of models that require less accuracy in absolute value estimate, e.g. the loss prediction models or exposure models. Although other measures such as the Kendall’s τ and Somer’s D are used to measure rank ordering with tied observations, the Spearman’s ρ is often calculated as an initial step of correlation analysis. In this short note we look into the tied observations in the target data set and investigate the impact on the Spearman Correlation coefficient in three different scenarios: single value ties, random multi-value ties, and bounded random ties.

Abstract

Keywords: Spearman Correlation, Tied Observations, Sensitivity Analysis.

Introduction

The Spearman Correlation, or Spearman’s rank correlation coefficient is considered the nonparametric version of the Pearson correlation and an appropriate measure for both strength and direction of association between ranked data sets.

It is worth mentioning that although the Spearman correlation describes the strength and direction of the monotonic relationship, but more than often the data sets in comparison does not have a significant monotonic relationship between the value series of interest, e.g. the loss and exposure estimates in Finance. In this case, performing a Spearman correlation analysis will also help to find out if there is a monotonic relationship between the data series.

Because the Spearman Correlation is a well known and widely accepted concept, we omit the technical details and the mathematical formulation of the Spearman Correlation and the accompanied statistical significance test in the rest of this paper. For the details of the Spearman Correlation, see¹ and2 for more information.

To find out how the correlation coefficient and the level of significance change as both the value and percentage number of ties changes, we look into the following three scenarios in the next few sections: a) Single value ties. b) Random multi-value ties. c) Bounded random ties².

For each analysis we set the value range to (0 , 1), randomly generate 2000 observations, and randomly select the number of tied observations in the sample. The Spearman Correlation is calculated using the original values and the adjusted series with tied observations. We repeat this calculation for a large number of times, 5000 for all results in this paper, the final result under each scenario is summarized in following sections.

Single value ties

We start with the simple case where all the tied observations tie into the same value. Here we run the tests for 0 , 5% , 10% , . . . , 100% with fixed incremental differences of 5%. In Figure 1 below, we show some examples of the random samples in different trials.

Figure 1: Example of single value ties