Excerpt

Europa Universität Viadrina

Wirtschaftswissenschaftliche Fakultät

Lehrstuhl für Volkswirtschaftslehre

Insb. Empirische Wirtschaftsforschung

2006

Diplomarbeit Volkswirtschaftslehre

**Robust Methods in Regression Analysis – Theory and Application**

Robert Finger

**Abstract**

Regression Analysis is an important statistical tool for many applications. The most frequently used approach to Regression Analysis is the method of Ordinary Least Squares. But this method is vulnerable to outliers; even a single outlier can spoil the estimation completely. How can this vulnerability be described by theoretical concepts and are there alternatives? This thesis gives an overview over concepts and alternative approaches. The three fundamental approaches to Robustness (qualitative-, infinitesimal- and quantitative Robustness) are introduced in this thesis and are applied to different estimators. The estimators under study are measures of location, scale and regression. The Robustness approaches are important for the theoretical judgement of certain estimators but as well for the development of alternatives to classical estimators. This thesis focuses on the (Robustness-) performance of estimators if outliers occur within the data set. Measures of location and scale provide necessary steppingstones into the topic of Regression Analysis. In particular the median and trimming approaches are found to produce very robust results. These results are used in Regression Analysis to find alternatives to the method of Ordinary Least Squares. Its vulnerability can be overcome by applying the methods of Least Median of Squares or Least Trimmed Squares. Different outlier diagnostic tools are introduced to improve the poor efficiency of these Regression Techniques. Furthermore, this thesis delivers a simulation of some Regression Techniques on different situations in Regression Analysis. This simulation focuses in particular on changes in regression estimates if outliers occur in the data.

Theoretically derived results as well as the results of the simulation lead to the recommendation of the method of Reweighted Least Squares. Applying this method frequently on problems of Regression Analysis provides outlier resistant and efficient estimates.

**Contents**

List of Figures v

List of Abbreviations v

**1. Introduction ... 1**

**2. The Classical Linear Ordinary Least Squares Regression ... 3**2.1. Introduction to OLS

**...**3

2.2. Properties of the Least Squares Estimates

**...**5

2.3. Problems of OLS

**...**5

**3. Outliers and OLS ... 7**3.1. Outlier definition and common error sources

**...**7

3.2. Outlier in Regression Analysis and their influence on OLS results

**...**8

**4. The concept of Robustness ... 13**4.1. Introduction to Robustness

**...**13

4.2. Qualitative Robustness

**...**13

4.3. Infinitesimal Robustness

**...**15

4.4. Quantitative Robustness

**...**19

4.5. Robust Estimates

**...**21

4.6. On asymptotic Results

**...**23

**5. Some measures of location and scale – with regard to their robustness properties **** ... 25**5.1. Introduction

**...**25

5.2. Measures of location

**...**25

5.2.1. A Definition

**...**25

5.2.2. The Arithmetic Mean

**...**27

5.2.3. The Median

**...**30

5.2.4. Trimmed mean(s)

**...**32

5.2.5. Other measures of location

**...**35

5.3. Measures of scale

**...**37

5.3.1. A Definition

**...**37

5.3.2. The Standard deviation

**...**39

5.3.3. The Median Absolute Deviation (MAD)

**...**40

5.3.4. The t-Quantile Range

**...**42

5.3.5. Other scale estimates

**...**43

5.4. Higher Dimensions

**...**44

**6. Robust Regression Techniques ... ****47**6.1. An Introduction and Definition

**...**47

6.2. M-Estimates

**...**50

6.3. The Repeated Median

**...**52

6.4. The Least Median of Squares Regression

**...**53

6.5. The Least Trimmed Squares Regression

**...**58

6.6. The Coakley – Hettmansperger Estimator

**...**61

6.7. Reweighted Least Squares

**...**62

6.8. The Multivariate Reweighted Least Squares Approach

**...**66

6.8.1. The Hat Matrix

**...**67

6.8.2. The Minimum Volume Ellipsoid Estimator

**...**70

6.9. Other Regression Methods and Limitations

**...**71

6.10. Conclusions on Robust Regression

**...**71

**7. Application to SAS and Simulation ... ****74**7.1. Introduction to Robustness Application and Simulation purposes

**...**74

7.2. The initial data set – The zero contamination case

**...**75

7.3. Seemingly negligible contamination in X-direction

**...**76

7.4. Seemingly negligible contamination in Y-direction

**...**77

7.5. High Leverage contamination

**...**78

7.6. Large overall contamination

**...**78

**8. Conclusions ... ****80**

Appendices / References** ... **81

**1. Introduction**

Regression Analysis is an important tool for every quantitative research. It explores the relationship between dependent and explanatory variables. Many hypothesis claimed by economic theories can be tested by applying a Regression Model on real world data. The method of Ordinary Least Squares (OLS) is the most frequently applied Regression Technique. The application of this specific method requires several assumptions. Every researcher is aware of the fact that the OLS method performs poorly if these assumptions are not fulfilled. In the last two centuries, various strategies were introduced to test whether the model assumptions are fulfilled or not. Besides that, various more general Regression Techniques are available which are based on less stringent conditions. Up to the middle of the twentieth century violations of the model assumptions were treated independently from any common error source. But in particular outlying observations within the data can cause violations of model assumptions and thereby can have a huge impact on Regression results. The intention of this thesis is to examine technically the effects of outliers on OLS Regression and to present alternative Regression Techniques. Furthermore this thesis should be a less mathematical demanding introduction into the field of Robust Statistics as it usually provided by mathematical statistics. The practical use of the here considered methods is always the crucial point within this work.

Chapter 2 is recalling the definition of the OLS method and its required assumptions. The definition of Outliers and a first non technical examination of their influence on OLS Regression are presented in chapter 3. The subsequent chapter introduces three approaches to the theory of Robustness: qualitative, infinitesimal and quantitative robustness. These concepts play an important role for the assessment but as well for the development of Robust Regression Techniques. They enable us to point out technically the poor performance of OLS Regression in the presence of outliers.

Chapters 5 through 7 deal with the application of the three Robustness concepts on estimators of location, scale and finally regression. As measures of location and scale can be seen as first steppingstones into Robust Estimation and Testing, in chapter 5 several of these estimates are presented in order to pave the way for Robust Regression Techniques. Furthermore, these sections provide valuable information for robust univariate and multivariate data analysis in general.

In chapter 6, six robust alternatives to OLS Regression are presented in detail, judged with regard to their Robustness properties as well as with regard to their efficiency properties. In particular “on the top” improvements on high breakdown objective Regression estimators are studied in detail as these methods are assumed to be the best performing Regression Techniques available. The considered high breakdown objective Regression estimators are the methods of Least Median of Squares or Least Trimmed Squares. The in chapter 6 introduced Regression Techniques are applied to some simulated data sets in chapter 7 by using the SAS software. Within this chapter are various types of outlier contamination simulated. The comparison of the Regression Techniques focuses on the proneness to outliers, the efficiency of the coefficient estimates and on the computational demand of these methods. Furthermore is in chapter 7 the availability of the introduced Regression Techniques in standard statistical software examined.

Taking into consideration in particular the in chapter 6 and 7 obtained results; a recommendation for particular Regression Techniques will be given in the concluding chapter. We found the method of Reweighted Least Squares (RLS) to be the most recommendable Regression Technique within the scope of the here considered. It combines good Robustness as well as efficiency properties and is besides that available in some of the most frequently used statistical software packages such as SAS, S-PLUS etc.

**[....]**

- Quote paper
- Robert Finger (Author), 2006, Robust Methods in Regression Analysis – Theory and Application, Munich, GRIN Verlag, https://www.grin.com/document/73282

Comments