Contents I
Contents
1 Introduction 1
2 The model 3
2.1 The basic form 3
2.2 The disturbance term 3
3 Regression techniques 4
3.1 The method of least squares 5
3.1.1 Ordinary Least Squares 5
3.1.2 Generalized Least Squares 6
3.2 Alternative regression methods 6
4 Classical measures of performance 7
4.1 Bias 7
4.2 Variances 8
4.2.1 The variance of OLS 8
4.2.2 The variance of GLS 8
4.2.3 A remark on the variances 9
4.3 Confidence intervals 9
4.3.1 A remark on the critical values 9
4.3.2 A confidence interval for OLS 10
4.3.3 A confidence interval for GLS 11
4.4 Rate of convergence 11
5 The bootstrap 12
5.1 How does the bootstrap work? 12
5.2 When does the bootstrap work? 13
5.3 The non-parametric bootstrap 13
5.4 The parametric bootstrap 14
5.5 Why does the bootstrap work? 15
5.6 How many bootstrap repetitions? 16
5.7 On the size of each repetition 17
6 Regressions with the bootstrap 18
6.1 Case resampling 18
6.2 Residual resampling 20
6.3 Wild bootstrap 22
6.4 When to use which method? 23
7 Inference with the bootstrap 24
7.1 Variances with the bootstrap 24
7.2 Confidence intervals with the bootstrap 25
7.2.1 The percentile interval 25
7.2.2 The bootstrap-t interval 26
7.2.3 Other bootstrap intervals 27
7.3 Convergence with the bootstrap 27
Contents II
8 Classical or bootstrap inference? 28
8.1 Which variance estimate? 28
8.2 Which confidence interval? 29
8.3 When the bootstrap fails 31
9 A practical test for the bootstrap 32
9.1 The datasets 32
9.1.1 The homoscedastic data 33
9.1.2 The heteroscedatic data 33
9.2 The simulations 34
9.2.1 Results simulation one 34
9.2.2 Results simulation two 36
9.3 Resum´ e of simulations 37
10 Concluding remarks 38
A Tables simulation one 41
A 1 Table of coefficient ˆ
β 1 41
A 2 Table of coefficient ˆ
β 2 42
B Tables simulation two 43
B 1 Table of coefficient ˆ
β 1 43
B 2 Table of coefficient ˆ
β 2 44
C Friedman test values 45
C 1 Test values simulation one 45
C 2 Test values simulation two 45
D Some further formulae 45
E Bibliography 46
1 Introduction 1
1 Introduction
Imagine the kids are in the living room. They are watching TV, Family Feud. The voice of the presenter cuts through the tense silence: Name a method of working of a statistician! - One second, two, thr.... the titleholder hits the buzzer and shouts: Regressions, of course! So how big a score did he get for this answer? Well, that is unknown since that was just an imaginary scene in a gameshow. However, this scene gives a good cue to the content of this paper, for the reason that one of the most prominent problems of statisticians are indeed the fields of regression.
They have to find a relationship between some explanatory variable and the response variable with the help of one of the various regression techniques. Unfortunately there exists no perfect technique since none of them outperforms the rest in all possible sur-roundings which is why depending on the framework, which is described by some model assumptions, different methods are used.
But how to measure the performances of the different regression techniques at all? And after choosing a method: what determines which is the best parameter estimate? There, too are no definitive answers available.
In the past one often relied on complicated formulae and the asymptotic behavior of an estimator to measure the performance of the estimator with the (only finitely available) sample of observations. This was done many years, often satisfactorily. A quite many times not, because some estimators induced difficulties by drawing conclusions about their asymptotic distributions or an inference formula just could not be obtained. This led to the situation that regularly inferior regression methods had to be chosen, although there were indications that another technique would probably be the more effective choice. Just because the performance of the suspected inferior estimator was known, whereas it was not possible to judge the performance of the other estimation technique conclusively. That was not a satisfying state of affairs. And it lasted till Bradley Efron’s construction of a technique named bootstrap in the late seventies, until a solution for this dilemma was found. 1 Like his predecessors he saw that it is sometimes not possible to use a sample and an estimator of a parameter to draw a conclusion on the real parameter value and
1 Cp. Efron (1979a, pp.1-26)
1 Introduction 2
its relationship to the real population. However, in contrast to his forerunners Efron also asked himself if this relationship can be estimated, which it can. Therefore one takes the original sample and the original estimate to treat them as a new population and its parameter value respectively. And from this ‘new‘ population one can draw new samples, estimate new parameter estimates. Even draw inference, because all information about the underlying population are known. With the help of the law of large numbers, those results can be used as approximation of the behavior of the original population which was often unknown in the past!
This technique called bootstrap is usable with a lot of statistical problems and it is the main topic of this paper. Since the bootstrap provides material for a whole series of books it is essential to pick one special aspect of the bootstrap and investigate it in depth, otherwise the analysis would inevitably become too general. This aspect is the topic of regression. Hence, this paper will introduce the bootstrap and compare the performance of the new inference methods which it provides with some classical methods of judging a regression which were used in the years before the bootstrap. 2 3
Therefore the remainder of this paper is as follows: First there will be a description of the basic model in which all of the following investigations will be done, chapter two. The next chapter will describe the different regression techniques which try to solve the model. The fourth chapter is going to show the behavior of these regression techniques in large samples, i.e. shows some classical methods of statistical inference. Following chapter five will give an introduction to the bootstrap which will be succeeded by a description of the bootstrap in regression problems, chapter six. The seventh chapter will show how inference is done with the help of the bootstrap. The eighth chapter is going to compare the performances of classical and bootstrap inference in regressions. Before the concluding remarks of chapter ten, there will be a practical application in chapter nine which tries to prove some observations of the preceeding chapters.
2 Cp. Efron (1979b, pp.465-468)
3 In order to avoid confusion with bootstrap inference tools, all inference techniques different from the bootstraps are denoted as classical in this paper
2 The model 3
2 The model
As mentioned above, one task in statistics is to compute regressions. Therefore one typically builds a model in order to depict a simplified version of the relationship which will be investigated. But: Is the relationship linear or non-linear? How many parameters should the regression have? What is the form of the disturbance term in the model? Which regression technique to use?
These are some questions the statistician has to answer before starting to calculate. And they are the topic of this chapter, which will describe the basic model along with a few assumptions which are valid throughout this paper. Nearly all of them are standard assumptions, used in many statistical papers which is why they put the subsequent evaluation of the bootstrap in the later chapters into a broader context.
2.1 The basic form
The basic model which will be used in this paper will be the standard linear regression model in matrix form: y = Xβ + ǫ (1)
Here the regressand y depends linearly on the covariate matrix X (the regressor) and a disturbance term ǫ. One also assumes full rank, so that no covariate is perfectly correlated with another in order to avoid complications while estimating the parameters of the model. What the regression tries to do is find the best possible estimate for the parameter vector β. 4 Doing this without too complex calculations various assumptions are required concerning the form of the disturbance term. 5
2.2 The disturbance term
Regarding the error term there are a few assumptions. The first concerns the conditional expected value, which is assumed to be zero:
E[ǫ|X] = 0 (2)
4 Cp. Greene (2008, p.11)
5 Sometimes also named error term
3 Regression techniques 4
That way the covariate does not convey any information concerning the form of the error term. This will hold throughout the paper which is why all conclusions will be drawn conditional on X. It is possible to show that they also hold in the unconditional case, but that will not be done here. 6
The next assumption is about the variance-covariance matrix Ω of ǫ. The covariances (the off-diagonal elements of Ω) of any two disturbances will be zero:
(3) Cov[ǫ i ; ǫ j |X] = 0 ∀i = j
That way all problems which arise because of autocorrelation can be discarded since they do not matter here.
Then an assumption is usually made on the variance of the disturbance, the elements on Ω’s diagonal. That will not be done here. The reason is that this paper uses different forms of variances. It assumes in one part that the error is homoscedastic, so that the variances are all constant and the same:
V ar[ǫ i |X] = σ 2 ∀i ∈ 1, ..., n (4)
In the other part the estimators will also have to deal with heteroscedasticity, so that:
∃i : V ar[ǫ i |X] = σ 2 i = σ 2 (5)
These two different characteristics of the disturbance make it necessary to introduce two different regression techniques in the next chapter. They also have some influence on the choosing of bootstrap methods in chapter 6.
3 Regression techniques
The regression technique this paper concentrates on is least squares regression. Indeed, there are many other techniques, and some of them will be discussed at the end of this chapter. However, the bulk of the investigations will be done with the method of least
6 Cp. Greene (2008, pp.49-50)
3 Regression techniques 5
squares. The reason is that a satisfactory analysis is difficult with some other methods since their statistical properties have not been obtained by a classical approach until now
- as for example the standard error of a least median of squares regression. 7 Thus, in order to make one intensive investigation instead of an extensive but shallow one this focus had to be chosen.
3.1 The method of least squares
The two regression methods which will be discussed below are applicable to different settings and depending on it perform quite differently. In spite of this they have one thing in common which distinguishes them from other regression methods: they minimize the sum of squared residuals. 8 These two methods are Ordinary Least Squares (OLS) and Generalized Least Squares (GLS) and their fundamental ideas are depicted in chapters 3.1.1 and 3.1.2 respectively.
3.1.1 Ordinary Least Squares
Underlying the OLS minimization of the sum of squared residuals is the basic regression model, equation (1). Along with that model, the other important fact is that one assumes the error term to be homoscedastic:
The popularity of this solution depends crucially on the assumption of constant error variances; Ω = σ 2 I. 9 In all other cases OLS does not yield a good estimator since it then fails to incorporate the form of the variance of the error term truthfully. However, among all estimators which are available in a linear model with homoscedasticity and which are unbiased the OLS estimate is the one with the smallest variance; a result known as Gauss-Markov-theorem. 10
7 Cp. Efron/Tibshirani (1993, pp.119-121) ǫ = y − X 8 ˆ β ⇔ residuals=regressand − fitted model
9 This term is the same as equation (4) but in matrix notation
10 Cp. Tanizaki (2004, pp.62-63)
3 Regression techniques 6
3.1.2 Generalized Least Squares
The other least squares technique discussed here is GLS. As the name indicates it is a generalized version of OLS. However, contrary to OLS it does not depend on a homoscedastic error term but allows the variance of the disturbance to vary. 11 A few matrix multiplications are necessary to transform model (1) and getting an equation for which OLS is again applicable. Consequently the new minimization problem, which differs from the OLS equation (6), is the following:
(y − Xβ) T Ω −1 (y − Xβ) ⇒ ˆ β GLS = (X T Ω −1 X) −1 X T Ω −1 y min (7) β
Hence one sees that the GLS estimator varies from its OLS counterpart through the appearance of Ω −1 , the inverse matrix of the disturbance’s variance. In practice this matrix Ω −1 is usually unknown which is why it is typically estimated by a prior OLS run. Thus in practice ˆ β GLS is not a function of the known quantities X, y and Ω but of X, y and the estimated covariance matrix ˆ Ω. Of course this procedure entails
more calculations than a simple OLS. However, this is worth the costs since by that way the heteroscedasticity is dealt with which OLS is unable to do. Using OLS in spite of heteroscedasticity would give wrong estimates. 12 Among the generalized models ˆ β GLS is
that unbiased estimator with the smallest variance, a result known as Aitken’s theorem. 13
3.2 Alternative regression methods
These were two regression techniques to find the parameters in a linear model. And they were the most popular ones. But there are other regression methods as well and not each one depends on the minimization of the squared residuals.
One could also, like Breidt et at.(2001) did, minimize the sum of absolute deviations. 14 This would have the advantage of less influence of outlying observations, which is a ma-jor problem with least squares since only one observation is necessary to influence the
11 All other elements in the covariance matrix Ω remain zero (Ω having different values on the diagonal only), since the assumption of non-autocorrelation is not loosened
12 Cp. Davidson/MacKinnon (2004, pp.257-262)
13 Cp. Greene (2008, p.155) n i=1 |y i − x T 14 min β i β|
4 Classical measures of performance 7
whole regression. 15 But, although Least Absolute Deviations (LAD) performs much better against outlying regressands it still, like LS has the high breakdown of 1/n, because outlying regressors manage to influence the regression very easily. 16 A second alternative would be to use Least Median of Squares (LMS), where those coefficients are taken which induce the least median of the squared residuals. 17 Rousseeuw and Leroy(1987) show that this technique is very robust against outlying observations with a breakdown point of roughly 50%. 18 The cause why it is not used more widely is not that it is not very efficient - which in actual fact it is not. 19 The reason is that both techniques (LMS and also LAD) lack information about their performances since for a long time it was very difficult to base inference on them.
These two regression methods will not be investigated any further. Their only purpose is to show two alternatives which could be used whenever least squares regressions fail to give satisfying results, because it may be the case that one of them is able to achieve the desired performance.
4 Classical measures of performance
After introducing the OLS and GLS estimators the next step is to investigate their per-formance. There are many possible ways to evaluate this performance so it is necessary to depict more than one measure. Otherwise the conclusions are conditioned by the per-formance under one criterion while possibly performing diametrically under another one. Thus, without a paramount criterion of performance it is reasonable and necessary to use more than one measure.
4.1 Bias
One such criterion could be the bias. However, under the present circumstances that would not be a good choice. Since the disturbance is assumed to have an expected value of zero it yields regression coefficients which are unbiased under both least squares
15 Cp. Breidt et al. (2001, pp.919-946)
16 Cp. Rousseeuw/Leroy (1987, pp.10-12)
17 min β median(y − Xβ) T (y − Xβ)
18 Cp. Rousseeuw/Leroy (1987, p.183)
19 Cp. Rousseeuw (1984, pp.871-880)
Quote paper:
Diplom Volkswirt Jonas Böhmer, 2009, Least Squares Regressions with the Bootstrap, Munich, GRIN Publishing GmbH
This text can be quoted and accessed from this url:
Embed
DOI
Interne Verrechnungspreise im Konzern - Formen und Methoden der Verrec...
Business economics - Controlling
Diploma Thesis, 63 Pages
Dezentrale Unternehmenskoordination mit Verrechnungspreisen
Ermittlungsansätze der Theorie...
Business economics - Business Management, Corporate Governance
Diploma Thesis, 76 Pages
Jonas Böhmer's text Least Squares Regressions with the Bootstrap is now available as a printed book
Jonas Böhmer has published the text Least Squares Regressions with the Bootstrap
Jonas Böhmer has uploaded a new text
Kausalmodellierung mit Partial Least Squares
Eine anwendungsorientierte Ein...
Frank Huber, Andreas Herrmann, Frederik Meyer, Johannes Vogel, Kai Vollhardt
Total Least Squares and Errors-in-Variables Modeling
Analysis, Algorithms and Appli...
P. Lemmerling, S. van Huffel
Total Least Squares and Errors-in-Variables Modeling
Analysis, Algorithms and Appli...
P. Lemmerling, S. van Huffel
Handbook of Partial Least Squares
Concepts, Methods and Applicat...
Vincenzo Esposito Vinzi, Wynne W. Chin, Jörg Henseler, Huiwen Wang
0 comments