Register or log in at GRIN

Your e-mail-address or password is wrong
Register now
For new authors: free, easy and fast
This will be used as your user name, please specify a valid e-mail address

Lost password

Your e-mail-address or password is wrong

Request a new password
Least Squares Regressions with the Bootstrap close

Please wait

Please install the Adobe Flash Player if no e-book is displayed.

Least Squares Regressions with the Bootstrap

Subtitle: A Survey of their Performance

Diploma Thesis, 2009, 52 Pages
Author: Diplom Volkswirt Jonas Böhmer
Subject: Statistics

Details

Event: Diplomarbeit bei Prof.Dr. Alois Kneip
Institution/College: University of Bonn (Statistische Abteilung der Rechts- und Staatswissenschaftlichen Fakultät)
Category: Diploma Thesis
Year: 2009
Pages: 52
Grade: 1,6
Language: English
Archive No.: V135688
ISBN (E-book): 978-3-640-42241-8
ISBN (Book): 978-3-640-42183-1

Abstract

The statistical technique called bootstrap is usable with a lot of inferential problems and it is the main topic of this paper. Since the bootstrap provides material for a whole series of books it is essential to pick one special aspect of the bootstrap and investigate it in depth, otherwise the analysis would inevitably become too general. This aspect is the topic of regression. Hence, this paper will introduce the bootstrap and compare the performance of the new inference methods which it provides with some classical methods of judging a regression which were used in the years before the bootstrap. Therefore the remainder of this paper is as follows: First there will be a description of the basic model in which all of the following investigations will be done, chapter two. The next chapter will describe the different regression techniques which try to solve the model. The fourth chapter is going to show the behavior of these regression techniques in large samples, i.e. shows some classical methods of statistical inference. Following chapter five will give an introduction to the bootstrap which will be succeeded by a description of the bootstrap in regression problems, chapter six. The seventh chapter will show how inference is done with the help of the bootstrap. The eighth chapter is going to compare the performances of classical and bootstrap inference in regressions. Before the concluding remarks of chapter ten, there will be a practical application in chapter nine which tries to prove some observations of the preceeding chapters.


Excerpt (computer-generated)

Rheinische Friedrich-Wilhelms-Universit¨

at Bonn

DIPLOMARBEIT

Least Squares Regressions with the Bootstrap

- A Survey of their Performance

Vorgelegt von:

cand. rer. pol. Jonas B¨

ohmer

Abgabetermin: 25.M¨arz 2009


Contents

I

Contents

1 Introduction

1

2 The model

3

2.1

The basic form .

3

2.2

The disturbance term .

3

3 Regression techniques

4

3.1

The method of least squares .

5

3.1.1

Ordinary Least Squares

.

5

3.1.2

Generalized Least Squares .

6

3.2

Alternative regression methods .

6

4 Classical measures of performance

7

4.1

Bias .

7

4.2

Variances .

8

4.2.1

The variance of OLS .

8

4.2.2

The variance of GLS .

8

4.2.3

A remark on the variances .

9

4.3

Confidence intervals .

9

4.3.1

A remark on the critical values .

9

4.3.2

A confidence interval for OLS 10

4.3.3

A confidence interval for GLS 11

4.4

Rate of convergence 11

5 The bootstrap

12

5.1

How does the bootstrap work? 12

5.2

When does the bootstrap work? 13

5.3

The non-parametric bootstrap 13

5.4

The parametric bootstrap 14

5.5

Why does the bootstrap work? 15

5.6

How many bootstrap repetitions? 16

5.7

On the size of each repetition 17

6 Regressions with the bootstrap

18

6.1

Case resampling 18

6.2

Residual resampling 20

6.3

Wild bootstrap 22

6.4

When to use which method? 23

7 Inference with the bootstrap

24

7.1

Variances with the bootstrap 24

7.2

Confidence intervals with the bootstrap 25

7.2.1

The percentile interval 25

7.2.2

The bootstrap-t interval 26

7.2.3

Other bootstrap intervals 27

7.3

Convergence with the bootstrap 27


Contents

II

8 Classical or bootstrap inference?

28

8.1

Which variance estimate? 28

8.2

Which confidence interval? 29

8.3

When the bootstrap fails 31

9 A practical test for the bootstrap

32

9.1

The datasets 32

9.1.1

The homoscedastic data 33

9.1.2

The heteroscedatic data 33

9.2

The simulations 34

9.2.1

Results simulation one 34

9.2.2

Results simulation two 36

9.3

Resum´e of simulations 37

10 Concluding remarks

38

A Tables simulation one

41

A.1 Table of coefficient ^

1 41

A.2 Table of coefficient ^

2 42

B Tables simulation two

43

B.1 Table of coefficient ^

1 43

B.2 Table of coefficient ^

2 44

C Friedman test values

45

C.1 Test values simulation one 45

C.2 Test values simulation two 45

D Some further formulae

45

E Bibliography

46


1 Introduction

1

1 Introduction

Imagine the kids are in the living room. They are watching TV, Family Feud. The voice

of the presenter cuts through the tense silence: Name a method of working of a statisti-

cian! - One second, two, thr the titleholder hits the buzzer and shouts: Regressions,

of course! So how big a score did he get for this answer? Well, that is unknown since

that was just an imaginary scene in a gameshow. However, this scene gives a good cue

to the content of this paper, for the reason that one of the most prominent problems of

statisticians are indeed the fields of regression.

They have to find a relationship between some explanatory variable and the response

variable with the help of one of the various regression techniques. Unfortunately there

exists no perfect technique since none of them outperforms the rest in all possible sur-

roundings which is why depending on the framework, which is described by some model

assumptions, different methods are used.

But how to measure the performances of the different regression techniques at all? And

after choosing a method: what determines which is the best parameter estimate? There,

too are no definitive answers available.

In the past one often relied on complicated formulae and the asymptotic behavior of an

estimator to measure the performance of the estimator with the (only finitely available)

sample of observations. This was done many years, often satisfactorily. A quite many

times not, because some estimators induced difficulties by drawing conclusions about their

asymptotic distributions or an inference formula just could not be obtained. This led to

the situation that regularly inferior regression methods had to be chosen, although there

were indications that another technique would probably be the more effective choice. Just

because the performance of the suspected inferior estimator was known, whereas it was

not possible to judge the performance of the other estimation technique conclusively.

That was not a satisfying state of affairs. And it lasted till Bradley Efron′s construction

of a technique named bootstrap in the late seventies, until a solution for this dilemma

was found.1 Like his predecessors he saw that it is sometimes not possible to use a sample

and an estimator of a parameter to draw a conclusion on the real parameter value and

1 Cp. Efron (1979a, pp.1-26)


1 Introduction

2

its relationship to the real population. However, in contrast to his forerunners Efron also

asked himself if this relationship can be estimated, which it can. Therefore one takes

the original sample and the original estimate to treat them as a new population and its

parameter value respectively. And from this `new` population one can draw new samples,

estimate new parameter estimates. Even draw inference, because all information about

the underlying population are known. With the help of the law of large numbers, those

results can be used as approximation of the behavior of the original population which was

often unknown in the past!

This technique called bootstrap is usable with a lot of statistical problems and it is the

main topic of this paper. Since the bootstrap provides material for a whole series of books

it is essential to pick one special aspect of the bootstrap and investigate it in depth, oth-

erwise the analysis would inevitably become too general. This aspect is the topic of

regression. Hence, this paper will introduce the bootstrap and compare the performance

of the new inference methods which it provides with some classical methods of judging a

regression which were used in the years before the bootstrap.2 3

Therefore the remainder of this paper is as follows: First there will be a description of the

basic model in which all of the following investigations will be done, chapter two. The

next chapter will describe the different regression techniques which try to solve the model.

The fourth chapter is going to show the behavior of these regression techniques in large

samples, i.e. shows some classical methods of statistical inference. Following chapter five

will give an introduction to the bootstrap which will be succeeded by a description of the

bootstrap in regression problems, chapter six. The seventh chapter will show how infer-

ence is done with the help of the bootstrap. The eighth chapter is going to compare the

performances of classical and bootstrap inference in regressions. Before the concluding

remarks of chapter ten, there will be a practical application in chapter nine which tries

to prove some observations of the preceeding chapters.

2 Cp. Efron (1979b, pp.465-468)

3 In order to avoid confusion with bootstrap inference tools, all inference techniques different from the

bootstraps are denoted as classical in this paper


2 The model

3

2 The model

As mentioned above, one task in statistics is to compute regressions. Therefore one typi-

cally builds a model in order to depict a simplified version of the relationship which will be

investigated. But: Is the relationship linear or non-linear? How many parameters should

the regression have? What is the form of the disturbance term in the model? Which

regression technique to use?

These are some questions the statistician has to answer before starting to calculate. And

they are the topic of this chapter, which will describe the basic model along with a few

assumptions which are valid throughout this paper. Nearly all of them are standard

assumptions, used in many statistical papers which is why they put the subsequent eval-

uation of the bootstrap in the later chapters into a broader context.

2.1 The basic form

The basic model which will be used in this paper will be the standard linear regression

model in matrix form:

y = X +

(1)

Here the regressand y depends linearly on the covariate matrix X (the regressor) and a

disturbance term . One also assumes full rank, so that no covariate is perfectly correlated

with another in order to avoid complications while estimating the parameters of the

model. What the regression tries to do is find the best possible estimate for the parameter

vector .4 Doing this without too complex calculations various assumptions are required

concerning the form of the disturbance term.5

2.2 The disturbance term

Regarding the error term there are a few assumptions. The first concerns the conditional

expected value, which is assumed to be zero:

E[|X] = 0

(2)

4 Cp. Greene (2008, p.11)

5 Sometimes also named error term


3 Regression techniques

4

That way the covariate does not convey any information concerning the form of the error

term. This will hold throughout the paper which is why all conclusions will be drawn

conditional on X. It is possible to show that they also hold in the unconditional case, but

that will not be done here.6

The next assumption is about the variance-covariance matrix of . The covariances

(the off-diagonal elements of ) of any two disturbances will be zero:

Cov[i; j|X] = 0 i = j

(3)

That way all problems which arise because of autocorrelation can be discarded since they

do not matter here.

Then an assumption is usually made on the variance of the disturbance, the elements on

′s diagonal. That will not be done here. The reason is that this paper uses different

forms of variances. It assumes in one part that the error is homoscedastic, so that the

variances are all constant and the same:

V ar[i|X] = 2 i 1, ..., n

(4)

In the other part the estimators will also have to deal with heteroscedasticity, so that:

i :

V ar[i|X] = 2i = 2

(5)

These two different characteristics of the disturbance make it necessary to introduce two

different regression techniques in the next chapter. They also have some influence on the

choosing of bootstrap methods in chapter 6.

3 Regression techniques

The regression technique this paper concentrates on is least squares regression. Indeed,

there are many other techniques, and some of them will be discussed at the end of this

chapter. However, the bulk of the investigations will be done with the method of least

6 Cp. Greene (2008, pp.49-50)


3 Regression techniques

5

squares. The reason is that a satisfactory analysis is difficult with some other methods

since their statistical properties have not been obtained by a classical approach until now

- as for example the standard error of a least median of squares regression.7 Thus, in order

to make one intensive investigation instead of an extensive but shallow one this focus had

to be chosen.

3.1 The method of least squares

The two regression methods which will be discussed below are applicable to different

settings and depending on it perform quite differently. In spite of this they have one

thing in common which distinguishes them from other regression methods: they minimize

the sum of squared residuals.8 These two methods are Ordinary Least Squares (OLS) and

Generalized Least Squares (GLS) and their fundamental ideas are depicted in chapters

3.1.1 and 3.1.2 respectively.

3.1.1 Ordinary Least Squares

Underlying the OLS minimization of the sum of squared residuals is the basic regression

model, equation (1). Along with that model, the other important fact is that one assumes

the error term to be homoscedastic:

min(y - X)T (y - X)

^

OLS = (XT X)-1XT y

(6)

The popularity of this solution depends crucially on the assumption of constant error

variances; = 2I. 9 In all other cases OLS does not yield a good estimator since it then

fails to incorporate the form of the variance of the error term truthfully. However, among

all estimators which are available in a linear model with homoscedasticity and which

are unbiased the OLS estimate is the one with the smallest variance; a result known as

Gauss-Markov-theorem.10

7 Cp. Efron/Tibshirani (1993, pp.119-121)

8 ^ = y - X residuals=regressand - fitted model

9 This term is the same as equation (4) but in matrix notation

10 Cp. Tanizaki (2004, pp.62-63)



Comments

No comments yet

Add Comment
Your comment is reviewed before being published

Other users also were interested in the following titles:

Erstellen einer schriftlichen Hausarbeit

Author: Claudia Nickel
Presentations, Models, Tutorials, Instructions, 2006 Download as PDF-file for 4,99 EUR

Grundtechniken wissenschaftlichen Arbeitens

Author: Maik Philipp
Presentations, Models, Tutorials, Instructions, 2004 Download as PDF-file for 5,99 EUR

This text can be quoted and accessed from this url:

http://www.grin.com/e-book/135688/least-squares-regressions-with-the-bootstrap
please wait Please wait