Please wait
Please install the Adobe Flash Player if no e-book is displayed.
Subtitle: A Survey of their Performance
Diploma Thesis, 2009, 52 Pages
Author: Diplom Volkswirt Jonas Böhmer
Subject: Statistics
Details
Institution/College: University of Bonn (Statistische Abteilung der Rechts- und Staatswissenschaftlichen Fakultät)
Year: 2009
Pages: 52
Grade: 1,6
Language: English
ISBN (E-book): 978-3-640-42241-8
ISBN (Book): 978-3-640-42183-1
Other users also were interested in the following titles:
Abstract
The statistical technique called bootstrap is usable with a lot of inferential problems and it is the main topic of this paper. Since the bootstrap provides material for a whole series of books it is essential to pick one special aspect of the bootstrap and investigate it in depth, otherwise the analysis would inevitably become too general. This aspect is the topic of regression. Hence, this paper will introduce the bootstrap and compare the performance of the new inference methods which it provides with some classical methods of judging a regression which were used in the years before the bootstrap. Therefore the remainder of this paper is as follows: First there will be a description of the basic model in which all of the following investigations will be done, chapter two. The next chapter will describe the different regression techniques which try to solve the model. The fourth chapter is going to show the behavior of these regression techniques in large samples, i.e. shows some classical methods of statistical inference. Following chapter five will give an introduction to the bootstrap which will be succeeded by a description of the bootstrap in regression problems, chapter six. The seventh chapter will show how inference is done with the help of the bootstrap. The eighth chapter is going to compare the performances of classical and bootstrap inference in regressions. Before the concluding remarks of chapter ten, there will be a practical application in chapter nine which tries to prove some observations of the preceeding chapters.
Excerpt (computer-generated)
Rheinische Friedrich-Wilhelms-Universit¨
at Bonn
DIPLOMARBEIT
Least Squares Regressions with the Bootstrap
- A Survey of their Performance
Vorgelegt von:
cand. rer. pol. Jonas B¨
ohmer
Abgabetermin: 25.M¨arz 2009
Contents
I
Contents
1 Introduction
1
2 The model
3
2.1
The basic form .
3
2.2
The disturbance term .
3
3 Regression techniques
4
3.1
The method of least squares .
5
3.1.1
Ordinary Least Squares
.
5
3.1.2
Generalized Least Squares .
6
3.2
Alternative regression methods .
6
4 Classical measures of performance
7
4.1
Bias .
7
4.2
Variances .
8
4.2.1
The variance of OLS .
8
4.2.2
The variance of GLS .
8
4.2.3
A remark on the variances .
9
4.3
Confidence intervals .
9
4.3.1
A remark on the critical values .
9
4.3.2
A confidence interval for OLS 10
4.3.3
A confidence interval for GLS 11
4.4
Rate of convergence 11
5 The bootstrap
12
5.1
How does the bootstrap work? 12
5.2
When does the bootstrap work? 13
5.3
The non-parametric bootstrap 13
5.4
The parametric bootstrap 14
5.5
Why does the bootstrap work? 15
5.6
How many bootstrap repetitions? 16
5.7
On the size of each repetition 17
6 Regressions with the bootstrap
18
6.1
Case resampling 18
6.2
Residual resampling 20
6.3
Wild bootstrap 22
6.4
When to use which method? 23
7 Inference with the bootstrap
24
7.1
Variances with the bootstrap 24
7.2
Confidence intervals with the bootstrap 25
7.2.1
The percentile interval 25
7.2.2
The bootstrap-t interval 26
7.2.3
Other bootstrap intervals 27
7.3
Convergence with the bootstrap 27
Contents
II
8 Classical or bootstrap inference?
28
8.1
Which variance estimate? 28
8.2
Which confidence interval? 29
8.3
When the bootstrap fails 31
9 A practical test for the bootstrap
32
9.1
The datasets 32
9.1.1
The homoscedastic data 33
9.1.2
The heteroscedatic data 33
9.2
The simulations 34
9.2.1
Results simulation one 34
9.2.2
Results simulation two 36
9.3
Resum´e of simulations 37
10 Concluding remarks
38
A Tables simulation one
41
A.1 Table of coefficient ^
1 41
A.2 Table of coefficient ^
2 42
B Tables simulation two
43
B.1 Table of coefficient ^
1 43
B.2 Table of coefficient ^
2 44
C Friedman test values
45
C.1 Test values simulation one 45
C.2 Test values simulation two 45
D Some further formulae
45
E Bibliography
46
1 Introduction
1
1 Introduction
Imagine the kids are in the living room. They are watching TV, Family Feud. The voice
of the presenter cuts through the tense silence: Name a method of working of a statisti-
cian! - One second, two, thr the titleholder hits the buzzer and shouts: Regressions,
of course! So how big a score did he get for this answer? Well, that is unknown since
that was just an imaginary scene in a gameshow. However, this scene gives a good cue
to the content of this paper, for the reason that one of the most prominent problems of
statisticians are indeed the fields of regression.
They have to find a relationship between some explanatory variable and the response
variable with the help of one of the various regression techniques. Unfortunately there
exists no perfect technique since none of them outperforms the rest in all possible sur-
roundings which is why depending on the framework, which is described by some model
assumptions, different methods are used.
But how to measure the performances of the different regression techniques at all? And
after choosing a method: what determines which is the best parameter estimate? There,
too are no definitive answers available.
In the past one often relied on complicated formulae and the asymptotic behavior of an
estimator to measure the performance of the estimator with the (only finitely available)
sample of observations. This was done many years, often satisfactorily. A quite many
times not, because some estimators induced difficulties by drawing conclusions about their
asymptotic distributions or an inference formula just could not be obtained. This led to
the situation that regularly inferior regression methods had to be chosen, although there
were indications that another technique would probably be the more effective choice. Just
because the performance of the suspected inferior estimator was known, whereas it was
not possible to judge the performance of the other estimation technique conclusively.
That was not a satisfying state of affairs. And it lasted till Bradley Efron′s construction
of a technique named bootstrap in the late seventies, until a solution for this dilemma
was found.1 Like his predecessors he saw that it is sometimes not possible to use a sample
and an estimator of a parameter to draw a conclusion on the real parameter value and
1 Cp. Efron (1979a, pp.1-26)
1 Introduction
2
its relationship to the real population. However, in contrast to his forerunners Efron also
asked himself if this relationship can be estimated, which it can. Therefore one takes
the original sample and the original estimate to treat them as a new population and its
parameter value respectively. And from this `new` population one can draw new samples,
estimate new parameter estimates. Even draw inference, because all information about
the underlying population are known. With the help of the law of large numbers, those
results can be used as approximation of the behavior of the original population which was
often unknown in the past!
This technique called bootstrap is usable with a lot of statistical problems and it is the
main topic of this paper. Since the bootstrap provides material for a whole series of books
it is essential to pick one special aspect of the bootstrap and investigate it in depth, oth-
erwise the analysis would inevitably become too general. This aspect is the topic of
regression. Hence, this paper will introduce the bootstrap and compare the performance
of the new inference methods which it provides with some classical methods of judging a
regression which were used in the years before the bootstrap.2 3
Therefore the remainder of this paper is as follows: First there will be a description of the
basic model in which all of the following investigations will be done, chapter two. The
next chapter will describe the different regression techniques which try to solve the model.
The fourth chapter is going to show the behavior of these regression techniques in large
samples, i.e. shows some classical methods of statistical inference. Following chapter five
will give an introduction to the bootstrap which will be succeeded by a description of the
bootstrap in regression problems, chapter six. The seventh chapter will show how infer-
ence is done with the help of the bootstrap. The eighth chapter is going to compare the
performances of classical and bootstrap inference in regressions. Before the concluding
remarks of chapter ten, there will be a practical application in chapter nine which tries
to prove some observations of the preceeding chapters.
2 Cp. Efron (1979b, pp.465-468)
3 In order to avoid confusion with bootstrap inference tools, all inference techniques different from the
bootstraps are denoted as classical in this paper
2 The model
3
2 The model
As mentioned above, one task in statistics is to compute regressions. Therefore one typi-
cally builds a model in order to depict a simplified version of the relationship which will be
investigated. But: Is the relationship linear or non-linear? How many parameters should
the regression have? What is the form of the disturbance term in the model? Which
regression technique to use?
These are some questions the statistician has to answer before starting to calculate. And
they are the topic of this chapter, which will describe the basic model along with a few
assumptions which are valid throughout this paper. Nearly all of them are standard
assumptions, used in many statistical papers which is why they put the subsequent eval-
uation of the bootstrap in the later chapters into a broader context.
2.1 The basic form
The basic model which will be used in this paper will be the standard linear regression
model in matrix form:
y = X +
(1)
Here the regressand y depends linearly on the covariate matrix X (the regressor) and a
disturbance term . One also assumes full rank, so that no covariate is perfectly correlated
with another in order to avoid complications while estimating the parameters of the
model. What the regression tries to do is find the best possible estimate for the parameter
vector .4 Doing this without too complex calculations various assumptions are required
concerning the form of the disturbance term.5
2.2 The disturbance term
Regarding the error term there are a few assumptions. The first concerns the conditional
expected value, which is assumed to be zero:
E[|X] = 0
(2)
4 Cp. Greene (2008, p.11)
5 Sometimes also named error term
3 Regression techniques
4
That way the covariate does not convey any information concerning the form of the error
term. This will hold throughout the paper which is why all conclusions will be drawn
conditional on X. It is possible to show that they also hold in the unconditional case, but
that will not be done here.6
The next assumption is about the variance-covariance matrix of . The covariances
(the off-diagonal elements of ) of any two disturbances will be zero:
Cov[i; j|X] = 0 i = j
(3)
That way all problems which arise because of autocorrelation can be discarded since they
do not matter here.
Then an assumption is usually made on the variance of the disturbance, the elements on
′s diagonal. That will not be done here. The reason is that this paper uses different
forms of variances. It assumes in one part that the error is homoscedastic, so that the
variances are all constant and the same:
V ar[i|X] = 2 i 1, ..., n
(4)
In the other part the estimators will also have to deal with heteroscedasticity, so that:
i :
V ar[i|X] = 2i = 2
(5)
These two different characteristics of the disturbance make it necessary to introduce two
different regression techniques in the next chapter. They also have some influence on the
choosing of bootstrap methods in chapter 6.
3 Regression techniques
The regression technique this paper concentrates on is least squares regression. Indeed,
there are many other techniques, and some of them will be discussed at the end of this
chapter. However, the bulk of the investigations will be done with the method of least
6 Cp. Greene (2008, pp.49-50)
3 Regression techniques
5
squares. The reason is that a satisfactory analysis is difficult with some other methods
since their statistical properties have not been obtained by a classical approach until now
- as for example the standard error of a least median of squares regression.7 Thus, in order
to make one intensive investigation instead of an extensive but shallow one this focus had
to be chosen.
3.1 The method of least squares
The two regression methods which will be discussed below are applicable to different
settings and depending on it perform quite differently. In spite of this they have one
thing in common which distinguishes them from other regression methods: they minimize
the sum of squared residuals.8 These two methods are Ordinary Least Squares (OLS) and
Generalized Least Squares (GLS) and their fundamental ideas are depicted in chapters
3.1.1 and 3.1.2 respectively.
3.1.1 Ordinary Least Squares
Underlying the OLS minimization of the sum of squared residuals is the basic regression
model, equation (1). Along with that model, the other important fact is that one assumes
the error term to be homoscedastic:
min(y - X)T (y - X)
^
OLS = (XT X)-1XT y
(6)
The popularity of this solution depends crucially on the assumption of constant error
variances; = 2I. 9 In all other cases OLS does not yield a good estimator since it then
fails to incorporate the form of the variance of the error term truthfully. However, among
all estimators which are available in a linear model with homoscedasticity and which
are unbiased the OLS estimate is the one with the smallest variance; a result known as
Gauss-Markov-theorem.10
7 Cp. Efron/Tibshirani (1993, pp.119-121)
8 ^ = y - X residuals=regressand - fitted model
9 This term is the same as equation (4) but in matrix notation
10 Cp. Tanizaki (2004, pp.62-63)
Comments
No comments yet
Other users also were interested in the following titles:
Formatvorlage / Vorlage für eine Diplomarbeit - Formatvorlage / Vorlage für eine Hausarbeit für Microsoft Word
Author: GRIN VerlagPresentations, Models, Tutorials, Instructions, 2005 Download as PDF-file for 6,99 EUR
Formatvorlage / Vorlage für eine Diplomarbeit - Formatvorlage / Vorlage für eine Hausarbeit für OpenOffice.org
Author: GRIN VerlagPresentations, Models, Tutorials, Instructions, 2005 Download as PDF-file for 9,99 EUR
Formatvorlage zur Erstellung einer Diplomarbeit / Vorlage zur Erstellung einer Hausarbeit
Author: Marco FeindlerPresentations, Models, Tutorials, Instructions, 2005 Download as PDF-file for 6,99 EUR
Formatvorlage / Vorlage für eine Diplomarbeit / Hausarbeit
Author: GRIN VerlagPresentations, Models, Tutorials, Instructions, 2008 Download as PDF-file for 6,99 EUR
Anleitung zum Erstellen schriftlicher Arbeiten: Der Aufbau einer wissenschaftlichen Arbeit
Author: Zoran ZivkovicPresentations, Models, Tutorials, Instructions, 2004 Download as PDF-file for 5,99 EUR
Erstellen einer schriftlichen Hausarbeit
Author: Claudia NickelPresentations, Models, Tutorials, Instructions, 2006 Download as PDF-file for 4,99 EUR
Grundtechniken wissenschaftlichen Arbeitens
Author: Maik PhilippPresentations, Models, Tutorials, Instructions, 2004 Download as PDF-file for 5,99 EUR
Ratgeber zur Erstellung wissenschaftlicher Arbeiten. Diplomarbeiten - Hausarbeiten - Seminararbeiten
Author: Mark RichterPresentations, Models, Tutorials, Instructions, 2008
This text can be quoted and accessed from this url: