Free online reading
Modelling the yield curve based on a partial conjecture of future yields
Ramtien Kalantar Nayestanaki
Abstract:
The reader is introduced to term structure modelling using
the Dynamic Nelson-Siegel model. Assuming an independent and correlated
specification for its factors, we estimate the factor dynamics by maximum
likelihood. Additionally, estimation of the factors is done by Kalman filter-
ing. We derive a closed-form distribution for future factors, forecast them
and present the in-sample and out-of-sample forecasts. As a useful addition,
we discuss the main finding of the thesis, namely a stochastic model for the
predicted yield curve, when a future yield with certain maturity is given.
Keywords: yield curve modelling, Nelson Siegel model, factor dynamics,
Kalman filter, forecasting.
III
Personal note
As my knowledge of mathematics and statistics has grown over the years, so
has my passion for finance. It has led me to believe that I might become a
practitioner or researcher in the field. In order to check if this desire harmo-
nizes with my experience, I have chosen to write my thesis at a firm called
TKP Investments (TKPI): a Dutch fiduciary manager, originally founded as a
pension fund for the Dutch state-owned posts and telecommunications com-
pany PTT in 1989. The firm currently manages over 25 billion euros for
pension funds of PostNL, KPN and other institutional clients. As one of the
few large financial institutions located in the area of my university, it was an
ideal opportunity to gain experience in the financial industry and to do my
academic research in an experienced and skilled environment.
Acknowledgement
First of all, I would like to show my gratitude to my supervisor Professor
dr. Paul Bekker, an expert on the subjects discussed in this paper. In an
early stage of my curriculum he has thought me about probability theory,
probability distributions, linear models in statistics and stochastic calculus.
Equipped with a solid foundation in these areas, I have been able to complete
my research. It has been a privilege to work with him and his comments have
substantially improved this paper.
I wish to express my sincere thanks to Dr. Sibrand Drijver for his continuous
encouragement, extensive guidance throughout my research and for making
my internship a joyful experience. Without his supervision, this research
would not have been possible. It is my hope that this thesis is beneficial to
him and the rest of the department of Investment Strategy at TKPI.
I am also grateful to Andr´e Broersma (CFA) who has been a guiding mentor
from a very early stage in my bachelors degree. The opportunity to intern at
the Investment Strategy department of TKPI would not have been granted
without his trust and support.
Last, but certainly not least, I show my gratitude to my father, colleagues
and fellow students for their interest in my thesis and their valuable support.
IV
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.1
The zero-coupon bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.2
General notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.3
Root Mean Squared Error (RMSE) . . . . . . . . . . . . . . . . . . . . . . . .
4
3
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
4
Term structure of interest rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
4.2
Economic and historical remarks . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
5
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
5.2
Dynamic Nelson-Siegel model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
5.3
Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
5.4
Choice of
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
6
Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
6.2
OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.3
Factor Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.4
ML estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.5
State-space representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.6
Kalman Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.7
Kalman Filter log-likelihood function. . . . . . . . . . . . . . . . . . . . . . 18
6.8
Kalman Filter estimation results . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7
Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2
Distribution of factors under iDNS model . . . . . . . . . . . . . . . . . 20
7.3
In-sample forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.4
Out-of-sample forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.5
Biasedness of forecast results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8
The yield curve based on a partial conjecture of future yields . . . 22
8.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
8.2
Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
8.3
In-sample example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9
Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
10
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
10.1
Distribution of forecasted factors at time t + h given
factors at time t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
11
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Introduction
1
Introduction
Traders, asset managers, pension fund managers and anyone whose work in-
volves fixed-income all heavily rely on the term structure of interest rates
(Christensen [5]). Improvements in yield curve modelling results in enhanced
pricing of assets and liabilities, hedging of risks and allocation of assets (Chris-
tensen [5]). For these reasons - along with its theoretical complexities - it has
been a topic of extensive research in both academia and financial institutions.
A widely known class of term structure models is the a ne term structure
model (ATSM), which define the short rate to be an exponentially a ne func-
tion of the initial state vector. A variety of ATSMs are discussed in Chen and
Scott [3], Cox, Ingersoll and Ross [6], Hull and White [22] and Vasicek [33].
As solutions for bond prices can be found in closed-form, ATSMs are popular
for their theoretical tractability and arbitrage-free assumptions
1
. However, in
Duee [14] and Dai and Singleton [8], this class of models is shown to perform
poorly in forecasting.
A parametric class of models has been introduced in Nelson and Siegel [27],
called the Nelson-Siegel model. Despite lacking the theoretical aspects (no
arbitrage assumptions; no closed-form solution for bond prices), showed in
Filipovic [16], it is empirically superior to the a ne class of models, as it
captures yield curve dynamics unparalleled.
In Diebold and Li [10], the
Nelson-Siegel model is put into panel-data context, referred to as the Dy-
namic Nelson-Siegel model, where the factors are interpreted as level, slope
and curvature. It seems as if there is a trade-o between theoretical short-
coming of the model versus empirical shortcomings. Therefore, a setup is
introduced in Christensen, Diebold and Rudebusch [4] where the Dynamic
Nelson Siegel model is combined with arbitrage-free restrictions, giving rise
to a specification containing `the best of both worlds' .
In this thesis, we solely work with the Dynamic Nelson-Siegel model as in
Diebold and Li [10]. Assuming an independent and correlated specification
for the factors of the Dynamic Nelson-Siegel model, we estimate the factor
dynamics by maximum likelihood. Additionally, estimation of the factors is
done by Kalman filtering. We derive a closed-form distribution for future fac-
tors, forecast them and present the in-sample and out-of-sample forecasts. As
we find ourselves at a time in which interest rates in general are at an all-time
low, associated forward curves will also reflect these low interest rates. We
are, therefore, challenged to find a model in which the reader is able to use
his or her conjecture of future yields as an input. Consequently, we present
the main finding of the thesis, namely a stochastic model for the predicted
yield curve, when a future yield with certain maturity is given.
1
Bonds are traded by well-informed institutions in highly liquid and transparent markets.
Therefore, arbitrage opportunities are unlikely to exist (Christensen, Diebold and Rude-
busch [4]). This makes the arbitrage-free assumption in ATSMs hold powerful allure.
2
Definitions
The zero-coupon bond
2
Definitions
2.1
The zero-coupon bond
The relation between interest rates and varying maturities is generally re-
ferred to as the term structure of interest rates. We shall briefly discuss three
main concepts that arise frequently in term structure modelling. We assume
continuously compounded rates.
Define P
t
(T ) as the price of a zero-coupon bond (ZCB) at time t, which
expires at time T > t. Assume that the price of the bond at maturity equals
unity and let
T t denote the time to maturity. We define y
t
(T ) as the
nominal yield of a bond at time t with expiration at T .
The price of a ZCB contract and its yield are related by:
P
t
(T ) = e
y
t
(T )
,
(2.1)
referred to as the discount curve
2
. Reformulation gives us:
y
t
(T ) =
ln P
t
(T )
.
(2.2)
For fixed t and varying this expression gives us the yield curve, which shall
be used extensively throughout this thesis. The instantaneous spot rate or
short rate is the limit:
r
t
= lim
T
!t
y
t
(T )
= lim
T
!t
ln P
t
(T )
= lim
T
!t
1
P
t
(T )
@ ln P
t
(T )
@T
=
1
P
t
(T )
@ ln P
t
(t)
@T
,
(2.3)
where the third equality follows from l'H^
opital's rule.
Forward prices are obtained using no-arbitrage arguments. For three points
in time t < T < T + and some random number c we have
P
t
(T + ) = c
· P
t
(T ).
By replication and excluding arbitrage, it follows that c must be the forward
price of a bond contract starting at time T and maturing at T + . Hence,
we rewrite the former to obtain a general expression for forward prices:
P
t
(T, T + ) =
P
t
(T + )
P
t
(T )
.
A forward rate is defined as the future yield on a bond. In line with earlier
notation, we define f
t
(T, T + ) to be the forward rate at time t of a forward
contract starting at time T and maturing at time T + . Then the unique
2
As the yield of a ZCB represents the continuously compounded rate of return we have
the relation P
t
(T )e
y
t
(T )
= 1, from which (2.1) immediately follows.
3
Definitions
continuously compounded forward rate is:
f
t
(T, T + ) =
1
ln
P
t
(T )
P
t
(T + )
.
(2.4)
We obtain the instantaneous forward rate by taking the limit of to zero:
f
t
(T ) = lim
!0
f
t
(T + ) =
@ ln P
t
(T )
T
.
The yield of a bond contract and its forward rate are, therefore, related by:
y
t
(T ) =
1
Z
T
t
f
t
(u) du.
We have come up with expressions for the discount curve (2.1), yield curve
(2.2) and the forward curve (2.4). Knowing any one of these curves, we can
construct any expression for the term structure of interest rates.
2.2
General notation
Vectors and matrices are denoted with lower- and uppercase bold symbols,
respectively. For natural numbers m and n, let A
2 R
m
n
with
{A}
ij
being
the (i, j)
th
entry of matrix A. Furthermore, let
|A|, A
1
and A
T
denote
the determinant, the inverse and the transpose of A, respectively. For the
sake of readability, if the dimensions of a matrix are clear from its definition,
we shall not specify the dimensions explicitly. For random variable X we
use
E
X
[X] and
V
X
[X] to denote the expected value and variance of X. If it
unambiguous over which distribution the operators are used, we leave out
the subscript. Let ^
denote the parameter estimate, for some parameter .
For two time series X
t
and Y
t
(t = t
1
, . . . , t
N
) let ¯
x and ¯
y denote their
sample mean. Then, (X
t
, Y
t
) =
XY
/
X Y
is the sample correlation, with
sample covariance
XY
=
1
N
1
P
N
i=1
(x
t
i
¯
x)(y
t
i
¯
y) and sample variance
X
=
1
N
1
P
N
i=1
(x
t
i
¯
x)
2
and
Y
=
1
N
1
P
N
i=1
(y
t
i
¯
y)
2
. Any other operation
will be presented to the reader when needed. Dates are represented in the
format DD-MM-YYYY.
2.3
Root Mean Squared Error (RMSE)
In this thesis, we use the root mean squared error (RMSE) as the measure of
error. On one hand, we treat the RM SE as a time-series:
RM SE
t
v
uut1M
M
X
i=1
y
t
(
i
)
y
K
t
(
i
)
2
(2.5)
for t = t
1
, . . . , t
N
, y
t
(
i
) the observed yield at time t for maturity
i
and y
K
t
(
i
)
the yield under model K (e.g. Ordinary Least Squares, Kalman Filter, etc).
On the other hand, we also work with the RM SE per maturity level:
RM SE
k
v
uut1N
N
X
i=1
y
t
i
(
k
)
y
K
t
i
(
k
)
2
(2.6)
for k = 1, . . . , M , y
t
i
(
k
) the observed yield at time t
i
for fixed maturity
k
and y
K
t
i
(
k
) the yield under model K at t
i
.
4
Data
3
Data
The data used for this thesis are zero-coupon German Bund yields from source
Bloomberg. We have end-of-day daily data spanning the time period:
{t
1
, . . . , t
N
} = {02-01-1998,...,09-03-2016}
3
.
The aim is to create yield curves in order to model them and to forecast later
on. Therefore we collect yields at maturity levels of
1
4
,
1
2
, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 15, 20 and 30 years, leaving us with 15 data points per time t. For some
time points, there are yields missing from the yield curve. Summary statistics
of the yield data are given in Table 3.1 and a graphical representation is given
in Figure 1, postponing its discussion until the next section
4
. Throughout the
thesis the range of t is left out when it is used as a subscript in panel data
context. Hereafter, assume t
2 {t
1
,
··· ,t
N
} unless specified dierently.
Maturity
N
i
Mean
St. Dev.
3M
3,465
1.362
1.474
6M
3,469
1.414
1.469
1Y
4,735
2.121
1.661
2Y
4,733
2.259
1.644
3Y
4,740
2.417
1.656
4Y
4,742
2.606
1.642
5Y
4,740
2.757
1.585
6Y
4,740
2.921
1.576
7Y
4,742
3.067
1.549
8Y
4,741
3.195
1.511
9Y
4,739
3.294
1.459
10Y
4,739
3.367
1.407
15Y
2,757
3.104
1.443
20Y
4,287
3.721
1.281
30Y
4,738
3.970
1.313
Table 3.1:
Summary statistics of German Bund yields in percentage points (pp) corre-
sponding to dierent maturity levels. N
i
is the number of yield osbervations per maturity
level, with i = 1, . . . , 15. The third column corresponds to the sample mean of the maturity
level and the fourth column to its sample standard deviation.
Figure 1:
Three-dimensional plot of German Bund yield data. TTM is the time to
maturity of the bond. The yield is given in percentage points.
3
We define N
max{N
1
,
· · · , N
15
} = 4742
4
Note the smaller amount of observations for yields of bonds with 15Y maturities (N
13
=
2, 757). Data of these government bonds were recorded from 19-02-2001 onwards.
5
Term structure of interest rates
Introduction
4
Term structure of interest rates
4.1
Introduction
A term structure of interest rates or yield curve relates bonds with dierent
maturities to their yields. These curves come in dierent shapes: normal,
inverted, steep, flat and humped. Figure 2 serves as an exemplification of
a normal yield curve. Dierent economic scenarios correspond to dierent
shapes of yield curves, see Cwik [7]. Naturally, we would expect yields to
increase with maturity. However, when the economy experiences a transition
from one economic state to another, varying shapes in the yield curve emerge
such as the more flattened curve in Figure 3. The economics behind the shape
of the yield curve is discussed briefly in the upcoming section.
4.0
4.5
5.0
5.5
6.0
TTM (years)
yield (pp)
0Y
5Y
10Y
15Y
20Y
25Y
30Y
Figure 2:
German Bund yield
curve on 02-01-1998; an example of
a normal yield curve.
The yield
in percentage points (pp) is plotted
against the time to maturity (TTM)
in years.
4.1
4.2
4.3
4.4
4.5
TTM (years)
yield (pp)
0Y
5Y
10Y
15Y
20Y
25Y
30Y
Figure 3:
German Bund yield
curve on 03-08-2007; an example of
a flat yield curve. The yield in pp is
plotted against the time to maturity
(TTM) in years.
4.2
Economic and historical remarks
To gain a more refined understanding of the dynamics of yield curves, we in-
spect their shape and overall level in Figure 1. Generally, a downward trend
in the level of yield curves is observed, regardless of their shape. During the
dot-com bubble in the late 1990's (DeLong and Magin [9]) and the subprime
crisis in 2007 (Gerardi e.a. [18]) these curves tend to be flat, implying small
spreads between short and long rates which in turn reflects a sign of weak
economical prospects (Smets and Tsatsaronis [30]). At the cost of not readily
identifying shapes of curves, we turn to Figure 4 for a time series of yields.
The gradual decrease in interest rates from 2001 was caused by central banks
implementing loose monetary policies after the bust of the dot-com bubble
(Govetto and Walcher [20]). In late 2005, this was followed by a tight mone-
tary policy of the European Central Bank (ECB), which resulted in a sudden
surge of short rates by 2007 (Stark [31]). The bankruptcy of Lehman Broth-
ers led to the global crisis in September 2008. Reaction of the ECB followed
swiftly by cutting the policy rate to 1%, where the abrupt fall of the overnight
rate is readily spotted in Figure 4. The impact of the decreasing policy rates
5
since the credit crunch is visible throughout the rest of the time series: from
2012 yields of bonds with short maturities fell below zero, where from 2015,
bonds with longer maturities also followed.
5
Published on : https://www.ecb.europa.eu/stats/monetary/rates/html/index.en.html
6
Model
Dynamic Nelson-Siegel model
0
1
2
3
4
5
6
date (years)
yield (pp)
2000
2005
2010
2015
Figure 4: Plot of yields; dashed line corresponds to null level.
5
Model
5.1
Introduction
Nelson and Siegel [27] introduced a parsimonious three-factor model for zero-
coupon bond yields, referred to as the Nelson-Siegel (NS) model. From its
introduction the model has found its extensive use in both academic research
and at financial institutions
6
. Diebold and Li [10], hereafter referred to as
DL, extends the NS model to a dynamic framework, where the three factors
are assumed to be time-variant. It should be noted that the model is refor-
mulated in DL [10] in terms of factorization, to lower the coherence between
the model's components (Koopman, Mallee and van der Wel [24]). In addi-
tion, the reformulation allows for straightforward interpretation of the latent
factors as level, slope and curvature. The assumption of time-varying factors
will be adopted by us and hence we will use the Dynamic Nelson-Siegel (DNS)
model throughout this thesis.
5.2
Dynamic Nelson-Siegel model
We introduce the DNS model formulated in DL [10]:
y
t
(
i
) =
1t
+
2t
1 e
i
i
+
3t
1 e
i
i
e
i
,
(5.1)
where y
t
represents the yield at time t,
i
denotes the time to maturity of
bond i (i = 1, . . . , M ),
is a fixed parameter that represents the exponential
decay rate and
1t
,
2t
and
3t
are referred to as the latent factors of the
model. In our case the number of maturity levels M = 15.
6
E.g., the Bank for International Settlements (2005) noted that central banks of countries
such as Germany, France and Spain use the NS model to model yield curves. See:
http://www.bis.org/publ/bppdf/bispap25.pdf.
7
Model
Factors
5.3
Factors
In this section we follow the formalism outlined in DL [10]. The load of
1t
is equal to unity. A change in
1t
's value changes the value of yields at all
maturities in equal measure. Naturally, we refer to
1t
as the level factor.
The importance of
1t
increases with
i
. This becomes more apparent when
we take lim
i
!1
y
t
(
i
) = y
t
(
1) =
1t
. It is , therefore, also natural to refer
to the latter as the long-term factor. Practitioners refer to this result as the
long rate which is usually compared to the short rate, defined in (2.3).
2t
has a factor loading of (1
e
i
)/
i
. For increasing
i
, the load has a
rapid monotonic decay to zero. Therefore,
2t
is called the short-term factor.
Since it changes the slope of the yield curve, we may also refer to it as the
slope factor.
The factor loading of
3t
is (1
e
i
)/
i
e
i
. This function starts at
zero, i.e. it cannot be short term and decreases with
i
to zero in the limit,
implying that it cannot be long-term. Naturally, we refer to this factor as
the medium-term factor. An increase in
3t
will also increase curvature of
the yield curve, which leads to its alternative definition, namely the curvature
factor. A graphical summary of the factor loadings is given in Figure 5.
0.0
0.2
0.4
0.6
0.8
1.0
increasing maturities [arbitrary scale]
loading
1t
loading
2t
loading
3t
loading
Figure 5:
An example of factor loadings of the DNS model; For some fixed , the factor
loadings are plotted versus varying maturities.
5.3.1
Empirical counterparts
DL [10] defines expressions for the empirical level, slope and curvature which
we shall refer to as the empirical counterparts of the factors. Using equation
(5.1), we define these at time t as
l
t
y
t
(30) =
1t
+ 0.036(
2t
+
3t
)
s
t
y
t
(30)
y
t
(1) =
0.617
2t
0.220
3t
c
t
2y
t
(3)
y
t
(1)
y
t
(30) =
0.012
2t
+ 0.259
3t
.
These results are used in Section 6.2.4.
8
Estimation
5.4
Choice of
The loading parameter
determines the position of the hump in the yield
curve. In particular, it represents the decay in the curve, where large values
of
produce a fast decay and small values produce slow decay. Let
t
=
[
1t
2t
3t
]
T
with
it
as in (5.1) for i = 1, 2, 3. It is tempting to treat
as a
free parameter,
t
, and estimate it with the rest of
t
as a non-linear problem,
which results in superior fits, see Christensen [5]. However, treating
t
as a
free parameter can lead to collinearity in
t
, as for many values of
t
the
factor loadings are highly correlated
7
. Gilli, Große and Schumann [19] raises
this issue and illustrates it graphically. Furthermore, estimating
for each t
tends to cause overfitting, which in turn leads to poor forecasts (Bilger and
Manning [1]). For simplicity, we restrict ourselves to a fixed . Note that
controls the position in maturity time where loading on the curvature factor
attains its maximum. Naturally, if we choose to fix
, we should choose a
value such that the curvature is maximized on some given term. For instance,
fix
to the value 0.0609 for a term of 30 months (DL [10]) or fix
to the
value 0.0770 for a term of 23.3 months (Diebold, Rudebusch and Aruoba [12]).
In this thesis we set the value of
to 0.9240. This is the value of
which
maximizes curvature at 23.3 months transformed into 0.0770
· 12 = 0.9240
for our setup, where the maturities are expressed in years instead of months.
Additionally, it turns out that this choice of
results in only mildly correlated
t
.
6
Estimation
6.1
Introduction
In order to forecast future term structures of interest rates, DL [10] uses an
approach in which
is fixed and
t
are estimated by Ordinary Least Squares
(OLS). This estimation procedure is carried out for dierent values of t, i.e. for
various cross-sections of yields, thereby obtaining a time series of estimates
for
t
. Assuming a process for the time series, future values of
t
can be
predicted and by that follows prediction of future yield curves. This procedure
of forecasting is called a two-step approach in Christensen [5]. Christensen [5],
Diebold, Rudebusch and Aruoba [12], Koopman, Mallee and Van der Wel [24],
Levant and Ma [26] and Yu and Zivot [35] extend the framework to a so called
one-step approach, in which a state-space representation of the DNS model
is introduced and estimation of the parameters is done by the Kalman Filter
(KF) based on maximum likelihood. This thesis comprises both the one- and
two-step approach.
7
Correlation in this context is used interchangeably with sample correlation.
9
Estimation
OLS
6.2
OLS
We proceed with OLS estimation of
t
which minimizes the sum of squared
error, i.e.
^
t
= arg min
t
M
X
i=1
y
t
(
i
)
y
DN S
t
(
i
)
2
,
(6.1)
with y
t
(
i
) being the observed yield for bond with maturity
i
and y
DN S
t
(
i
)
is the yield estimated by the DNS model.
6.2.1
Single time point: OLS example
For every t we obtain a triplet
t
, where each triplet represents a curve. We
illustrate for two dierent time points the OLS fit on the data. Figure 6 il-
lustrates a superior fit, as opposed to Figure 7, which shows a poor fit (c.f.
RM SE
t
in the respective figure captions)
8
.
4.0
4.5
5.0
5.5
6.0
TTM (years)
yield (pp)
0Y
5Y
10Y
15Y
20Y
25Y
30Y
German Bund Yield
NS Fit (lambda=0.924)
Figure 6:
German Bund yield
curve on 02-01-1998 with superior
NS fit; Take K = OLS in (2.5),
then RM SE
t
= 5.5 basis points
(bp)
3.0
3.5
4.0
4.5
TTM (years)
yield (pp)
0Y
5Y
10Y
15Y
20Y
25Y
30Y
German Bund Yield
NS Fit (lambda=0.924)
Figure 7:
German Bund yield
curve on 26-09-2008 with poor NS
fit; Take K = OLS in (2.5), then
RM SE
t
= 23.4 bp
We shall now turn to all yield curves in the time dimension, where we dis-
cuss an outlier issue that arises and present various estimation results after
resolving this issue.
8
Data plotted in Figure 7 has an additional 3 data points as opposed to data plotted in
Figure 6:
{3M, 6M, 15Y}.
10
Estimation
OLS
6.2.2
Time series of factors: outlier problem
Turning to all yield curves we obtain a time series of
t
, where each triplet of
factors at t satisfies (6.1) for German Bund yields at t, see Figure 8. As the
N = 4742 obtained triplets of
t
each represent a fit of (5.1) we also consider
a plot of the corresponding RM SE
t
over time, c.f. Figure 9.
-
8
-
6
-
4
-
2
0
2
4
6
date (years)
^
it
2000
2005
2010
2015
^
1t
^
2t
^
3t
Figure 8:
Time series of OLS esti-
mates of the
t
triplets.
2000
2005
2010
2015
0
10
20
30
40
50
60
date (years)
rmse (bp)
Figure 9:
RM SE
t
in basis points
for t = t
1
, . . . , t
N
and K = OLS, c.f.
(2.5).
A closer look at Figure 9 reveals the existence of outliers in our data. These
seem to be caused by large dierences in yields of bonds with 4Y and 5Y
maturities, which the DNS curve is not able to capture
9
. Also, observe the
spike in Figure 8 that is caused by this anomaly. At first this was thought
to be caused by error in data entry by Bloomberg. This idea was justified by
exclusion of arbitrage: as bond markets with the mentioned maturities are
highly liquid, it would be unusual for this large dierence in yields to occur.
However as there are two consecutive business days that show this occurance
there is no obvious reason to simply exclude these outliers from the dataset.
On the other hand, these outliers may aect estimation results in an undesired
manner. To avoid problems with convergence during Kalman filtering later
on, we simply choose a more recent starting date for our data collection. This
is carried out in the next section.
6.2.3
Time series of factors: data exclusion
To resolve the outlier problem raised in the last section we redefine the time
period in which we collect the data:
{t
1
, . . . , t
N
} {03-01-2000,...,09-03-2016},
(6.2)
i.e. discarding the first 521 observations
10
. This leaves us with N
4742
521 = 4221 yield curves.
9
On the 13
th
and 14
th
of January in 1999
10
Observations in this context indicate yield curves
11
Estimation
OLS
6.2.4
Time series of factors: OLS results
Having resolved the outlier issue by exclusion of data, we turn to the OLS
results in Table 6.1 and Figure 10.
Maturity
y
y
OLS
RM SE
k
3M
1.362
2.024
0.173
6M
1.414
1.894
0.052
1Y
1.972
1.803
0.208
2Y
2.103
1.956
0.171
3Y
2.251
2.236
0.055
4Y
2.441
2.499
0.091
5Y
2.600
2.714
0.133
6Y
2.758
2.883
0.156
7Y
2.905
3.013
0.142
8Y
3.036
3.115
0.108
9Y
3.139
3.197
0.078
10Y
3.223
3.262
0.060
15Y
3.104
3.461
0.132
20Y
3.549
3.561
0.211
30Y
3.805
3.660
0.182
Table 6.1:
OLS results in percentage points. ¯
y and ¯
y
OLS
represent the sample mean of
the observation and the OLS estimated yield for specified maturity level. We use RM SE
k
defined in (2.6) for k = 1, . . . , M and K = OLS.
2000
2005
2010
2015
5
10
15
20
25
date (years)
rmse (bp)
Figure 10:
RM SE
t
in basis points for t = t
1
, . . . , t
N
and K = OLS, c.f. (2.5).
According to Figure 11(a) the estimated level factor appears to be down-
ward sloping, which agrees with the fact that interest rates have decreased
drastically since the 2008 financial crisis, c.f. Section 4.2. The estimated
factors tend to move in a similar manner as their empirical counterparts:
( ^
1t
, ^l
t
) = 0.995, ( ^
2t
, ^
s
t
) = 0.883 and ( ^
3t
, ^
c
t
) = 0.949, similar results as
in DL [10].
12
Estimation
OLS
2000
2005
2010
2015
1
2
3
4
5
6
level
emp. level
^
1t
2000
2005
2010
2015
0
1
2
3
4
factor v
alue
slope
emp. slope
-
^
2t
2000
2005
2010
2015
-
3.0
-
2.0
-
1.0
0.0
year
curvature
emp. curvature
0.3
^
3t
Figure 11:
(a), (b) and (c). Plot of estimated DNS level, slope and curvature factors
versus their empirical counterparts (c.f. Section 5.3.1). Rescaling of estimated factors
according to DL [10].
Turning to the time series of the factors, the sample correlation matrix of
{
^
1t
, ^
2t
, ^
3t
} in Table 6.2 (a) indicates mild correlation between the factors
over time. Furthermore, we stress that the post-2008 crisis environment has
(a)
Factors
1t
2t
3t
1t
1.000
2t
-0.020 1.000
3t
0.304 0.426 1.000
(b)
Factors
1t
2t
3t
1t
1.000
2t
-0.939 1.000
3t
-0.132 0.139 1.000
Table 6.2:
Sample correlation matrix of
t
(a) for t = t
1
, . . . , t
N
and (b) from 26-09-2008
onwards.
led to more heavily negatively correlated level and slope factors, identified
by Figure 8 and also supported by the post-2008 sample correlation matrix,
Table 6.2 (b). This may contradict with our assumption of the factors being
mutually independent when we turn to the factor dynamics, discussed in more
detail in the next section.
13
Estimation
Factor Dynamics
6.3
Factor Dynamics
Introduction
For the purpose of using the Kalman Filter later on we shall refrain from
continuous time notation and stick to discrete time notation when defining
the factor dynamics. In line with Koopman, Mallee and van der Wel [24]
we assume a demeaned vector autoregressive process of order 1, abbreviated
VAR(1) process, for the factors. In this section we define the independent
VAR(1) process with uncorrelated factors, the unrestricted VAR(1) process
where we relax the assumption of uncorrelatedness of the factors and the log-
likelihood function associated with the assumed model. In Christensen [5],
the independent and correlated processes for the DNS model are referred to
as iDNS and cDNS, respectively. We follow this style of notation and define
these accordingly.
6.3.1
iDNS
We introduce the VAR(1) process with uncorrelated factors:
t
=
t 1
+ (I
)µ +
t
(6.3)
t
=
24
1t
2t
3t
35, =24
11
0
0
0
22
0
0
0
33
35,µ=24µ
1
µ
2
µ
3
35,
t
=
24
1t
2t
3t
35,
where I is a 3
3 identity matrix and µ is the unconditional mean of
t
.
Furthermore,
t
is assumed to have Gaussian error structure with mean
E[
t
] = 0 = [0
0
0]
T
(6.4)
and variance-covariance matrix
V[
t
] =
=
24
2
1
0
0
0
2
2
0
0
0
2
3
35.
(6.5)
To ensures positive semi-definiteness of
during the estimation procedure,
we use the decomposition
= qq
T
(6.6)
with diagonal matrix
q =
24q
11
0
0
0
q
22
0
0
0
q
33
35.
(6.7)
In the case of uncorrelated factors we estimate nine parameters.
14
Estimation
Factor Dynamics
6.3.2
cDNS
In order to allow for correlated factors in (6.3) we relax the assumption of the
non-diagonal entries of
and
to be null, in particular,
{ }
ij
6= 0 and {
}
ij
6= 0
(6.8)
for all i, j = 1, 2, 3 and i
6= j. Cholesky decomposition ensures positive semi-
definiteness and symmetry of
(Krishnamoorthy [25]) and is shown to be
the most e cient decomposition in the estimation procedure, see Jung [23].
Therefore, we use the Cholesky decomposition
= qq
T
(6.9)
with lower-triangular matrix
q =
24q
11
0
0
q
21
q
22
0
q
31
q
32
q
33
35.
(6.10)
In the case of correlated factors we estimate twelve parameters. We caution
that unrestricted processes (i.e. processes allowing for correlated factors) gen-
erally result in poor forecasts even in the existence of cross-variable interaction
(DL [10]). The increase in the number of parameters tend to cause in-sample
overfitting (Bilger and Manning [1]). The choice for a model specification is,
however, postponed until the forecast results are presented.
6.3.3
Estimation based on maximum likelihood
As in Christensen [5], Diebold, Rudebusch and Aruoba [12] and Koopman,
Mallee and Van der Wel [24], we assume
t
N(0,
)
(6.11)
which leads to the following distribution for
t
,
f
t
|t 1
(
t
) = (2)
3
2
1
2
exp
1
2
T
t
1
t
11
.
(6.12)
For N observations we have the conditional multivariate normal
log-likelihood function
l
t
|t 1
(
, , µ;
t
) =
3N
2
log(2)
N
2
log
(6.13)
1
2
N
X
t=1
t
t 1
(I
)µ
T
1
t
t 1
(I
)µ
After substitution of
= qq
T
the maximization problem becomes
max
,µ,q
l
t
|t 1
.
(6.14)
11
Performing the Shapiro-Wilk test indicates non-normal
t
(p-value < 2.2
· 10
16
). Even
though the test is downward biased by sample size (Field [15]) we emphasize that this
result may raise concern for the assumption of normality for the error terms. However,
due to existing literature on normally distributed error terms for yields we prefer to
stick to this assumption. Having raised this issue we give space to other researches to
investigate upon this subject.
15
Estimation
ML estimates
The parameters found through maximization of the loglikelihood function are
referred to as the maximum likelihood (ML) estimates.
6.4
ML estimates
We use the Nelder-Mead algorithm to maximize the log-likelihood function
(6.13). The interested reader is referred to Gao and Han [17] for an introduc-
tion to the Nelder-Mead algorithm. N
= 2608 yield curves are used in the
in-sample to estimate the parameters, with
t
2 {t
1
, . . . , t
N
} = {03-01-2000,··· ,01-01-2010},
(6.15)
i.e. ten years of daily yield data.
6.4.1
Estimates of iDNS model
The ML estimates of the iDNS are
^ =
240.9994 0.0000 0.0000
0.0000 0.9982 0.0000
0.0000 0.0000 0.99003
5,
(6.16)
^
=
240.0019 0.0000 0.0000
0.0000 0.0057 0.0000
0.0000 0.0000 0.03853
5,
(6.17)
^
µ =
242.74131.6266
3.27853
5.
(6.18)
Reparametrisation of
and
ensure stationarity of the process (Koopman,
Mallee and Van der Wel [24])
12
.
6.4.2
Estimates of cDNS model
The ML estimates of the cDNS are
^ =
240.9998 0.0024 0.0010
0.0008 0.9927 0.0060
0.0022
0.0039 0.98653
5,
(6.19)
^
=
240.0019 0.0018 0.0003
0.0018
0.0061
0.0101
0.0003
0.0101
0.0429 3
5,
(6.20)
^
µ =
242.55371.9733
3.21013
5.
(6.21)
12
Augmented Dickey-Fuller (ADF) tests suggest the factors may be non-stationary. ADF
test with lag of order 4 gives p-values 0.3629, 0.3804, 0.0892 for level, slope and curvature,
respectively. We use rule of thumb introduced by Schwert [29] to set an upperbound for
the order of lag in the ADF test: lag
max
=
(12N/100)
1/4
=(124742/100)
1/4
=
4.884113=4,withcdenotingtheintegerpartofc.
If a process x
t
has a unit root or x
t
I(1), this process can be transformed into a
stationary process by taking first dierences x
t
x
t 1
I(0). Having highlighted the
latter, we emphasize in this work we did not take first dierences.
16
Estimation
State-space representation
6.5
State-space representation
For every point in time
{t
1
, t
2
,
··· ,t
N
} we observe a term structure of interest
rates. Each curve comes with a set of parameters which satisfies a maximiza-
tion problem
13
. This panel data structure is referred to as a Linear Gaussian
Model (Roweis and Ghahramani [28]).
We add measurement error, "
t
, to (5.1) and rewrite the equation in matrix
notation:
y
t
= C
t
+ "
t
(6.22)
for t
2 {t
1
, t
2
,
··· ,t
N
}, with
y
t
=
264y
t
(
1
)
..
.
y
t
(
M
)
375,C=26641
1 e
1
1
1 e
1
1
e
1
..
.
..
.
..
.
1
1 e
M
M
1 e
M
M
e
M
3775,
t
=
24
1t
2t
3t
35.
Furthermore,
"
t
=
"
1t
··· "
M t
T
(6.23)
is Gaussian white noise with mean vector
E["
t
] = 0 = [0
0
··· 0]
T
(6.24)
and variance-covariance matrix
V["
t
] =
"
= diag[
2
"
1
,
··· ,
2
"
M
].
(6.25)
In section 6.3 we defined the factor dynamics. Assume the iDNS model and
let its elements be defined as in section 6.3.1, then our state vector is
t
=
t 1
+ (I
)µ +
t
.
(6.26)
Combining (6.22) and (6.26) gives the desired state-space representation.
13
For fixed
over time we emphasize that the estimation problem becomes linear.
17
Estimation
6.6
Kalman Filter
Since
t
is latent and only y
t
is observed at time t measurement error is
brought out. In our context such measurement errors may stem from bid-ask
spreads or errors in data entry, see Bolder [2]. The problem becomes filtering
out the desired true signal in the time series. Such a filtering problem can
be resolved by implementation of the Kalman Filter. This paper serves as an
application of the KF. For an introduction to the KF we direct the reader to
Hamilton [21] or Welch & Bishop [34].
Measurement and transition equation, respectively:
y
t
= C
t
+ "
t
, "
t
N(0,
"
)
(6.27)
t
=
t 1
+ (I
)µ +
t
,
t
N(0,
)
(6.28)
Familiarizing the reader with the filtering process, let b
t
|·
be the mean square
linear estimator of latent
t
and
t
|·
its mean square error matrix (Koopman,
Mallee and Van der Wel [24]). Define forecasting error and its variance-
covariance matrix,
v
t
= y
t
Cb
t
|t 1
(6.29)
V[v
t
] = F
t
= C
t
|t 1
C
T
+
"
.
(6.30)
The update step is
b
t
|t
= b
t
|t 1
+
t
|t 1
C
T
F
1
t
v
t
(6.31)
t
|t
=
t
|t 1
t
|t 1
C
T
F
1
t
C
t
|t 1
.
(6.32)
The predictive step is
b
t+1
|t
=
b
t
|t
+ (I
)µ
(6.33)
t+1
|t
=
t
|t
T
+
.
(6.34)
We use the initial values b
1
|0
=
^
1,0
^
2,0
^
3,0
T
where ^
i,0
is the OLS
estimate of
i,0
, i = 1, 2, 3 and
1
|0
= 0 where 0 is a 3
3 matrix with entries
equal to zero.
6.7
Kalman Filter log-likelihood function
For N observations and M maturity levels we have the conditional multivari-
ate normal log-likelihood function
l
t
|t 1
,
"
, , µ;
t
=
N M
2
log(2)
1
2
N
X
t=1
log F
t
(6.35)
1
2
N
X
t=1
v
T
t
F
1
t
v
t
Parameter estimates are found through maximization of (6.35). Assuming
the iDNS specification we estimate a total of 24 parameters. The results are
presented in the upcoming section.
18
Estimation
Kalman Filter estimation results
6.8
Kalman Filter estimation results
Estimation by the KF is done to introduce the reader to its procedure and to
form a solid comparison to the OLS results presented in Section 6.2.4. These
results should agree with each other. In this paper forecasting of the factors
is done by assuming specifications (6.3) and (6.8) for the factor dynamics and
using their ML estimates to predict, i.e. we do not forecast with the Kalman
Filter. The KF results are presented in Table 6.3.
Maturity
y
y
KF
RM SE
k
3M
1.362
2.153
0.239
6M
1.414
1.964
0.036
1Y
1.972
1.791
0.273
2Y
2.103
1.867
0.279
3Y
2.251
2.122
0.153
4Y
2.441
2.38
0.079
5Y
2.600
2.596
0.025
6Y
2.758
2.766
0.018
7Y
2.905
2.899
0.016
8Y
3.036
3.004
0.045
9Y
3.139
3.087
0.090
10Y
3.223
3.155
0.132
15Y
3.104
3.359
0.304
20Y
3.549
3.461
0.372
30Y
3.805
3.563
0.342
Table 6.3:
Kalman Filter restults percentage points. ¯
y and ¯
y
KF
represent the sample
mean of the observation and the KF estimated yield for specified maturity level. We use
RM SE
k
defined in (2.6) for k = 1, . . . , M and K = KF .
The ML estimates of the KF are
^ =
241.0012 0.0000 0.0000
0.0000 0.9997 0.0000
0.0000 0.0000 0.99173
5,
^
=
240.0024 0.0000 0.0000
0.0000 0.0061
0.0000
0.0000 0.0000
0.03993
5,
^
µ =
242.74131.6241
3.28003
5.
Furthermore, ^
"
= diag
0.0039,0.0014,0.0099,0.0023,0.0014,0.0006,
0.0000, 0.0000, 0.0012, 0.0001, 0.0006, 0.0007, 0.0092, 0.0113, 0.0058
.
As
{
^ }
11
> 1 we note that the process for
1t
is suggested to be non-
stationary by the KF. We mentioned earlier that the OLS results in Table 6.1
should be similar to the KF results presented in Table 6.3. For the mid matu-
rity levels (4Y to 9Y) the KF seems to fit the yield curve superior to OLS, c.f.
fourth columns of Tables 6.1 and 6.3. However, as is also seen in the fourth
column of Table 6.3, for the low (3M to 3Y) and high maturity levels (10Y to
30Y) the KF fits the yield curve poorly. There is a possibility that either the
KF is not implemented properly or that due to the initial values for the pa-
rameters the KF finds a local maximum for its log-likelihood function, rather
than a global maximum. Due to time constraints, a thorough investigation of
neither of these possibilities was feasible.
19
Forecasts
7
Forecasts
7.1
Introduction
In this section we derive the distribution of future values of
t
under the
assumed model. We also present the forecasting results for the key maturity
levels. The aim is to forecast yields in an Asset Liability Management (ALM)
context implying that we work with long forecast horizons.
7.2
Distribution of factors under iDNS model
Let h
2 N be the number of days in the future and assume the iDNS model
defined in (6.3). The future values of the factors conditioned on their present
values follow a Gaussian distribution with the following specification,
t+h
|
t
N
h
t
+
h 1
X
k=0
k
I
µ,
h 1
X
k=0
k
(
k
)
T
. (7.1)
For ease of readability the proof is postponed to the Appendix.
7.3
In-sample forecasts
As mentioned in section 6.4 we use
{03-01-2000,··· ,01-01-2010} as the in-
sample period. Starting at 03-01-2000 we forecast 1-year ahead (h=260),
5-years ahead (h=1304) and 10-years ahead (h=2608). The in-sample leaves
us with N
= 2608 yield curves to estimate upon (c.f. Section 6.4). The
initial condition for our forecast is
^
0
=
24^
1,0
^
2,0
^
3,0
35,
(7.2)
where ^
i,0
is the OLS estimate of
i,0
, i = 1, 2, 3. The in-sample results are
presented in Table 7.1 with their discussion given in Section 7.5.
1-year
5-years
10-years
ahead
ahead
ahead
iDNS
cDNS
iDNS
cDNS
iDNS
cDNS
y
K
RM SE
k
y
K
RM SE
k
y
K
RM SE
k
y
K
RM SE
k
y
K
RM SE
k
y
K
RM SE
k
1Y
3.391
1.243
3.818
0.808
2.799
0.650
3.953
1.671
2.287
1.606
4.089
3.295
3Y
4.087
0.430
4.442
0.034
3.364
0.897
4.517
1.916
2.798
1.349
4.643
3.017
5Y
4.587
0.200
4.891
0.399
3.838
0.975
4.948
1.952
3.255
1.085
5.069
2.650
10Y
5.117
0.312
5.366
0.517
4.358
0.877
5.410
1.772
3.761
0.805
5.527
2.145
20Y
5.402
0.399
5.620
0.589
4.638
0.761
5.659
1.588
4.035
0.723
5.774
1.675
30Y
5.497
0.210
5.705
0.360
4.731
0.708
5.742
1.502
4.126
0.723
5.856
1.749
Table 7.1:
In-sample forecast results in percentage points. For the key maturity levels,
¯
y
K
is the sample mean of specification K's estimated yield, K
2 {iDNS, cDNS}. We use
RM SE
k
defined in (2.6) for k
2 {1Y, 3Y, 5Y, 10Y, 20Y, 30Y } and K 2 {iDNS, cDNS}.
Let N
= 2608 be the number of in-sample observations. For the in-sample period we use
t
2 {t
1
, . . . , t
N
} = {03-01-2000, · · · , 01-01-2010}.
20
Forecasts
7.4
Out-of-sample forecast
The out-of-sample period spans
{01-01-2010,··· ,01-01-2015}. Starting at 01-
01-2010 we forecast 1-year ahead (h=260), 3-years ahead (h=780) and 5-years
ahead (h=1302). We take the same initial condition as in (7.2). In this case
it is not possible to produce forecasts for 10-years ahead, since we would
need data until 01-01-2020. We could generate a simulated trajectory for the
factors to obtain a so called pseudo out-of-sample dataset (Stock and Watson
[32]), but this implies making assumptions about the (inter)dependency of
the factors. We prefer to limit our forecast horizon to a maximum of five
years. The out-of-sample forecasts are presented in Table 7.2 and discussed
in Section 7.5.
1-year
3-years
5-years
ahead
ahead
ahead
iDNS
cDNS
iDNS
cDNS
iDNS
cDNS
y
K
RM SE
k
y
K
RM SE
k
y
K
RM SE
k
y
K
RM SE
k
y
K
RM SE
k
y
K
RM SE
k
1Y
1.038
0.521
0.676
0.180
1.298
0.985
0.787
0.575
1.365
1.182
0.830
0.702
3Y
2.037
1.046
1.622
0.620
2.120
1.437
1.524
0.909
2.081
1.623
1.496
1.083
5Y
2.735
1.025
2.296
0.573
2.719
1.502
2.085
0.947
2.632
1.720
2.020
1.167
10Y
3.470
0.772
3.010
0.328
3.356
1.182
2.686
0.643
3.225
1.381
2.586
0.841
20Y
3.863
0.603
3.392
0.232
3.698
0.882
3.009
0.459
3.544
1.015
2.891
0.549
30Y
3.995
0.668
3.519
0.239
3.812
0.949
3.117
0.453
3.650
1.033
2.993
0.520
Table 7.2:
Out-of-sample forecast results in percentage points. For the key maturity levels,
¯
y
K
is the sample mean of specification K's estimated yield, K
2 {iDNS, cDNS}. We use
RM SE
k
defined in (2.6) for k
2 {1Y, 3Y, 5Y, 10Y, 20Y, 30Y } and K 2 {iDNS, cDNS}.
Let N
= 1303 be the number of out-of-sample observations. For the out-of-sample period
we use t
2 {t
1
, . . . , t
N
} = {01-01-2010, · · · , 01-01-2015}.
7.5
Biasedness of forecast results
Table 7.2 verifies that the cDNS model dominates the iDNS model in the
out-of-sample forecast for all forecast horizons. Generally, this would come
as a suprise since the larger number of parameters could cause overfitted
estimates to the data which usually results in poor forecasts. However, in
section 6.2.4 we showed that from 26-09-2008 onwards, the OLS estimates
of the factors become heavily correlated. The conjencture is that the cDNS,
which assumes correlated factors, would in turn perform better in terms of
forecasting. For the in-sample forecasts on the other hand, we conclude from
Table 7.1 that the iDNS produces superior forecasts. Again, this is not strange
since before 2008 the factors were only mildly correlated. A key hypothesis
would state that switching flexibly along models (iDNS and cDNS) depending
on the correlation environment of the factors results in superior forecasts,
i.e. assume the iDNS when there is little to mild correlation between the
factors and assume the cDNS when there is medium to heavy correlation.
However, to prove this empirically we would need an out-of-sample dataset
where the factors show little correlation. Only then could we check whether
the forecasting power of the iDNS is then superior versus the cDNS. Therefore
the statement remains a hypothesis, but still key.
21
The yield curve based on a partial conjecture of future yields
8
The yield curve based on a partial conjecture
of future yields
8.1
Introduction
At t we observe a yield curve:
y
t
=
264y
t
(
1
)
..
.
y
t
(
M
)
375,
(8.1)
with M maturity levels. Let y
t+h
(
i
)
2 R be a prespecified, i.e. given, yield
for maturity
i
(i = 1, . . . , M ), h
2 N days ahead. In this section we derive a
distribution of y
t+h
given y
t+h
(
i
) = y
t+h
(
i
).
8.2
Derivation
Let y
t+h
2 R
M
1
be the yield curve at time t + h and define e
i
2 R
M
1
with
zeros at all indices j and unity only at index i (i, j = 1, . . . , M and i
6= j). In
order to extract the i
th
yield of the yield curve we write e
T
i
y
t+h
= y
t+h
(
i
).
For a given y
t+h
(
i
) we want to find the distribution of:
y
t+h
| {e
T
i
y
t+h
= y
t+h
(
i
)
}.
(8.2)
Let C,
t
and "
t
be as in (6.22) and write I for the M
M identity matrix. For
the mean vector and variance-covariance matrix of the conditional distribution
in (7.1), we write µ
t+h
and
t+h
, respectively. Let
"
be defined as in
(6.25). Then, given the information
t
at time t and given
"
, we find:
I
e
T
i
y
t+h
=
C
e
T
i
C
t+h
+
I
e
T
i
"
t+h
N
C
e
T
i
C
µ
t+h
,
C
e
T
i
C
t+h
C
T
C
T
e
i
+Ie
T
i
"
I e
i
=
N
Cµ
t+h
e
T
i
Cµ
t+h
,
C
t+h
C
T
+
"
C
t+h
C
T
+
"
e
i
e
T
i
C
t+h
C
T
+
"
e
T
i
C
t+h
C
T
+
"
e
i
.
Hence, given
t
and
"
, we find the conditional distribution:
y
t+h
| {e
T
i
y
t+h
= y
t+h
(
i
)
} N µ
y
t+h
,
y
t+h
(8.3)
with
µ
y
t+h
= Cµ
t+h
+
(8.4)
C
t+h
C
T
+
"
e
i
e
T
i
C
t+h
C
T
+
"
e
i
1
y
t+h
(
i
)
e
T
i
Cµ
t+h
and
y
t+h
= C
t+h
C
T
+
"
(8.5)
C
t+h
C
T
+
"
e
i
e
T
i
C
t+h
C
T
+
"
e
i
1
e
T
i
C
t+h
C
T
+
"
.
Note that equations (8.3), (8.4) and (8.5) are the main findings of this thesis.
In the next section we discuss an application of this result.
22
The yield curve based on a partial conjecture of future yields
8.3
In-sample example
Let t = 07-01-2000 and h = 520 days (i.e. two years), then t+h = 07-01-2002.
In order to check how closely the shape of the expected yield curve matches
the shape of the observed yield curve we set y
t+520
(10) = y
t+520
(10), i.e. as
if we observe the ten-year yield now, rather than two years later. In this
example we have M = 12 maturity levels. At t we observe the OLS estimate
^
t
which is used as the estimate for
t
. Furthermore, the OLS estimate ^
"
of
"
estimated over the in-sample, is used in Figure 14. We define
RM SE
t+520
=
vuut1
12
12
X
i=1
y
t+520
(
i
)
y
K
t+520
(
i
)
2
(8.6)
where y
K
t
(
i
) denotes the yield under model K for
i
(i = 1, . . . , 12), where
K stands for the OLS fit, the forecasted curve using the iDNS specification
and the model using a partial conjecture. The following figures show the OLS
fit, the forecasted yield curve using (7.1) and the improved forecasted curve
when y
t+520
(10) is given; all at t + 520 = 07-01-2002.
0
1
2
3
4
5
6
maturity (years)
yield (pp)
0Y
5Y
10Y
15Y
20Y
25Y
30Y
German bund yield
OLS fit (NS model, lambda=0.924)
Figure 12:
DNS OLS fit of the
yield curve versus the observed yield
curve, with RM SE
t+520
= 3.56 bp.
0
1
2
3
4
5
6
maturity (years)
yield (pp)
0Y
5Y
10Y
15Y
20Y
25Y
30Y
German bund yield
Forecasted curve (iDNS)
Figure 13:
Forecasted yield curve
using the iDNS specification ver-
sus the observed yield curve, with
RM SE
t+520
= 35.54 bp.
0
1
2
3
4
5
6
maturity (years)
yield (pp)
0Y
5Y
10Y
15Y
20Y
25Y
30Y
German bund yield
Forecasted curve given partial conjecture
99.5% CI
Figure 14:
Plot of
E
y
t+520
y
t+520
(10) = 4.86 pp
with 99.5% confindence interval
versus the observed yield curve. RM SE
t+520
= 3.81 bp.
23
Concluding remarks
9
Concluding remarks
Assuming the independent specification for the factor dynamics we have de-
rived a distribution for future values of the factors. A crucial test remains to
check whether error terms in the DNS fits and its factor dynamics are Gaus-
sian. These tests seem absent in a wide range of literature on the matter. For
our dataset the Shapiro-Wilk test suggested the idiosyncratic shocks to be
non-Gaussian (however we emphasize this to be the case with our dataset),
which poses the question how robust the findings presented in this paper truly
are when they are generalized to a wider context (U.S. Treasury yields rather
than German Bund yields; Closing prices rather than end-of-day prices; Wider
time-frame rather than data collection starting from 03-01-2000; Monthly data
rather than daily data). For both the independent and correlated specifica-
tion an overview of the forecast performance was presented. The correlated
specification tends to perform better in out-of-sample forecasting. This may
stem from biased forecasting results, where the biasedness is discussed in Sec-
tion 7.5. The main finding of the paper shows that if we know a particular
yield in the future, we are able to forecast the yield curve as a whole. Ar-
guing an illustrative application on the policy side. The ECB might want to
know what happens to the yield curve as a whole when y
t+h
(
i
) is limited to
y
t+h
(
i
). On the other hand, we turn to an institution, e.g. a pension fund.
The pension fund manager wishes to analyze his liabilities in a wide variety of
economic scenario's. In an ALM context this implies that the manager values
the fund's liabilities for changes in the yield curves (usually to ten or fifteen
years ahead). From this perspective computations become substantially more
e cient, since the manager does not have to simulate a wide range of yield
curves as a whole, but only simulate dierent scenario's for his conjecture of
y
t+h
(
i
), say y
t+h
(
i
), i.e a single point. The rest of the yield curve is then
generated by model (8.3). We have found that if the shape of the yield curve
does not change drastically during forecasting procedure, with a decent con-
jecture of future yields, our model produces superior forecasts to the iDNS
and cDNS.
24
10
Appendix
10.1
Distribution of forecasted factors at time t+h given
factors at time t
Here we present the proof for the conditional distribution of forecasted factors
at time t + h given factors at time t, presented in Section 7.2.
PROOF
Assume (6.3) & (6.11) and let h
2 N. Backward induction leads to,
t+h 1
=
h 1
t
+ I +
+
2
+
··· +
h 2
I
µ+
t+h 1
+
t+h 2
+
2
t+h 3
+
··· +
h 2
t+1
=
h 1
t
+
h 2
X
k=0
k
I
µ +
h 2
X
k=0
k
t+h k 1
=
h 1
t
+
h 2
X
k=0
k
I µ+
t+h k 1
.
(10.1)
Conditioning on factors at time t we have,
E
t+h
|
t
=
h
t
+
h 1
X
k=0
k
I
µ
(10.2)
and
V
t+h
|
t
=
h 1
X
k=0
k
(
k
)
T
Hence,
t+h
|
t
N
h
t
+
h 1
X
k=0
k
I
µ,
h 1
X
k=0
k
(
k
)
T
(10.3)
25
11
References
[1]
M. Bilger, W. G. Manning, et al. Measuring overfitting and mispecification in
nonlinear models. Tech. rep. HEDG, c/o Department of Economics, University
of York, 2011.
[2]
D. J. Bolder. "Affine term-structure models: Theory and implementation". In:
Available at SSRN 1082826 (2001).
[3]
R.-R. Chen and L. Scott. "Pricing interest rate options in a two-factor Cox
IngersollRoss model of the term structure". In: Review of Financial Studies
5.4 (1992), pp. 613636.
[4]
J. H. Christensen, F. X. Diebold, and G. D. Rudebusch. "The affine arbitrage-
free class of NelsonSiegel term structure models". In: Journal of Econometrics
164.1 (2011), pp. 420.
[5]
J. M. Christensen. "Nelson Siegel and Affine Processes". In: (2012).
[6]
J. C. Cox, J. E. Ingersoll Jr, and S. A. Ross. "A theory of the term structure of
interest rates". In: Econometrica: Journal of the Econometric Society (1985),
pp. 385407.
[7]
P. F. Cwik. "An Investigation of Inverted Yield Curves and Economic Down-
turns". In: Graduate Faculty of Auburn University (2004).
[8]
Q. Dai and K. J. Singleton. "Expectation puzzles, time-varying risk premia,
and affine models of the term structure". In: Journal of financial Economics
63.3 (2002), pp. 415441.
[9]
J. B. DeLong and K. Magin. A short note on the size of the dot-com bubble.
Tech. rep. National Bureau of Economic Research, 2006.
[10]
F. X. Diebold and C. Li. "Forecasting the term structure of government bond
yields". In: Journal of econometrics 130.2 (2006), pp. 337364.
[11]
F. X. Diebold, M. Piazzesi, and G. Rudebusch. Modeling bond yields in finance
and macroeconomics. Tech. rep. National Bureau of Economic Research, 2005.
[12]
F. X. Diebold, G. D. Rudebusch, and S. B. Aruoba. "The macroeconomy and
the yield curve: a dynamic latent factor approach". In: Journal of econometrics
131.1 (2006), pp. 309338.
[13]
R. DiFazio et al. Approximate cholesky decomposition-based block linear equal-
izer. US Patent App. 11/427,217. June 2006.
[14]
G. R. Duffee. "Term premia and interest rate forecasts in affine models". In:
The Journal of Finance 57.1 (2002), pp. 405443.
[15]
A. Field. Discovering statistics using SPSS. Sage publications, 2009.
[16]
D. Filipovi. "A note on the NelsonSiegel family". In: Mathematical finance
9.4 (1999), pp. 349359.
[17]
F. Gao and L. Han. "Implementing the Nelder-Mead simplex algorithm with
adaptive parameters". In: Computational Optimization and Applications 51.1
(2012), pp. 259277.
[18]
K. Gerardi et al. "Making sense of the subprime crisis". In: Brookings Papers
on Economic Activity 2008.2 (2008), pp. 69159.
[19]
M. Gilli, S. Große, and E. Schumann. "Calibrating the nelson-siegel-svensson
model". In: Available at SSRN 1676747 (2010).
[20]
M. Govetto and T. Walcher. "Analysis and interpretation of the US mone-
tary policy during the dot. com bubble and the subprime crisis". In: Diss.
Copenhagen Business School (2009).
[21]
J. D. Hamilton. Time series analysis. Vol. 2. Princeton university press Prince-
ton, 1994.
[22]
J. Hull and A. White. "Pricing interest-rate-derivative securities". In: Review
of financial studies 3.4 (1990), pp. 573592.
26
[23]
J. H. Jung. Cholesky decomposition and linear programming on a GPU. Tech.
rep. 2006.
[24]
S. J. Koopman, M. I. Mallee, and M. Van der Wel. "Analyzing the term struc-
ture of interest rates using the dynamic NelsonSiegel model with time-varying
parameters". In: Journal of Business & Economic Statistics 28.3 (2010), pp. 329
343.
[25]
A. Krishnamoorthy and D. Menon. "Matrix inversion using Cholesky decom-
position". In: arXiv preprint arXiv:1111.4144 (2011).
[26]
J. Levant and J. Ma. "A Dynamic Nelson-Siegel Yield Curve Model with
Markov Switching". In: (2013).
[27]
C. R. Nelson and A. F. Siegel. "Parsimonious modeling of yield curves". In:
Journal of business (1987), pp. 473489.
[28]
S. Roweis and Z. Ghahramani. "A unifying review of linear Gaussian models".
In: Neural computation 11.2 (1999), pp. 305345.
[29]
G. W. Schwert. "Tests for unit roots: A Monte Carlo investigation". In: Journal
of Business & Economic Statistics 20.1 (2002), pp. 517.
[30]
F. Smets, K. Tsatsaronis, et al. Why does the yield curve predict economic
activity?: Dissecting the evidence for Germany and the United States. Citeseer,
1997.
[31]
J. Stark. Monetary Policy before, during and after the financial crisis. https:
//www.ecb.europa.eu/press/key/date/2009/html/sp091109.en.html.
2009.
[32]
J. H. Stock and M. W. Watson. Phillips curve inflation forecasts. Tech. rep.
National Bureau of Economic Research, 2008.
[33]
O. Vasicek. "An equilibrium characterization of the term structure". In: Jour-
nal of financial economics 5.2 (1977), pp. 177188.
[34]
G. Welch and G. Bishop. "An introduction to the kalman filter". In: Proceed-
ings of the Siggraph Course, Los Angeles (2001).
[35]
W.-C. Yu and E. Zivot. "Forecasting the term structures of Treasury and cor-
porate yields using dynamic Nelson-Siegel models". In: International Journal
of Forecasting 27.2 (2011), pp. 579591.
27