List of Figures.. 5
List of Tables.. 5
List of Abbreviations.. 6
1 Introduction and Overview.. 7
2 Nonparametric regression.. 8
3 Nonparametric quantile estimation based on surrogate models.. 10
3.1 Introduction in Surrogate Models.. 10
3.2 Order Statistics.. 11
3.2.1 Asymptotic distribution of a central order statistic.. 12
3.3 A general error bound.. 14
3.3.1 Theorem.. 15
4 Neural Networks.. 22
4.1 Introduction.. 22
4.2 Biological neural networks.. 22
4.3 Historical Background.. 23
4.3.1 McCulloch-Pitts Model.. 23
4.3.2 Perceptron.. 24
4.4 Elements of an artiﬁcial neural network.. 24
4.5 Deﬁnitions.. 25
4.5.1 Sigmoid function.. 25
4.5.2 Squashing function.. 26
4.5.3 Artiﬁcial neuron.. 27
4.5.4 Feedforward neural network with hidden layers.. 28
4.5.5 A recursive deﬁnition of multilayered feedforward neural networks.. 28
4.6 Approximation characteristics of neural networks.. 29
4.6.1 Idea.. 29
4.6.2 Lemma 3 (An approximation result).. 29
4.6.3 Lemma 4.. 39
5 Implementation.. 48
5.1 The Quantile estimates.. 48
5.2 Test Settings.. 48
5.3 Application on Simulated Data.. 50
5.4 Backpropagation algorithm.. 51
5.4.1 Gradient descent method.. 51
5.4.2 Phases of the Backpropagation algorithm.. 53
5.4.3 Training and Test Phase.. 53
5.4.4 Implementation.. 53
5.4.5 Initialising of the weights.. 54
5.4.6 Structure of the Parameters.. 56
5.4.7 Partial Derivatives.. 56
5.5 Comparison.. 57
5.6 Discussion.. 63
6 Conclusion and Outlook.. 64
Order Statistic Estimate.. 67
Backpropagation Algorithm.. 68
Monte Carlo Estimate.. 73
Application on Simulated Data.. 74
1 Introduction and Overview
The estimation of a speciﬁc quantile of a data population is a widely used statistical task and a comprehensive way to discover the true relationship among variables. It can be classiﬁed as nonparametric regression, where it is one of the standard tasks. The most common selected levels for estimation are the ﬁrst, second and third quartile (25, 50 and 75 percent). The quantile level is given by alpha. A 25 percent quantile for example has 25 percent of the data distribution below the named quantile and 75 percent of the data distribution above it. Sometimes the tail regions of a population characteristic are of interest rather than the core of the distribution. Quantile estimation is applied in many different contexts - ﬁnancial economics, survival analysis and environmental modelling are only a few of them.
This thesis deals with the development of an alpha-quantile estimate based on a surrogate model with the use of artiﬁcial neural networks. Using artiﬁcial neural networks as an estimate is considered a nonparametric approach.
In reference  the following examples of nonparametric quantile estimation can be found:
• A device manufacturer may wish to know what the 10% and 90% quantiles are for some feature of the production process, so as to tailor the process to cover 80% of the devices produced.
• For risk management and regulatory reporting purposes, a bank may need to estimate a lower bound on the changes in the value of its portfolio, which will hold with high probability.
• A pediatrician requires a growth chart for children given their age and perhaps even medical background, to help determine whether medical interventions are required, e.g. while monitoring the progress of a premature infant.
These examples show that there is a wide range of ﬁelds in which quantile estimation could be needed.
In this thesis a function m is considered. This function is to be estimated, where it is assumed that it is costly to evaluate the func- tion. That is why an estimation is required. In this context m could be a deterministic outcome of costly computations for scientiﬁc problems. For a random variable the interest is information about m(X), which is also called a complex system. Ideally one would like to get sufﬁciently enough evaluations of m. But this stands in contrast to its cost. The solution to this costly, time-consuming and uneconomic situation is the construction of a surrogate model. The advantage here is that one can construct a surrogate model from a small sample size. The aim is to get a good approximate mn of m, which is faster and cheaper to evaluate. Then mn can be used to generate sufﬁciently large samples. In our context the error given by using a surrogate model is acceptable since the advantages by using a bigger sample size are greater.
To give a brief overview of the chapters: First of all chapter (2) clariﬁes what nonparametric regression means. After that, simulation models and surrogate models are being described and deﬁned in chapter (3). Furthermore, the order statistic esti- mate and the Monte Carlo estimate are introduced and distinguished. After that, a general error bound is given and proved.
Chapter (4) provides the needed information for artiﬁcial neural networks. They are explained regarding the biological model, the historical background is presented, which is followed by the elements of artiﬁcial neural networks. Artiﬁcial neural networks allow the estimation of possibly nonlinear models. All that gives a base to regard and prove an approximation result.
The implementation of an order statistic estimate and a Monte Carlo surrogate estimate based on neural networks is described in chapter (5). The ﬁnite sample size behaviour of these estimates is examined by applying them on simulated data. The errors of the estimates are compared to each other in boxplots as well as their medians and interquartile ranges and the results are discussed.
Finally a conclusion and a possible outlook is given.
The MATLAB codes for the implementation and some deﬁnitions, which could be useful at some points are listed in the ap- pendix. Also since chapter (5) mainly deals with the results and the MATLAB code for the implementation is included in the appendix, the procedure and how the MATLAB functions work are explained here.