1.2 Objective and structure
2 Basic principles and state of the art
2.1 Genetic Programming
2.1.1 Program Structure
2.1.2 Initialization of the GP Population
2.1.3 The Genetic Operators
2.1.4 Fitness Function
2.1.6 Process of the algorithm
2.1.7 Crossover, building blocks and schemata
2.1.8 Approaches against macromutation
2.1.10 Further approaches for improvement
2.2 Artificial Neural Networks
2.2.1 Components of neural networks
2.2.2 Network topologies
2.2.3 Learning methods
2.3 Trading Systems
2.3.1 Tape Reader
2.3.2 Market timing
2.3.3 Position sizing
2.3.4 Comparison of trading systems
2.3.5 Fundamental versus technical analysis .
2.3.6 The Currency Market
2.3.7 Approaches for the development of trading systems .
3.2 Requirements on the software
3.3 Conception of software
3.3.1 The Evolutionary Algorithm
3.3.2 The fitness function
4.1 Components of the developed software
4.2 Classes of the exchange rate data server . .
4.3 Classes of the Evolutionary Algorithm
4.4 Overview over the framework ECJ
4.5 Problems during experiments
5 Experiment results
5.1 Results with node weights
5.1.1 Results of the training time period .
5.1.2 Results of the validation time period .
5.1.3 Results of the test time period
5.1.4 Results as monthly turnovers
5.1.5 Created trading rules
5.2 Results without node weights
5.2.1 Results of the training period
5.2.2 Results of the validation time periods
5.2.3 Results of the test period
5.2.4 Results as monthly returns
5.2.5 Created trading rules
5.3 Identification and application of optimal f
6 Discussion and evaluation
List of figures
Chapter 1 Introduction
The natural evolution has turned out to be a most successful mechanism for the engenderment and adaptation of creatures to the environment. Without receiving any particular instructions or even precise objective definitions, it has succeeded in finding sophisticated solutions for problems existing in the real world.
Genetic Programming (GP) is an approach for using the creative power within the natural evolution for the automatic development of computer programs (cf. (Koz92, Chapter 1-6)). It is used to try to simulate mechanisms of the natural evolution in order to generate automatic programs solving a given problem. In a series of applications, GP has been used for solving mathemat- ical problems as well as for solving real-world problems successfully. Among them are counted such problems as symbolic regression, (cf. (Koz92, cf. Chap- ter 10)), classification (cf. (Koz92, Chapter 17)), the synthesis of artificial neural networks (cf. (Gru94, Chapter 2 following)), pattern recognition ((Tac93, pages 2 to 10)), robot control (cf. (BNO97, pages 2 to 10)) and the generation of im- ages (cf. (GH97, pages 2 to 7)) are counted among these problems. Automated learning by means of GP can be interpreted as heuristic search al- gorithm finding out of the set of all possible programs those offering the best solution for the given problem. Dependent on the given problem, the search range is very large and oftentimes neither continuous nor differentiable and thus the search range of all possible programs is ill-fitting for classical search algorithms (cf. (LP02, page 2 following)).
In this thesis, the GP is applied in the scope of the generation of trading sys- tems for the financial market, especially for the currency market. In the finan- cial markets, successful speculative dealers use to act on the basis of a certain set of rules. However, these rules are subject to a relatively broad interpreta- tion. After a careful review it becomes evident that, in decisive situations, the dealers deviate from the regulations they believe to determine their action and act according to their "gut feeling" as it were. It is possible that this portion of intuitive action distinguishes an experienced and profitable dealer from an in- experienced and unprofitable dealer, even if both of them believe that they are working on the basis of the same set of regulations. The definition of a trading system by a human being is subject to some difficulties, for he cannot repro- duce all of the rules unequivocally. This is why the transfer of action defining sets of rules to the computer has turned out to be unsuccessful. There is another approach to have the computer learn the rules for action auto- matically. To do so, e.g. Artificial Neural Networks (NN) are applied success- fully (cf. (Ska01, pages 2 to 5), (MK, pages 2 to 7)). However, it is not possible to extract the several rules from the network in a way that allows an easy inter- pretation without difficulties. Users criticize this "black box" property of NN. GP is proposing itself as alternative to NN, for it can generate rules directly and they can be interpreted in a better way despite a certain complexity (cf. (YCK05, page 23 following)). As far as the solution of difficult problems is concerned, both approaches are comparable (cf. (BB98, page 13)).
1.2 Objective and structure
It is the objective of this thesis to apply GP, in order to generate trading systems and to analyze their profitability in the framework of a historical simulation. A software system, that solves this task, is designed and interesting implementation aspects are presented.
To be able to develop trading systems with GP, the software, that is to be devel- oped, has to meet a number of requirements. The development of the trading systems is to be made on the basis of historical time series of exchange rates and prices. The market is supposed to change in the course of time and so trading systems, that have been profitable before, are going to lose their prof- itability. For that reason, the development system of the trading system has to be conceived in a way to allow the generation of new trading systems adapted to the changed market conditions. To guarantee the provision of current price data, the relevant market data have to be collected continuously and provided to the development system. The development of profitable trading systems is supported by preprocessed data of the price data, that are securities dealers known and used, are provided to the system. To enable a visual verification, the preprocessed data as well as the transactions of the trading systems should be represented graphically. An over-optimization at the development of the trading systems is to be avoided by subdividing the available price history in a training, validation and test period. In order to get a trading system for the test period, the best trading systems of the training period are applied to the validation period. The best trading system during the validation is chosen for the trade in the test period. It should be possible to reproduce the development process and it should be transparent by means of log-files of the intermediate results. Trading systems of a too high complexity make users sceptical, as it is difficult to retrace the decisions of the system. Even if more complex trading systems would provide a higher return, a certain degree of traceability of the trading systems is to be targeted. In addition to the limitation of the size of the trading systems, the standard GP is expanded by means of node weights. This is how one tries, on the one hand, to reduce the macromutation by means of the crossover operator and, on the other hand, to simplify the capability of interpretation of the generated trading systems (cf. 3.3.1 on page 41).
Structure of this thesis To begin with, the basic principles and the state of the art of the Genetic Programming and the Artificial Neural Networks will be presented in Chapter 2.1 on page 5. The design and the function of NN will be analyzed, for this approach is widely spread in the context of the development of trading systems. After that the basic principles of technical trading systems will be presented from Chapter 2.3 on page 18 onwards. On the one hand, the technical analysis is presented as an instrument for finding favourable mo- ments for trading, on the other hand, an approach is shown for defining the optimum trade position size. About both of these presented approaches for the automated learning, different authors have reported successful applica- tions. Some of these successful applications are described in Chapter 2.3.7 on page 33.
From Chapter 3 on page 37 onwards, one will find the design of a system for generating, optimizing and testing trading systems by means of GP on the basis of historical price and exchange rate data. After an overview of the system, the requirements will be presented and a concept for the software is developed. The properties of the evolutionary algorithm relevant for the Genetic Programming will be defined. The implementation details are described from Chapter 4 on page 47 onwards. The results of the experiments made with the system are shown from Chapter 5 on page 55 onwards.
After that there is a discussion and evaluation of the results as well as an out- look on possible further developments of the system in Chapter 6 on page 73.
Chapter 2 Basic principles and state of the art
In this Chapter, the basic principles of Evolutionary Algorithms and artificial neural networks are presented and the current state of the art is described. After that the relevant principles of the application area, the technical trading systems, are discussed.
2.1 Genetic Programming
Genetic Programming (GP) is an algorithm out of the family of the evolution- ary algorithms. In the nature, the evolution has turned out to be a very suc- cessful system for the further development and optimization of all creatures. Evolutionary algorithms are simulating the essential and successful features of the natural evolutionary process by means of simple models. In this way, they make it possible to find good solutions with little effort, even if there are prob- lems with a great search range. Data structures and algorithms are generated and optimized by the Evolutionary Algorithms, in order to solve given prob- lems. This introduction follows the studies of Banzhaf et al. (BNKF98, Chapter 1-8) as well as Koza, who has described GP in (Koz92) for the first time. In the following passages there is a brief description of how a program is structured in the GP and how the evolution works. After that existing problems of the procedure as well as the approaches to a solution for them are discussed.
2.1.1 Program Structure
The individuals evolutionarily developed during the GP are programs. These programs are made up of functions and terminals (cf. (BNKF98, pages 109 to 118)).
The selection of the used functions within a genetic program is dependent on the specific application area. An essential requirement for the selection of the functions, provided to the genetic program, is that it should be possible to make up a solution for the application problem out of them. To avoid an unnecessary enlargement of the search range, however, there should not be provided too many functions. According to Banzhaf et al. in (BNKF98, page 111 following) as well, it does not make sense to develop functions that are specially-tailored to the given problem from the very outset of an application. Since GP is very creative as far as the combination of functions is concerned, it may be sufficient to provide simple functions such as the Boolean and arithmetical functions to achieve amazing results.
The arguments of the applied functions are terminals. On the one hand, they are made up of the input data used by the system for training and, on the other hand, of constants changed in the course of the evolution. These constants are called "ephemeral random constants" (ERC) (cf. (Koz92, page 242 following)). The closure of the functions with respect to the terminals is an essential re- quirement for the error-free function of the generated programs. The applied functions must be designed in a way that enables them to process all types of input values: for example, the division often is adjusted in order to keep the program from aborting in case there are divisions by zero.
To determine the sequence of the evaluation of the program functions, the functions and terminals of a program are stored in a corresponding data struc- ture. Most often, tree structures are used to do so. However, also linear struc- tures as well as general graphs are used as organization forms. In the case of tree structures, the usual sequence of the evaluation is from left to right: the furthermost left node furthermost within the tree structure, for which all entries are available, is executed. The most prominent representa- tive of a linear structuring of the functions and terminals is the simulation of a register machine. It disposes of several registers, a linear memory for the al- location of terminals to the registers as well as functions that use the registers for the input/output.
A relatively new variant of the structuring are directional graphs that may include cycles (cf. (BNKF98, page 116 following)). What makes this structure interesting is the fact that loops and recursion are resulting in the course of the evolution and do not have to be provided by special functions beforehand. However, superfluous cycles could become problematic for they are extending the program run unnecessarily.
2.1.2 Initialization of the GP Population
Since the tree structure is the most wide-spread data structure for individuals and is used for the application example as well, only this data structure will be considered in the following.
The methods, introduced by Koza in (Koz92, pages 91 to 94), for the initial- ization of the specific individuals of the population are called "Full", respec- tively "Growth". For the whole population, the maximum size of a single pro- gram is determined by the maximum depth of the specific trees. In the case of the "Full" method for the initialization of an individual, a tree with maximum depth is created, for that all nodes are selected from the set of functions at ran- dom, with the exception being the nodes of maximum depth, that are selected from the set of the terminals at random. At the "Growth" method, the tree is built up from the root, with selecting for every node an element from function, respectively from terminal, at random. If a terminal is selected, the structuring of this branch is finished and the procedure continues at the last node of the branch, that is no terminal. For the nodes of the maximum depth of the tree, the stochastic selection is limited to the terminals.
To achieve the highest possible variety within the population, both structur- ing methods described here are applied in combination. This method is called "Ramped Half-and-Half". At a given maximum depth of N, the population is divided by equal parts into trees with maximum depths of 2,3...N. Each one of these groups is initialized half and half according to the "Growth" and to the "Full" method.
2.1.3 The Genetic Operators
The genetic operators are tools that are available to the algorithm to improve the fitness of the population in the course of time. The most frequently used operators are crossover, mutation and reproduction. The easiest operator is the reproduction, that is selecting an individual with respect to the fitness and is transmitting an identical copy into the next generation. The crossover operator is combining the genetic material of two individuals by exchanging subtrees, thus generating two new individuals. Doing so, there is a higher probability for functions to be selected rather than terminals. Even the generation of just one single descendant is quite usual (cf. (BNKF98, page 241)).
Mutation is applied to a single individual. Quite often this is made subse- quent to the crossover operator. There is only a small probability that a node is selected at random from the individual and the subtree is replaced by new, stochastic nodes, as it has been done at the initialization. In addition to that, there is also a series of other genetic operators of wide-spread use. For exam- ple, Banzhaf et al. (BNKF98, page 241) are listing the following operators for mutation:
- Point Mutation: A single node is exchanged by a stochastic node of the same class.
- Permutation: Die Positionen von Argumenten einer Funktion werden vertauscht.
- Hoist: A new individual is generated from a subtree.
- Expansion Mutation: A terminal is exchanged by a random subtree.
- Collapse Subtree Mutation: a subtree is exchanged by a stochastic termi- nal.
- Subtree Mutation: A subtree is exchanged by a stochastic subtree.
- Gene Duplication: A subtree is exchanged by a stochastic terminal.
As alternatives for the standard crossover operator, Banzhaf is listing the following operators:
- Subtree Exchange Crossover: exchange of two subtrees among individ- uals
- Self Crossover: exchange of subtrees within one individual
- Module Crossover: exchange of modules among individuals
- Context-Preserving Crossover: Exchange of two subtrees of two individ- uals, if their coordinates are corresponding or if at least they are similar.
2.1.4 Fitness Function
Corresponding to the selection taking place in the natural evolution, there is a selection process in the GP on the basis of a fitness value that is allocated to every specific individual of the population. The fitness value of an individ- ual is determined by the evaluation of the individual by means of the fitness function. To do so, in most cases the genetic program of the individual is executed with entries resulting from a training data set and a fitness value is determined on the basis of the output of the program. In order to establish a se- lection pressure similar to the natural evolution for getting better solutions for the given problem, it is determined by means of a selection procedure based on the fitness of the individuals which individuals are reproduced in the next generation. An essential requirement both for the problem to be solved and for the design of the fitness function is the continuity of the fitness function (cf. (BNKF98, page 127)). This means that the improvement of an individual with respect to the problem should correspond to an improvement of the fit- ness value. The continuity of the fitness function is an essential requirement for the iteration of the individuals’ improvement. A very frequent fitness func- tion is the error-fitness-function. It can be applied if an optimum, that is to be reached, is known and the still existing error of the achieved solution for this optimum can be determined. An example for this is the symbolic regression. Hereby a polynomial is given and the GP system shall approximate this func- tion as exactly as possible by combining the provided functions (cf. (Koz92, Chapter 10)). As the deviation from the given polynomial decreases, the fit- ness value of the individual increases.
The selection determines which individuals of a population remain, respec- tively will reproduce and thus will conserve or even proliferate their genetic material. The selection pressure can be controlled with respect to the type and parameter setting of the selection algorithm. A high selection pressure is distributing the properties of the superior individuals within the population rapidly. This may lead to the result that the Evolutionary Algorithm will only find a suboptimal solution, because the existing solution dominates the pop- ulation and better properties will not be able to prevail. A selection pressure that is too small is unnecessarily extending the running of the algorithm. In this case it may occur that good properties are unable to spread within the population and that they are destructed by the genetic operators again.
The fitness-proportional selection For a long time, the fitness-proportional selection has been the most frequently used method for the selection in the field of genetic algorithms, after its introduction by Holland (cf. (Hol75)). With this method, an individual of the population is selected at random. The prob- ability is determined by the relation of the fitness of the individual to the sum of the fitness of all the individuals. Blickle et al. conclude that the fitness- proportional selection is unfit for the following reasons (cf. (BT95, pages 40 to 42)):
- The reproduction rate is proportional to the fitness of an individual. If the fitness values are very close to each other, there is, as it were, just a stochastic selection.
- There is no translation invariance: while the fitness values one, respectively ten, still mean a great difference in the selection probability, this difference disappears for the most part, if both fitness values are increased by relatively high value.
- Despite a high variance at the beginning of the optimization, the selection intensity is too small, sometimes even negative, a fact that may lead to a deterioration of the average fitness of the population.
Truncation Selection The truncation selection, that is also known as (µ, σ)- selection (cf. (Sch95, page 158 following)), uses µ individuals as parents to generate σ new individuals of which the µ individuals serve as parents in the next generation. The absolute fitness values do not play a role at this selection, but the sequence of the individuals on the basis of the fitness values.
Selection by Rank Selection by rank is also based on the sequence of the individuals that is defined on the basis of the fitness values. There is a distinction made between linear and exponential ranking. At the linear ranking, the probability for an individual to be selected is depending on its rank within the population linearly way. At the exponential ranking, the probability of an individual to be selected is increasing exponentially.
Tournament Selection At the tournament selection, a group of individuals, selected at random out of the population, is competing. The size of this group is called the tournament size. The individual with the greatest fitness value of the chosen group is selected and used for reproduction. The descendant is replacing the inferior individuals of the tournament group in the population. The tournament size determines the selection pressure, with a tournament size of two corresponding to a minor selection pressure. In the GP, a tournament size of seven has established itself as default size.
2.1.6 Process of the algorithm
After the presentation of the several components of the evolutionary algo- rithm, it follows a description of the progress of the algorithm. There is a distinction made between the generational GP and the steady state GP. In the generational GP, at a certain moment a population is creating a pop- ulation that is transmitted to population of the next generation by the genetic operators:
1. Initialization of the population
2. Evaluation of the population’s individuals and allocation of a fitness value for every individual
3. Application of selection and genetic operators, until the population of the next generation is complete
4. Unless the abort criterium of the algorithm is met, continue with item 2
In the steady state algorithm, there is no distinction made between genera- tions. Out of a randomly selected group of individuals of the population the best ones are chosen by means of tournament selection. The genetic operators are applied to them and the resulting descendants are replacing the losers of the tournament:
1. Initialization of the population
2. Stochastic selection of a group of individuals from the population for the tournament selection
3. Evaluation of the fitness of the selected individuals
4. Application of the genetic operators to the winners of the tournament
5. Replacement of the losers of the tournament by the generated descen- dants of the winners
6. Unless the abort criterium of the algorithm is met, continue with item 2
With comparing generational GP and steady state GP on the basis of a sorting problem, Kinnear concludes that steady state GP is achieving better results (cf. (Kin93a, page 6 following))).
2.1.7 Crossover, building blocks and schemata
Crossover in its various characteristics is the most frequently applied genetic operator in the GP. The probability for the choice of the crossover operator for the generation of the descendants of an individual usually lies in the range of 90% (cf. (Koz92, S. pages 114 to 116)). The strong application of crossover in the GP is supported by the natural evolution in the framework of the biolog- ical sexual reproduction. As a consequence, mutation occurs in the course of the biological reproduction, but only at a small degree. This fact is reflected by the GP by a small probability for mutation (cf. (BNKF98, Chapter 2). The crossover operator is regarded as the reason why GP is more effectively spread than other procedures that are based on mere random changes of the solution candidates. According to Koza’s description in (Koz92, pages 116 to 119), the population of a GP includes "Building Blocks". Good building blocks improve the fitness of the individuals they include. For that reason there is a higher probability that these individuals are selected for the reproduction. This is how a good building block is succeeds in spreading within the popu- lation. With this argumentation, Koza follows the argumentation of Holland, who put up the building block hypothesis for genetic algorithms for the first time (cf. (Hol75)). Because of this spreading of good building blocks it is pos- sible for the GP to create good problem solutions more rapidly than it would be possible for mere mutation-based procedure. Koza demonstrates in (Koz92, Chapter 9), that, with the exception of very simple problems, GP is superior to the random search. Goldberg tries to explain the function of genetic algorithms by means of the building block hypothesis. According to this hypothesis, the crossover operator is combining small good building blocks to form bigger and better building blocks, in order to finally create almost optimal individuals (cf. (Gol89, page 41)). For example, in (Gre93, page 3 following)) Grefenstette criticizes the building block hypothesis as misleading, for it would describe the actual process during the evolutionary development not sufficiently and it could easily happen that wrong conclusions are made.
Many authors give the reason for the better function of GP as compared to a mere random search by means of schemata. Schemata are templates that rep- resent a complete group of similar code sections. A scheme theorem describes assumptions of how these schemata develop from generation to generation, while crossover, mutation and selection are influencing them. The presum- ably most well-known scheme theorem was put up by Holland for genetic algorithms (cf. (Hol75, pages 66 to 88)). In (Koz92, pages 116 to 119), Koza made the first attempt to transfer this scheme theorem to GP. As Langdon de- scribes in (LP02, page 27)), the scheme theorem has been criticized by many authors in the meantime and some of them even classify it as completely use- less. However, Langdon supposes that the problem is the over-interpretation of the scheme theorem and not the theorem itself. There are a number of ap- proaches for the transfer of the scheme theorem to GP. For example, there is an overview and a detailed discussion of the different approaches in (LP02, Chapter 3-6). In (LP02, Chapter 6), Langdon introduces the "macroscopic ex- act schema theorem" and shows that standard crossover schemata of higher rank are made up of schemata of lower rank. However, he admits that the notion "building block" may cause misunderstandings for the GP in as far as it suggests that building blocks are assembled step by step to form better and bigger blocks. According to Langdon, his schema theorem suggests that the selection process of the applied schemata can not be reproduced and is occur- ring randomly. Moreover, there is not necessarily a selection of schemata that are particularly short or fit over average.
2.1.8 Approaches against macromutation
An essential problem of the crossover operator as compared to its model in the biological reproduction is the fact that, on the one hand, it is creating good combinations, but, on the other hand, it is working as macromutation and is destroying created building blocks. In the biological reproduction, much en- ergy is used for protecting already created good building blocks against the harmful influences of the crossover. Biological crossover is homologous, i. e. the crossover is happening almost exclusively on the gene level. It hardly occurs that two genes with completely different purposes are exchanged (cf. (BNKF98, page 156 following)).
There is a number of approaches that try to introduce protection mechanisms against the macromutation in the GP. A naturally occurring phenomenon is the attachment of effect-free code to the building blocks in the course of the simulation. Hereby there is a decreasing probability for a destruction of the building blocks. This process can be observed at the natural model as well. More than 90% of a genome of higher creatures is made up of so-called introns that neither encode genes nor contain control sequences. While the EA is in process, it can be observed that the quantity of effect-free code is increasing exponentially (BNKF98, pages 191 to 193). This phenomenon is called "bloat". Besides the protection function by the introns, bloat has the problem that it blocks the development and the execution time of the generated program is unnecessarily extended.
Banzhaf identifies the macromutation of the crossover operator as big difference as compared to the naturally occurring crossover at the biological sexual reproduction. Almost all natural crossovers are successful, whereas in GP about 75% of them would have to be classified as "deadly" according to biological standards (cf. (BNKF98, page 157)). For that reason, the reduction of the macromutation of the crossover operator in the GP is an important field of research. In the following passages, there is the discussion of some of the approaches that have been examined for the improvement of the crossover operator, respectively for avoiding its negative effects.
Brood Recombination In (Tac94, Chapter 5) by means of "brood combina- tion" Tackett proposes a method that actually does not improve the crossover operator, but reduces negative effects. In nature, there are some species that have by large more descendants than the survival of the species would require. Some descendants die soon after the birth because of several mechanisms, for example, while fighting each other for nourishment. Hereby it is guaranteed that the parents invest the energy needed for the upbringing only in the fittest descendants.
Tackett transfers this approach to GP. Instead of procreating only one or two descendants, a whole series of descendants is generated. The fitness values are calculated for this group called "brood" and the best one or two descen- dants are selected. The rest of the brood cancelled. In order to counteract to the increased calculation requirements of this method, Tackett proposes not to make the calculation for the brood on the basis of the complete training data, but only with a small part. He argues that it is legitimate to do so, because the goal is sorting out of the unfit individuals and not actually finding the best individual of the brood (cf. (Tac94, page 87)). Tackett reports that the expected improvement as compared to the GP without brood selection has actually occurred at the executed experiments.
Apart from approaches for avoiding the harmful effects of crossover, there are numerous attempts to make the crossover operator "more intelligent".
Strong Context-Preserving Crossover In (D’h94), D’haeseleer develops the "strong context-preserving crossover" (CPC) operator, that only allows crossover between nodes that are exactly at the same position of two par- ents. D’haeseleer reports to have achieved improvements by mixing the nor- mal crossover and SCPC. The application of SCPC alone would cause prob- lems, because in that case the variety within the population would suffer. D’haeseleer thinks that a certain degree of macromutation is required (cf. (D’h94, page 2)).
Explicit Multiple Gene Systems In (Alt95), Altenberg presents a system where fitness components are influenced by a subset of available genes. Dur- ing the evolution it is tried periodically to add a new gene. However, this gene is integrated only if it leads to a fitness increase. During the evolution of the population, genes are exchanged between the individuals by means of crossover. The "constructing selection" with subsequent exchange of genes by crossover proposed by Altenberg (cf. (Alt95, page 39)) shows a strong sim- ilarity with the homologous biological crossover. Instead of permitting the destruction of the majority of the good building blocks by the macromutation of the crossover, in this model, the crossover is limited to genes consisting of one or several building blocks. Since there is no risk for the building blocks to be destroyed, it is supposed that bloat is no problem at this approach.
Explicitly Defined Introns What belongs to the interesting measures against the macro-manipulation by the crossover operator is the approach "explicitly defined introns" (EDI). An integer value is stored between two nodes of a tree without any effect on calculation of the tree. The crossover operator is modi- fied; so the probability of a crossover incident between two nodes is depending on the the value of the EDI between them. Just as the rest of the individual, the EDIs are developed evolutionarily In the course of the evolution. Hereby building blocks are forming, that are protected against a crossover by their linking EDIs. In (NFB95, page 16), Nordin et al. are making a series of tests concluding in the assumption that EDIs have an important role in finding indi- viduals of high fitness. As far as fitness, generalization and processing time are concerned, EDIs have achieved better results than individuals in tests without EDIs.
GP 1-Point Crossover Operator In (PL98, page 19), Poli and Langdon pro- pose a 1-point crossover operator that considers structural similarities in the parent trees for the selection of an appropriate crossover point. Their experiments indicate point out that hereby the destructive influences of the crossover can be reduced in the course of the evolution.
In many program languages there is the possibility to combine parts of the program and to use these modules at another place. Modularization is used for the abstraction, the division of the problems into subordinate problems and the retrieval of the program code. But in the context of the problem with the macromutation by means of the crossover operator, modularization has an interesting role. Some approaches are using specific crossover operators that consider the modules and thus prevent the macromutation. Different authors have examined modularization in the GP. A choice of different techniques is presented in the following passages.
Encapsulation The idea of encapsulation is described in (Koz92, pages 110 to 112). Hereby an asexual genetic operator is considered for selecting a function node of an individual and replacing it by a terminal node that includes the subtree of the selected function node. Now the generated terminal can also be applied in other individuals. The encapsulated subtree can no longer be improved or destroyed by the crossover operator. In case the replaced subtree contains a useful code, this could be an advantage.
Module Acquisition Angeline and Pollack propose a technique called "mod- ule acquisition" (cf. (AP92, pages 2 to 4)) for generating reusable modules. To do so, a subtree for a determined depth is defined as module in a selected indi- vidual. The parts of the subtree under the module are considered as arguments of the module. Apart from the exclusive application within the individual, the module can be provided to the complete population by means of a function library. As long as one individual is using a module at least, it will remain in the library, otherwise it will be deleted. Exactly as being done at the encap- sulation, a defined module is not developed further at the module acquisition and it is protected against the macromutation.
Automatically Defined Functions In (Koz92, page 534 following), Koza gives a description of "automatically defined functions" and a very detailed one in (Koz94, Chapter 4-16). In order to use ADFs in the GP, the program trees are divided into two parts: a branch selected for the calculation of the fitness and a branch containing the function definitions of the ADFs. Both branches take part in the evolution. Hereby the crossover operator has to con- sider specific characteristics of the branches and the relation between them. The definition of an ADF in the function definition branch of the program tree is made up of three parts: the function name, a list of arguments as well as a branch including the body of the function set. The function name of the ADF becomes a part of the function set of the branch, which is generating results. The variables defined in the argument list become a part of the terminal set of their ADF function body. The evolution exclusively is taking place in the func- tion body of the ADF and in the result-generating branch. Only this branch as well as the function body of the ADFs is generated randomly. Before the evolution can start, the number of the names of the ADF as well as the number and names of the function arguments has to be defined for every ADF.
λ Abstraction In (Yu99a, Chapter 6,7), Yu presents an approach to use λ ab- stractions, based on the λ calculus, in the GP. Hereby every λ abstraction is considered as an independent module within the GP tree and protected as a unit in the course of the evolutionary process. These modules are develop- ing in a way similar to the rest of the GP tree. Crossover is only permitted between modules that have the same number and type of the input and out- put. Yu documents the success of this approach, e. g. in (Yu99b). One reason for this success could be found in the homologous crossover that conserves structure and function, in the context with the λ abstraction.
2.1.10 Further approaches for improvement
Among others, further approaches for the extension of the standard GP are the following ones:
Loops and Recursion Although loops and recursion have an enormous im- portance in the manual programming, their application is up against difficul- ties in the GP. Long loops, especially endless loops, can cause the abort of the evolution of a program. In (Kin93b, page 5), Kinnear describes a possible approach in the framework of the evolutionary development of a sorting al- gorithm: the applied loop can only receive the start and end index of a finitely great list as parameter.
Strongly Typed Genetic Programming The extension of the traditional GP to the strongly typed GP (STGP) is introducing types for terminals and functions into the genetic programming (Mon93, page 7). This extension is particularly useful, because it reduces the search range of the evolutionary algorithm considerably. Functions are allocated to specific input and output types and can only combined with terminals of the appropriate type.
Cultural GP The Cultural GP, described by Spector and Luke (SL96), is using an indexed memory with access for all individuals of the population over all generations. The memory is initialized at the start of the evolutionary devel- opment and is provided to the population as data storage and communication medium between the generations afterwards. The authors come to the con- clusion that the processing effort for the considered problems is reduced as compared to the standard GP by means of the shared memory.
2.2 Artificial Neural Networks
In the context of the decision-making for financial transactions, there is a wide- spread use of artificial neural networks (NN). Their application ranges from trend prognostics systems, as described for example for the gold market in (MK, pages 2 to 7), over point prognostics systems, as proposed for example in (NM02, pages 501 to 511) for the currency market, to the development of automated trading systems, as described in (Ska01, pages 2 to 8). Because of the wide-spread use and the numerous successful applications, the principles of the NN are presented in the following passages in short.
Artificial neural networks are motivated by the natural brains. Hereby it is not the biologically correct simulation of a brain that is in the foreground, but the simulation of the mathematically relevant elements on an abstract level. The neurons, the links between the neurons as well as the method that rules the training of the network are all essential parts that have to be defined for a simu- lation of a NN. These components and frequently applied network topologies are presented in the following. These explanations follow the description of Zell (Zel94, Chapter 5 to 8).
2.2.1 Components of neural networks Neurons The neurons are characterized by:
- the activation status, that indicates the current degree of the activation of an neuron.
- the threshold value, that is included in the activation function for the calculation of the next activation status. In many models it indicates the threshold of the strong activation of a neuron. In biological neurons this is the equivalent threshold that must be reached to make the neuron fire.
- the activation function, that is calculating a new activation status on the basis of the current activation status, the input by other neurons as well as the threshold of the neuron.
- the activation function, that is calculating an output value on the basis of the current activation of the neuron.
Connection Network Due to the abstraction from the natural example, in the NN there is no simulation of the axons, dendrites and synapses found in the human brain, but the connection between pairs of neurons is reduced to one value, the connection weight. On the whole, the connection network often is represented by a matrix of connection weights. The strength of the connection between neurons is characterized by the value of the connection weight be- tween two neurons. Depending on the model, the weight may be in different value ranges. Apart from positive values of the weights, the value 0 stands for a not existing connection and negative values are used for retardant connec- tions that reduce the activation of the following neuron.
2.2. ARTIFICIAL NEURAL NETWORKS 17
During the simulation of a NN, the output values of all neurons that are con- nected to the considered neuron, is provided as input for it with the connec- tion weights. Most often, only the connection weights are changed during the learning phase. The acquired knowledge is distributed to the network of the connection weights and it is not possible without problems to deduce single statements. Some authors, like e. g. Gruau in (Gru94, page 8, page 22), state that even components of the neuron, such as the activation function or the threshold value, should be alterable.
2.2.2 Network topologies
Very often NN are structured by in layers, with an input, and output and at least one intermediate layer. The communication with the environment is made by means of the input and output layer, whereas the information processing is taken place in the internal layers. At the topological organization of NN, the essential point is the distinction between networks with and without feedback. Feedback means that there are neurons having a path through the network that is leading back to the neuron.
2.2.3 Learning methods
After having determined a specific network topology for a given problem, it is typical that the connection weights are initialized with random values. In this initial situation, the NN will find a satisfying solution only in rare cases. Thus it is necessary to change the connection weights for achieving a better solution of the problem. This operation is called training. There are different learning methods for performing the training. In the scientific literature, there are many different approaches for training an NN for the given task. The theoretically changeable components of a NN are the connection weights, threshold value, activation function, output function as well as the adding or deleting of neu- rons. Among the most comprehensive approaches including all these possible changes, there is e. g. the one of Gruau in (Gru94, Chapter 2-3). There geneti- cally developed programs are encoding the organization and the components of the NN. If such a program is applied to an initial neuron, the NN is created by the execution of the instructions of the program by and by. To the orders be- long instructions for the division of a neuron into two neurons, the deletion of neurons, creating and deleting of connections between neurons, the changing of connection weights and some more operations.
Types of Learning Many applications are limited to exclusively training the connection weights of a given network topology. Hereby a distinction is made between different types of learning according to the provided training data:
- During the supervised learning, input and output data are presented to the NN at the same time and the weights are changed by the learning method in a way to enable the network to make the association of these data sets automatically. The goal is the right classification of slightly changed input data, thus achieving a generalization by the NN.
- At the reinforcement learning, only input data are provided to the NN and after the evaluation by the NN, the right or wrong classification by the network is announced.
- The unsupervised learning is characterized by the fact that the learning is made by self-organization. The chosen learning method ensures the allo- cation to one class for similar input patterns that are represented by one or more neurons. At this type of learning, statistical properties of the in- put pattern are extracted and, on this basis, generalizing classes are built. Interesting works from this field are, among others, Learning-Vector- Quantisation (LVQ) or self-organized cards from Kohonen (cf. (Koh89, Chapter 5)) as well as the growing cell structures (Fri92), respectively the growing neural gas (Fri95) from Fritzke.
Hebb’s Learning Rule The principle of many learning rules was described in its first form by Donald O. Hebb in 1949 already and he gave his name to Hebb’s learning rule: if a neuron receives an input by another neuron and if both of them are strongly activated at the same time, then the weight between the these neurons is increasing (cf. (Zel94, page 84)). The size of the weight change is called learning rate. It is either given as constant or is resulting from a function, which considers the fact that at the start of a learning operation a high learning rate is an advantage, however, the size of the changes should decrease, as the end of the training is reached. Hereby it is possible to bring about a convergence of the NN into an optimal status.
Backpropagation Rule A wide-spread learning rule for supervised or sup- ported learning is the backpropagation rule (cf. (Zel94, pages 99 to 114)). It is a generalization of the so-called delta-learning rule that can only be applied to single-layer networks with linear activation functions. Networks with linear functions as activation function can only calculate linear functions. Multi-layer networks are more powerful than single-layer networks. Usually the back- propagation procedure, respectively one of its further developments, is ap- plied to them, for it is able to train networks with semi-linear activation func- tions and several layers. At the backpropagation procedure, after the alloca- tion of input and output data to the network, the prevailing error between the given output and the actual output is determined and sent backwards through the network with the connection weights, with changing the weights of the connections respectively depending on the learning rate in a way to reduce the error.
2.3 Trading Systems
After the description of the principles of neural networks and evolutionary algorithms, the following passages will offer an introduction to the application area. At first there is a discussion of trading systems in general, then there is a view on how mechanical systems are structured in particular. After that, there is a discussion about the question concerning the determination of position sizes of a trading system, followed by a description of different measures for the comparison of trading systems.
In the scientific literature, there are different opinions on the characteristics of a trading system. In the framework of this thesis, a trading system shall be defined as a set of rules that determines the circumstances that have a user of a trading system initiate specific actions on the capital market. If the user of the system has made a decision in favour of one trading system, the decision finding process is taken from him and he just has to execute the proposals of the trading system. If the user himself has to check the rules, then this is called a manual trading system, respectively he is called a discretionary trader. Apart from trading systems that offer more or less concrete proposals for trading, there can be found prognostics systems that try to predict, for example, the stock exchange rate at a certain moment in the future.
Yet the user has to decide himself if and how he is going to act on the basis of this prognostic. The execution of the proposals of the trading system by the human being is subject to errors and takes a lot of time. For that reason, the emergence of computers with higher performance has led to a propagation of the trading by means of automated trading systems, the "mechanical trading systems", that do not require any operations by the user. The user of an auto- mated trading system, the so-called "systematic trader", just has to assume a superior supervising function. According to Vince, the application of a com- pletely automated trading system is even absolutely necessary, because other- wise it is difficult to determine whether the optimal position size is traded (cf. (Vin90, page 41, page 116)). In order to verify whether a mechanical trading system is functioning, a so-called back-testing is made. Thereby historical data are presented to the trading system and an evaluation is made on the basis of the decisions of the trading system to find out whether the real operation is possible.
At the development of a trading system for a considered security, the ques- tion has to be answered, which size the position, that is held of this security, should have at any moment. In principle, the trading system is representing a function that offers, for example, positive values for a corresponding big posi- tion, negative values for a corresponding empty sold quantity of the security or zero for no position in this security. The problem is subdivided into sev- eral parts. Beside the selection of an appropriate security, there is the "market timing", that indicates when a position shall be accepted or deleted and the "position sizing", that indicates the size of the position to be accepted. Many authors merely try to optimize the market timing and leave the at least equally important position sizing almost completely aside (cf. (Tha99, page 6 follow- ing)). Doing so, as Vince is pointing out, it may easily happen that considerable losses are made, respectively hardly any returns are gained, because of a bad position sizing while having a good market timing. On the other hand, it is sufficient to have a market timing with a slightly positive expectation value, in order to enable considerable gains in the long run, if the position sizing is performed correctly. For that reason, the selection of the stock to be traded will not be considered in the following passages.
2.3.1 Tape Reader
In his book (Wyc01, Chapter 4 to 12), published for the first time in 1919 and with a new edition in 2001, Wyckoff describes his method to make money as Tape Reader at the New York Stock Exchange. At that time, the actual ex- change rates and put/pull orders were transmitted to the dealers by means of so-called "tickers" and printed out on a tape. From this usage the designa- tion tick as expression for a single rate has remained until today. Tape readers made their trading decisions exclusively on the basis of the exchange rates that entered on their tape constantly and at times they have been very successful with it. As Lefevre reports in (LeF23, Chapter 2), the at that time very well- known speculator Livermore also belonged to the successful users of the tape reading.
According to (Wyc01, page 9), messages that were relevant for the exchange rates, could be read on the tape of the ticker minutes, hours and days before appearing in the press. Wyckoff’ explanation for this is the action of better in- formed investors, before a message was given to the public. The necessary re- quirement was that the tape reader knew how to interpret the data of the tape correctly. On the basis of a series of quotations, the tape readers recognized a leading trend and followed it. To do so, some experience was necessary as well as the permanent observation of the market, i. e. a multitude of different papers of stocks at the same time.
It cannot be said clearly which processes exactly are going on in the brains of a tape reader and lead up to put or pull decisions. It is hard to extract rules from a neural network without simulating the whole system, for the information is distributed over the complete network. Even if the tape reader tried to repre- sent the guiding rules by himself with all integrity, it is easy to figure out that someone who makes his decisions by the rules alone may decide in a different way as the tape reader himself.
Unlike the tape readers, who had to perform the complete information pro- cessing of the stock exchange rates by head, today’s traders can dispose of computer-based preprocessing of the market data. There is a wide-spread use of different forms of rate/time diagrams, called charts, that allow the recogni- tion of trends or resistance and support ranges of a price movement. Very often one finds an integration of different time series that are deduced from the quo- tation data, so-called technical indicators, in these charts. Apart from these time series deduced from the rates investors are using a series of additional data, used for recognizing the current market opinion in any way whatsoever. In the following chapters, only the technical indicators shall be considered that are deduced from the technical indicators directly. The charts are a graphic tool and, as the tape readers have shown, the additional market data are not necessarily required, for the essential information is already included in the exchange rates.
2.3.2 Market timing
Market timing stands for the finding a decision, whether a position in a stock or paper is accepted or deleted at a certain moment. In addition to that, it is possible to determine targeted quotations for taking profit (Take Profit) or for limiting losses (Stoploss) by means of different indicators. Take profit or stop loss are often given along with the order to open the position. The tape readers could only dispose of the mere information of the prices that were the basis of their decisions. However, traders have started very early with using charts as a simple illustration of the price development. Based on the resistances, supports, trends and different formations of the price develop- ment that could be recognized in the charts, Dow has developed the so-called Dow theory that is discussed, for example, in (Mur99, pages 23 to 33). This kind of the (chart) technical analysis of a stock or paper is still in use today. In addition to that, many technical indicators have been developed, that are meant to permit a support or even a direct deduction of trading signals.
Technical Indicators Technical indicators are a preprocessing of the mere price information. Each indicator is based on a particular view and assump- tions about the market that were determined by the developer of the indicator. An overview of a multitude of various indicators can be found, for example, in (Mue93), respectively (Flo00). As for the technical indicators, Floret makes a distinction between trend followers, momentum oscillators, trend determina- tion indicators as well as volatility indicators and for these classes he describes the historical development and differences between the indicators. The com- mon point of all indicators is the fact that they are representing certain statisti- cal properties of the historical price data. The authors of the indicators allocate instructions to specific constellations of the indicator progress. The idea is that the participants’ behaviour on the market is similar in comparison to similar situations and that there is a collection of these patterns by means of the in- dicator constellation. Apart from profitable signals of an indicator, there are times when the indicator is generating default signals. For example, this could occur when a market passes from a trend to a lateral movement. This is why a permanent check of the applied indicators is indispensable.
In the following passages, the function of a technical indicator is shown on the basis of the structure of moving averages and the trend follower MACD, built up of these averages, is described as example.