Contents
1 Introduction 1
1.1 Motivation 1
1.2 Objective and structure 2
2 Basic principles and state of the art 5
2.1 Genetic Programming 5
2.1.1 Program Structure 5
2.1.2 Initialization of the GP Population 6
2.1.3 The Genetic Operators 7
2.1.4 Fitness Function 8
2.1.5 Selection 8
2.1.6 Process of the algorithm 10
2.1.7 Crossover, building blocks and schemata 10
2.1.8 Approaches against macromutation 12
2.1.9 Modularization 14
2.1.10 Further approaches for improvement 15
2.2 Artificial Neural Networks 16
2.2.1 Components of neural networks 16
2.2.2 Network topologies 17
2.2.3 Learning methods 17
2.3 Trading Systems 18
2.3.1 Tape Reader 20
2.3.2 Market timing 21
2.3.3 Position sizing 22
2.3.4 Comparison of trading systems 26
2.3.5 Fundamental versus technical analysis 29
2.3.6 The Currency Market 32
2.3.7 Approaches for the development of trading systems 33
i
3 Draft 37
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 Requirements on the software . . . . . . . . . . . . . . . . . . . . 37 3.3 Conception of software . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Implementation 47
4.1 Components of the developed software . . . . . . . . . . . . . . 47 4.2 Classes of the exchange rate data server . . . . . . . . . . . . . . 48 4.3 Classes of the Evolutionary Algorithm . . . . . . . . . . . . . . . 49 4.4 Overview over the framework ECJ . . . . . . . . . . . . . . . . . 49 4.5 Problems during experiments . . . . . . . . . . . . . . . . . . . . 53
5 Experiment results 55
5.1 Results with node weights . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Results without node weights . . . . . . . . . . . . . . . . . . . . 62
5.3 Identification and application of optimal f . . . . . . . . . . . . . 65
6 Discussion and evaluation 73
6.1 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7 Summary 77
List of figures 82
Bibliography 88
Glossary 89
Chapter 1
Introduction
1.1 Motivation
The natural evolution has turned out to be a most successful mechanism for the engenderment and adaptation of creatures to the environment. Without receiving any particular instructions or even precise objective definitions, it has succeeded in finding sophisticated solutions for problems existing in the real world.
Genetic Programming (GP) is an approach for using the creative power within the natural evolution for the automatic development of computer programs (cf. (Koz92, Chapter 1-6)). It is used to try to simulate mechanisms of the natural evolution in order to generate automatic programs solving a given problem. In a series of applications, GP has been used for solving mathematical problems as well as for solving real-world problems successfully. Among them are counted such problems as symbolic regression, (cf. (Koz92, cf. Chapter 10)), classification (cf. (Koz92, Chapter 17)), the synthesis of artificial neural networks (cf. (Gru94, Chapter 2 following)), pattern recognition ((Tac93, pages 2 to 10)), robot control (cf. (BNO97, pages 2 to 10)) and the generation of images (cf. (GH97, pages 2 to 7)) are counted among these problems. Automated learning by means of GP can be interpreted as heuristic search al-gorithm finding out of the set of all possible programs those offering the best solution for the given problem. Dependent on the given problem, the search range is very large and oftentimes neither continuous nor differentiable and thus the search range of all possible programs is ill-fitting for classical search algorithms (cf. (LP02, page 2 following)).
In this thesis, the GP is applied in the scope of the generation of trading systems for the financial market, especially for the currency market. In the financial markets, successful speculative dealers use to act on the basis of a certain set of rules. However, these rules are subject to a relatively broad interpretation. After a careful review it becomes evident that, in decisive situations, the dealers deviate from the regulations they believe to determine their action and act according to their "gut feeling" as it were. It is possible that this portion of intuitive action distinguishes an experienced and profitable dealer from an inexperienced and unprofitable dealer, even if both of them believe that they are
1
CHAPTER 1. INTRODUCTION 2
working on the basis of the same set of regulations. The definition of a trading system by a human being is subject to some difficulties, for he cannot reproduce all of the rules unequivocally. This is why the transfer of action defining sets of rules to the computer has turned out to be unsuccessful. There is another approach to have the computer learn the rules for action automatically. To do so, e.g. Artificial Neural Networks (NN) are applied successfully (cf. (Ska01, pages 2 to 5), (MK, pages 2 to 7)). However, it is not possible to extract the several rules from the network in a way that allows an easy interpretation without difficulties. Users criticize this "black box" property of NN. GP is proposing itself as alternative to NN, for it can generate rules directly and they can be interpreted in a better way despite a certain complexity (cf. (YCK05, page 23 following)). As far as the solution of difficult problems is concerned, both approaches are comparable (cf. (BB98, page 13)).
1.2 Objective and structure
It is the objective of this thesis to apply GP, in order to generate trading systems and to analyze their profitability in the framework of a historical simulation. A software system, that solves this task, is designed and interesting implementation aspects are presented.
To be able to develop trading systems with GP, the software, that is to be developed, has to meet a number of requirements. The development of the trading systems is to be made on the basis of historical time series of exchange rates and prices. The market is supposed to change in the course of time and so trading systems, that have been profitable before, are going to lose their profitability. For that reason, the development system of the trading system has to be conceived in a way to allow the generation of new trading systems adapted to the changed market conditions. To guarantee the provision of current price data, the relevant market data have to be collected continuously and provided to the development system. The development of profitable trading systems is supported by preprocessed data of the price data, that are securities dealers known and used, are provided to the system. To enable a visual verification, the preprocessed data as well as the transactions of the trading systems should be represented graphically. An over-optimization at the development of the trading systems is to be avoided by subdividing the available price history in a training, validation and test period. In order to get a trading system for the test period, the best trading systems of the training period are applied to the validation period. The best trading system during the validation is chosen for the trade in the test period. It should be possible to reproduce the development process and it should be transparent by means of log-files of the intermediate results. Trading systems of a too high complexity make users sceptical, as it is difficult to retrace the decisions of the system. Even if more complex trading systems would provide a higher return, a certain degree of traceability of the trading systems is to be targeted. In addition to the limitation of the size of the trading systems, the standard GP is expanded by means of node weights. This is how one tries, on the one hand, to reduce the macromutation by means
1.2. OBJECTIVE AND STRUCTURE 3
of the crossover operator and, on the other hand, to simplify the capability of interpretation of the generated trading systems (cf. 3.3.1 on page 41).
Structure of this thesis To begin with, the basic principles and the state of the art of the Genetic Programming and the Artificial Neural Networks will be presented in Chapter 2.1 on page 5. The design and the function of NN will be analyzed, for this approach is widely spread in the context of the development of trading systems. After that the basic principles of technical trading systems will be presented from Chapter 2.3 on page 18 onwards. On the one hand, the technical analysis is presented as an instrument for finding favourable moments for trading, on the other hand, an approach is shown for defining the optimum trade position size. About both of these presented approaches for the automated learning, different authors have reported successful applications. Some of these successful applications are described in Chapter 2.3.7 on page 33.
From Chapter 3 on page 37 onwards, one will find the design of a system for generating, optimizing and testing trading systems by means of GP on the basis of historical price and exchange rate data. After an overview of the system, the requirements will be presented and a concept for the software is developed. The properties of the evolutionary algorithm relevant for the Genetic Programming will be defined. The implementation details are described from Chapter 4 on page 47 onwards. The results of the experiments made with the system are shown from Chapter 5 on page 55 onwards.
After that there is a discussion and evaluation of the results as well as an out- look on possible further developments of the system in Chapter 6 on page 73.
CHAPTER 1. INTRODUCTION 4
Chapter 2
Basic principles and state of the
art
In this Chapter, the basic principles of Evolutionary Algorithms and artificial neural networks are presented and the current state of the art is described. After that the relevant principles of the application area, the technical trading systems, are discussed.
2.1 Genetic Programming
Genetic Programming (GP) is an algorithm out of the family of the evolutionary algorithms. In the nature, the evolution has turned out to be a very successful system for the further development and optimization of all creatures. Evolutionary algorithms are simulating the essential and successful features of the natural evolutionary process by means of simple models. In this way, they make it possible to find good solutions with little effort, even if there are problems with a great search range. Data structures and algorithms are generated and optimized by the Evolutionary Algorithms, in order to solve given problems. This introduction follows the studies of Banzhaf et al. (BNKF98, Chapter 1-8) as well as Koza, who has described GP in (Koz92) for the first time. In the following passages there is a brief description of how a program is structured in the GP and how the evolution works. After that existing problems of the procedure as well as the approaches to a solution for them are discussed.
2.1.1 Program Structure
The individuals evolutionarily developed during the GP are programs. These programs are made up of functions and terminals (cf. (BNKF98, pages 109 to 118)).
The selection of the used functions within a genetic program is dependent on the specific application area. An essential requirement for the selection of the functions, provided to the genetic program, is that it should be possible to make up a solution for the application problem out of them. To avoid an
5
CHAPTER 2. BASIC PRINCIPLES AND STATE OF THE ART 6
unnecessary enlargement of the search range, however, there should not be provided too many functions. According to Banzhaf et al. in (BNKF98, page 111 following) as well, it does not make sense to develop functions that are specially-tailored to the given problem from the very outset of an application. Since GP is very creative as far as the combination of functions is concerned, it may be sufficient to provide simple functions such as the Boolean and arithmetical functions to achieve amazing results.
The arguments of the applied functions are terminals. On the one hand, they are made up of the input data used by the system for training and, on the other hand, of constants changed in the course of the evolution. These constants are called "ephemeral random constants" (ERC) (cf. (Koz92, page 242 following)). The closure of the functions with respect to the terminals is an essential requirement for the error-free function of the generated programs. The applied functions must be designed in a way that enables them to process all types of input values: for example, the division often is adjusted in order to keep the program from aborting in case there are divisions by zero. To determine the sequence of the evaluation of the program functions, the functions and terminals of a program are stored in a corresponding data structure. Most often, tree structures are used to do so. However, also linear structures as well as general graphs are used as organization forms. In the case of tree structures, the usual sequence of the evaluation is from left to right: the furthermost left node furthermost within the tree structure, for which all entries are available, is executed. The most prominent representative of a linear structuring of the functions and terminals is the simulation of a register machine. It disposes of several registers, a linear memory for the allocation of terminals to the registers as well as functions that use the registers for the input/output.
A relatively new variant of the structuring are directional graphs that may include cycles (cf. (BNKF98, page 116 following)). What makes this structure interesting is the fact that loops and recursion are resulting in the course of the evolution and do not have to be provided by special functions beforehand. However, superfluous cycles could become problematic for they are extending the program run unnecessarily.
2.1.2 Initialization of the GP Population
Since the tree structure is the most wide-spread data structure for individuals and is used for the application example as well, only this data structure will be considered in the following.
The methods, introduced by Koza in (Koz92, pages 91 to 94), for the initialization of the specific individuals of the population are called "Full", respectively "Growth". For the whole population, the maximum size of a single program is determined by the maximum depth of the specific trees. In the case of the "Full" method for the initialization of an individual, a tree with maximum depth is created, for that all nodes are selected from the set of functions at random, with the exception being the nodes of maximum depth, that are selected from the set of the terminals at random. At the "Growth" method, the tree is
2.1. GENETIC PROGRAMMING 7
built up from the root, with selecting for every node an element from function, respectively from terminal, at random. If a terminal is selected, the structuring of this branch is finished and the procedure continues at the last node of the branch, that is no terminal. For the nodes of the maximum depth of the tree, the stochastic selection is limited to the terminals. To achieve the highest possible variety within the population, both structuring methods described here are applied in combination. This method is called "Ramped Half-and-Half". At a given maximum depth of N, the population is divided by equal parts into trees with maximum depths of 2,3...N. Each one of these groups is initialized half and half according to the "Growth" and to the "Full" method.
2.1.3 The Genetic Operators
The genetic operators are tools that are available to the algorithm to improve the fitness of the population in the course of time. The most frequently used operators are crossover, mutation and reproduction. The easiest operator is the reproduction, that is selecting an individual with respect to the fitness and is transmitting an identical copy into the next generation. The crossover operator is combining the genetic material of two individuals by exchanging subtrees, thus generating two new individuals. Doing so, there is a higher probability for functions to be selected rather than terminals. Even the generation of just one single descendant is quite usual (cf. (BNKF98, page 241)). Mutation is applied to a single individual. Quite often this is made subsequent to the crossover operator. There is only a small probability that a node is selected at random from the individual and the subtree is replaced by new, stochastic nodes, as it has been done at the initialization. In addition to that, there is also a series of other genetic operators of wide-spread use. For example, Banzhaf et al. (BNKF98, page 241) are listing the following operators for mutation:
• Point Mutation: A single node is exchanged by a stochastic node of the same class.
• Permutation: Die Positionen von Argumenten einer Funktion werden vertauscht.
• Hoist: A new individual is generated from a subtree.
• Expansion Mutation: A terminal is exchanged by a random subtree.
• Collapse Subtree Mutation: a subtree is exchanged by a stochastic terminal.
• Subtree Mutation: A subtree is exchanged by a stochastic subtree.
• Gene Duplication: A subtree is exchanged by a stochastic terminal.
As alternatives for the standard crossover operator, Banzhaf is listing the following operators:
CHAPTER 2. BASIC PRINCIPLES AND STATE OF THE ART 8
• Subtree Exchange Crossover: exchange of two subtrees among individuals
• Self Crossover: exchange of subtrees within one individual
• Module Crossover: exchange of modules among individuals
• Context-Preserving Crossover: Exchange of two subtrees of two individuals, if their coordinates are corresponding or if at least they are similar.
2.1.4 Fitness Function
Corresponding to the selection taking place in the natural evolution, there is a selection process in the GP on the basis of a fitness value that is allocated to every specific individual of the population. The fitness value of an individual is determined by the evaluation of the individual by means of the fitness function. To do so, in most cases the genetic program of the individual is executed with entries resulting from a training data set and a fitness value is determined on the basis of the output of the program. In order to establish a selection pressure similar to the natural evolution for getting better solutions for the given problem, it is determined by means of a selection procedure based on the fitness of the individuals which individuals are reproduced in the next generation. An essential requirement both for the problem to be solved and for the design of the fitness function is the continuity of the fitness function (cf. (BNKF98, page 127)). This means that the improvement of an individual with respect to the problem should correspond to an improvement of the fitness value. The continuity of the fitness function is an essential requirement for the iteration of the individuals’ improvement. A very frequent fitness function is the error-fitness-function. It can be applied if an optimum, that is to be reached, is known and the still existing error of the achieved solution for this optimum can be determined. An example for this is the symbolic regression. Hereby a polynomial is given and the GP system shall approximate this function as exactly as possible by combining the provided functions (cf. (Koz92, Chapter 10)). As the deviation from the given polynomial decreases, the fitness value of the individual increases.
2.1.5 Selection
The selection determines which individuals of a population remain, respectively will reproduce and thus will conserve or even proliferate their genetic material. The selection pressure can be controlled with respect to the type and parameter setting of the selection algorithm. A high selection pressure is distributing the properties of the superior individuals within the population rapidly. This may lead to the result that the Evolutionary Algorithm will only find a suboptimal solution, because the existing solution dominates the population and better properties will not be able to prevail. A selection pressure that is too small is unnecessarily extending the running of the algorithm. In
2.1. GENETIC PROGRAMMING 9
this case it may occur that good properties are unable to spread within the population and that they are destructed by the genetic operators again.
The fitness-proportional selection For a long time, the fitness-proportional selection has been the most frequently used method for the selection in the field of genetic algorithms, after its introduction by Holland (cf. (Hol75)). With this method, an individual of the population is selected at random. The probability is determined by the relation of the fitness of the individual to the sum of the fitness of all the individuals. Blickle et al. conclude that the fitness-proportional selection is unfit for the following reasons (cf. (BT95, pages 40 to 42)):
• The reproduction rate is proportional to the fitness of an individual. If the fitness values are very close to each other, there is, as it were, just a stochastic selection.
• There is no translation invariance: while the fitness values one, respectively ten, still mean a great difference in the selection probability, this difference disappears for the most part, if both fitness values are increased by relatively high value.
• Despite a high variance at the beginning of the optimization, the selection intensity is too small, sometimes even negative, a fact that may lead to a deterioration of the average fitness of the population.
Truncation Selection The truncation selection, that is also known as (µ, σ)- selection(cf. (Sch95, page 158 following)), uses µ individuals as parents to generate σ new individuals of which the µ individuals serve as parents in the next generation. The absolute fitness values do not play a role at this selection, but the sequence of the individuals on the basis of the fitness values.
Selection by Rank Selection by rank is also based on the sequence of the individuals that is defined on the basis of the fitness values. There is a distinction made between linear and exponential ranking. At the linear ranking, the probability for an individual to be selected is depending on its rank within the population linearly way. At the exponential ranking, the probability of an individual to be selected is increasing exponentially.
Tournament Selection At the tournament selection, a group of individuals, selected at random out of the population, is competing. The size of this group is called the tournament size. The individual with the greatest fitness value of the chosen group is selected and used for reproduction. The descendant is replacing the inferior individuals of the tournament group in the population. The tournament size determines the selection pressure, with a tournament size of two corresponding to a minor selection pressure. In the GP, a tournament size of seven has established itself as default size.
CHAPTER 2. BASIC PRINCIPLES AND STATE OF THE ART 10
2.1.6 Process of the algorithm
After the presentation of the several components of the evolutionary algorithm, it follows a description of the progress of the algorithm. There is a distinction made between the generational GP and the steady state GP. In the generational GP, at a certain moment a population is creating a population that is transmitted to population of the next generation by the genetic operators:
1. Initialization of the population
2. Evaluation of the population’s individuals and allocation of a fitness value for every individual
3. Application of selection and genetic operators, until the population of the next generation is complete
4. Unless the abort criterium of the algorithm is met, continue with item 2
In the steady state algorithm, there is no distinction made between generations. Out of a randomly selected group of individuals of the population the best ones are chosen by means of tournament selection. The genetic operators are applied to them and the resulting descendants are replacing the losers of the tournament:
1. Initialization of the population
2. Stochastic selection of a group of individuals from the population for the tournament selection
3. Evaluation of the fitness of the selected individuals
4. Application of the genetic operators to the winners of the tournament
5. Replacement of the losers of the tournament by the generated descendants of the winners
6. Unless the abort criterium of the algorithm is met, continue with item 2
With comparing generational GP and steady state GP on the basis of a sorting problem, Kinnear concludes that steady state GP is achieving better results (cf. (Kin93a, page 6 following))).
2.1.7 Crossover, building blocks and schemata
Crossover in its various characteristics is the most frequently applied genetic operator in the GP. The probability for the choice of the crossover operator for the generation of the descendants of an individual usually lies in the range of 90% (cf. (Koz92, S. pages 114 to 116)). The strong application of crossover in the GP is supported by the natural evolution in the framework of the biolog- ical sexual reproduction. As a consequence, mutation occurs in the course of
2.1. GENETIC PROGRAMMING 11
the biological reproduction, but only at a small degree. This fact is reflected by the GP by a small probability for mutation (cf. (BNKF98, Chapter 2). The crossover operator is regarded as the reason why GP is more effectively spread than other procedures that are based on mere random changes of the solution candidates. According to Koza’s description in (Koz92, pages 116 to 119), the population of a GP includes "Building Blocks". Good building blocks improve the fitness of the individuals they include. For that reason there is a higher probability that these individuals are selected for the reproduction. This is how a good building block is succeeds in spreading within the population. With this argumentation, Koza follows the argumentation of Holland, who put up the building block hypothesis for genetic algorithms for the first time (cf. (Hol75)). Because of this spreading of good building blocks it is possible for the GP to create good problem solutions more rapidly than it would be possible for mere mutation-based procedure. Koza demonstrates in (Koz92, Chapter 9), that, with the exception of very simple problems, GP is superior to the random search. Goldberg tries to explain the function of genetic algorithms by means of the building block hypothesis. According to this hypothesis, the crossover operator is combining small good building blocks to form bigger and better building blocks, in order to finally create almost optimal individuals (cf. (Gol89, page 41)). For example, in (Gre93, page 3 following)) Grefenstette criticizes the building block hypothesis as misleading, for it would describe the actual process during the evolutionary development not sufficiently and it could easily happen that wrong conclusions are made. Many authors give the reason for the better function of GP as compared to a mere random search by means of schemata. Schemata are templates that represent a complete group of similar code sections. A scheme theorem describes assumptions of how these schemata develop from generation to generation, while crossover, mutation and selection are influencing them. The presumably most well-known scheme theorem was put up by Holland for genetic algorithms (cf. (Hol75, pages 66 to 88)). In (Koz92, pages 116 to 119), Koza made the first attempt to transfer this scheme theorem to GP. As Langdon describes in (LP02, page 27)), the scheme theorem has been criticized by many authors in the meantime and some of them even classify it as completely useless. However, Langdon supposes that the problem is the over-interpretation of the scheme theorem and not the theorem itself. There are a number of approaches for the transfer of the scheme theorem to GP. For example, there is an overview and a detailed discussion of the different approaches in (LP02, Chapter 3-6). In (LP02, Chapter 6), Langdon introduces the "macroscopic exact schema theorem" and shows that standard crossover schemata of higher rank are made up of schemata of lower rank. However, he admits that the notion "building block" may cause misunderstandings for the GP in as far as it suggests that building blocks are assembled step by step to form better and bigger blocks. According to Langdon, his schema theorem suggests that the selection process of the applied schemata can not be reproduced and is occurring randomly. Moreover, there is not necessarily a selection of schemata that are particularly short or fit over average.
CHAPTER 2. BASIC PRINCIPLES AND STATE OF THE ART 12
2.1.8 Approaches against macromutation
An essential problem of the crossover operator as compared to its model in the biological reproduction is the fact that, on the one hand, it is creating good combinations, but, on the other hand, it is working as macromutation and is destroying created building blocks. In the biological reproduction, much energy is used for protecting already created good building blocks against the harmful influences of the crossover. Biological crossover is homologous, i. e. the crossover is happening almost exclusively on the gene level. It hardly occurs that two genes with completely different purposes are exchanged (cf. (BNKF98, page 156 following)).
There is a number of approaches that try to introduce protection mechanisms against the macromutation in the GP. A naturally occurring phenomenon is the attachment of effect-free code to the building blocks in the course of the simulation. Hereby there is a decreasing probability for a destruction of the building blocks. This process can be observed at the natural model as well. More than 90% of a genome of higher creatures is made up of so-called introns that neither encode genes nor contain control sequences. While the EA is in process, it can be observed that the quantity of effect-free code is increasing exponentially (BNKF98, pages 191 to 193). This phenomenon is called "bloat". Besides the protection function by the introns, bloat has the problem that it blocks the development and the execution time of the generated program is unnecessarily extended.
Banzhaf identifies the macromutation of the crossover operator as big difference as compared to the naturally occurring crossover at the biological sexual reproduction. Almost all natural crossovers are successful, whereas in GP about 75% of them would have to be classified as "deadly" according to biological standards (cf. (BNKF98, page 157)). For that reason, the reduction of the macromutation of the crossover operator in the GP is an important field of research. In the following passages, there is the discussion of some of the approaches that have been examined for the improvement of the crossover operator, respectively for avoiding its negative effects.
Brood Recombination In (Tac94, Chapter 5) by means of "brood combination" Tackett proposes a method that actually does not improve the crossover operator, but reduces negative effects. In nature, there are some species that have by large more descendants than the survival of the species would require. Some descendants die soon after the birth because of several mechanisms, for example, while fighting each other for nourishment. Hereby it is guaranteed that the parents invest the energy needed for the upbringing only in the fittest descendants.
Tackett transfers this approach to GP. Instead of procreating only one or two descendants, a whole series of descendants is generated. The fitness values are calculated for this group called "brood" and the best one or two descendants are selected. The rest of the brood cancelled. In order to counteract to the increased calculation requirements of this method, Tackett proposes not to make the calculation for the brood on the basis of the complete training data,
2.1. GENETIC PROGRAMMING 13
but only with a small part. He argues that it is legitimate to do so, because the goal is sorting out of the unfit individuals and not actually finding the best individual of the brood (cf. (Tac94, page 87)). Tackett reports that the expected improvement as compared to the GP without brood selection has actually occurred at the executed experiments.
Apart from approaches for avoiding the harmful effects of crossover, there are numerous attempts to make the crossover operator "more intelligent".
Strong Context-Preserving Crossover In (D’h94), D’haeseleer develops the "strong context-preserving crossover" (CPC) operator, that only allows crossover between nodes that are exactly at the same position of two parents. D’haeseleer reports to have achieved improvements by mixing the normal crossover and SCPC. The application of SCPC alone would cause problems, because in that case the variety within the population would suffer. D’haeseleer thinks that a certain degree of macromutation is required (cf. (D’h94, page 2)).
Explicit Multiple Gene Systems In (Alt95), Altenberg presents a system where fitness components are influenced by a subset of available genes. During the evolution it is tried periodically to add a new gene. However, this gene is integrated only if it leads to a fitness increase. During the evolution of the population, genes are exchanged between the individuals by means of crossover. The "constructing selection" with subsequent exchange of genes by crossover proposed by Altenberg (cf. (Alt95, page 39)) shows a strong similarity with the homologous biological crossover. Instead of permitting the destruction of the majority of the good building blocks by the macromutation of the crossover, in this model, the crossover is limited to genes consisting of one or several building blocks. Since there is no risk for the building blocks to be destroyed, it is supposed that bloat is no problem at this approach.
Explicitly Defined Introns What belongs to the interesting measures against the macro-manipulation by the crossover operator is the approach "explicitly defined introns" (EDI). An integer value is stored between two nodes of a tree without any effect on calculation of the tree. The crossover operator is modified; so the probability of a crossover incident between two nodes is depending on the the value of the EDI between them. Just as the rest of the individual, the EDIs are developed evolutionarily In the course of the evolution. Hereby building blocks are forming, that are protected against a crossover by their linking EDIs. In (NFB95, page 16), Nordin et al. are making a series of tests concluding in the assumption that EDIs have an important role in finding individuals of high fitness. As far as fitness, generalization and processing time are concerned, EDIs have achieved better results than individuals in tests without EDIs.
GP 1-Point Crossover Operator In (PL98, page 19), Poli and Langdon pro- pose a 1-point crossover operator that considers structural similarities in the
CHAPTER 2. BASIC PRINCIPLES AND STATE OF THE ART 14
parent trees for the selection of an appropriate crossover point. Their experiments indicate point out that hereby the destructive influences of the crossover can be reduced in the course of the evolution.
2.1.9 Modularization
In many program languages there is the possibility to combine parts of the program and to use these modules at another place. Modularization is used for the abstraction, the division of the problems into subordinate problems and the retrieval of the program code. But in the context of the problem with the macromutation by means of the crossover operator, modularization has an interesting role. Some approaches are using specific crossover operators that consider the modules and thus prevent the macromutation. Different authors have examined modularization in the GP. A choice of different techniques is presented in the following passages.
Encapsulation The idea of encapsulation is described in (Koz92, pages 110 to 112). Hereby an asexual genetic operator is considered for selecting a function node of an individual and replacing it by a terminal node that includes the subtree of the selected function node. Now the generated terminal can also be applied in other individuals. The encapsulated subtree can no longer be improved or destroyed by the crossover operator. In case the replaced subtree contains a useful code, this could be an advantage.
Module Acquisition Angeline and Pollack propose a technique called "module acquisition" (cf. (AP92, pages 2 to 4)) for generating reusable modules. To do so, a subtree for a determined depth is defined as module in a selected individual. The parts of the subtree under the module are considered as arguments of the module. Apart from the exclusive application within the individual, the module can be provided to the complete population by means of a function library. As long as one individual is using a module at least, it will remain in the library, otherwise it will be deleted. Exactly as being done at the encapsulation, a defined module is not developed further at the module acquisition and it is protected against the macromutation.
Automatically Defined Functions In (Koz92, page 534 following), Koza gives a description of "automatically defined functions" and a very detailed one in (Koz94, Chapter 4-16). In order to use ADFs in the GP, the program trees are divided into two parts: a branch selected for the calculation of the fitness and a branch containing the function definitions of the ADFs. Both branches take part in the evolution. Hereby the crossover operator has to consider specific characteristics of the branches and the relation between them. The definition of an ADF in the function definition branch of the program tree is made up of three parts: the function name, a list of arguments as well as a branch including the body of the function set. The function name of the ADF
2.1. GENETIC PROGRAMMING 15
becomes a part of the function set of the branch, which is generating results. The variables defined in the argument list become a part of the terminal set of their ADF function body. The evolution exclusively is taking place in the function body of the ADF and in the result-generating branch. Only this branch as well as the function body of the ADFs is generated randomly. Before the evolution can start, the number of the names of the ADF as well as the number and names of the function arguments has to be defined for every ADF.
λ Abstraction In (Yu99a, Chapter 6,7), Yu presents an approach to use λ abstractions, based on the λ calculus, in the GP. Hereby every λ abstraction is considered as an independent module within the GP tree and protected as a unit in the course of the evolutionary process. These modules are developing in a way similar to the rest of the GP tree. Crossover is only permitted between modules that have the same number and type of the input and output. Yu documents the success of this approach, e. g. in (Yu99b). One reason for this success could be found in the homologous crossover that conserves structure and function, in the context with the λ abstraction.
2.1.10 Further approaches for improvement
Among others, further approaches for the extension of the standard GP are the following ones:
Loops and Recursion Although loops and recursion have an enormous im-portance in the manual programming, their application is up against difficulties in the GP. Long loops, especially endless loops, can cause the abort of the evolution of a program. In (Kin93b, page 5), Kinnear describes a possible approach in the framework of the evolutionary development of a sorting al-gorithm: the applied loop can only receive the start and end index of a finitely great list as parameter.
Strongly Typed Genetic Programming The extension of the traditional GP to the strongly typed GP (STGP) is introducing types for terminals and functions into the genetic programming (Mon93, page 7). This extension is particularly useful, because it reduces the search range of the evolutionary algorithm considerably. Functions are allocated to specific input and output types and can only combined with terminals of the appropriate type.
Cultural GP The Cultural GP, described by Spector and Luke (SL96), is using an indexed memory with access for all individuals of the population over all generations. The memory is initialized at the start of the evolutionary development and is provided to the population as data storage and communication medium between the generations afterwards. The authors come to the conclusion that the processing effort for the considered problems is reduced as compared to the standard GP by means of the shared memory.
Quote paper:
Diplom Informatiker Holger Hartmann, 2007, Development of Trading Systems using Genetic Programming with a Case Study, Munich, GRIN Publishing GmbH
This text can be quoted and accessed from this url:
Embed
DOI
Formatvorlage (Microsoft Word) für eine Diplomarbeit, Masterarbeit, Ha...
Für MS Word 2003 - Update 2010
Presentations, Models, Tutorials, Instructions
Elaboration, 25 Pages
Formatvorlage (OpenOffice) für eine Diplomarbeit, Masterarbeit, Hausar...
Presentations, Models, Tutorials, Instructions
Elaboration, 35 Pages
Formatvorlage / Vorlage zur Erstellung einer Diplomarbeit, Bachelorarb...
Presentations, Models, Tutorials, Instructions
Elaboration, 15 Pages
Formatvorlage / Vorlage für eine Diplomarbeit / Hausarbeit
Für MS Word 2007 - dotx
Presentations, Models, Tutorials, Instructions
Elaboration, 25 Pages
Anleitung zum Erstellen schriftlicher Arbeiten: Der Aufbau einer wisse...
Presentations, Models, Tutorials, Instructions
Elaboration, 20 Pages
Erstellen einer schriftlichen Hausarbeit
Presentations, Models, Tutorials, Instructions
Term paper, 14 Pages
Grundtechniken wissenschaftlichen Arbeitens
Bibliografieren - Reden - Schr...
Presentations, Models, Tutorials, Instructions
Script, 46 Pages
Ratgeber zur Erstellung wissenschaftlicher Arbeiten. Diplomarbeiten - ...
Presentations, Models, Tutorials, Instructions
Elaboration, 39 Pages
Holger Hartmann's text Development of Trading Systems using Genetic Programming with a Case Study is now available as a printed book
Holger Hartmann has published the text Development of Trading Systems using Genetic Programming with a Case Study
Holger Hartmann has uploaded a new text
Preparing Educators to Engage Families: Case Studies Using an Ecologic...
Holly Marie Kreider, M. Elena Lopez, Celina M. Chatman-Nelson
Numerical Simulations and Case Studies Using Visual C++.Net
Shaharuddin Salleh, Albert Y. Zomaya, Stephan Olariu
Integration of Transport and Trade Facilitation: Selected Regional Cas...
T. R. Lakshmanan, Uma Subramanian, William P. Anderson
HbA1c in Diabetes: Case Studies Using IFCC Units
Stephen Gough, Susan Manley, Irene Stratton
Studies in Contemporary Socio-Economic and Educational Problems in the...
Solomon V. Kobiowu
The Advanced Technology Program: A Case Study in Federal Technology Po...
Loren Yager, Rachel Schmidt
Development of Port Systems: Viet Nam Case Study
Enhancing and Satisficing Stra...
Ngoc-Hien DO, Ki-Chan NAM
The WTO Primer: Tracing Trade's Visible Hand Through Case Studies
Kevin Buterbaugh, Richard Fulton
0 comments