Excerpt

## Contents

1 Introduction

1.1 Background

1.2 Fundamental Ideas

2 The Core Immune Learning Algorithm

2.1 Problem Description

2.2 Deciding the Immune Principles to be used for Problem Solving

2.3 Engineering the Artificial Immune System

2.3.1 Defining the Types of Immune Components to be used

2.3.2 Defining the Mathematical Representation for the Elements of the AIS

2.3.3 Applying Immune Principles to Problem Solving

2.3.4 The Metadynamics of the AIS

2.4 Reverse Mapping from AIS to the Real Problem

2.5 The Full Learning Algorithm in Pseudocode

2.5.1 The aiNet Algorithm

2.5.2 The Isis Algorithm

2.5.3 Comparison of aiNet and Isis

3 The Architecture

3.1 The Concept

3.2 The Ra Algorithms

3.2.1 The Learning Algorithm: raCreate

3.2.2 The Classification Algorithm: raCluster

4 Implementation

4.1 Implemented Classes

4.2 Invoking the Algorithms

4.2.1 Isis

4.2.2 raCreate

4.2.3 raCluster

4.2.4 raGraph

5 Data Preprocessing

6 Results

6.1 Results with the Identity Shapespace

6.1.1 Level 4 Clusters with Identity Shapespace

6.1.2 Comments

6.2 Results with the Convolution Shapespace

6.2.1 Level 2 Clusters with Convolution Shapespace

6.2.2 Comments

6.3 Results with the Fourier Shapespace

6.3.1 Level 1 Clusters with Fourier Shapespace

6.3.2 Comments

6.4 Comparison of the Algorithm Behaviour Across the Differ- ent Shapespaces

7 Further Possibilities for the Ra/Isis Approach

7.1 Parametertuning

7.1.1 Hierarchical Parameters

7.2 Shapespaces

7.2.1 Hierarchical Shapespaces

7.3 Clustering Unusual Data

7.4 Data Analysis and Other Learning Algorithms

7.5 Classification

7.5.1 Hierarchical Classification

7.5.2 Classification Networks

7.5.3 Subsequent Classification

8 Analysis of the Algorithm Behaviour

8.1 Ra/Isis vs. Common Aglomerative Clustering

8.2 Comparison of Ra/Isis with Self Organizing Maps

8.2.1 Results of Applying the Data to a SOM

9 Conclusion

## List of Figures

1 Ra/Isis Single Loop Architecture

2 Ra/Isis Concept: Example for Complex and Specialized Setups

3 Preprocessing: Moving the Patterns to the Upper Left

4 Patterns 30 and 63 before Dilatation

5 Patterns 30 and 63 after Dilatation .

6 Patterns 0, 33 and 66

7 Patterns 1, 34 and 67

8 Patterns 2, 35 and 68

9 Patterns 3, 36 and 69

10 Patterns 4, 37 and 70

11 Patterns 5, 38 and 71

12 Patterns 6, 39 and 72

13 Patterns 7, 40 and 73

14 Patterns 8, 41 and 74

15 Patterns 9, 42 and 75

16 Patterns 10, 43 and 76

17 Patterns 11, 44 and 77

18 Patterns 12, 45 and 78

19 Patterns 13, 46 and 79

20 Patterns 14, 47 and 80

21 Patterns 15, 48 and 81

22 Patterns 16, 49 and 82

23 Patterns 17, 50 and 83

24 Patterns 18, 51 and 84

25 Patterns 19, 52 and 85

26 Patterns 20, 53 and 86

27 Patterns 21, 54 and 87

28 Patterns 22, 55 and 88

29 Patterns 23, 56 and 89

30 Patterns 24, 57 and 90

31 Patterns 25, 58 and 91

32 Patterns 26, 59 and 92

33 Patterns 27, 60 and 93

34 Patterns 28, 61 and 94

35 Patterns 29, 62 and 95

36 Patterns 30, 63 and 96

37 Patterns 31, 64 and 97

38 Patterns 32, 65 and 98

39 Hierarchical Clustering using Identity Shapespace .

40 Hierarchical Clustering using Convolution Shapespace

41 Hierarchical Clustering using Fourier Shapespace

42 Parameters within the Databionics ESOM Analyser .

43 ESOM Result: Identity Shapespace, Gaussian Neighbour- hood

44 ESOM Result: Identity Shapespace, Mexican Hat Neigh- bourhood

45 ESOM Result: Convolution Shapespace, Gaussian Neigh- bourhood

46 ESOM Result: Convolution Shapespace, Mexican Hat Neigh- bourhood

47 ESOM Result: Fourier Shapespace, Gaussian Neighbourhood

48 ESOM Result: Fourier Shapespace, Mexican Hat Neigh- bourhood

49 ESOM Result: Fourier Shapespace, Gaussian Neighbour- hood, Manhattan Distance

## List of Tables

1 Values for Selecting the Distance Measures

2 raCreate Parameters for Figure 39

3 Level 4 Clusters in Identity Shapespace

4 raCreate Parameters for Figure 40

5 Level 2 Clusters in Convolution Shapespace

6 raCreate Parameters for Figure 41

7 Level 1 Clusters in Fourier Shapespace

## Listings

1 Isis constructors

2 Isis.setShapespace method

3 Isis.setSelfPatterns method

4 Parameters of the Isis.genIsis method

5 Ra constructors

6 Parameters of the Ra.raCreate method

7 Parameters of the Ra.raCluster* methods

8 Parameters of the Ra.raGraph method

## 1 Introduction

This section will explain the background and the fundamental ideas of this work.

### 1.1 Background

This work is based on a diploma thesis by Changxing Dong [Dong04]. In [Dong04], the effects of different shapespaces on the behaviour of artifi-cial immune system algorithms are evaluated. For that purpose a pat-tern recognition task was used. In this case, chinese characters should be learned and recognized. Each character was adapted individually us-ing the CLONALG algorithm [dCT02]. The resulting set of antibody patterns, the duration of running the algorithm and the quality of recog-nition were then used to evaluate the differences between the different shapespaces.

### 1.2 Fundamental Ideas

In [Dong04], each character was modelled as a single antigen pattern. Each of these patterns was adapted by the algorithm into one antibody pattern, a pattern that would match the incoming antigen. This approach results in an one on one mapping of the character to its recognizing pattern.

Although this is fine, if one is just interested in an analysis of the behaviour of algorithms, it is of little practical use, since the most optimal output of the algorithm is equivalent to the input.

In [Dong04], the characters are used in three different fonts to do further comparisons. Now, the initial focus of this work, was to create a system, that would adapt all the incoming antigen patterns using the same set of antibody patterns. That idea is based on discrete models of the immune network theory as proposed in [dCT02], especially on the aiNet algorithm [dCT00, dCT02].

The system would get all the chinese characters as input antigens and create a set of antibody patterns that match these input patterns. A specific antibody would then represent a cluster across the input patterns. That cluster contains all the patterns that are matched by the specific antibody.

Following that approach, the goal was, to create a set of antibody patterns, that would somehow adapt the structure of the chinese chracters. The used chinese characters were taken from the GB2312-80 font set, that ships with Microsoft Windows XP. By their nature, some characters look very similiar. The algorithm should be able to group similiar patterns together, at best across different fonts.

With that goal in mind, the algorithm has to find an equilibrium be-tween generalizing different, but similiar, patterns on the one hand, and discriminating between patterns, that are very distinct, on the other.

Since the different fonts provide each character in a different way, the grouping must be done using a higher or more abstract level, using conceptual properties rather than pixels alone.

The highlighting of conceptual properties shall be provided by the shapespace concept. That provides a transformation of the input data which emphasizes conceptual properties.

Each antibody matches antigens in its recognition region, which is determined by the used cross reactivity threshold.

More complex recognition regions can be created by grouping the recognition regions of several antibodies together.

That grouping can be created using the same clustering algorithm. For that purpose, matching by similiarity instead of matching by complementarity is used. By that, the output antibodies can be seen as higher level antigens that can be grouped together by mapping them to higher level antibodies. Through iteration of that idea, a conceptual hierarchy is created. As the used artificial immune algorithms are heuristic by design, these clusters can’t be guaranteed to be disjunct.

## 2 The Core Immune Learning Algorithm

To design the algorithm, I followed the Guidelines for the Design of AIS from the book Artificial Immune Systems: A New Computational Intelligence Approach [dCT02].

The algorithm developed in this section, is the core clustering algorithm. The usage of this algorithm to create the hierarchy is explained in a later section of this document.

### 2.1 Problem Description

The task to be solved is, as already outlined, to create a system capable of grouping together similiar input patterns. The base algorithm has only to be able to map one or more input patterns to one recognizing pattern. The input patterns are binary pixel matrices, with a dimension of 16 for both x and y directions.

Two main elements can be indentified: the patterns to be mapped and the patterns onto which the input patterns are mapped. The algorithm is called Isis, an abreviation of I independent S ingle I mmune S ystem.

### 2.2 Deciding the Immune Principles to be used for Problem Solving

Following conceptual immune priciples will be used:

1. Abstract models of Immune Cells, Molecules and their Interactions

- Only two types of cells are used:

(a) antigens, corresponding to the input patterns

(b) antibodies, corresponding to the recognizer patterns

- The matching between the cells is done using an affinity mea- sure. Matching by similiarity is used: the more equal antigen and antibody, the higher the affinity. The distance measure used for calculating the affinity can be varied by the user.

- The input patterns come as 16x16 pixel matrices. These can be interpreted as binary strings with a length of 256. On these matrices a shapespace transformation can be calculated, that weights each pixel with an integer value. Generally, the data on which the algorithm works is an integer matrix. The shapespace transformation can be varied by the user.

2. Algorithms and Processes

These Algorithms and Processes will be used in the full algorithm:

- Clonal Selection

- Clonal Expansion

- Affinity Maturation

- Somatic Hypermutation

- Cell Aging

- Simulating an Immune Reaction

Note, that all these are just subprocesses, used as parts of the complete learning algorithm, which is developed.

3. Immune Network Models

The algorithm is inspired by the Immune Network Theory [dCT02] and its proposals for discrete models [dCT02]. It is also inspired by the aiNet algorithm [dCT00, dCT02].

According to [dCT02], there are two levels of interactions in the network:

(a) The interaction with the environment (foreign anti- gens): In this system, any antigen is a specific character, that should be recognized. So, for each antigen, which is presented to the antibody set, the best matching antibodies are selected, cloned and mutated. The lifetime of those antibodies is set to maximum. Once there is an antibody, that matches an antigen according to a specific cross reaction threshold, the antigen is removed and the antibody is preserved. That reflects a simu-lated immune reaction. By that process, it is guaranteed, that all antibodies match their corresponding antigen with roughly the same affinity. That is useful, because otherwise, some an-tibodies match their antigen with a much higher affinity, than other antibodies matching theirs. If that would be the case, the emerging clusters would be biased. That may be desired in other circumstances, but in this case, the generalization per-formance shall be optimal and not the exact matching. If we would allow the bias, overfitting would have a higher chance to occur.

(b) The interaction with other network elements:

These processes work without the external antigenic stimuli:

- Aging: All antibodies have a lifetime counter. That counter is reduced in each cycle of the algorithm. That process is more of an implementation issue. If an antibody is not used, that is, it didn’t match any antigen for a given period, it is removed. It wouldn’t hurt, if it stayed, yet if it stays, it creates unnecessary memory and cpu usage. Also, Aging can be seen as an interaction between network cells, as the lifetime of a cell is always relative to the lifetime of other cells.

- Network Suppression: The affinity between each pair of antibodies is calculated. If the affinity is maximal, then the antibodies are equal and one of the two is removed. If the affinity is very high, both are melted into a single antibody. That feature should be handled with great care and is explained more detailed in a later section of this document.

- Diversity: A certain number of new random antibodies is created.

The whole network structure is not given by the affinities and bind-ings between the antibodies, but by the existence of the antibodies itself. The resulting antibody set of the whole algorithm contains only antibodies, which, in a sense, survive. The survival is granted by their abilitiy to match at least one antigen. That makes sense here, because the input set is static and the antibodies are adapting the input set.

### 2.3 Engineering the Artificial Immune System

The engineering step includes the actual implementation of the system. For convenience reasons, the main components just get defined in this section. Details about the implementation and usage of the developed software are given in a later section.

#### 2.3.1 Defining the Types of Immune Components to be used

Only two types of immune components are used: Antigens relating to the input and antibodies, which match the incoming antigens.

#### 2.3.2 Defining the Mathematical Representation for the Elements of the AIS

As specified, there are only two types of elements in this AIS. Since the affinity between them is calculated by checking how similiar these patterns are, both types are represented in the same fashion.

The main input are 16x16 pixel matrices, each matrix representing one chinese character. Examples are shown in figures 6 to 38. These could be treated as 256 bit binary vectors, but for easy appli-cation of shapespace transformations, it is not implemented that way. Each antigen and antibody pattern is implemented as a matrix of in-teger values. As this system is implemented in Java, each integer contains 32 bits.

The size of the matrix can be set at runtime. Here, the patterns are always 16 pixels wide and 16 pixels high, as that is the nature of the used input data.

On these integer matrices, that just contain 0 or 1 to indicate black or white pixels, a shapespace transformation is applied. Following shapespace transformations are possible in this system. More transformations can be added if desired.

- Identity Shapespace

The Identity Shapespace transformation does nothing to the input data, the data is used as it already is.

- Convolution Shapespace

The Convolution Shapespace transformation calculates a convolution on the input pixel matrix. It is as such, a specific Integer Shapespace.

The convolution operation is defined as follows. This definition is based on [Ste04]:

Let L, M be odd, natural numbers.

Let GInput = (gInput(i, j)) with i = 0, .., I − 1; j = 0, .., J − 1 be a pixel matrix with I rows and J columns, called input picture. The elements gInput(i, j) are real numbers.

Abbildung in dieser Leseprobe nicht enthalten

The linear convolution GC = (gC (i, j)) of GInput with the convo- lution kernel H is defined by the following sum:

Abbildung in dieser Leseprobe nicht enthalten

All results using this shapespace, that are presented in this document, were created using that convolution matrix:

All results using this shapespace, that are presented in this document, were created using that convolution matrix:

Abbildung in dieser Leseprobe nicht enthalten

By using that simple convolution matrix, every pixel gets transformed into a count on how much pixels surround the pixel.

- Fourier Shapespace

The Fourier Shapespace transformation calculates a two dimensional fourier transformation on the input patterns. The result is a complex matrix. That complex matrix is then reduced into an integer ma-trix. By that, this shapespace transformation also creates an Integer Shapespace.

A two dimensional fourier transformation can be calculated by cal-culating a one dimensional fourier transformation on the row vectors of the matrix and then calculating a one dimensional fourier trans-formation on the column vectors of that matrix [Cla97]. To calculate the one dimensional fourier transformation, the fast fourier transformation (FFT) algorithm, as in [La97], is used.

The resulting integer matrix GT is calculated out of the complex matrix using this equation:

Abbildung in dieser Leseprobe nicht enthalten

denotes the elements of the fourier matrix.

The values gT are then rounded to convert them to pure integer values.

To calculate affinities between the antigen and antibody patterns, a distance measure is required. Following distance measures are implemented in the system, which is outlined here.

Abbildung in dieser Leseprobe nicht enthalten

- Frequency Filter Distance

The Frequency Filter Distance, is basically the same like the other distance measures, with the especialness that small values in either the antigen or antibody are set to zero.

That makes particularly sense when using the fourier shapespace transformation. In that case, the distance calculation only takes strong frequencies in account. Thus noise or small differences between patterns are smoothed out.

In addition to the filtering, any other distance measure may be used. In the system, that this work created, only the manhattan distance was implemented with the frequency filter.

By that, the implemented Frequency Filter Distance follows these equations:

Abbildung in dieser Leseprobe nicht enthalten

#### 2.3.3 Applying Immune Principles to Problem Solving

The task for this AIS is to search for antibodies that match at least one antigen. That search is basically done by the CLONALG strategy [dCT02], which is very similiar to evolutionary algorithms.

Two operations have to be defined, as they are used in the search and for the metadynamics of the algorithm.

1. Mutation:

The mutation operator is responsible for actually changing the an-tibody patterns. The change happens on the input pattern, which only contain black and white pixels. One of these bits is changed and on the changed matrix the selected shapespace transformation is applied.

That core operator is then used in the somatic hypermutation process, that may change more than one pixel, according to the current affinity of the selected antibody.

The operation can thus be described as:

(a) Select a position in the binary pixel matrix at random.

(b) Toggle the selected pixel.

(c) Repeat the above steps as often as desired.

(d) Apply shapespace tranformation on the new pixel matrix.

2. Melt:

The melt operation is used to melt two patterns into one.

(a) for all pixels do:

i. if the two source pixels are equal, the target pixel is set to that source value

ii. if the two source pixels are different, the target pixel is set to either 0 or 1 at random

(b) Apply shapespace tranformation on the new pixel matrix. Comments on the melt operation:

- This operation should be handled with great care, since it works against the clonal expansion and against small mutations.

- It is mainly useful to reduce memory and cpu costs, as it reduces the size of the antibody set.

- It should be applied only, if the two source patterns are very equal. It can reduce the searching work, because the distance is calculated between the shapespace transformed patterns. If we have two patterns that differ only in one binary pixel, and another two patterns that also differ only in one binary pixel, the distance between each pair in the shapespace can be differ- ent. By that, the melt operator can focus the search on those one pixel mutations, which make a real difference within the shapespace.

The clonal expansion requires to calculate the number of clones NC proportionally to its affinity. That is done using equation 4.

Abbildung in dieser Leseprobe nicht enthalten

where

ζ denotes the maximal number of clones to create.

ρ denotes a coefficient to control the shape of the exponential function. α denotes the current affinity.

αmax denotes the maximum affinity, which is possible in the used shape-space.

Please note, that the highest affinity is represented by α = 0.

The somatic hypermutation requires to calculate the number of mu-tations NM indirectly proportionally to its affinity. That is done using equation 5.

Abbildung in dieser Leseprobe nicht enthalten

ζ denotes the minimal number of mutations to do.

ρ denotes a coefficient to control the shape of the exponential function.

α denotes the current affinity.

αmax denotes the maximum affinity, which is possible in the used shape-space.

Note again, the highest affinity is represented by α = 0.

#### 2.3.4 The Metadynamics of the AIS

The metadynamics are described by the processes, which add new cells to or remove cells from the current set of cells. In the Isis algorithm, this happens by these processes:

- Antigens are removed, meaning they are no longer presented, once they are matched by an antibody with an affinity that is high enough.

- Antibodies are preserved, once they are capable of matching at least one antigen with a high enough affinity.

- Antibodies are removed from the working set, once their lifetime expires.

- An antibody is removed, if the same antibody is already in the work- ing set.

- Two antibodies are melted into one, if they are very equal.

- New random antibodies are introduced into the system in each life- cycle of the cells.

These processes control the creation of new cells and the removal of unnecessary cells. In addition to that, the set of cells is changed in the antigenic presentation phase, where selection, cloning and mutation operators are used to adapt to the incoming antigens.

### 2.4 Reverse Mapping from AIS to the Real Prob-lem

The output of the algorithm is a set of antibodies. Each of these anti-bodies represents a cluster containing input patterns. The input patterns in each cluster are found checking which antigens (representing the input patterns) match with each antibody according to a specific cross reaction threshold. The clusters may require further interpretation to be useful for solving a pattern recognition or categorization task, but that interpreta-tion is not within the scope of an AIS.

### 2.5 The Full Learning Algorithm in Pseudocode

The here developed Isis algorithm was inspired by aiNet [dCT00, dCT02]. To make it easy to compare the two algorithms, I include a very brief version of that algorithm.

#### 2.5.1 The aiNet Algorithm

1. Initialization: create a set of random antibodies

2. repeat for a given number of iterations:

(a) Antigenic presentation: for each antigen do:

i. Affinity evaluation: calculate the affinities between all an- tibodies and the current antigen

ii. Clonal selection: select the best antibodies

iii. Clonal expansion: clone the selected antibodies, the better the antibody, the more it is cloned

iv. Affinity maturation: mutate the cloned antibodies, the bet- ter the antibody, the less changes are done

v. calculate the affinities between the new antibodies and the current antigen

vi. select the best new antibodies

vii. Metadynamics: remove those antibodies from that selec- tion, whose affinity with the current antigen is too low

viii. Clonal interactions: calculate the affinities, which the new antibodies have with each other

ix. Clonal suppression: remove those new antibodies whose affinity to the other new antibodies is too low

x. Network construction: the network consists of the existing antibody set and the new antibodies, that were just created

(b) Network interactions: calculate the affinities, which the anti- bodies in the network have with each other

(c) Network suppression: remove those antibodies from the net- work, whose affinity to the other network antibodies is too low

(d) Diversity: create new random antibodies

(e) add the network set and the new random ones to the antibody set

3. return the network set

#### 2.5.2 The Isis Algorithm

1. Initialization: The initial antibody set can be given before running the algorithm. If it is not given, the antibodies are created at ran- dom. All antibodies get the maximum remaining lifetime.

2. as long as the antigen set is not empty, do:

(a) Antigenic presentation: for each antigen do:

i. Affinity evaluation: calculate the affinities between all the antibodies and the current antigen

ii. Clonal selection: take the best antibodies, that don’t match with self patterns, out of its set into a clonal selection set

iii. Clonal expansion: clone the selected antibodies, the better the antibody, the more clones are created

iv. Affinity maturation: mutate the clones, the better the clone, the less changes are done

v. if the mutated clone has improved its affinity to the current antigen, it is allowed back into the antibody set and its remaining lifetime is set to maximum

(b) Network aging:

i. remove all antibodies, whose lifetime is expired.

ii. decrease the remaining lifetime of the surviving antibodies.

(c) Network interactions: calculate affinities between the antibody patterns

(d) Network suppression:

i. if two antibodies are equal, remove one of them

ii. if two antibodies are very similiar, melt them into one an- tibody

(e) Diversity: add new random antibodies to the antibody set

(f) Immune reaction: for each antigen do:

i. calculate affinities between all antibodies and the current antigen

ii. select the best matching antibody

iii. if the selected antibody matches the antigen with an affinity that is above a certain threshold level, then:

A. remove the antigen from the antigen set

B. copy the antibody into a set of memory antibodies

3. return the memory antibody set

#### 2.5.3 Comparison of aiNet and Isis

The aiNet algorithm bears some resemblances with Isis, but there are also significant differences. Both are compared now. When comparing the pseudocode directly, keep in mind, that aiNet is designed to use matching by complementarity.

1. The initializing step is basically identical.

2. aiNet runs for a given number of iterations, Isis runs until all anti- gens can be handled by the antibody set.

3. Both algorithms start by presenting each antigen to the antibodies to increase their affinity with the appropriate cloning and mutation processes.

4. The result of the cloning and mutation processes is handled slightly different. aiNet selects just the best clones, Isis allows all those clones, which improved in affinity. The Isis algorithm may also use this aiNet selection strategy, if the search space is unknown or known to contain local minima.

5. aiNet features some metadynamic steps to allow the cloning process to be the driving force behind the construction of a network of anti-bodies, which should represent the structure of the input patterns. These steps are not contained within Isis, as it has a different goal. That goal is to cut the structure inherent within the input patterns into pieces, with the intent to extract clusters. This is the core point discriminating both algorithms and is described in greater detail af-ter this step-by-step comparison.

6. Isis features a step to decrease the remaining lifetime of the anti- bodies. aiNet doesn’t care for cell lifetimes and thus doesn’t contain such a step.

7. Both algorithms do network metadynamics. Yet, aiNet removes sur- plus antibodies, while Isis merges those antibodies, using the melt operation.

8. The diversity ensuring step is equal in both algorithms.

9. At the end of the antibody creation process, the Isis algorithm sim- ulates an immune reaction, to unfocus from the input patterns that can already be matched. As aiNet wants to represent the input pattern structure as a whole, it doesn’t contain such a process.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1: Ra/Isis Single Loop Architecture

As just mentioned, the goals of aiNet and Isis are different.

aiNet tries to map the structure of the input using a set of, not nec-essarily but mostly, interconnected antibodies. By that, the output of that algorithm describes the inner topology of the input data. That is explained and shown in detail in [dCT00] and is not depicted here. So, the output of aiNet knows about the inner coherence of the data and can thus classify new data as known or different from the known data pattern. aiNet thus assumes, that the input data describes an important part of the data universe, and that this part should be pictured.

Extracting different clusters from that topology is possible by inter-preting the aiNet output as a graph and pruning that graph using the minimum spanning tree algorithm [dCT00]. This pruning cuts the aiNet structure into pieces, which then mark different areas in the input space.

On the contrary, Isis tries to directly cleave the input data into pieces, foregoing the necessity of applying graph pruning to its output. Isis as-sumes, that the input data is representative of the whole input universe, with each input pattern representing a different part of the whole. It fur-ther assumes, that any unseen data can be mapped to one of the pieces, it created from the learning data. The output of Isis is a set of focal points in the input universe, which the algorithm labels as important.

Due to this upfront break down of the input data structure, unlike aiNet, only small connections between the patterns can be seen, just in the case one antibody can match more than one antigen. To create a coherent structure out of the Isis output, this very output can be clustered again. That widens the focus of each recognizer element in the system and makes it possible to form complex, non-circular recognition regions. An approach to this is given in the next section.

Anticipating what will be outlined there, it is possible to contrast aiNet with Isis by saying, that aiNet follows a top-down strategy, while Isis advances a bottom-up pursuit.

## 3 The Architecture

### 3.1 The Concept

The Isis algorithm can now be used to create a set of recognizing patterns, onto which the input patterns are mapped. These recognizing patterns can then be mapped onto higher level recognizer patterns.

This can be done in a single loop (see figure 1) or in a more complex setup (see figure 2).

This overall conceptual idea is called Ra/Isis, Recurrent Architecture of I ndependent S ingle I mmune S ystems. The name also emphasizes, that it is a setup, which wraps around the core Isis algorithm.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2: Ra/Isis Concept: Example for Complex and Specialized Setups

In this work, only the single loop setup is used. Yet, the conceptual idea is much more general than this. Whether that more complex approach is useful, is a question that may require further investigation. It may be especially useful in conjunction with other possible enhancements. Some of these additional modifications are outlined later.

### 3.2 The Ra Algorithms

In this section, the main algorithms for the proposed single-loop archi-tecture are outlined. They are iterative, but that is just a convenience for implementation, they could be easily recursive. A recursive algorithm requires a termination condition, that has to be defined. That definition should be made in the context of the actual algorithm and its purpose, so no general definition is given here.

For such a definition, it is important to note, that the raCreate algo-rithm does not guarantee, that it can always create a single final cluster, that covers all input patterns. If two patterns are very distinct, they may never be grouped together, because of the constraints set by the cross reactivity threshold. That is not a bug, but a feature and highly depends on the used shapespace.

#### 3.2.1 The Learning Algorithm: raCreate

1. Initialize the input antigens.

2. repeat for a given number of levels:

(a) Create a set of memory antibodies using the Isis algorithm.

(b) Save the memory antibodies.

(c) The just created memory set is used as input set for the next iteration.

#### 3.2.2 The Classification Algorithm: raCluster

Each iteration of raCreate creates one hierarchical level. raCluster ex-tracts the original input data patterns for each cluster in a given level.

1. The hierarchical level to be read out is given.

2. for each antibody Ab in that level do:

(a) Set the current set to the current antibody Ab.

(b) repeat until the lowest hierarchical level is reached:

i. Find all patterns of the next lower level, that are matched by each antibody in the current set.

ii. Set the current set to those found patterns.

(c) Add the current set and the corresponding current antibody Ab to the list of clusters.

3. Return the list of clusters.

## 4 Implementation

The described single-loop architecture has been implemented for demon-strational purposes in the context of the chinese characters pattern recog-nition task. The implementation was done using the Java programming language.

### 4.1 Implemented Classes

The implementation consists of six classes.

- raisis: This is the main class containing only a public static void main() method. It is used to invoke the various worker methods of the other classes.

- Matrix: That class handles a single attribute matrix. Besides meth- ods for handling the attribute matrix, like changing or writing its contents to some output device, it provides methods for preprocess-ing, for calculating the shapespace transformations and for somatic hypermutation.

- PatternManager: The PatternManager class inherits the java class Vector and provides a container environment for Matrix ob-jects. It is used to provide and handle sets of antigens and antibod-ies. It provides methods for file input and output, for sorting the patterns according to the affinities using the QuickSort algorithm and for applying the shapespace transformation to all patterns.

- Isis: This is the main class implementing the Isis algorithm. Addi- tionally it provides methods for each distance measure and a method to save a text file to disk, in which the input patterns that are mapped to each generated recognizer pattern are shown.

- Ra: The architectural setup around the Isis algorithm is imple- mented in this class. It contains methods for creating and extracting informations out of the cluster hierarchy. These have been explained in the section about The Ra Algorithms.

- Complex: Complex is just a helper class, which implements com- plex numbers and methods for dealing with them. It is not necessary for the Ra/Isis algorithmic setup. In this sample implementation, this class is used for calculation a fast fourier transformation. The FFT is then used for calculating one specific shapespace. This Com-plex class was implemented by Andrew G. Bennett [Ben01].

### 4.2 Invoking the Algorithms

#### 4.2.1 Isis

To invoke the Isis algorithm, the Isis class has to be instantiated. The class has two different constructors::

Listing 1: Isis constructors

Abbildung in dieser Leseprobe nicht enthalten

The first one only supplies the antigens and causes the system to initialize the antibodies at random. The second constructor allows the initial anitbodies to be given from outside.

That can be useful, if one knows one or more clusters already and wants the system to build the clustering around the known ones. It may also be used to test a created set of antibodies on a different set of antigens.

The next step is to call the setShapespace method, to turn on the usage of a shapespace transformation. In the current implementation, it just sets a flag.

Listing 2: Isis.setShapespace method

Abbildung in dieser Leseprobe nicht enthalten

The shapespace transformation to use, is configured directly in the code of the Matrix class by changing the doShapespace() method. It is possible to supply a set of self patterns, that the antibodies are required to not match. This feature has not been used in the experiments that are shown in this document. Possible uses of this feature are outlined in a later section.

Listing 3: Isis.setSelfPatterns method

Abbildung in dieser Leseprobe nicht enthalten

**[...]**

- Quote paper
- Stefan Schadwinkel (Author), 2005, A Recurrent Architecture of Independent Single Immune Systems, Munich, GRIN Verlag, https://www.grin.com/document/48178

Publish now - it's free

Comments