Excerpt

## Content

1 Introduction

1.1 Motivation

2 The Architecture of ConvNets and Data Processing

2.1 The Convolutional Layer

2.1.1 Hyperparameters and filter weights

2.1.2 Activation functions und Biases

2.2 The Pooling Layer

2.3 The Fully-Connected Layer

2.4 Processing of colored images

3 Advantages of Convolutional Neural Networks

3.1 Parameter Reduction

3.1.1 Weight Sharing in Convolutional Layers

3.1.2 Dimensionality Reduction via Pooling

3.2 Object Detection

4 Application to the MNIST Dataset

5 Summary

6 Literature

7 Appendix

7.1 Python Code

## 1 Introduction

In the past two decades in particular, artificial neural networks have led to new approaches and processes in machine learning in many areas. They have replaced many existing processes. In some areas, they even exceed human performance. Impressive progress has been made in the area of image recognition and classification. Above all, this includes the introduction of convolutional neural networks (ConvNets). They belong to the class of neural networks. The first ConvNet was developed by LeCun et al. in 1989.^{1} ConvNets were especially developed to enhance image processing. Therefore, they provide a unique architecture. Due to their structure and functionality, ConvNets are particularly well suited within this field of application compared to other methods.

### 1.1 Motivation

Ordinary neural networks basically consist of an input layer, a series of hidden layers and an output layer. Every single hidden layer consists of a number of neurons. Each neuron is connected to any other neuron of the preceding and following layer. The neurons operate independently and do not share any weights with other neurons of the same layer. The output layer represents the results, for example the class scores in case of classification tasks. Although impressive results in many applications using classic neural networks are achievable, they are not suitable for all scenarios due to their rather simple structure.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1: Image processing in ordinary neural networks^{2}

For example, consider a record of 1000 x 1000-pixel images that have to be classified using an ordinary neural network. Each pixel of an image corresponds to a neuron of the input layer. Therefore, the first layer already consists of one million neurons. Hence, for any additional neuron in the first hidden layer one million edge weights would have to be recalculated in each training step. In case of colored images, that issue of computational effort becomes even more acute, because each pixel is mapped into a three-dimensional color space. With respective magnitude, a normal computer quickly reaches its limits. ConvNets try to compensate this disadvantage. ConvNets are similar to ordinary neural networks. They have an input layer, a certain number of hidden layers and an output layer. Furthermore, layers in ConvNets are also equipped with neurons, weights and biases. The main difference to neural networks is related to the architecture and the way the first few hidden layers work. In contrast to neural networks, most layers in ConvNets are only locally connected to preceding layers. This significantly reduces the training effort. The following section explains the individual components of a ConvNet and its way of processing data.

## 2 The Architecture of ConvNets and Data Processing

ConvNets essentially consist of filter layers (called *convolutional layer*) and aggregation layers (called *pooling layer*), which are repeated alternately. They are followed by one or more fully connected layer.

Abbildung in dieser Leseprobe nicht enthalten

Figure 2: An example ConvNet architecture^{3}

In the following, the case of grayscale images is dealt with first. In order to be able to process colored images with ConvNets, only minimal changes have to be made. However, the general structure and its individual processing steps remain the same.

### 2.1 The Convolutional Layer

At first, the matrix input is analyzed by a predefined number of *filters* (also called kernels) of a fixed size. While processing, they move like a window with constant step size (called *stride*) over the pixel matrix of the input. The filters move from left to right over the input matrix and jump to the next lower line after each run. *Padding* determines how the filter should behave when hitting the edges of the matrix.

Abbildung in dieser Leseprobe nicht enthaltenAbbildung in dieser Leseprobe nicht enthalten

Figure 3: Stride and Padding^{4}

If padding is used, a margin of zeros is added around the original matrix. While processing, this allows the original size of the input to be retained.

Abbildung in dieser Leseprobe nicht enthalten

Figure 4: The convolution operation^{5}

The filter has a fixed weight for every point in its viewing window. The weights do not change when running through the initial input matrix. As result, the *feature maps* are calculated by *convolution*. The operation of convolution consists in performing point products between the filter weights and the local image section values and adding them up afterwards. Given an *m * x * n * input image * I* and a filter *K * of dimensions x , the discrete 2D-convolution at point (*i*, *j*) is defined by:

The dimensionality of the output matrix depends on the image size, filter size, the padding (p) and the stride (s).

For example, a stride of 2 in combination with a filter size of 2 x 2, results in a quarter of the dimensionality of the input matrix. The output of the convolutional layer serves as input for the subsequent pooling layer.^{6} ^{7}

#### 2.1.1 Hyperparameters and filter weights

The hyperparameters of a ConvNet include the number of filters, the size of the filter windows, the stride and padding. The number of filter blocks can be interpreted as the number of features to be treated. It's a power of two anywhere between 32 and 1024 and is usually set to 32 or 64. The filters dimensionalities and their weights are not fixed and by convention often set to a small odd number depending on the application.^{8} One general approach is to use larger filters on high dimensional data and smaller filters when considering opposite cases. Another approach is to start using small size filters and to gradually increase them in subsequent layers. The filter weights are chosen randomly at the beginning and become further optimized during the training. Non-positive values are also approved for assignment. Stochastic gradient descent in combination with back propagation is commonly used as training method. By minimizing the loss function, the optimal (or at least locally optimal) set of weights can be found.

As mentioned in a previous section, padding is used to prevent shrinking dimensions of output matrices by adding additional zeros around the input images before sliding the filter window through it. It can be considered as a trade-off between information loss and reduced dimensionality.^{9} Another mentioned hyperparameter of a convolutional layer is the stride, which indicates the number of pixels that the window moves in each step.^{10} Like Padding, there is a trade-off between information loss and dimensional shrinkage to reduce computational effort. Large stride values decrease the amount of information that will be passed to the next layer. The most common way to optimize these hyperparameters is to use a validation set.

#### 2.1.2 Activation functions und Biases

Before the results of a convolutional layer are loaded into the next layer, they usually go through a further intermediate step.

Abbildung in dieser Leseprobe nicht enthalten

Figure 5: Bias and RELU-Activation Function^{11}

Biases are assigned to each filter. In a ConvNet, the biases are added elementwise to the individual values of the result matrix. The sum is set into an activation function. It’s function value serves as input for the pooling layer. ReLU or Sigmoid is commonly used as activation function.

### 2.2 The Pooling Layer

A pooling layer aggregates the results of the convolutional layer. Its effect is to only pass only the most relevant signals from a given pixel width to the next layers. There are several pooling techniques to choose from. The most commonly used pooling techniques are *Max-Pooling* and *Average-Pooling*. For example, using a Max-Pooling layer, the highest value is selected and all others are discarded. Accordingly, the average of the convolutional values is used in average pooling. That pooling process not only reduces the amount of computational effort, but also protects against overfitting.

Abbildung in dieser Leseprobe nicht enthalten

Figure 6: Pooling Operation^{12}

For example, a 2 x 2 Max-Pooling layer reduces the 2 x 2 matrix result of a Convolutional layer to a single number (the maximum value). This provides a more abstract representation of the content and significantly reduces the amount of data to be processed.^{13}

There are different approaches of ConvNets in which the exact sequence of Convolutional layers and Pooling layers strongly varies.^{14}

After the last layer Pooling follow one or more Fully-Connected Layer.

### 2.3 The Fully-Connected Layer

The fully connected layer has the structure of an ordinary neural network. That means, that all neurons are connected to all neurons of the previous and subsequent layer. In order to be able to load the outputs of the pooling layer into a fully connected layer, they firstly have to be rolled out. This process is known as *flattening*.

*Abbildung in dieser Leseprobe nicht enthalten*

Figure 7: Flattening

The individual matrix values of the various pooling outputs are transformed into a common large input vector. This vector then serves as the input for the first layer of the neural network. According to ordinary neural networks, there may be additional hidden layers and an output layer.^{15} In case of classification problems, this output layer usually has a softmax activation function. That is, the outputs of all neurons in the last layer add up to one and indicate the probability of class membership. The exact number of neurons in the output layer corresponds to the fixed number of classes. The edge weights between neurons are randomly chosen at the beginning and become further optimized during training phase. Stochastic gradient descent in combination with back propagation is the most commonly used training method. Furthermore, *categorical cross-entropy* is used to measure the error.

**[...]**

^{1} Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard,W., & Jackel, L. J. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1, 541–551

^{2} Nikolić, Zoran: (2019). Convolutional Neural Networks, URL: http://www.mi.uni-koeln.de/wp-znikolic/wp-content/uploads/2019/06/11-Odenthal.pdf, June 22, 2020

^{3} Torres.AI, Jordi: (2018). Convolutional Neural Networks for Beginners, URL: https://towardsdatascience.com/convolutional-neural-networks-for-beginners-practical-guide-with-python-and-keras-dc688ea90dca, June 22, 2020

^{4} Based on: Nikolić, Zoran (2019)

^{5} Based on: Nikolić, Zoran (2019)

^{6} is a field, often

^{7} Skalski, Piotr. (2019). Gentle Dive into Math Behind Convolutional Neural Networks, URL: https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-79a07dd44cf9, June 22, 2020

^{8} Common sizes are 2 x 2, 3 x 3, 5 x 5 and 7 x 7.

^{9} Padding is almost always used to prevent information loss.

^{10} The stride is usually set to one.

^{11} Based on: Nikolić, Zoran, (2019)

^{12} Based on: Nikolić, Zoran, (2019)

^{13} Becker, Roland. (2019). Convolutional Neural Networks – Aufbau, Funktion und An-wendungsgebiete, URL: https://jaai.de/convolutional-neural-networks-cnn-aufbau-funktion-und-anwendungsgebiete-1691/, June 22, 2020

^{14} Li, Z., Yang, W., Peng, S, Liu, F. (2020). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. Retrieved from University Library Archives, June 22, 2020

^{15} Sometimes an additional dropout layer is used for regularization

- Quote paper
- Anonymous, 2020, The Architecture of Convnets and Data Processing. Advantages of Convolutional Neural Networks, Munich, GRIN Verlag, https://www.grin.com/document/914160

Publish now - it's free

Comments