Vision-based pedestrian detection and estimation with a blind corner camera


Research Paper (undergraduate), 2006

82 Pages, Grade: 1,0


Excerpt


Contents

Abstract

1 Introduction
1.1 Background
1.2 An Approach for Blind Corner Pedestrian Detection
1.2.1 Pedestrian Detection
1.2.2 Blind Corner Problem
1.3 A Detection and Estimation System for the Blind Corner Problem
1.3.1 System Description
1.3.2 Basic Components
1.3.3 Detection and Estimation
1.4 Outline of the Thesis

2 Basic Components
2.1 Blind Corner Camera
2.1.1 General Description
2.1.2 Geometric Information and field of view
2.2 Pedestrian Detection Method
2.2.1 Vision-Based Pedestrian Detection Methods
2.2.2 Neural Networks (NN)
2.2.3 Pedestrian Detection with a Convolutional Neural Network

3 Image Preprocessing
3.1 Detection Problems
3.1.1 CNN Testing
3.1.2 Evaluation of Test Results
3.1.3 Problem Summary
3.2 Image Padding
3.2.1 Method
3.2.2 Testing
3.2.3 Evaluation
3.2.4 Summary
3.3 Image Enhancement
3.3.1 Approaches
3.3.2 Methods and Testing
3.3.3 Evaluation
3.3.4 Summary

4 Tracking and Estimation
4.1 Tracking
4.1.1 Model and Approach
4.1.2 Testing
4.1.3 Evaluation & Improvement
4.1.4 Summary
4.2 Prediction
4.2.1 Model and Approach
4.2.2 Testing
4.2.3 Evaluation
4.2.4 Summary
4.3 Estimation
4.3.1 Model and Approach
4.3.2 Testing
4.3.3 Evaluation
4.3.4 Summary

5 Conclusion and Future Work
5.1 Conclusion
5.2 Future Work

A Algorithms

A.1 Image Enhancement Methods
A1.1 Laplacian Filter
A1.2 Averaging Filter
A1.3 Median Filter
A1.4 Gamma Correction
A1.5 Histogram Equalization

A.2 Regression Analysis: Least Squares Method (LSM)

B Bibliography

List of Figures

1.1 Blind Corner area

1.2 Speed precondition

1.3 Gateway and intersection situation

1.4 System structure

2.1 Camera attachment and field of view

2.2 Geometric information about the Blind Corner camera field of view

2.3 The perceptron

2.4 Common activation functions of the perceptron

2.5 Multi-layer Perceptron (MLP)

2.6 Application of candidate extraction an NN classification

2.7 CNN architecture

3.1 CNN result

3.2 CNN threshold

3.3 Lower body feature definition

3.4 Result evaluation

3.5 Result evaluation by ROC curve

3.6 Camera related system problems

3.7 CNN related system problems

3.8 Performance of the CNN, applied to the Blind Corner Camera

3.9 Margin information in the training data

3.10 Image padding methods

3.11 Visual evaluation of a small test sequence

3.12 ROC curve evaluation of padding methods

3.13 Computing time for padded images

3.14 Comparison of training data images with testing data images

3.15 Angular and blurred regions in BCC images

3.16 Histogram of BCC desired feature (left) and training feature (right)

3.17 Convolution kernels for filter methods

3.18 Sigmoid mapping functions

3.19 Influence of gamma correction on histograms

3.20 Sharpening: Laplacian ROC result

3.21 Smoothing: results from averaging

3.22 Smoothing: median filter results

3.23 Gamma correction and sigmoid gray level transform results

3.24 Histogram equalization and adaptive histogram equalization results

4.1 The TRACKER program in the tracking process

4.2 Tracking model example

4.3 Flow chart overview about TRACKER

4.4 Flow chart of the tracking algorithm

4.5 Example for the tracking algorithm

4.6 Tracking of the CNN result

4.7 Tracking problems: crossing and distraction

4.8 Tracking problems: unstable CNN result

4.9 Reference, detection and tracking results (adapted threshold)

4.10 Detection result (threshold not adapted)

4.11 Detection result (threshold not adapted)

4.12 Detection result depending on the threshold

4.13 Center of gravity and root point of the detection frame

4.14 Problems with the center of gravity calculation

4.15 x/y trend calculated with the LMS method

4.16 Average x/y position of the last three detection frames

4.17 Prediction of new positions

4.18 Flow chart of the prediction function

4.19 Tracking with prediction

4.20 Comparison of tracking with prediction and tracking without prediction

4.21 Prediction result

4.22 The danger level function

4.23 Y-position areas in the camera image

4.24 The danger level model

4.25 Flow chart of the danger level estimation function

4.26 Danger level output

4.27 Danger level plot

A.1 Laplacian kernels

A.2 Examples for average filter kernels

Abstract

Avoiding collision accidents is becoming more and more an important topic in the research field of driver assistant systems. Especially for vision-based detection systems there are various approaches, which are built upon many different methods. This thesis deals with the avoidance of pedestrian accidents, caused by Blind Corner view problems. The presented approach comprises a pedestrian detection subsystem, which is part of a large camera system framework covering observation of the car environment. Based on a Blind Corner Camera and a neural network classification method, research in this thesis is focused on two aspects: detection improvement and danger level estimation. Since vision-based classification methods usually are still not able to yield perfect results, because of the complexity of this task, the detection result has to be improved by preprocessing and post processing. In this work, first, effects of image enhancement methods on detection are tested as preprocessing methods and, secondly, a new approach for a simple tracking and estimation strategy is presented, which improves detection in the way of a post processing method. Finally, information from tracking and prediction is used to estimate a danger level for pedestrians, which provides information about how collision- prone the current situations is.

Chapter 1

Introduction

1.1 Background

It often occurs that cars cause tragic cuts in the life of persons when an accident happens.

This is the reason why car manufacturers spend increasing amounts of funds for research on car safety. Especially in the last 20 years cars became more and more safe for drivers and passengers. Thanks to the introduction of systems like belt pretensioners, ABS, air bags, or ESP, the amount of serious or fatal injuries in car accidents has decreased.

Whereas there has been a lot of research done for car passengers, research on safety systems for persons which are not located in the car at the time of the accident (e.g. pedestrians, cyclists or motorbike riders) often has been somewhat underrepresented until now. On the one hand there are some already existing passive safety systems, which are supposed to have an influence on the consequences of a pedestrian or cyclist accident. For example, design features alleviate collisions with cars for opponents. On the other hand, road users like pedestrians or cyclists are relatively unprotected in comparison to passengers of a car, so that injuries of this group are often severe even when the accident car is equipped with these systems. These facts imply that accident-avoiding systems should have the most influence on consequences of potential accidents with unprotected opponents such as pedestrians.

1.2 An Approach for Blind Corner Pedestrian Detection

As mentioned, this thesis deals with the problem of avoiding car accidents with pedestrians by pedestrian detection and estimation, regarding the Blind Corner area. Therefore, this section provides background information about the Blind Corner problem.

1.2.1 Pedestrian Detection

In the research field of pedestrian detection vision based systems are the widely common.

Since one or two single cameras cannot examine the complete area around the car, only the most dangerous areas are regarded as a region of interest. At the time frontal and rear cameras are most conventional. The reason for this is that these cameras are supposed to examine the driving direction, i.e. the area, where the accident happens. In the case of the rear camera, another reason may be that the examined area is accident-prone because of view problems. Actually, rear cameras without a detection system are already available as accessories of some latest car models. Thus, pedestrian detection methods usually focus on accidents, which can happen in the frontal area or in the rear area.

1.2.2 Blind Corner Problem

Another dangerous area is the Blind Corner (which should not be mistaken with the “blind spot” area). Figure 1.1 shows the location of the Blind Corner in the area around the car. Here you can see the Blind Corner area at the right and the left side of the car front. Pedestrian accidents often happen when pedestrians cross the road and, thus, the movement direction of cars. Although the Blind Corner area is not an area where accidents happen, it

illustration not visible in this excerpt

Figure 1.1: Blind Corner area

is an area where opponents can come from right before the situation of an accident. Among other things (e.g. view problems), this fact makes the Blind Corner problem particularly interesting for accident avoidance.

View Problems

As the name “Blind Corner” suggests, proper examination of this area is often difficult for the driver. View problems result from the following reasons:

1. In most of traffic situations the region, to which the driver pays attention, is the front. In case of driving straight, the Blind Corner area is usually not regarded.
2. Moreover, many drivers cannot pay much attention to side areas if it is necessary because of distraction. Difficult situations like city traffic or unknown areas can have a great influence on the driving ability. In these cases the driver’s attention is distracted. Especially here the Blind Corner area often remains disregarded, because a proper examination would consume much time.
3. A specific problem at gateways or narrow intersections is that the Blind Corner area is occluded, e.g. by house walls. Limited view can also be a problem when there are obstacles like parking cars or trucks at the side of the road, because pedestrians sometimes use narrow spaces between parking cars to cross a road. Occlusion problems are also caused by the car design. That is, in this case of the Blind Corner the A-pillar. This occluded view makes it impossible for the driver to examine areas where pedestrians could come from to cross the way of his car.

Accident Situations

Regarding that the average speed of pedestrians is relatively slow in comparison to cars, pedestrians, who are approaching from the Blind Corner area can only be involved in car accidents under certain circumstances. That is that the pedestrian either has to be very close to the road or the car may only drive at a very slow speed for an accident to occur.

An example for this important precondition is given in Figure 1.2. Assume that the cars on the left and on the right of Figure 1.2 are driving at a typical speed for cars (e.g. 50 km/h). From this example it becomes clear that accidents are unlikely to happen, unless a pedestrian is walking very close to the road like in case on the right. Since the pedestrian in the case on the right is outside the Blind Corner area, it can be concluded that the car speed has to be very slow in the case of a Blind Corner accident (this would rather be an application case for a frontal-view camera).

As the example implies, the Blind Corner area plays a more important role in accidents, which happen in areas where cars are driving at slow speeds, such as gateways or intersections. Another fact that makes gateways and intersections interesting for the Blind Corner problem is that these situations are characteristic for pedestrians, which are approaching from the Blind Corner area, as well as the occurrence of view problems. Therefore, two of the most common situations for Blind Corner accidents are the situations, which are shown in Figure 1.3. The left of Figure 1.3 shows a situation at a gateway where the car is about to start or driving at slow speed in order to turn into the road. Especially here view problems can cause accidents. A distraction of the driver by traffic on the road is usually present in this situation, as well. The right of Figure 1.3 shows a similar situation at an intersection. Again the car is about to start or driving at slow speed in order to cross the intersection. In situations like that it sometimes occurs that signal lights and traffic rules are ignored or overlooked by car drivers or pedestrians. View problems and distraction also increase the likelihood of an accident with pedestrians.

illustration not visible in this excerpt

Figure 1.2: Speed precondition

illustration not visible in this excerpt

Figure 1.3: Gateway and intersection situation

1.3 A Detection and Estimation System for the Blind Corner Problem

The goal of this thesis is to present an approach for a driver assistant system, which should help to avoid pedestrian accidents by examining the Blind Corner area and assigning danger levels to detected pedestrians. Of course, it is not possible to cover the avoidance of every type of accident only by examining the Blind Corner area. This system should rather be a subsystem that can be combined with other systems, e.g. frontal or rear systems, to form a greater framework of accident avoiding systems.

1.3.1 System Description

The system is based on two external components - a camera and a special classification method, called “Convolutional neural network” (CNN), for detecting pedestrians. The work is divided into a detection part, which is represented by preprocessing methods for the CNN, and an estimation part, which contains several post processing methods. Figure 1.4 shows the system structure for describing relations between basic components and detection and estimation part.

illustration not visible in this excerpt

Figure 1.4: System structure

It should be mentioned that this thesis presents an approach rather than a complete system. One reason for this is that the classification method is still in the state of research, as well as almost every other classification approach. Furthermore, the approach is presented in the form of an offline system, since the CNN consumes considerable computing time.

1.3.2 Basic Components

Developing a solution for a pedestrian detection system means that there are at least two basic components needed: A sensor device to acquire information about the environment and a classification method to process this information. Both of these basic components in the system are external components, which means that they are not part of the work itself.

Camera

Since a vision-based method was chosen, the sensor device had to be a camera. It is easy to see that frontal or rear cameras cannot cover the Blind Corner area, therefore a different camera type had to be used. Very convenient in this case is that in some new car models (e.g. the Japanese model “Wish”) [URL 1] from Toyota there is a camera accessory available called “Blind Corner Camera”. Because of that the decision was to use this specific camera. See also section 2.1 for more detailed information about the camera.

Classification Method

Currently, there is a lot of research being done on classification methods and a big variety of approaches exist. The system presented in this thesis uses a classification system called “convolutional neural network” (CNN). The purpose of the classification method is to detect objects or features in the camera image by using a robust approach. More detailed information about pedestrian detection and the CNN can be found in section 2.2.

1.3.3 Detection and Estimation

Detection (preprocessing)

As it was mentioned, the CNN is still in the state of research. Thus it is not able to create an adequate detection result for a safety-relevant system. The detection part contains finding and evaluating methods for improving the pedestrian detection result. In this case the approach is improving the image quality by applying image enhancement methods to the image data and solving regional CNN related problems by padding the enhanced pictures with additional information.

Estimation (post processing)

After having acquired information about pedestrians in the image data or in the Blind Corner area respectively, this information can be used in a further step for estimation.

The meaning of estimation is to find methods, which automatically evaluate the detection result and categorize the present situation. The target in here is to create a system, which can decide whether a situation should be considered to be dangerous in regard to an accident or not.

1.4 Outline of the Thesis

The remainder of this thesis is organized as follows.

Chapter 2 describes the basic components more in detail. Information about the Blind Corner Camera is provided here, as for example about the field of view and about results of geometric measurements. Furthermore, a general introduction in pedestrian detection methods with focus on neural networks should help to familiarize the reader with the CNN. Chapter 3 deals with approaches about improving the detection result by using preprocessing methods. In order to provide a better understanding of later preprocessing approaches, major problems and conspicuities are outlined, when the CNN is applied to the Blind Corner Camera. Having that basis of argumentation, preprocessing approaches by using padding and image enhancement methods are explained and their evaluation results are discussed.

Chapter 4 represents the main part of the thesis. Here, novel methods for tracking, prediction and danger level estimation are introduced and test results are evaluated and discussed.

Chapter 5 contains the conclusion about this work and gives an outlook for future work on this pedestrian detection system.

Chapter 2

Basic Components

As we know, the system in Figure 1.4 uses two external components. These components and their usage should be described in this chapter more in detail. Section 2.1 gives an account of the Blind Corner camera. After that, knowledge based pedestrian detection systems with a focus on neural networks are introduced in section 2.2.

2.1 Blind Corner Camera

One of the first decisions, which have to be made for developing a vision-based pedestrian detection system, is choosing the type of the camera. Type of the camera should be understood in this case as first seen from the application point of view (that is regarding the area which should be covered) and secondly from the algorithm point of view (that is choosing between a monocular camera and a stereo camera). As mentioned before, there has to be used a special type of camera to examine the Blind Corner area. A monocular camera, which is designed exactly for this purpose, is the “Blind Corner Camera” (in this thesis also referred to as BCC) of the company Sumitomo Electric Industries [URL 2]. This particular camera is manufactured for Toyota and available for some new car models of Toyota on the Japanese market.

2.1.1 General Description

The top of Figure 2.1 shows the Blind Corner Camera and how it is attached to the car. As you can see it here for the model “Wish” of Toyota, the camera is supposed to be attached on top of the front side bumper. The pictures at the bottom of the Figure show the field of view, which is covered by the camera. On the left side you can see again the Blind Corner area. The camera field of view covers both sides of this area by using a special construction with a prism inside. A screenshot of an image, which has been captured from the camera, is shown on the right picture. Here you can have an account of how the two sides of the Blind Corner area (that is the two Blind Corners) are separated by the prism. Unfortunately there is no information about the lens or sensor types provided by Sumitomo Electric. Thus, the way of working of the Blind Corner Camera could not be understood or analyzed in detail, which would have been more helpful for choosing image enhancement methods.

illustration not visible in this excerpt

Figure 2.1: Camera attachment and field of view

Since the Blind Corner Camera is only a sensor device without any recording function, a commercially obtainable video camera was used to record the sensor output at a resolution of 720 by 480 px.

2.1.2 Geometric Information and field of view

Since this information is useful for the application, at that point some geometric information about the relation of points in the camera image to points in the area, which is covered by the camera field of view, should be given. This information was obtained by measurements (they shouldn’t be explained in detail, because they are not part of the thesis itself). In Figure 2.2 you can see the some of the measurement results, which should give a geometric account of the camera field of view. The left picture on the topside of the Figure shows that the horizontal view angle is 26° for each Blind Corner and that the vertical view angle is 39°. The right topside picture is a screenshot of the right part of the camera image, which was taken to see the distortion of the camera. The recorded object is a pattern of regular squares. According to the screenshot, the lens of the camera creates some distortion at the margin area of the camera image, which is the reason why the pattern doesn’t appear regular in the camera image. Finally, the table at the bottom of Figure 2.2 shows heights of objects at certain distances. Here, 1.75m of height should represent an average person and 1m the average height of the legs. It should be noted that very close pedestrians couldn’t be displayed in the camera image (for example an average person at a distance of 2m) and that ground is not visible until a distance of about 1.8m, if the camera is attached at bumper height (about 60cm) and adjusted parallel to the ground.

Apart from this, it is also important to note here that this research only focuses on adult persons. The reasons for this are that the CNN component is only trained to detect adult persons and that this system should only be a prototype system, which considers the general case.

2.2 Pedestrian Detection Method

In principle, many systems can be used to detect objects in the car environment. Current research on pedestrian detection systems uses mainly the following methods: vision-based systems, infrared-vision based systems, radar and supersonic. Of these system approaches, vision-based systems are often preferred to other methods, because of the fact that they yield relatively good results and use comparable cheap hardware. For example, infrared band cameras are much more expensive than visible band cameras.

illustration not visible in this excerpt

Figure 2.2: Geometric information about the Blind Corner camera field of view

2.2.1 Vision-Based Pedestrian Detection Methods

Categorizing pedestrian detection methods is not very easy, since there are many different approaches and the whole system often consists of several steps of processing. Generally, detection methods can be broken down in three parts: candidate extraction, classification and tracking. But not all three subdivisions are implemented in every detection method.

Candidate Extraction Methods

Candidate extraction means to analyze the image and to divide it into regions of two types: candidate regions and non-candidate regions. Candidate regions contain only objects, which hold pedestrian features, and non-candidate regions simply form the remainder of the image. Features themselves can be represented by various different approaches, for example by filter methods [1] or by segmentation methods [2].

Classification Methods

The classifier is the most important component of the system. The task of the classification method is to evaluate the extracted candidates and to give a response in form of an output score. In the ideal case, the higher the score is the more likely it is that the candidate is a pedestrian. Most of classification methods are learning based methods, what means that they have to be trained for producing a reasonable output. Using a boosting algorithm like Real Adaboost [3] can support the training and thus improve the performance of the

classifier. Very common systems among classification methods are support vector machines (SVM) [4] [5] [6], but to some extent other methods are used as well, e.g. neural networks (NN) or template matching. The proposal of hierarchical methods using cascaded classifiers for increasing the speed of detection systems should also be mentioned [1].

Tracking Methods

Tracking or multi-frame processing is often rather an extension for pedestrian detection systems, since it usually operates on the detection result and doesn’t make any classification. Thus, tracking can be used either in the way of a feedback to define regions of interest for the classifier or to increase the system speed by applying the detection algorithm less frequently and using a prediction strategy instead. Another benefit is that tracking provides time-related information about detected objects, which can be used to analyze the behavior of tracked objects in regard to time. A widely used method for tracking is the Kalman Filter [6] [7], but also simple methods can be used like first order prediction [8].

2.2.2 Neural Networks (NN)

This work is built upon a novel pedestrian detection approach with a CNN. The CNN is a knowledge-based system, which combines the candidate extraction method and the classification method. This section should give a brief introduction in neural networks (NN) in order to provide a better understanding of the CNN.

General Description of NN

The philosophy that is behind NNs is to model a simplified biological nervous system in an artificial way with the intention to create weak artificial intelligence. Therefore NNs are also referred to as artificial neural networks (ANN).

The network consists of a certain amount of smallest components called “neurons”, which are interconnected to each other. The intelligence of the network is represented by the ability of learning to classify an input with a certain output score, which is carried out in a process called training.

The Perceptron

The smallest part of a NN is neuron. However, taken as a single unit, the neuron is referred to as perceptron. As you can see in Figure 2.3, the perceptron consists of several inputs (ij), edges with weights (wj) including an offset (wb), an activation function (f) and an output (o).

The functional description for a perceptron looks like the following:

illustration not visible in this excerpt

whereas the activation function f is often chosen to be one of the functions in Figure 2.4. A perceptron can be compared with a single nerve cell, which generates an impulse signal depending on its stimulation.

illustration not visible in this excerpt

Figure 2.3: The perceptron

illustration not visible in this excerpt

Figure 2.4: Common activation functions of the perceptron

NN Architecture

There is a variety of different architectures of NNs, which can roughly be divided by the following characteristics: number of neurons, number of layers, connection of the edges and the type of activation function in the neurons. The task of the network is to define a function that maps the input vector X to the output vector Y. The simplest architecture is the single layer perceptron. It consists of a single layer of parallel neurons, which are usually all connected to the input vector X.

Figure 2.5 shows a more sophisticated network, the multi-layer perceptron (MLP).

illustration not visible in this excerpt

Figure 2.5: Multi-layer Perceptron (MLP)

The MLP architecture consists of an input layer of neurons, which is connected to the input vector, an output layer, which is connected to the output vector, such as one or more hidden layers, which are between the input and output layers. Like the simple perceptron, MLPs are usually fully connected and feed-forward. That is that each neuron of one layer is connected to each neuron of the upper layer. The choice of the amount of neurons and hidden layers has to be done carefully, because a too simple architecture may not be able to approximate the desired result (underfitting problem) and a too complex architecture could produce an oversized result (overfitting problem).

Further architectures of NNs are, for example, radial basis function networks (RBFN), ADALINE networks or Hopfield networks.

Training of NN

As mentioned before, a NN is a knowledge-based system, which means that this system is capable of acquiring knowledge by learning. Neural networks can be used for several tasks, e.g. pattern recognition, function approximation or controlling processes. In all application cases, NNs first have to learn their task by training before they can be put into practice.

In theory, learning can be subdivided into three categories: supervised learning, unsupervised learning and reinforcement learning. Supervised learning means learning from a given set of examples (X,Y) of input values X and desired output values Y. In contrast to supervised learning unsupervised learning uses a set of learning data, which contains only input values X. In this case the minimization of a cost function determines the desired output. Finally, reinforcement learning is performed by interacting with the environment instead of using as set of given learning data. It can roughly be described as a procedure of performing actions and being “rewarded” by a learning algorithm according to the result Learning algorithms usually try to solve an optimization problem by minimizing a cost function. But in fact, the result, which is produced by a learning algorithm, is not always optimal. The reason for this is that many learning algorithms use a gradient descent procedure (e.g. the back-propagation algorithm), which can terminate in local minima. The influence, which training takes on the NN, is that the weights of the edges are adapted in a procedure of several steps until the result is satisfying. It should be noted that almost all NNs have got in common that they are too complex to understand their exact way of working after they are trained. Thus it can happen that there occasionally occur some surprising reactions when they are put into practice.

Digital Image Processing with NN

As we already know, an important component of a vision-based detection system is the classifier. A well working classifier has to meet a high claim. It should be able to recognize patterns in images, which are subject to changes such as: position, shape, size, color, illumination, ideally occlusion and others. Thus, the major problem in development is that it is very hard or almost impossible to describe patterns such as pedestrians with conventional methods. Because of their high generalization ability and the fact that they are able to learn, knowledge-based systems like SVMs or NNs are the method of choice when building a classifier for complex patterns like pedestrians.

In the application case candidate features are extracted in a first step, as it is explained in section 2.2.1, and after that are fed to the trained NN as an input vector. Since the input vector of the NN has to be of a defined size (for example 60x30 pixels), every candidate feature has to be normalized to this size. After processing the input vector, the NN then gives an output in the way of a score value, which usually ranges from 0.0 to 1.0. See also Figure 2.6 for the whole process.

illustration not visible in this excerpt

Figure 2.6: Application of candidate extraction an NN classification

As a training method for NNs in classification tasks usually supervised learning is chosen. That means that example pairs of input vectors and output values are needed, which are represented by manually cut out parts of images and binary values of true (in case of the target feature) or false (in case of any other feature).

2.2.3 Pedestrian Detection with a Convolutional Neural Network (CNN) CNN Description

The CNN proposed by LeCun [9] is a NN with a similar architecture to MLP networks. The main difference to the MLP is that the first hidden layers of the CNN are not fully connected (as it is common for MLPs) and have shared weights. According to [9], the benefit from the CNN architecture is that it provides a high shift, scale and distortion invariance.

Figure 2.7 shows an example of this architecture. The first, second, third and fourth layers contain regions, which are directly connected to regions of contiguous layers. Without these directly connected layers, the input layer, the fifth hidden layer and the output layer form a MLP architecture. The idea, which is behind this architecture, is that several (three in the proposal of LeCun) different convolution methods can be applied to the input information in a parallel way and in several stages (two in [9]), before processing the information with fully connected layers.

illustration not visible in this excerpt

Figure 2.7: CNN architecture

Image Processing with a CNN

Several tests have shown that CNNs are able to yield a better performance than other NNs (e.g. MLPs) in image processing applications [9]. One reason for this may be that the first layers, which are also called “feature extraction layers”, perform a feature extraction similar to convolutional filter methods [10] and, thus, offer higher generalization ability. After the feature extraction layers, a hidden layer (or hidden layers) performs the classification part. Thus, CNNs can be seen as a combination of feature extraction method and classification method. For a more detailed description and an account about the performance of the CNN see [11].

For the application, two CNNs were available. The first one had been trained to detect full pedestrians (as it is described in [11]) and the second one had been trained to detect the lower part of the body. That is the body up to the waist. Because of the limited view for near pedestrians (section 2.1.2) of the BCC, the choice was for the lower half CNN.

Chapter 3

Image Preprocessing

According to the system description in Figure 1.4 the first part of the system should improve the detection result of the CNN with the help of preprocessing methods. After outlining the major problems in detection in section 3.1, section 3.2 presents an approach for a solution of the margin problem. Finally, section 3.3 shows how common image enhancement methods influence the CNN detection result.

3.1 Detection Problems

Before dealing with preprocessing there has to be outlined what, respectively where the specific problems of the system are. Thus, this section gives an account of detection problems and weaknesses of the CNN and some camera related difficulties in order to have a basis for thinking about solutions for these problems.

3.1.1 CNN Testing

Before the testing procedure is explained, it is important to mention that the CNN, which is used in this work as a basic component, is trained with features of the lower body (see also section 2.2.3). CNN training itself is not part of the thesis and thus shouldn’t be considered, except for the feature definition, since it is important for the evaluation procedure.

Recording and Capturing

The first step, which had to be made for CNN tests, was to record image material with the blind corner camera. Therefore, several scenes were recorded in outside areas of Tokyo. In order to make the later evaluation easier, most of the scenes were taken of predefined actions, such as approaching to the camera, walking away, crossing persons and a predefined amount of persons (e.g. approach of one single person). For recording the camera was either fixed at a test car on top of the bumper or placed on a camera stand at bumper height.

[...]

Excerpt out of 82 pages

Details

Title
Vision-based pedestrian detection and estimation with a blind corner camera
College
University Karlsruhe (TH)
Grade
1,0
Author
Year
2006
Pages
82
Catalog Number
V176752
ISBN (eBook)
9783640981472
ISBN (Book)
9783640981618
File size
1914 KB
Language
English
Keywords
pedestrian detection, image processing, tracking, camera
Quote paper
Bastian Hartmann (Author), 2006, Vision-based pedestrian detection and estimation with a blind corner camera, Munich, GRIN Verlag, https://www.grin.com/document/176752

Comments

  • No comments yet.
Look inside the ebook
Title: Vision-based pedestrian detection and estimation with a blind corner camera



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free