In this project, I created and augmented a dataset from a number of given images to train and test convolutional neural network which is used to classify five classes of images of scanned documents. In order to generate the dataset, some image processing techniques were applied such as sliding-window, rotating, flipping and pyramid-sizing. The result of this phase is a set of images having same size 244x224x3. These images after being labeled were divided into three dataset for training, validating and testing the network.

The network is a simple convolution neural network which is also called LeNet. It has three convolutional layers and one fully connected layer. After being trained and validated, the best state of the network was pointed out and tested on the testing dataset and some real images. The result showed that the LeNet was able to classify images of documents in a pretty high accuracy. At the end of the project, I modified the network and discussed the affect that those changes had on the network with the purpose of creating another similar network which can perform better than the original one. The result proved that it worked a little better than its original version.

Excerpt

Inhaltsverzeichnis (Table of Contents)

Introduction
- Context
  - About ICTLab
  - ARCHIVES project
  - Internship context
- Report organization
State of the art
- Artificial intelligence & machine learning
- Artificial neural network (ANN)
  - History
  - Regular neural network
  - Convolutional neural network (LeNet)
  - Training and evaluating
Contribution
- Data creation and augmentation
  - ARCHIVES dataset
  - Creating data
  - Augmenting the data
  - Preparing data
- Constructing the convolution neural network (LeNet)
  - The model
  - Training
  - Validation and testing
- Developing the network
Results
- The basic network
  - Testing on the dataset
  - Testing on real images
- The network modifications
  - Fully connected layer
  - Convolutional layers
- The new network

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This internship report focuses on the application of convolutional neural networks (CNNs) for document classification. The primary objective is to develop and evaluate a CNN model capable of accurately classifying scanned documents into five distinct categories.

Image processing techniques for dataset generation
CNN architecture and training methodology
Evaluation and analysis of model performance
Network optimization and development
Application of CNNs in document classification

Zusammenfassung der Kapitel (Chapter Summaries)

The report begins with an introduction to the project's context, highlighting the ARCHIVES project and its significance in document classification. Chapter 2 provides a comprehensive overview of artificial intelligence, machine learning, and particularly convolutional neural networks. This chapter delves into the history of neural networks, the structure of regular neural networks, and the specific architecture of LeNet, the chosen CNN model for this project. Chapter 3 details the creation and augmentation of the dataset, including image processing techniques like sliding window, rotating, flipping, and pyramid-sizing. The chapter also elaborates on the construction of the LeNet network, its training process, and validation and testing methods. Finally, Chapter 4 presents the results of the network's performance, both on the generated dataset and on real images. It further explores the impact of modifications to the network, including changes to the fully connected and convolutional layers, leading to the development of a new, improved network.

Schlüsselwörter (Keywords)

This internship report focuses on the application of convolutional neural networks (CNNs), image processing techniques, document classification, dataset creation, and model optimization for achieving high accuracy in document classification tasks. The project employs a LeNet architecture for training and evaluation, utilizing techniques like sliding window, rotating, flipping, and pyramid-sizing for data augmentation. The research explores the impact of network modifications, aiming to improve the performance of the CNN model.

Excerpt out of 33 pages - scroll top

Details

Title: Convolutional Neural Network in classifying scanned documents
College: University of Science and Technology of Hanoi
Course: Internship
Author: Tai Doan (Author)
Publication Year: 2016
Pages: 33
Catalog Number: V349852
ISBN (eBook): 9783668371675
ISBN (Book): 9783668371682
Language: English
Tags: machine learning deep learning classification internship computer science neural network convolutional neural network leNet
Product Safety: GRIN Publishing GmbH

Quote paper: Tai Doan (Author), 2016, Convolutional Neural Network in classifying scanned documents, Munich, GRIN Verlag, https://www.grin.com/document/349852

Convolutional Neural Network in classifying scanned documents