Grin logo
de en es fr
Shop
GRIN Website
Texte veröffentlichen, Rundum-Service genießen
Zur Shop-Startseite › Informatik - Angewandte Informatik

Convolutional Neural Network in classifying scanned documents

Titel: Convolutional Neural Network in classifying scanned documents

Praktikumsbericht / -arbeit , 2016 , 33 Seiten

Autor:in: Tai Doan (Autor:in)

Informatik - Angewandte Informatik
Leseprobe & Details   Blick ins Buch
Zusammenfassung Leseprobe Details

In this project, I created and augmented a dataset from a number of given images to train and test convolutional neural network which is used to classify five classes of images of scanned documents. In order to generate the dataset, some image processing techniques were applied such as sliding-window, rotating, flipping and pyramid-sizing. The result of this phase is a set of images having same size 244x224x3. These images after being labeled were divided into three dataset for training, validating and testing the network.

The network is a simple convolution neural network which is also called LeNet. It has three convolutional layers and one fully connected layer. After being trained and validated, the best state of the network was pointed out and tested on the testing dataset and some real images. The result showed that the LeNet was able to classify images of documents in a pretty high accuracy. At the end of the project, I modified the network and discussed the affect that those changes had on the network with the purpose of creating another similar network which can perform better than the original one. The result proved that it worked a little better than its original version.

Leseprobe


Table of Contents

1 Introduction

1.1 Context

1.1.1 About ICTLab

1.1.2 ARCHIVES project

1.1.3 Internship context

1.2 Report organization

2 State of the art

2.1 Artificial intelligence & machine learning

2.2 Artificial neural network (ANN)

2.2.1 History

2.2.2 Regular neural network

2.2.3 Convolutional neural network (LeNet)

2.2.4 Training and evaluating

3 Contribution

3.1 Data creation and augmentation

3.1.1 ARCHIVES dataset

3.1.2 Creating data

3.1.3 Augmenting the data

3.1.4 Summary and Result

3.2 Constructing the convolution neural network (LeNet)

3.2.1 The model

3.2.2 Preparing data

3.2.3 Training

3.2.4 Validation and testing

3.3 Developing the network

4 Results

4.1 The basic network

4.1.1 Testing on the dataset

4.1.2 Testing on real images

4.2 The network modifications

4.2.1 Fully connected layer

4.2.2 Convolutional layers

4.3 The new network

5 Conclusion

Objectives & Core Topics

The primary objective of this internship report is to design and implement a Convolutional Neural Network (CNN) to automatically classify digitized historical documents from the ARCHIVES project into five distinct categories: graphs, maps, photos, hand-written text, and printed text. The research addresses the challenge of creating a robust classifier by augmenting limited raw data through techniques like sliding windows, rotation, and resizing, ultimately aiming to achieve high classification accuracy for historical hazard documentation.

  • Application of Convolutional Neural Networks (LeNet) for image classification.
  • Comprehensive data augmentation pipeline including resizing, rotation, and flipping.
  • Supervised learning methodology with validation and testing dataset separation.
  • Evaluation of network performance using real-world scanned document imagery.
  • Experimental analysis of hyperparameter tuning, specifically kernel size and fully connected layer width.

Excerpt from the Book

3.1.2.2 Sliding window

In reality, when looking at an image of a scanned document, we do not have to look into every detail to say which kind of document it is, but usually only one area of the image is enough. It means that if we cut that image into pieces, and if those pieces are big enough, then most of them will contain enough information to be well-classified.

Furthermore, the CNN in the next section requires all input to have the same size, so the solution applied in this case is a sliding window. The size of the window would be the size of the input data for the CNN model. In this project, I chose 224x224x3 (x3 because of 3 channels of RGB images).

Figure 3.1 shows how the sliding window worked. It started at the top left of the image. Each step, it moved to the right a specified distance (112 pixels in this case). When the window reached the right edge of the image, it came back to the left, not the previous position but shifted down 112 pixels. The loop was repeated until the window slide touch the bottom right corner of the image, then it started with another image.

Anywhere the window stopped, it took all the pixels it was covering to create a new sub-image file. Those created sub-images would be used as inputs of the CNN model. In the Figure 3.2 a group of sub-images generated by sliding window approach are displayed.

Summary of Chapters

1 Introduction: Provides an overview of the internship context at ICTLab, the ARCHIVES project objectives, and the report structure.

2 State of the art: Explains foundational concepts of Artificial Intelligence, Machine Learning, and the architectural principles of Convolutional Neural Networks.

3 Contribution: Details the practical implementation, including data preparation, augmentation strategies, network construction, and training methodologies.

4 Results: Presents the performance metrics of the initial LeNet model and compares these results with various modified network architectures.

5 Conclusion: Summarizes the project findings and acknowledges that the original network design performed best for the document classification task.

Keywords

Convolutional Neural Network, LeNet, Document Classification, Data Augmentation, Sliding Window, Machine Learning, Feature Extraction, Image Processing, Supervised Learning, ARCHIVES Project, Pattern Recognition, Neural Network Architecture.

Frequently Asked Questions

What is the core focus of this research?

The work focuses on building an automated system to classify scanned historical documents into five predefined classes (graph, map, photo, hand-written text, and printed text) using deep learning techniques.

What are the main thematic areas covered in the report?

The report covers computer vision fundamentals, the ARCHIVES historical document project, neural network architecture design, and systematic data augmentation workflows.

What is the primary objective of this project?

The goal is to develop a classifier that can accurately identify the type of historical document to facilitate better data extraction and management within the ARCHIVES simulation project.

Which scientific method is utilized in this study?

The project employs a supervised learning approach using a Convolutional Neural Network (LeNet), specifically trained on pre-processed and augmented image datasets.

What topics are discussed in the main body?

The main body details the data acquisition, the application of sliding window and bounding box techniques, the training process of the LeNet model, and a comparative analysis of network performance after hyperparameter modifications.

Which keywords best characterize this work?

Key terms include Convolutional Neural Network, Image Classification, Data Augmentation, LeNet, and Historical Document Analysis.

Why was the sliding window technique implemented?

The sliding window was necessary to standardize input sizes for the CNN and to capture local structural features of the documents, as a single large image contained too much redundant or irrelevant detail.

How does the author attempt to improve the network?

The author experimented with varying the width of the fully connected layer and testing different kernel sizes (3x3, 5x5, 7x7) to see if these adjustments would yield better classification accuracy than the original configuration.

What was the conclusion regarding the model modifications?

Interestingly, while individual modifications showed potential, the final combination of these changes did not result in a superior network, leading the author to conclude that the original LeNet configuration remained the most effective.

Ende der Leseprobe aus 33 Seiten  - nach oben

Details

Titel
Convolutional Neural Network in classifying scanned documents
Hochschule
University of Science and Technology of Hanoi (Trường Đại học Khoa học và Công nghệ Hà Nội)
Veranstaltung
Internship
Autor
Tai Doan (Autor:in)
Erscheinungsjahr
2016
Seiten
33
Katalognummer
V349852
ISBN (eBook)
9783668371675
ISBN (Buch)
9783668371682
Sprache
Englisch
Schlagworte
machine learning deep learning classification internship computer science neural network convolutional neural network leNet
Produktsicherheit
GRIN Publishing GmbH
Arbeit zitieren
Tai Doan (Autor:in), 2016, Convolutional Neural Network in classifying scanned documents, München, GRIN Verlag, https://www.grin.com/document/349852
Blick ins Buch
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
Leseprobe aus  33  Seiten
Grin logo
  • Grin.com
  • Versand
  • Kontakt
  • Datenschutz
  • AGB
  • Impressum