A study on network intrusion detection using classifiers

Research Paper (undergraduate), 2019

36 Pages



Executive Summary

Table of contents

List of figures

List of Tables


1.1 Objective
1.2 Motivation
1.3 Background


3.1 Functional Requirements
3.2 Assumptions, Dependencies and Constraints
3.3 User Requirements and Product Specific System Requirements
3.4 Domain Requirements
3.5 Non-functional Requirements
3.6 Engineering Standard Requirements
3.7 System Requirements







Executive Summary

In these days of rising internet usage, almost everyone has access to the internet. It is available easily and readily. So along with increase in popularity and importance it also leads to an increase in risks and susceptibility to unwanted attacks. Networks and servers and more prone to malicious attacks than ever. Cyber security is vital in this age. Lots of organizations now interact and communicate with people via the internet. They store huge amounts of data in their computers or devices connected to the network. This data should only be accessed by authorized members of the organization. It is possible for hackers to gain unauthorized access to this data. A lot of sensitive information is present in the data which might lead to harm in the hands of hackers. It is important to protect the network from being attacked in such a way. Network security is an element of cyber security which aims to provide services so that the organizations are safe from such attacks. Intrusion detection systems are present in the network which work along with the firewalls to detect and prevent such attacks. For this project, we aim to identify the suitable machine learning technique to detect such attacks and which can be used in state of the art system.

List of Figures

Abbildung in dieser Leseprobe nicht enthalten

List of Tables

Abbildung in dieser Leseprobe nicht enthalten

List of Abbreviations

Abbildung in dieser Leseprobe nicht enthalten


1.1 Objective

There are several intrusion detection systems in use today [7]. Researchers are trying to develop systems which use machine learning techniques to identify the signature of attackers [6]. This proposed system aims to find a suitable novel technique to be used as a backend to such a system.

1.2. Motivation

It is difficult or almost impossible to develop an intrusion detection system with 100 percent success rate. Most systems today have a lot of security flaws. Not all kinds of intrusions are known. Also, hackers are figuring out new ways into the networks using machine learning techniques [5]. Quick detection of these attacks will help to identify possible intruders. and limit damage effected. So, developing an efficient and accurate intrusion detection system will help to reduce network security threats.

1.3. Background

With the world trending towards being steadily reliant on computers and automation, it is a challenge to build networks and systems secure for everyday use. The number of security threats to organizations is increasing exponentially with the growth of online markets and services. They are numerous solutions to combat network security threats. Intrusion detection systems are placed alongside firewalls in networks to combat security threats. They scan the network for all the incoming and outgoing traffics and analyze the signature of the packets to detect whether they are malicious or normal. Machine learning is used to help the system learn the signature of known attacks and profile normal network packets [3]. Some of the best intrusion detection systems on the market are:

- Snort
- Bro
- Suricata
- Sagan
- Security Onion
- Samhain

Most intrusion detection systems only detect attacks and make the intruders presence known to the systems. These are called passive intrusion detection systems. Other types of intrusion detection systems are active and reactive to the attack by discontinuing access or working with the network manager to reset the network settings.

The intrusion detection systems can be categorized in two ways. They can be categorized by how they are placed in the network. They are also categorized by whether they detect signatures or anomalies in the network.

By Position in the network:

Network Intrusion detection system: It is placed at different important points in the network so it can monitor the incoming and outgoing network packets for unwanted activity.

Host intrusion detection system: It is placed in all the systems which are directly connected to the network in question. It can detect unwanted network packets that are sent everywhere in the network including the local network. It is better than Network intrusion detection systems in this way. It also detects attacks originating from the system in which it is placed. Network Intrusion detection systems cannot monitor traffic in the local area network.

By type of detection:

Signature-based intrusion detection system: Signature based intrusion detection systems learn the signatures of different types of attacks which occur commonly and identify the right kind of attack on the system. The disadvantage of signature-based intrusion detection system is that they often fail to identify new types of attacks.

Anomaly-based intrusion detection system: Anomaly-based intrusion detection systems learn the characteristics of normal network packets and categorize anything not similar to normal as an anomaly. So Anomaly based intrusion detection systems are able to identify new types of attacks although they cannot identify the signature of specific attacks.


The project contains four modules: Data Preprocessing, Feature Selection, Training the models and Attack prediction and testing.

The dataset selected is NSL-KDD [2]. It is the refined version of the KDD Cup 99 dataset. The KDD Cup 99 dataset is one of the most widely used datasets for training Intrusion Detection Systems(IDS) and Intrusion Prevention Systems(IPS). There is a lack of labelled datasets for network security. This is because it is difficult to predict new types of attacks and know their signature. Some of the other popular datasets are DARPA, CAIDA, LBNL, CDX, Kyoto, UMASS, ISCX2012 and ADFA.

The KDD Cup 99 Dataset has a lot of redundant values and instances. The NSL-KDD is created after removing the redundant values. It has 41 attributes and is classified into 4 types of attacks along with the normal network packet values.

The next stage is Data Preprocessing. All the values in the dataset have to be in a numerical format for the classifiers to take in as input. Some of the features are categorical and have string values. This is then converted into numerical values using Label Encoder. A One-hot Encoder is then used to split the columns to each different category.

Next comes feature selection. Having a lot of unimportant features in the training set can hinder the accuracy of the predictive model. So the most important features contributing to the characteristics of the attack are chosen. There are several methods for feature selection.

1) Univariate Feature Selection
2) Recursive Feature Elimination
3) Principal Component Analysis

An optimal feature selection method has to be selected which is suitable for the dataset [9].

The next step is selecting and training various machine learning and deep learning models to analyze the best model for predicting malicious attacks [1]. One of the important requisites is to have model with a low false positive rate.

Three Machine learning models and two deep learning models are selected to train and test the dataset.

1) Random Forest
2) Decision Tree
3) Naïve Bayes
5) Deep Neural Network

The dataset is split into 85% training and 15% testing and the above models are implemented.

The performance metrics used will be

1) Accuracy
2) Precision
3) Recall
4) f1-score

The confusion matrix is also generated for all the models.

Deep learning models require a lot of computational power which is not present in most CPUs. So either GPUs or a cloud service should be used to train the model with the dataset. The latest datasets used are very large with a lot of redundant fields and it is required to preprocess the data and convert into format suitable for the model.


- Collecting a good dataset with huge amount of entries and appropriate risk factors.
- The acquired data cannot be used as it is. So, suitable data preparation techniques should be carried out in order to obtain accurate predictive models.
- The data is pruned for unwanted or unrelated attributes. It is then converted into numerical format so that the classifier can input the data.
- Feature extraction and feature selection should be performed as when we are dealing with large amounts of data, the most significant features become crucial.
- Implement several machine learning and deep learning techniques to identify the suitable algorithm for classification of the attacks.
- Achieve an accuracy above 75% in the testing phase.


3.1 Functional Requirements

3.1.1 Product Perspective

This product is for detecting network anomalies in traffic.

3.1.2 Product features

- High speed with such big data
- Compatible with most of the PCs
- The accuracy is also high

3.1.3 Assumption, Dependencies & Constraints

The names of the feature columns in the dataset will be different from the visualization layer. The model is dependent on all the feature columns given in the dataset.

3.1.4 Domain Requirements

Intrusion detection systems are placed alongside firewalls in networks. They are present in the host in Host Intrusion detection systems. In Network Intrusion detection systems they are distributed at critical points throughout the network.

3.2 Non Functional Requirements

Non-functional requirements are the requirements that do not directly show the specific functions of the system. They may specify system performance and maintainability and security.

System performance:

The user interface should be smooth and there should not be any crashes in the system.


The system should be compatible with any PCs. It should work under any environment and also under any conditions.


Preparing the software is not just the final. Maintenance is also an important thing. The maintenance cost should be less. Services should be available all the time without any interruptions.


The output should be more accurate and should have a low false positive rate.

3.3 Engineering Standard Requirements (Explain the applicability for your work w.r.to the following operational requirement(s))

- Economic

Detecting and identifying hackers will prevent any losses to the companies hosting the networks.

- Social

The main aim of this project is to make it open source and it should be available to all the users.

- Sustainability

The system should work long enough with new developments doing to system time to time. It should adapt to the current systems by giving updates to the users which should increase the data clean and thus efficiency.

3.4 System Requirements

3.4.1 H/W Requirements(details about Application Specific Hardware)

Table 3.1 Hardware Specifications

Abbildung in dieser Leseprobe nicht enthalten

3.4.2 S/W Requirements(details about Application Specific Software)

1) Tensorflow-GPU Library
2) Keras API
3) Jupyter Notebook 4) NVIDIA CUDA Toolkit 10.1
5) NVIDIA cuDNN 7.0


4.1 Introduction

As discussed earlier the intrusion detection systems can be classified in two ways based on detection type and position.

In this project several machine learning and deep learning models are used to construct a system with good level of performance.

Anomaly-based intrusion detection system is selected for this project.

Abbildung in dieser Leseprobe nicht enthalten

Figure 4.1 IDS Classification

The dataset used is the NSL-KDD. It is the refined version of KDD 99 dataset. Several thousand redundant samples are removed from the KDD 99 dataset to create the NSL-KDD dataset. The dataset is split into 85% for training set and 15% for testing set. The characteristics and details of the dataset will be discussed below along with the appropriate data preprocessing required to work with this dataset.

The dataset can be downloaded from the online repository in either ARFF or CSV format.

We will use the csv format for reading in python environment.

Dataset Description and Details

The NSL-KDD dataset has 41 attributes or features. They are listed below.

1. duration
2. protocol type
3. service
4. flag
5. src bytes
6. dst bytes
7. land
8. wrong fragment
9. urgent
10. hot
11. num failed logins
12. logged in
13. num compromised
14. root shell
15. su attempted
16. num root,
17. num file creations,
18. num shells,
19. num access files,
20. num outbound cmds,
21. is host login,
22. is guest login,
23. count
24. srv count
25. serror_rate,
26. srv serror rate
27. rerror rate
28. srv rerror rate
29. same srv rate
30. diff srv rate
31. srv diff host rate
32. dst host count
33. dst host srv count
34. dst host same srv rate
35. dst host diff srv rate
36. dst host same src port rate
37. dst host srv diff host rate
38. dst host serror rate
39. dst host srv serror rate
40. dst host rerror rate
41. dst host srv rerror rate
42. label


Excerpt out of 36 pages


A study on network intrusion detection using classifiers
VIT University
Catalog Number
ISBN (eBook)
ISBN (Book)
machine learning, network security, intrusion detection
Quote paper
Dr. Balamurugan Rengeswaran (Author), 2019, A study on network intrusion detection using classifiers, Munich, GRIN Verlag, https://www.grin.com/document/469095


  • No comments yet.
Look inside the ebook
Title: A study on network intrusion detection using classifiers

Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free