Hands-on Lab Practice

The dataset in this module is used for computer network intrusion detection

The dataset has attributes: duration, protocol type, service, source bytes, destination bytes, flag, land, wrong fragments, urgent, number of hot indicators, number of failed logging attempts, logging in, number of compromised conditions, number of root access, number of shell prompts, number of outbound commands, hot login, guest login, number of connection to the same host, percentage of connections that have "SYN" errors, percentage of connections that have "REJ"" errors, percentage of connections to the same service, percentage of connections to different service, and percentage of connections to different hosts.

In this Module, we will be implementing Neural network algorithms for network DoS detection in a new Google Colab notebook. Our dataset contains many different attributes that will help us to decide if a link is malicious or benign. In the dataset, the attributes are following:

duration, protocol_type, service, flag, dst_host_srv_rerror_rate, label, and many more.

Copy and paste the following link to open google colab

https://colab.research.google.com/notebooks/welcome.ipynb

Then click File --> New notebook

Click the red box area on the website and change the file name to Neural network algorithms for network DoS detection.

Next click the Runtime and change runtime type (in Hardware accelerator) to GPU (Because it will run faster than CPU).

On the first code cell copy and paste the following code to upload the dataset to google colab.

from google.colab import files

uploaded = files.upload()

And the click the run button(looks like play button) to run this code cell.

After successful execution of that cell you should be able to see the same result like the following picture. Next step, click choose upload DoS dataset (kddcup99.csv) into google colab. This is the link for kddcup99 dataset.

https://drive.google.com/drive/folders/1-V4PZys9AVvHsCletTcNnEBBCM4h13bX?usp=sharing

(we will find our kddcup99.csv file in the archive after we unzipping).

It might take more than 10 mins to upload this file because this dataset is extremely large (494020 sample provided).

Next, Create a new cell, copy and paste the following code and run it.

The purpose of this code is to receive the data from the csv file. Then, converting the string data to integer data and normalize the dataset.

import pandas as pd

from sklearn.preprocessing import LabelEncoder, StandardScaler

from keras.utils import to_categorical

dataset = pd.read_csv("kddcup99.csv", sep=',')

dataset.head()

X = dataset.iloc[:, 0:41]

y = dataset.iloc[:, 41]

X = X.to_numpy()

y = y.to_numpy()

le = LabelEncoder()

sc = StandardScaler()

X[:, 1] = le.fit_transform(X[:, 1])

X[:, 2] = le.fit_transform(X[:, 2])

X[:, 3] = le.fit_transform(X[:, 3])

X = sc.fit_transform(X)

# Encode class values as integers

encoder = LabelEncoder()

encoded_y = encoder.fit_transform(y)

# Convert integers to dummy variables

dummy_y = to_categorical(encoded_y)

Create a new cell, copy and paste the following code and run it.

We split the dataset into training set and test set. The split rate is 0.1. (it means that the training set is 90% of the whole dataset and the test set is 10% of the whole dataset).

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,dummy_y,test_size = 0.1)

After feature scaling, we will create a neural network and do the training work.

each epoch means that we apply all our training datasets to our training model

Create a new cell, copy and paste the following code and run it.

import keras

from keras.models import Sequential

from keras.layers import Dense

# Build the neural network

model = Sequential()

model.add(Dense(16, input_dim=41, activation='relu'))

model.add(Dense(12, activation='relu'))

model.add(Dense(23, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

c = model.fit(X_train, y_train, epochs=5, batch_size=64)

After training, we should convert the label to fit prediction model. Finally, we run our prediction model and we will see the accuracy on the test set.

import numpy as np

from sklearn.metrics import accuracy_score

y_pred = model.predict(X_test)

#Converting predictions to label

pred = list()

for i in range(len(y_pred)):

pred.append(np.argmax(y_pred[i]))

#Converting one hot encoded test label to back to original label

test = list()

for i in range(len(y_test)):

test.append(np.argmax(y_test[i]))

y_pred = model.predict(X_test)

accuracy = accuracy_score(pred,test)

print("The accuracy is: {:.2%}".format(accuracy))

As we can see, the accuracy is 99.89%.

Page updated

Google Sites

Report abuse