CHIKONDI SEPULA

Darknet Traffic Detection Using Histogram-Based Gradient Boosting Ensemble

_______________________________________

Supervisor: Dr. Dane Brown

____________________________________________

Context of Research

The rise in malicious activities from the darknet calls for encrypted traffic detection systems with better performance to enhance system security. Nowadays, hackers use The Onion Routing(TOR) or Virtual Private Network (VPN) when carrying out malicious activities due to the anonymity that comes with this mode of communication (Arash et al., 2020). Network intrusion detection systems (NIDS) have been designed to detect and prevent malicious network traffic from accessing computer systems. Traditionally, these systems have databases that contain signatures or patterns of known malicious network traffic (Al-Enazi and El Khediri, 2022). When the signature of incoming network traffic matches one of the signatures in the database, a malicious traffic flag is generated. This kind of intrusion detection works very well until when there is unknown malicious network traffic whose signature is not in the database. In this case, the system fails to flag the malicious traffic hence it gains access. This is why security specialists are beginning to build network intrusion detection systems based on machine learning techniques. When machine learning models are trained with network data, they can detect both known and unknown malicious traffic. Several machine learning techniques have been used in darknet traffic detection. However, the performance of these systems remains an area of research, and people continue to propose different machine learning techniques that can be used to improve performance. Motivated by this background, histogram-based gradient boosting ensembles (histGBoost) are proposed to classify darknet traffic in this research. CIC-Darknet 2020 dataset is used to train the proposed classifier as it reflects real-world network traffic. Firstly, data is classified into two classes: benign (non- VPN or non-TOR), and darknet (VPN or TOR) traffic. Secondly, the traffic is identified based on the application category.

Proposed Approach: