Detection and Classification of Malicious Traffic over the Tor Network using Machine Learning Approaches with Packet Payload and Statistical Data as Features

Background of the Study
The internet has been continuously evolving and expanding its reach to people’s everyday life. Users can now easily access new ideas, information, and activities where innovation in areas such as communication, information sharing, education, and others can be used. As its usage is also increasing, anonymity in the internet became one of the most appealing service for several users. While keeping their identities private, users are able to express freedom of speech, avoid network surveillance, and seek discomfiting information. There are many ways for users to obtain anonymity in the internet, one of which is through Tor.

Tor is a popular anonymous proxy service with millions of users worldwide. It was originally developed for protecting government communications. Today, Tor is used by a wide range of users like journalists, activists, business executives, and others for protecting their identity, avoiding surveillance, and other various purposes. It can also host web sites through its hidden services accessible only by other Tor users.

Statement of the Problem
The anonymity that Tor provides also makes it attractive to users with illegal intent. Numerous onion websites provide unlawful services like selling illegal drugs and weapons, hiring hitmen, promoting child pornography, counterfeiting personal identification, spreading malware, and hacking. Since these services operate over Tor networks, they are difficult to detect and trace.

Objectives of the Study 

General Objective:
The main objective of this study is to implement a system that can detect and classify malicious traffic over the Tor network using machine learning algorithms.

Specific Objectives:
  • To create a system that detects and processes Tor traffic;
  • To extract statistical and payload content features using DPI-based tools;
  • To determine the feature subset and machine learning algorithm that will produce accurate result

Significance of the Study
As Tor users have been increasing every year, ensuring the services that Tor provides adhere to legal policies and contributes to the positive development of society. With the results of this study, a more reliable data on the behavior of Tor users in the Philippines can be achieved, particularly in Higher Educational Institutions (HEIs), and will be useful for future researches regarding detection, classification, and blocking of malicious traffic over Tor.