While in professional environments network firewalls are implemented to "gatekeep" incoming and outgoing traffic, these firewalls are not specifically trained to classify IoT traffic (Tahaei, Hamid, et al., 2020). Our project attempts to devise a 2nd stage solution wherein once network traffic has passed through an organizational network firewall, the traffic is fed into our ML model to take a second pass at classifying the network traffic for IoT devices as either malicious or benign. Once our model analyzes traffic and classifies it, the malicious traffic needs to be treated and re reviewed.
The Machine Learning solution that we have proposed can provide KPMG as well as their clients with a number of potential benefits.
With the implementation of the ML model we have proposed, KPMG could develop a standardized approach to better predict malicious network traffic between specific networks. This in turn benefits the client in that they have confidence they as well as their IoT devices are secure.
Along with this, our model has the potential to assist KPMG in the mitigation of future malware attacks by making data, logs, etc. easily accessible and readable for all stakeholders involved.
Many of the existing log files that are out there are not necessarily understandable or readable for human users, so we think our model will provide valuable insight for users of all levels.
Splunk is made up of three components, the indexer, forwarder, and search head. The forwarders are responsible for collecting the data we need to use (in this case that would be the machine log files). The indexers would then collect the data, store it, and turn it into events. There is also an aspect of the generation of metadata files such that a search head (top level) can execute the user queries.
Elasticsearch is the stack’s search engine. It is based on the Apache Lucene search engine (i.e., a fully featured search engine library). How it works is, Logstash collects the data on the fly, transforms it, and sends it to Elasticsearch along its data pipelines. This data populates Elasticsearch. As we can see the ELK stack is made up of three different open-source products, Elasticsearch, Logstash and Kibana. Elasticsearch is the stacks search engine, Logstash collects the data on the fly, is able to transform it, and then is able to send it to Elasticsearch along the Elasticsearch data pipelines.