Malware And Benign Windows PE Cuckoo Reports

I built a dataset that contains cuckoo sandbox reports of 3103 malicious Windows PE (.dll and .exe) and 1890 benign software. The malware samples were acquired in , and the benign files were scraped from different websites like

DOWNLOAD

Alazab, Venkataraman & Watters (2010) used a four-step methodology to extract API call features using a fully automated method. The authors disassemble, analyze, and extract the API function calls from the binary content of malware using static analysis tool IDAPro disassembler to classify program executable as malicious or benign. Statistical tests were performed on extracted calls to determine the malware class based on suspicious behavior. The sample of 386 malware used to conduct experimental tests. The authors generated six different categories of suspicious behavior of API call features based on these preliminary tests (Alazab, Venkataraman & Watters, 2010). They applied static analysis techniques to detect malware. Attackers use lots of evading techniques to bypass the analysts.

Peiravian & Zhu (2013) framework that uses permissions and API calls to detect malicious Android applications. The permissions are extracted from Android applications and combined with the API calls to characterize each application either as malware or a benign. The inherent advantage of this framework is that it does not need to involve any dynamical tracing of the system calls ut only uses simple static analysis to find system functions involved in the application. Experiments on real-world applications demonstrate the good performance of the framework for malware detection. Furthermore, the framework can be generalized to all mobile applications for malware detection (Peiravian & Zhu, 2013).

Alazab et al. (2010) proposed an approach to detect obfuscated malware by investigating the structural and behavioral features of API calls. The authors sed n-gram statistical analysis for API calls to analyze the similarities and distance of unknown malware with known behavior so that obfuscated malware could be detected efficiently. The authors used a dataset of 242 malware and 72 benign files to obtain experimental results. The approach demonstrates the accuracy of 96.5% for the unigram model (Alazab et al., 2010).

An application developed to run on the Windows operating system must call the interfaces presented as APIs to use a function offered by the operating system. When an application is running on any operating system, it calls several API to complete an action. For example, when an application is requested to create a file, CreateFileA Windows API ( -us/windows/desktop/api/FileAPI/nf-fileapi-createfilea) is called. All API calls made by an application on the system can show the overall behavior of that application. Therefore, API calls-based approach is widely applied in the dynamic malware analysis showing how malware can behave accurately.

EldeRan (Sgandurra et al. 2016) works with Cuckoo Sandbox, machine learning and negative feedback to determine a set of key features for ransomware. Training data, consisting of benign software and malware, are dynamically analysed based on five attributes: API invocations, use of registry keys, file or directory operations, Internet download activity, and hardcoded strings. EldeRan was trained in Windows XP SP3 32-bit, which is more vulnerable than later editions of the Windows OS suite. However, since the OS has been deprecated since 2014, it would have been beneficial to test or train a version on Windows 7 or later. This would have given a good comparison of how well the system works over different generations.

Abstract:The growing sophistication of malware has resulted in diverse challenges, especially among security researchers who are expected to develop mechanisms to thwart these malicious attacks. While security researchers have turned to machine learning to combat this surge in malware attacks and enhance detection and prevention methods, they often encounter limitations when it comes to sourcing malware binaries. This limitation places the burden on malware researchers to create context-specific datasets and detection mechanisms, a time-consuming and intricate process that involves a series of experiments. The lack of accessible analysis reports and a centralized platform for sharing and verifying findings has resulted in many research outputs that can neither be replicated nor validated. To address this critical gap, a malware analysis data curation platform was developed. This platform offers malware researchers a highly customizable feature generation process drawing from analysis data reports, particularly those generated in sandbox-based environments such as Cuckoo Sandbox. To evaluate the effectiveness of the platform, a replication of existing studies was conducted in the form of case studies. These studies revealed that the developed platform offers an effective approach that can aid malware detection research. Moreover, a real-world scenario involving over 3000 ransomware and benign samples for ransomware detection based on PE entropy was explored. This yielded an impressive accuracy score of 98.8% and an AUC of 0.97 when employing the decision tree algorithm, with a low latency of 1.51 ms. These results emphasize the necessity of the proposed platform while demonstrating its capacity to construct a comprehensive detection mechanism. By fostering community-driven interactive databanks, this platform enables the creation of datasets as well as the sharing of reports, both of which can substantially reduce experimentation time and enhance research repeatability.Keywords: malware; malware feature engineering; malware datasets; malware detection; machine learning; artificial intelligence

The summary page contains details that would otherwise be gathered from conducting static malware analysis. It highlights the file sizes, hashes, and more. The right side of the summary page shows a score that is assigned to the file based on how the tool deems it malicious. The score is graded from zero, which means the document/file is benign or harmless, to ten, for overly malicious files.

Cuckoo Sandbox also highlights specific details of the analysis, such as when the file was analyzed, time taken, and type of routing used. The summary page also shows interesting malware signatures in the further details section. File signatures have blue, red, and yellow color codes.Blue signature shows that the file is benign, yellow-coded files have medium risks, while signatures marked red mean that Cuckoo Sandbox has identified malicious activities, such as keylogging activity or leaking IP address.Cuckoo Sandbox has screenshots from the Guest device at the end of the summary page, which had the infected malware. These screenshots are useful in analyzing Ransomware since most ransom messages are displayed.

The network analysis page has multiple tabs that filter reports based on specific network traffic protocols. Malware analysts can filter network analysis reports to TCP, DNS, ICMP, IRC, UDP, and HTTP traffic generated by malware. The platform also allows analysts to download PCAP from the page.

Currently, we are witnessing a significant rise in various types of malware, which has an impact not only on companies, institutions, and individuals, but also on entire countries and societies. Malicious software developers try to devise increasingly sophisticated ways to perform nefarious actions. In consequence, the security community is under pressure to develop more effective defensive solutions and to continuously improve them. To accomplish this, the defenders must understand and be able to recognize the threat when it appears. That is why, in this paper, a large dataset of recent real-life malware samples was used to identify anomalies in the HTTP traffic produced by the malicious software. The authors analyzed malware-generated HTTP requests, as well as benign traffic of the popular web browsers, using 3 groups of features related to the structure of requests, header field values, and payload characteristics. It was observed that certain attributes of the HTTP traffic can serve as an indicator of malicious actions, including lack of some popular HTTP headers and their values or usage of the protocol features in an uncommon way. The findings of this paper can be conveniently incorporated into the existing detection systems and network traffic forensic tools, making it easier to spot and eliminate potential threats.

HTTP is used by malware for various purposes, for example, for connecting with the Command and Control (C&C) server to register/download commands, checking the external IP address of the infected host, and downloading additional modules. It is also used to perform DDoS (Distributed Denial of Service) attacks or create revenue by clicking on referral links. Such communication is masked by benign HTTP traffic which can be vastly different, depending on the application and its usage purpose. It must be noted that the HTTP protocol can be used by applications other than web browsers, for example, updaters, operating system mechanisms, application shops, and messengers. The main difference between the network traffic of such applications and the network traffic of web browsers lays in the characteristic of used addresses. The latter traffic can be potentially directed to any address, while in the former, the addresses are constant: they are either a set of domain names or an IPs range. For example, addresses of servers used by Windows telemetry services or Windows update mechanisms are widely known and are listed in many manuals focusing on blocking these services with network firewalls [4, 5] or dedicated tools such as WindowsSpyBlocker ( -max/WindowsSpyBlocker/). Network traffic of these applications can be easily identified using, for example, publicly available address lists or a short analysis of the traffic in the network proxy log. Considering the above, the authors decided to focus only on the web browser traffic as the other popular HTTP-based applications are relatively easy to be identified and filtered out from the network traffic.

b73f46f91f

Page updated

Google Sites

Report abuse