Supervisor(s): Dr. Subhash Bagui, Dr. Sikha Bagui
Department of Mathematics and Statistics
University of West Florida
We leveraged open-source tools and big data analytics to address the growing challenges posed by the exponential growth of IoT devices and the need for effective monitoring and detection of malicious activities. My research utilizes Zeek, Hadoop Distributed File System (HDFS), and Apache Spark to develop an Extended Isolation Forest (EIF) approach, aimed at detecting cyber threats in IoT network traffic, as demonstrated through the analysis of the UWF-ZeekData22 dataset.
Address the IoT Data Challenge:
In response to the rapid proliferation of Internet of Things (IoT) devices, our primary goal was to tackle the challenges posed by the exponential growth of IoT data. We aimed to devise effective methods for monitoring and detecting malicious activities within the expanding IoT network traffic.
Leverage Open-Source Tools:
We sought to harness the power of open-source tools such as Zeek, Hadoop Distributed File System (HDFS), and Apache Spark for efficient and scalable data collection, storage, and analysis. Our objective was to create a robust framework that could handle the enormous volume of data generated by IoT devices.
Develop a Novel Detection Approach:
Our central focus was on developing an innovative approach for the detection of malicious activities in IoT network traffic. Specifically, we aimed to adapt and apply the Extended Isolation Forest (EIF) algorithm, a variation of the Isolation Forest method, known for its effectiveness in detecting anomalies in high-dimensional data. Our objective was to assess the performance of this model in identifying cyber threats within IoT data.
Utilize the MITRE ATT&CK Framework:
We aimed to label our dataset using the MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) framework, a globally accessible knowledge base for characterizing adversary tactics and techniques. This framework guided our research in focusing on three critical adversary tactics: reconnaissance, discovery, and resource development.
Contribute to Anomaly Detection Literature:
Our ultimate objective was to contribute to the field of anomaly detection, particularly in the context of IoT network traffic. We aimed to provide insights into the efficacy of the EIF model and its potential for identifying cyber threats in this specific domain.
The Extended Isolation Forest (EIF) proved effective in identifying malicious activities in IoT network traffic, particularly in tactics like reconnaissance, discovery, and resource development. The EIF model performed best at extension level 0, aligning with the standard Isolation Forest, emphasizing the need to tailor models to specific data characteristics. These findings hold significant potential for enhancing IoT network security against cyber threats, with future implications in this domain.
Supervisor(s): Dr. Petko Bogdanov, Dr. Stacy Copp (UC Irvine)
Data Mining and Management (DMM) Lab
Department of Computer Science
University at Albany SUNY
DNA-stabilized silver clusters (Ag-DNAs) are novel fluorophores that are finding numerous applications in nanophotonics, chemical sensing, and bioimaging. The fluorescence colors of Ag-DNAs can be tuned from blue-green into the near-infrared by selecting the sequence of the single-stranded DNA that templates the cluster. Using a training set of DNA template strands and the fluorescence spectrum associated with each strand, we mine discriminative multi-base DNA motifs that correlate with fluorescent cluster brightness. Furthermore, using such motifs to parameterize DNA templates, we develop a machine learning-based tool to design novel DNA templates that stabilize brightly fluorescent Ag-DNAs.
The primary objective of this research project was to design DNA-stabilized silver nanoclusters (AgN-DNAs) with specific properties, particularly targeting AgN-DNAs with near-infrared (NIR) fluorescence emission. The initial situation was characterized by limited understanding of the DNA sequences that govern the properties of AgN-DNAs, such as their fluorescence color and brightness. Researchers faced challenges due to the complex interactions between DNA and silver atoms, making it difficult to design AgN-DNAs using conventional chemical calculations.
Model Development:
We aimed to develop a generative model to design AgN-DNAs with specific properties. To achieve this, we employed a variational autoencoder (VAE) architecture, which was trained using a dataset of 2661 DNA sequences and their corresponding AgN-DNA properties, including fluorescence wavelength (WAV) and fluorescence brightness (LII).
Property Regularization:
To ensure that the VAE model learned to correlate DNA sequences with AgN-DNA properties, we introduced property regularization. This step helped the model capture the relationships between DNA sequences and the desired properties.
Addressing Imbalanced Data:
One significant challenge was dealing with imbalanced training data, especially for NIR-emissive AgN-DNAs, which represented only 2% of the dataset. To overcome this, we implemented a weighting scheme that gave more importance to rare property observations during training.
Sampling and Synthesis:
After training the VAE model, we employed it to generate new DNA template sequences. The model produced 1000 samples, and we selected the top 20 sequences with the highest re-encoded WAV proxy values for experimental synthesis.
Experimental Validation:
Wet lab experiments were conducted to synthesize AgN-DNAs based on the 20 DNA template sequences generated by the VAE model. The results were highly successful, with all 20 sequences yielding brightly fluorescent AgN-DNAs with wavelengths between 695 nm and 845 nm. Importantly, one sequence resulted in a NIR-emissive AgN-DNA with a peak fluorescence wavelength of 840 nm.
The VAE-based model successfully increased the representation of NIR-emissive AgN-DNAs by 240% compared to the training dataset. This development presents substantial potential for applications in diverse scientific and technological domains. Additionally, our work enhanced our understanding of the intricate relationship between DNA sequences and AgN-DNA properties, contributing valuable insights for future nanomaterial customization. We addressed imbalanced training data effectively, leading to improved machine learning model performance and underlining the importance of data preprocessing. These outcomes position our research at the forefront of materials science and its interdisciplinary applications, promising transformative impacts in fields like medical diagnostics, in vivo imaging, and drug delivery systems.
This project originated from a collaboration between the University of South Florida (USF) and the USF Cybersecurity Operations Center (CSOC). Dr. Ankit Shah's idea, supported by Dr. Xinming Ou, led to the development of an innovative cybersecurity solution.
I started my internship at the CSOC for data acquisition, and Soumyadeep Hore, a Ph.D. student, later joined our efforts. We extend our gratitude to the CSOC team for sharing their invaluable vulnerability data and special thanks to Dr. Ankit Shah and Dr. Xinming Ou for their guidance. This collaboration has resulted in a robust cybersecurity framework, with potential for future advancements.
Enhance Vulnerability Prioritization:
Our project aimed to revolutionize vulnerability management. Initially, organizations largely relied on CVSS, resulting in neglected vulnerabilities. We sought to change this by introducing a holistic approach that considered both organizational context and severity.
Develop an Advanced VPSS Model:
A significant focus was placed on constructing a cutting-edge Vulnerability Priority Scoring System (VPSS). We leveraged TF-IDF for feature engineering and harnessed the Random Forest algorithm for precise vulnerability scoring.
Optimize Vulnerability Selection:
We designed a decision-support system with optimization models to select context-aware vulnerabilities for mitigation. The ultimate goal was to minimize total vulnerability exposure while accommodating vulnerability-proficient security personnel.
Streamline Resource Allocation:
Our allocation model aimed to maximize matching vulnerability types with security analysts' skills, reducing response time. This optimized resource allocation was vital for efficient vulnerability management.
Strengthen Cybersecurity Operations:
The project's core objective was to fortify an organization's cybersecurity operations by systematically reducing security risk exposure. Through selection and allocation optimization, we aimed to create a robust security environment.
Our project delivered concrete results with data-driven impact. The machine learning-based Vulnerability Priority Scoring System (VPSS) outperformed traditional methods like CVSS, consistently reducing organizational vulnerability exposure scores. The VPSS-based selection model prioritized vulnerabilities more effectively, while the allocation model optimized resource assignment. This approach highlighted the limitations of CVSS in overlooking critical vulnerabilities.