Research Work

MINING Imbalanced Big Data with Julia (2019)

Julia Conference 2019 

In this era of big data, classifying imbalanced real-life data in supervised learning is a challenging research issue. Standard data sampling methods: under-sampling, and over- sampling have several limitations for dealing with big data. Mostly, under-sampling approach removes data points from majority class instances and over-sampling approach engenders artificial minority class instances to make the data balanced. However, we may lose informative information instances using under-sampling approach, and under other conditions over-sampling approach causes overfitting problem. In this research work, we have presented a new cluster-based under-sampling approach by amalgamating ensemble learning (e.g. RandomForest classifier) for classification of imbalanced data that we implemented in Julia. We have collected actual illegal money transaction telecom fraud data, which is highly imbalanced with only 8,213 minority class instances among 63,62,620 instances. The proposed method bifurcates the data into majority class and minority class instances. Then, clusters the majority class instances into several clusters and considers a set of instances from each cluster to create several sub-balanced datasets. Finally, a number of classifiers are generated using these balances datasets and apply majority voting technique for classifying unknown new instances. We have tested the proposed method on separate test dataset that achieved 97% accuracy.


Link: https://github.com/atikul-islam-sajib/Undergraduate-Thesis-/blob/main/Fraud_Classification_Imbalanced_Big_Data.pdf

CRISPRforecast: An effective method to predict high on-target sgRNA activity in CRISPR/Cas9 system (2020)

Structure:

    Employing CRISPRforecast Models:


Link: https://github.com/zahid6454/CRISPRforecast