An Initial Coin Offering (ICO) is an innovative crowdfunding method whereby companies issue digital coins or cryptocurrencies to raise funds for projects online. The project aims to predict the success of ICOs, where success is defined as the ability of companies to raise funds through crowdfunding by selling digital coins or cryptocurrencies.
Data Understanding:
Utilised a dataset containing details of various crowdfunding projects, including start and end dates, number of coins issued, price per coin, country of origin, and more.
The dataset consists of 2,767 observations with 16 attributes each.
Success rate analysis showed that approximately 63% of the projects were unsuccessful.
Data Preparation and Cleaning:
Removed leading or trailing whitespaces from textual attributes.
Converted dates from strings to date objects and the success variable into a binary factor for analysis.
Handled missing values by removal or imputation, particularly for the 'platform' and 'countryRegion' variables.
Identified and removed extreme outliers and invalid values in numerical variables like 'coinNum', 'priceUSD', and 'distributedPercentage'.
Feature Engineering and Selection:
Introduced a new variable 'projectDuration' to capture the length of each crowdfunding campaign.
Transformed multi-categorical variables 'countryRegion' and 'platform' to more generalized categories and applied one-hot encoding for machine learning model compatibility.
Statistical tests (ANOVA and chi-square) were used to select significant variables for modelling.
Ethereum was the most used platform, accounting for 86% of the dataset.
Creating a separate blockchain platform had high success rates (53%).
Singapore has 312 ICOs with a 49% success rate, the highest recorded.
Estonia, with a smaller economy, has 191 ICOs and a 47% success rate.
Modelling and Evaluation:
Employed several machine learning models, including K Nearest Neighbors (KNN), Decision Trees, Support Vector Machines (SVM), and Neural Networks, to predict ICO success.
Conducted cross-validation to assess model performance and adjust parameters to optimize results.
The neural network model, particularly one with 2 hidden layers (the first layer with 2 neurons and a second layer with 1 neuron) using the Tanh activation function, performed the best with an accuracy of 70%.
Conclusion:
The project successfully applied machine learning techniques to predict ICO success, with the neural network model showing the best performance.
It was noted that data quality and variable selection significantly impact model performance.
The findings can aid both fundraising teams and potential investors in making more informed decisions regarding ICO participation.