In a world filled with brilliant entrepreneurs eager to make their ideas a reality, finding the funding necessary for such initiatives can be a challenging endeavour. Crowd-funding websites such as Kickstarter provide a way for savvy entrepreneurs to leverage the collective resources of the crowd to obtain the necessary capital to get their projects started. Kickstarter's core function is to act as a global crowd-funding platform where users (also known as backers) are offered tangible rewards or experiences in exchange for pledges of monetary support. While Kickstarter publishes many articles outlining best-practices for project creation, large scale analysis has shown that approximately half of all campaigns still fail.
We believe that meaningful indicators exist that can predict whether a project is more likely to get successfully funded as compared to others. By identifying these indicators, it is our goal to be able to provide insights into what types of decisions project creators should focus their scarce time and resources on. We believe that project creators across all categories would be able to use this information to inform their decisions and would both desire and benefit from these insights.
We feel that there are many ideas that exist in the world that, for one reason or another, do not have the financial resources or social visibility to be recognized by the public. It is our hope that by providing information to project creators about what features are most predictive of a successful campaign that all good ideas will be given equal representation and net social good will be promoted through the flourishing of novel innovations.
The dataset for the project was obtained from web robots. It uses scraper robots to crawl all the Kickstarter projects and collects data in CSV format. The data consists of all projects from April 2009 to August 2019. Our dataset has a total of 169962 projects expanding across various categories out of which 96768 projects were successful. A successful project means that the project owners were able to crowdfund their desired goal amount. This amount is decided entirely by project owners. Each project has 37 features/attributes available.
Following is the description of each column:
Through group discussion and previous literature review, we have determined that there are 4 main questions that we wish to address through this work.
The recent popularity and success of crowdfunding platforms such as Kickstarter has lead to an exponential increase in research projects focused on predicting the success or failure of Kickstarter projects as well as outlining which features are most important to that prediction. Works such as [2], [5], and [9] use a combination of features extracted directly from the Kickstarter project page alongside other data collected from various social media in order to reach this prediction. [2] and [5] specifically are able to leverage these features with relative success. Other works such as [1] choose only to use information available directly from the Kickstarter project page and are able to achieve comparable results to the previously mentioned projects. Analysis of previous work informed our decision to use only features that were directly available from the Kickstarter project page.
A key aspect that we noticed was missing from all of the previously mentioned works is a modern application of natural language processing to the applicable Kickstarter project page features (specifically "name" and "blurb"). Works like [3] and [4] address certain aspects of natural language (notably persuasion theory) but stop short of in-depth application and analysis using the features. We decided to fill this gap in knowledge in our work by applying different variations and techniques of sentiment analysis on the applicable features.
Most of the previously mentioned works use supervised learning models as the driving force behind their predictive models. Though the creation of these models is useful, we chose to focus both on the creation of a model as well as highlight key features that were indicative of success alongside the specific aspects of these features that were most important. By doing this, we hoped to be able to provide actionable insights to real project creators about what aspects of their campaign to focus the most effort on rather than simply creating a classifier that could predict success or failure after the project was already created.
We decided to drop some of the features that were not useful. The features discarded are :
The following attributes were extracted from the existing data for further exploration:
We had 169962 unique projects out of which 96768 projects were successful. This includes projects across all 15 categories.
Our dataset has information on Kickstarter projects from April 2009 to August 2019.
It is observed that the number of projects launched on the website grew steadily until 2013. Followed by a sudden increase between the years 2013 and 2015. It peaked in 2015 and the platform's popularity seems to be waning off since then.
Generally, the percentage of successful projects has been decreasing over the years. However in 2018 and 2019, there was a slight improvement.
In 2015, Kickstarter had the highest engagement in terms of projects launched. However, it was also the year where the least percentage of projects (~40%) were successful.
Film and video, music, art, technology and publishing categories had the highest number of projects launched. However, categories such as comics, dance and music categories performed better in terms of success rate when compared to other categories.
Most projects were launched between March and July. Projects launched in March and April had the highest success rate. December had the least number of projects launched and also had least success rate amongst all the months.
Most projects were launched during normal workdays i.e. Monday-Friday. However, all weekdays show similar success rates. Therefore, there's no strong evidence suggesting that launching a project on a specific day would be beneficial to a campaign.
Kickstarter's website has a dedicated section called "Projects we Love" which lists their staff's current favorite projects . This also happens to be the first suggestion on the explore page.
In our dataset, we observed that ~90% of the staff picked projects successfully met their goal amounts.
~20% of the successful projects are staff picked.
Only ~3% of the failed projects are staff picked.
The above graphs present the distribution of project goal amounts over the years for two cases: successful and failed projects together and only successful projects. The number of projects and their goal amounts follow a similar pattern in their growth and decay over the years. Mostly the goal amount for projects has consistently increased over the years (shown by the 75 percentile mark). The median goal amount has been nearly the same between the years 2015 and 2018 in both cases.
On observing the goal amounts across all categories, 'technology' has projects with the highest goal amount. In fact, 75% of the projects set their goal amount as high as $50,000. But 50% of the successful projects have only $10000 as their goal amount.
Projects under categories such as 'dance', 'comics', 'arts' and 'crafts' set a much smaller goal amount. Under these categories, 75% of the projects have a goal amount of ~$10000. 50% of the successful projects have less than $5000 as their goal amount under these categories. Interestingly, if we look at the section on distribution of projects over categories, these four categories have a much higher success rate than the 'technology' category.
For successful and failed projects combined, the median of average amount pledged remains quite consistent over the years. The median lies between $45 to $60. For only successful projects, the median amount pledged is around $60. The 25 percentile lies around $40 for successful projects but fluctuates a lot when failed projects are also taken into consideration.
The average amount pledged for categories seems to mostly be in proportion to the goal amount set for the respective category (compared over only successful projects). The only exception seems to be the 'games' category. The average amount pledged for the 'games' category is one of the lowest despite its goal amount being amongst the top five highest.
There were additional features that we decided to consider for our model. Since one of our objective was to find relationship between the success of a project and its duration, we calculated a feature named campaign_days which is the difference between the deadline of the project and its launch date. We decided to use days as the granularity level because month was too big a duration to consider for a project and seconds/hours is too small. Number of days can easily be managed by product owners.
We also wanted to test if a longer waiting period between the official launch of project, for crowdfunding, and creation day negatively affects a project. To this effect, we included another feature titled creation_to_launch_days. Again for the reason mentioned above, we settled on days as the appropriate granularity.
Previous section explored the sentiment surrounding the project and and its description. We further wanted to test if there is a pattern in the length of these various textual information about the project. As a result we calculated following fields and included them in our dataset:
Since our models require the data to be present as numbers, we used one-hot-encoding for the following fields - country, category, launch_month, launch_day, deadline_month and deadline_day. Moreover, since our existing features were all on different scale we decided to standardize them. To do this we decided to center the features around 0 and have variance in the same order. The following formula was used to achieve this:
z = (x - u) / s
where u is mean and s is standard deviation
One of the more complex features that we chose to engineer was the sentiment for both the project name and project blurb. The project name corresponded with a string that represented the project's displayed title and the project blurb corresponded with a string that encompassed a brief summary of the project. We chose to treat the sentiment analysis problem for this dataset as a classification problem. We decided upon positive, neutral, and negative as the 3 potential sentiment classes. Ultimately, our goal was to calculate the sentiment of both names and blurbs respectively using TF-IDF vectors in order to feed this feature into our final classifier. Note that TF-IDF was chosen as the means of representing the text data due to its ability to be computed efficiently and its previously measured success with tweet sentiment classification (both names and blurbs tend to be approximately similar in length to tweets) [7].
Our first attempt at engineering this feature was to use hand-labelled data to train a classifier and use that trained model to attempt to predict the sentiment labels for the remaining data. We randomly sampled 1,000 projects from our dataset and hand-labelled both the name and blurb for each. Due to the tendency of project creators to use very similar wording in both their name and blurb, oftentimes the labels would be the same for a given project. The result of training and testing upon the hand-labelled data were ultimately nonviable due to the poor accuracy scores. There are several potential reasons for this but the most apparent ones are the subjective nature of hand-labelled data (especially for this application) and our choice to randomly sample the data rather than due a stratified sampling. Due to time constraints of the project and the lack of guarantee that hand-labelling would yield a viable result we chose to attempt an automated process instead.
Of the many different paradigms of sentiment mining that exist, we felt that our problem was most in line with opinion mining [8]. SentiWordNet is a automated sentiment tagger that is able to assign positive, negative, and objective scores to a given statement [6]. We chose to use the scores assigned by SentiWordNet to classify a given name or blurb as either positive, negative, or neutral (where the objective score corresponded to our neutral classification). Though less labour intensive, this process ultimately resulted in poorly correlated labels. Hand analysis showed that the sentiment tags provided by SentiWordNet were generally not in line with human annotations. The most prominent deficiency was that it was unable to meaningfully identify and label neutral sentiment (which is sometimes difficult even for a human annotator).
After our two previous failed attempts at assigning representative labels to the sentiment based on a string, we chose to change our approach. Since we had data that confirmed whether or not a project was successful, we chose to use this information to assign the sentiment label of positive, neutral, or negative to respective name and blurb categories. Projects that were successful were deemed to have positive sentiment, projects that failed were deemed to have negative sentiment, and projects that raised 30% or more of their goal amounts were deemed to be neutral (while projects that raised less than 30% of their goal amount were deemed as negative). Based on previously observed trends, we chose to assign the same label to both name and blurb sentiment for each project (since this was what occurred with the vast majority of previous labelled cases). The intuition behind this method is that we wanted to exploit the already labelled class structure of "successful" and "failed" to create subsets within our data where we knew the language used in the "name" and "blurb" sections were indicative of success.
Once we had the newly labelled data, we split it into stratified training and test data that was used to train a random forest ensemble classifier. This classifier was able to achieve a 65% predictive accuracy across all unseen test data. The newly predicted labels were then fed into the final model which ultimately lead to an increase in predictive accuracy.
A wide variety of factors contribute to the success or failure of a project. Some of these factors are able to be quantified, which allows for the construction of a model to predict whether or not a project will be successful. The models provide an insight to potential project creators on features that are important to make their project successful. This section details the models we experimented with. We started out with Vanilla Logistic Regression and moved onto experiment with SVMs, Decision Trees and Random Forests. During training, we used the state of the project (successful or failed) as the output of the models.
Logistic Regression is an appropriate regression analysis to conduct when the dependent variable is binary. It is widely used as a binary classifier to predict which of the two categories a given data point falls into. Our first model is a Logistic regression model with the default parameters.
Support Vector Machine is used for classification problems similar to Logistic Regression. While Logistic Regression focuses on maximizing the likelihood of the data, an SVM tries to find a separating hyper plane that maximizes the distance between the closest points to the margin. The focus of SVM is on making the right decisions rather than predicting the probability. SVM creates a more balanced boundary between the categories than Logistic Regression. For linearly separable data, logistic regression and SVM give similar performance. If the data is not linearly separable, SVM with non-linear kernels performs better than Logistic Regression. For our experiments, we used SVM with RBF kernel.
A decision tree is a widely used non-parametric supervised learning technique for classification and regression problems. The primary advantage of decision trees are that they are interpretative and can be easily visualized. They can effectively handle both numerical and categorical data.
While decision trees are simple and intuitive, they sometimes create complex tree structures which fail to generalize well from the training data. Hence, decision trees are easily prone to over-fitting. This can be countered by using an ensemble of diverse learners which have very little correlation with each other. Random forests consists of a large number of individual decision trees where each tree gives out a class prediction and the model prediction is determined by majority voting. To ensure that the individual trees are not correlated, Random Forest uses the following methods:
For evaluating the prediction of project success, we used Accuracy of the classifiers as the evaluation metric. For evaluating the prediction of a project being staff-picked, we use the micro-averaged F1 score as there is a class imbalance problem for the staff-picked projects.
The above table summarizes the results of various models with and without the sentiment features. It is noted that including sentiment features improves our model performance (across all models) when deciding whether a project will be successful or not. Out of all models, Random Forest with 200 trees gave us the best performance. It had an accuracy of 77.25 % with the sentiment features.
For staff picked projects, we noticed that sentiment related features are not useful for our models. This is evident in the F1 scores we calculated for all our models. In fact, our Random Forest model trained without sentiment features performed the best with a score of 64.04%. We believe this might be because only ~13% of total projects are staff picked and ~90% of this is composed of successful projects. Our sentiment features are trained such that successful projects are associated with 'positive' sentiment label and 'failed' sentiment label. Therefore, for training our models, these sentiment features do not play a key role in deciding whether a project will be staff picked or not.
To help project owners in setting up for success, we decided to explore the important features and gather useful insights. We plotted the feature importance graph using the best performing random forest model. Top 15 important features have been shown in the plot above. We can clearly see that features like goal, creation to launch duration, campaign days, sentiment, staff pick, etc. are important. Upon drawing a similar visualization for the staff picked model, we found similar set of feature to be of importance. The following sub-sections explores these important features further.
The box plot on the left clearly shows that the projects with a smaller goal are more likely to succeed. The 75th percentile for the goal amount for successful project is at $10,000. Let's explore the ideal amount per category to see if we can go a bit higher for some category of projects.
The per category box plot again tells that $10,000 is the amount at which a lot of projects (75%) can expect success. This is true for categories like photography, journalism and fashion. As expected, technology is a deviation from this where we can set a higher goal amount with $30,000 being the 75th percentile. We can also go higher that $10,000 for projects belonging to food, design and games.
However, we should be careful with other categories where the goal amounts should be lower. This is evident from the plots of music, comics, art, crafts, theater and dance. In fact, for a few of these categories, the ideal maximum should be around $5000.
Recall that we engineered features to find how the duration affect success of project. One such feature was creation_to launch_days which is the difference (in days) between the time owners registered on Kickstarter and officially started accepting funds. Contrary to what we believed, the results shows that longer duration between the creation and launch date is indicative of success. The ideal duration is about 40 days.
Another duration related feature that we created was campaign days which is the duration between the deadline and official launch date. The data clearly indicates that the ideal campaign length is between 30-35 days.
The plot on the left shows the percentage of successful project in each quarter from 2010 onward. We had to discard data from the year 2009 and 2019 because we did not have the data for complete year in their cases. We can see that success rate is distributed uniformly across the quarters. However, we can see that Q1 has been more volatile than others. This combined with the fact that the ideal creation to launch duration is 40 days and ideal campaign duration is 30-35 days, we can conclude that it is better to avoid ending projects in Q1.
Please read up the Feature engineering section to understand how we trained our sentiment models to generate required features on the project name/title and its description (blurb). The two doughnut charts above clearly indicates a strong correlation between having a positive sentiment and success of project. We conclude that positive sentiment predictions tend to be indicative of successful projects.
When it comes to length of the title and project description/blurb, we could not find great differentiating factors. It can be seen that the distribution of blurb length is quite similar for both the successful and failed projects. Majority of the projects had lengths between 100-135. The title length for successful projects were however a touch higher than the failed ones. The ideal title length for project is between 25-55 characters.
The notebooks containing all the code related to this project can be found on our GitHub repository.