Video Script

VIDEO SCRIPT

Hello, welcome to Daniel and Alex’s presentation video for Virginia Tech’s Fall 2017 Machine Learning course. In this project, we explored machine learning techniques targeted toward an altruistic request model - Basically, factors contributing to a successful appeal for aid. Our intention: To develop a practically useful model to predict public perception toward a request.

This is a diverse topic that often witnesses mixed results, pulling through or falling flat based on their ability to connect to donors hearts and wallets through various means.

Our investigation centered around a dataset titled “Reddit Pizza Requests”, provided by the Stanford Network Analysis Project. This package contains the information for about 5600 posts in the “Random Acts of Pizza” subreddit spanning from December 2010 to September 2013.

Each examples is comprised of a post containing a (usually humble) request for free pizza, with provided background or reasoning. Any user seeing that post may fulfill the request, resulting in a binary outcome of whether the poster received a pizza or not.

The case study “How to Ask for a Favor” from Althoff and colleagues examines this data, extracting social features and modeling a predictor of success via logistic regression. We use their ideas as a foundation, expanding the binary predictor to multiple models for comparison. Ultimately, though, we aimed for a related but higher level prediction of public perception, which may represent a more general solution for altruistic requests.

Additionally, Maguire and Michelson conducted a multi-model analysis predicting popularity of online social news posts from HackerNews, which had features relevant to our needs despite being in a different domain of online forum.

On our pizza dataset, we implemented models for this binary classification as a baseline, allowing us to extract general traits of successes and failures. Our code encompassed modeling for Decision Trees, Naive Bayes, Logistic Regression, and Perceptron learning. Afterward, the same techniques were used as a foundation for multiclass prediction based around the quantity of user votes and ratio of likes to dislikes per post. All of the modeling techniques used algorithms provided through class as a foundation for the work.

We converted out dataset from its base json format to a dict where each key is a descriptor of a trait, and the value is the corresponding array holding that trait for all examples, maintaining order.

For the Decision Tree and Naive Bayes methods, we established over 100 boolean arrays. These covered the existence of particular key words in the body or title of the post, various time features such as relative time of day, day of week, month, and proximity to holidays, relative length of message title and body, and features of the requester’s account such as age and status.

For multiclass reception prediction, features including the number of upvotes, downvotes, and comments had to be strictly omitted to prevent a strong dependence between feature and label.

For logistic regression and perceptron modeling, data was taken more directly and represented in floating point format. Rather than relative categories, we could use actual counts, such and the number of words found within predefined narratives - categorization of the reasoning behind the post, such as ‘money’, ‘family’, or ‘school’.

For multiclass predictions, certain measures were taken to reduce noise and imbalance in the dataset. First, we determined that it would be necessary to prune entries from the set based on a vote threshold - If a post only has a handful of votes or less, the positivity ratio may not be totally reflective of public perception, because intuitively, receiving 100% upvotes means something different when the total is 1 versus 20.

Additionally, the base data reports around a quarter of posts find success, potentially skewing results. To account for this, particularly in our binary classification, we equalize the pizza ratio by randomly removing failing entries until the proportion has evened out.

Fixing distributions for multiclass was more involved. We tested on 3 and 4 class separations based on vote ratio, sorting the fractions and selecting the appropriate break values. Because these vote proportions are often repeated by many entries, we test four permutations of how to divide the classes at these breakpoints and keep the distribution with the lowest standard deviation.

For all tests, we conducted primarily with an 80/20 split between testing and training data. After extracting the relevant features and pruning if necessary, the entries were randomized to ensure a spread and kept independent.

Finally, as an aside, we incorporated optional principal component analysis code for the logistic regression modeling. Practically, for our dataset, reducing dimensionality was not vital, but we found that results with a heavily pruned feature set usually only suffered a few percentile accuracy, so scaling could be viable.

Our parsing showed a relatively positive correlation between successful requests and features such as account age and previous activity, post timing within the afternoon and evening hours, relatively detailed message bodies, proximity to holiday, and using words that imply low income, waiting for a paycheck, immediate need, or immediate family.


Now, the results, all representing uniform class sizes:

For our binary prediction on success, all four models came in well above the expected random results, averaging near 70% for all outside of perceptron. Here, pruning out less popular entries actually tended to lower accuracy across the board. This is likely because the features in unpopular posts have a high correlation with failing requests

Things are different for multiclass prediction. Pruning eliminates outliers, so features can be made more identifiable with user reception. In the three- and four- class distribution, pruning has shown to potentially enhance the output of each algorithm, often by a relatively significant margin in the best case.

On the whole, our perceptron model fell short in predicting public response. Between the nature of the data and our implementation, results were effectively random.

However, our other three models showed sparks of life. While results are by no means incredible, there is a clear enough relation between features and reception to help them consistently outperform random classification. Logistic regression appears to have a slight edge in general, but the relative performance of the other models may indicate that this complex problem may not need a complex solution.

That said, this is a complicated issue that deserves a more detailed analysis and much more training data. Our biggest setback was not being able to find another relevant dataset that held sufficient metadata for our project. Based on these results, though, a deeper dive into using machine learning for altruistic request modeling is warranted.


Average results (20 runs per):

  • Binary success prediction, no pruning:
    • Decision Tree: 69.2%
    • Naive Bayes: 71.8%
    • Logistic Regression: 72.2%
    • Perceptron: 63.4%
  • Binary success prediction, optimal pruning:
    • Decision Tree: 68.2%
    • Naive Bayes: 70.1%
    • Logistic Regression: 69.7%
    • Perceptron: 62.2%
  • Three-class reception prediction, no pruning:
    • Decision Tree: 39.4%
    • Naive Bayes: 39.6%
    • Logistic Regression: 42.8%
    • Perceptron: 35.1%
  • Three-class reception prediction, pruning:
    • Decision Tree: 42.5%
    • Naive Bayes: 40.7%
    • Logistic Regression: 49.8%
    • Perceptron: 38.2%
  • Four-class reception prediction, no pruning:
    • Decision Tree: 30.3%
    • Naive Bayes: 30.0%
    • Logistic Regression: 34.6%
    • Perceptron: 27.5%
  • Three-class reception prediction, pruning:
    • Decision Tree: 35.8%
    • Naive Bayes: 33.3%
    • Logistic Regression: 37.8%
    • Perceptron: 29.4%