Animated GIFs

a real time animated gif suggestion in messaging

Abstract

Texting is the dominated way of online communication, especially in the times of Covid-19. While texting provides an easy and low cost way to connect with one another, it also causes tremendous loss in non-verbal information, which, as estimated, takes up over sixty percent of our communication content. We present an multi-model gif suggestion system that can harness gif as an effective media to convey non verbal information by suggesting gifs based on single text messages. We show that sentiments features are more effective than low level visual features at help predicting gifs. We show that this multi-model gif suggestion system can works with decent performance and great flexibility. We envision that the insights and methods we learned from building this system can inspire future works of using animated gifs as ways to improve online communication experience.

Video description of project

Final Results

Demo

Visual evaluation results

We used bootstrap to train and test the predicted results 20 times for each

25 users participated in our study in total. The average score for each of the gifs suggested by each combination of the model for each conversation is shown in figure 12. They ranged from 1.78 to 3.58. We also plotted the distribution of the average scores for four different models in figure 13 Since there are missing data due to the fact some model did not suggest any gifs, the boxplot on the top are the raw data without any mean imputation and the boxplot on the bottom are the complete data with missing value filled with mean of the column. The top boxplot shows the median rating are all around 2 and there are no significant difference between the four models but the bottom boxplot shows ver different result: it appears Model 1 has the best average score of 3 while Model 2 has the worse average score of 2.5. Table 9 shows the average scors across all conversations for four models and the confidence interval of the average scores.

regression methods for each features. Figure 4 and 5 shows the distribution of mean squared error and R-Squared of OLS Regression, SVM Regression With Poly Kernel, Elastic Net Regression With Optimal L1, L2 Parameters Selected, all tokenized with Bert and standardized with mean normalization. Table 3 shows what features each number in the x-axis represented.

As reflected in the figures, all three regression methods showed consistent results. The MSE for all features for all regression methods are centered around 1.2. To our surprise, r^2 for all features for all regression methods are all centered near 0, which means the text vector as predictors can predict very little variability in all features values.

There're a few additional observation to be made with the intial results. First, Ordinary Least Square yields much wider distribution of MSE and R2 for features Brightness, Contrast, Entrophy, number of Faces and FPS Duration, while Elastic Net does not produce results with such wide range. While we don't know the reasons behind this, we suspect this may be caused by the implementation of OLS from Sklearn library instead of the OLS method its self. Secondly, during training, it took significantly longer time for both SVM regression to train than OLS and Elastic Net. Also, Elastic Net is the only method that occasionally gives R-Squared that are above 0.

Overall pipeline evaluation results

25 users participated in our study in total. The average score for each of the

gifs suggested by each combination of the model for each conversation is shown in figure 12. They ranged from 1.78 to 3.58. We also plotted the distribution of the average scores for four different models in figure 13 Since there are missing data due to the fact some model did not suggest any gifs, the boxplot on the top are the raw data without any mean imputation and the boxplot on the bottom are the complete data with missing value filled with mean of the column. The top boxplot shows the median rating are all around 2 and there are no significant difference between the four models but the bottom boxplot shows ver different result: it appears Model 1 has the best average score of 3 while Model 2 has the worse average score of 2.5. Table 9 shows the average scors across all conversations for four models and the confidence interval of the average scores.

Lastly, we did a t-test to test the difference between the average scores of

different models, with null hypothesis H0: mu_i == mu_j, for i and j in 1, 2, 3, 4. We used the full data with missing value filled with mean imputation because the raw data only have complete rows for five conversations while the full data allow us to compare across all 13 conversations.

T-test shows there's significant difference in average scores between M1 v. M2, M1 v. M4, M2 v. M3 and M3 v. M4. Combined with 0.95 confidence interval of the average scores, we can conclude that Model 1 with features Sentiment and Emotion, suggests the gifs that best fits the conversation rated by users while M2 gives the worse fitted gifs.

In conclusion, our overall pipeline evaluation shows that the system works better than randomly suggesting gifs and that different combinations of features can have significant impact on the suggestion quality. In the future, we can improve the evaluation methodology and also using higher visual features to improve our pipeline.

More Information

Questions?

Contact boyuan@vt.edu, srg4rv@vt.edu, or joannafg@vt.edu to get more information on the project