KDD 2020 Tutorial
Image and Video Understanding for Recommendation and Spam Detection Systems
Ananth Sankar
Aman Gupta
Sirjan Kafle
Di Wen
Sumit Srivastava
Suhit Sinha
Nikita Gupta
Bharat Jain
Dylan Wang
Liang Zhang
Instructors (in-person) - Aman Gupta (LinkedIn), Sirjan Kafle (LinkedIn), Di Wen (LinkedIn), Ananth Sankar (LinkedIn), Sumit Srivastava (LinkedIn)
Tutors - Dylan Wang (LinkedIn), Suhit Sinha (LinkedIn), Nikita Gupta (LinkedIn), Bharat Jain (LinkedIn), Liang Zhang (LinkedIn)
Image and video-based content has become ever present in a variety of domains like news, entertainment and education. Users typically discover and engage with content via search and recommendation systems. It is also important to serve high quality data to users by filtering out irrelevant or harmful content. Thus, there is an increasing need to leverage the rich information in image and video content in order to power systems for search and recommendation. At the same time, the effectiveness and efficiency of these systems has been accelerated by the availability of large-scale labeled datasets and sophisticated deep learning-based models.
This tutorial is aimed at providing an overview of image and video understanding, and its practical applications in the industry. We focus on deep learning-based state of the art techniques for image and video understanding. This includes tasks like image classification and segmentation, image-based content retrieval and video classification. We also focus on applications of these technologies to large-scale recommendation and low quality content detection systems. We present concrete examples from various LinkedIn production systems, and also discuss associated practical challenges. The tutorial concludes with a discussion on emerging trends and future directions.
Questions? - Contact Aman Gupta at amagupta@linkedin.com
Outline
Introduction (Slides)
Theory
Image understanding (Slides)
Tasks - image classification, object detection, semantic/instance segmentation, visual Q & A, image captioning
Image representations
Before Deep Learning - HoG, SIFT, VLAD
Deep Learning and CNNs
Self-supervised learning
Optimization for CNNs - implicit regularization for SGD, double descent, flooding
Image embeddings
Metric learning for images
Visio-lingual representations
Video understanding (Slides)
Tasks - video classification, action recognition, temporal topic localization, video captioning
Video embeddings and networks
Before Deep Learning - SIFT, Fisher Vectors, Optical Flow
3D CNNs
Two-stream networks
Improvements on 3D CNNs and Two-stream
Non-local networks and SlowFast
Self-supervised video embeddings
DSGMM and deep cluster-and-aggregate method
Speech technologies for video understanding
Applications (Slides)
Introduction - feed, ads, search and spam
Multimedia Infrastructure @ LinkedIn
Multimedia Search @ LinkedIn
Common technologies for feed and ads recommendation @ LinkedIn
Video representations used in production
Feed recommendation @ LinkedIn
Ads recommendation @ LinkedIn
Spam and low quality content detection @ LinkedIn