Here we see the various graphs which represent the relationship between various data variables
Fig 1 : Histogram of views vs the frequency (number of videos).:
In this graphical depiction, we are examining the frequency of view counts within the dataset and how many videos belong to different view count categories. The graph is generally evenly distributed .But it is still evident that a substantial proportion of videos in the dataset cluster around the 600k view mark. This pattern highlights the prevalence of videos with approximately 600k views, suggesting that this view count range is a notable and recurring milestone for video popularity within the dataset.
Fig 2: Histogram of the duration of video vs frequency (number of videos):
Analysis of video durations demonstrates a well-balanced distribution, showcasing the diversity in content lengths. However, a significant proportion of videos cluster around the 750-second mark, indicating its widespread appeal among creators. This trend prompts inquiries into its popularity and its potential impact on viewer engagement and retention. Understanding why this duration is favored can guide content creators and platform managers in optimizing strategies for enhanced viewer experiences.
Fig 3 : Histogram of views vs the frequency (number of videos) and a line plot which cleary shows the trend. :
This histogram closely resembles Fig 1 in terms of the distribution it portrays. The key distinction lies in the addition of a line plot, which serves to accentuate the distribution's trend as it pertains to viewing numbers. This dual visualization approach offers a more comprehensive and nuanced view of the dataset's attributes, making it a valuable resource for conducting thorough analyses and extracting meaningful insights.
Fig 4: Plot showing number of videos published over time:
This visual representation offers valuable insights into the evolving landscape of video content creation over time. It vividly illustrates a consistent upward trajectory in the number of videos published each year. Particularly noteworthy is the exponential surge in recent years, reaching a historic peak in 2022 with a record-breaking volume of video uploads. This data underscores the dynamic and rapidly expanding realm of online video sharing.
Fig 5: Plot showing cumulative numbers of videos published over time
This visual representation offers insightful observations about the changing landscape of video content creation as time progresses. It clearly illustrates a consistent upward trend in the number of videos being published each year. Of particular note is the remarkable recent surge, which peaked in 2022 with an unprecedented volume of video uploads. This data highlights the dynamic and rapidly expanding realm of online video sharing.
Fig 6: Pie chart showing percent of videos with live broadcast :
This chart provides an overview of video broadcasting statuses, including live, non-live, and upcoming broadcasts. It's evident that the majority of videos, approximately 99.3%, are not live broadcasts. However, a small fraction, about 0.6%, falls under the live broadcasting category. Additionally, there's a negligible 0.1% of videos that are scheduled for future live broadcasts, indicating a diverse range of content delivery methods among creators.
Fig 7: Bar chart showing number of videos with live broadcast :
The bar chart concurs with the pie chart, elucidating that a substantial majority of videos, are not categorized as live broadcasts. In contrast, a minor yet discernible fraction,falls into the live broadcast category. Additionally, a negligible videos are anticipated to be live-streamed in the future. This representation reaffirms the prevailing trend of pre-recorded content dominating the platform, with live broadcasts being comparatively infrequent.
Fig 8: Word Cloud:
Fig 8.1: Word cloud for channelTitle:
In this analysis, we've generated a word cloud from the "channelTitle" column, revealing the words commonly employed as channel names in the videos. Notably, in this music-related recommendation system, the term "Music" emerges as the predominant choice for channel names, followed by "Genre" and "Collection." It's worth highlighting that the channel name "VanguardMaster 1" stands out as a unique and notable entry. To gain a more detailed understanding of the frequency of these words, we've also visualized their distribution in the accompanying bar chart below.
Fig 8.2: Word cloud for Title of the video:
In this visualization, we've created a word cloud based on the video titles, shedding light on the recurring words in these titles. Unsurprisingly, given the context of a Music recommendation system, "Genres" takes the lead as the most frequently occurring term, closely followed by "Music." Additionally, "Top" and "Best" are prominently used, likely to highlight the finest music selections within specific time frames. To delve deeper into the prevalence of these words, we've thoughtfully illustrated their distribution in the corresponding bar chart below.
Fig 8.3: Word cloud for description of the video:
The word cloud provides insight into the word distribution within the video descriptions in our dataset. Notably, words like "genres" and "video music" take center stage in the distribution, highlighting their significance in describing the content. Additionally, there's a notable presence of words like "Please" and "Subscribe," reflecting the common practice of channels encouraging viewers to engage with their content, ultimately helping them reach a wider audience. Moreover, the word "enjoy" is used to invite viewers to have a pleasant viewing experience. For a more detailed exploration of the frequency of these words, we've meticulously presented their distribution in the accompanying bar chart below.
Fig 9: Correlation heatmap showing correlation between views, likes, dislikes and the video duration.
The correlation heatmap presented here delves into the interplay among video views, likes, dislikes, and video duration. A noteworthy observation is the substantial negative correlation between video duration and dislikes. This suggests that, as video length increases, there tends to be a decrease in the number of dislikes. In other words, longer videos are less likely to receive dislikes, indicating a potential trend favoring extended content.
Furthermore, a notable negative correlation exists between views and dislikes. This implies that as the number of views grows, there is typically a reduction in the number of dislikes. This relationship underscores that videos with higher viewership tend to attract fewer dislikes, reflecting higher audience satisfaction and engagement.
Moreover, when considering the correlation between likes and video duration, a clear trend emerges: longer videos tend to garner fewer likes. This finding suggests that viewer engagement, as indicated by likes, may decrease as video duration extends.
Lastly we see a negative correlation between video duration and
In summary, this correlation heatmap offers valuable insights into the complex dynamics between video duration, views, likes, and dislikes, shedding light on viewer engagement and content strategies in the realm of online videos.