The Spotify Podcast Dataset, A. Clifton, A. Pappu, S. Reddy, Y. Yu, J. Karlgren, B. Carterette, and R. Jones
Podcasts are a relatively new form of audio media. Episodes appear on a regular cadence, and come in many different formats and levels of formality. They can be formal news journalism or conversational chat; fiction or non-fiction. They are rapidly growing in popularity and yet have been relatively little studied. As an audio format, podcasts are more varied in style and production types than, say, broadcast news, and contain many more genres than typically studied in video research. The medium is therefore a rich domain with many research avenues for the IR and NLP communities. We present the Spotify Podcasts Dataset, a set of approximately 100K podcast episodes comprised of raw audio files along with accompanying ASR transcripts. This represents over 47,000 hours of transcribed audio, and is an order of magnitude larger than previous speech-to-text corpora. [Full text]Trajectory Based Podcast Recommendation, G. Benton, G. Fazelnia, A. Wang, and B. Carterette
Podcast recommendation is a growing area of research that presents new challenges and opportunities. Individuals interact with podcasts in a way that is distinct from most other media; and primary to our concerns is distinct from music consumption. We show that successful and consistent recommendations can be made by viewing users as moving through the podcast library sequentially. Recommendations for future podcasts are then made using the trajectory taken from their sequential behavior. Our experiments provide evidence that user behavior is confined to local trends, and that listening patterns tend to be found over short sequences of similar types of shows. Ultimately, our approach gives a 450% increase in effectiveness over a collaborative filtering baseline. [Full text]A Baseline Analysis for Podcast Abstractive Summarization, C. Zheng, H. J. Wang, K. Zhang, and L. Fan
Podcast summary, an important factor affecting end-users’ listening decisions, has often been considered a critical feature in podcast recommendation systems, as well as many downstream applications. Existing abstractive summarization approaches are mainly built on fine-tuned models on professionally edited texts such as CNN and DailyMail news. Different from news, podcasts are often longer, more colloquial and conversational, and nosier with contents on commercials and sponsorship, which makes automatic podcast summarization extremely challenging. This paper presents a baseline analysis of podcast summarization using the Spotify Podcast Dataset provided by TREC 2020. It aims to help researchers understand current state-of-the-art pre-trained models and hence build a foundation for creating better models. [Full text]A review of metadata fields associated with podcast RSS feeds, M. Sharpe
Podcasts are traditionally shared through RSS feeds. As well as pointing to the audio files, RSS gives a creator a way of providing metadata about the podcast shows and episodes. We investigate how certain metadata fields associated with podcasts are currently used and comment on their applicability to recommendations. Specifically, we examine the itunes:type| field and suggest that it isn’t being widely used in the expected fashion by many creators. Then we do the same with the season number associated with a podcast (with the same result), and with the category associated with a podcast (with the same result). Finally, we examine the notion that a single podcast show is the same as a single RSS feed. This also turns out to not be strictly true in all cases. In short, the metadata associated with many podcasts isn’t always reflective of the show and should be used with caution. [Full text]PodSumm: Podcast Audio Summarization, A. Vartakavi and A. Garg
The topical nature combined with a blend of varied multi-media types in podcasts presents a unique challenge to content discovery systems. Podcast episodes are often diverse in the range of topics, presentation style, and the number of speakers. We believe non textual characteristics like presentation style are significant indicators of subjective user preferences, though difficult to quantify. Therefore, we propose the automated creation of ‘podcast audio summaries’ to aid in content discovery and help listeners to quickly preview the podcast content before investing time in listening to an entire episode. In this paper, we present a method to automatically construct a podcast summary via guidance from the text-domain. Lack of datasets for this task lead us to curate an internal dataset, find an effective scheme for data augmentation, and design a protocol for user preference annotation. Our method performs two key steps, namely, audio to text transcription and text summary generation. We perform model fine-tuning with our augmented dataset perform and ablation experiments to test for robustness. Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset. We hope these results may inspire future research in this direction. [Full text]