Computer Science
Using OpenAI’s GPT-3 to Summarize and Simplify Sets of News Articles for Efficiency and Educational Use
Bryanna Huang
Computer Science
Bryanna Huang
Multi-document summarization is the task of condensing a set of documents under the same topic to a subset that contains the most important information. This can be achieved using deep learning techniques within the subcategory of Natural Language Processing (NLP). NLP is the subcategory of artificial intelligence (AI) that refers to teaching computers how to process, understand, and produce text in a similar fashion to humans. Recently OpenAI, an artificial intelligence research laboratory, has developed an algorithm called the Generative Pre-trained Transformer Version 3 (GPT-3) whose main function is to generate text from a prompt. One aspect of GPT-3 is that it is a pre-trained algorithm. This is a computer model that is trained on a larger dataset such that it can be later applied to a more specific task. Another aspect of GPT-3 is that it’s a transformer, a type of algorithm architecture that allows the AI to understand long range correlations between words, specifically suitable for long texts such as news articles. In this day and age, many people are constantly on a time crunch and don’t have time to read long articles, hence text summarization comes into play. In the past, text summarization has been attempted through various methods, however many lacked the ability to capture the deeper meaning of words and phrases that a human summary would. Due to GPT-3’s deep learning based architecture, it is able to recognize a word’s true meaning due to its complex structure and the large amount of data it was trained on. Since GPT-3 is still a relatively new development in NLP, it has not yet been specifically applied to summarizing news articles. In this proposed research, GTP-3 will summarize news articles which will then be compared to professionally written human summaries to achieve a high semantic similarity.