D7 Midterm - Milestone #1

Algorithm Research/Decision Process

Algorithm Choice

  • While neural networks are the state of the art for text summarization, Kevin recommends we first approach the problem with a simpler, statistical based approach
  • Chose Text Rank Algorithm
  • Implement by end of Fall '18 Semester

Text Rank - Graph Based Algorithm

  • Provides a relevancy score for each sentence in an article
  • Takes the most relevantly ranked sentences and sorts them as they appear in the article which is then used to create a summary

Text Rank - How It Works

  • Based on Google's PageRank algorithm
  • Each sentence in an article is a node in the graph
  • Each node has a relevancy weight based on keywords, sentence placement within the article, etc.
  • Randomly traverse the graph by moving to the next most similar neighbor node from a random starting node and increment each node's relevancy counter when it is visited
  • Once the graph traversals are complete, the algorithm then outputs the most relevant sentences based on their relevancy score