Milestone 2

Block Diagram/Flowchart

Single Document:

Gantt Chart (LINKED)

MNRVA Gantt Chart

Currrent State of the MNRVA Algorithm

Input Article:

https://www.cnn.com/2018/12/03/business/trade-war-business-china/index.html

Output Summary:

The US-China ceasefire is not a breakthrough that ends the trade war entirely.

The trade war between the United States and China has been a major concern for executives and investors alike.

And President Donald Trump had threatened to impose tariffs on additional $267 billion of Chinese goods.

The US-China ceasefire "creates an off-ramp for de-escalation of the trade war," Chris Krueger, analyst at Cowen Washington Research Group, wrote to clients on Monday.

"We are hopeful this will lead to both important reforms in China and a de-escalation in trade tensions between the US and China," the Business Roundtable wrote in a statement on Sunday.


Work Breakdown Structure



Business Objectives

User Requirements:

  • Reduce time spent on platform obtaining necessary info
  • Ensure accuracy of provided info

FinTech Studios Requirements:

  • Add a unique feature to the platform to improve marketability
  • Increase and maintain number of platform users

Combined Measurable Success Criteria:

  • Decreased time spent by users obtaining information
  • Preserved information accuracy (measured by identical sentiment analysis results)
  • Improved platform visibility to potential customers (increased media attention, company website traffic)
  • Increased user satisfaction (measured by renewed subscriptions)
  • Increased account creation/revenue for FinTech Studios


Dataset and Algorithm Performance Metrics

Dataset Selection

  • Finding a dataset for algorithm performance evaluation has been challenging
  • Need extractive summary data with output summaries greater than one sentence long

Performance Metrics

  • Finding a way to measure summary performance with metrics has also been a challenge
  • ROUGE - https://github.com/kavgan/ROUGE-2.0/blob/master/docs/usage-documentation.md
    • How it Works
      • Recall
        • # of overlapping words b/w system and reference / # total words in the reference summary
      • Precision
        • # of overlapping words b/w system and reference / # total words in the system summary
      • Issue: ROUGE is only as good as the dataset used for evaluation
  • Eyeball Approach
    • As we continue to iterate the algorithm design, we will keep summarizing a test group of ~20 articles and keep track of how the summaries change through each iteration
  • Sentiment Approach
    • Compare sentiment of original article to sentiment of output summary to ensure context concurrency

Next Steps

Technical Plans - Winter Break and Moving Into Spring'19

  • Algorithm will remain statistical for the foreseeable future
  • Improving the Current Approach
    • Use recursion to add multi document functionality to current algorithm with a method to eliminate sentence redundancy in the output summary
    • Incorporate Word Vectors - Word2Vec
  • Future Improvements for Adding Multi Document Summarization
    • Clustering Approach
      • Tokenize sentences from each document and perform similarity analysis
      • Group each sentence by similarity and rank importance similarly to our current approach
      • Reduce redundancy in output multi doc summary by selecting the most important sentences that are also a certain distance away from each other