Milestone 2
Block Diagram/Flowchart
Single Document:
Gantt Chart (LINKED)
Currrent State of the MNRVA Algorithm
Input Article:
https://www.cnn.com/2018/12/03/business/trade-war-business-china/index.html
Output Summary:
The US-China ceasefire is not a breakthrough that ends the trade war entirely.
The trade war between the United States and China has been a major concern for executives and investors alike.
And President Donald Trump had threatened to impose tariffs on additional $267 billion of Chinese goods.
The US-China ceasefire "creates an off-ramp for de-escalation of the trade war," Chris Krueger, analyst at Cowen Washington Research Group, wrote to clients on Monday.
"We are hopeful this will lead to both important reforms in China and a de-escalation in trade tensions between the US and China," the Business Roundtable wrote in a statement on Sunday.
Work Breakdown Structure
Business Objectives
User Requirements:
- Reduce time spent on platform obtaining necessary info
- Ensure accuracy of provided info
FinTech Studios Requirements:
- Add a unique feature to the platform to improve marketability
- Increase and maintain number of platform users
Combined Measurable Success Criteria:
- Decreased time spent by users obtaining information
- Preserved information accuracy (measured by identical sentiment analysis results)
- Improved platform visibility to potential customers (increased media attention, company website traffic)
- Increased user satisfaction (measured by renewed subscriptions)
- Increased account creation/revenue for FinTech Studios
Dataset and Algorithm Performance Metrics
Dataset Selection
- Finding a dataset for algorithm performance evaluation has been challenging
- Need extractive summary data with output summaries greater than one sentence long
- Cornell Newsroom Dataset - https://summari.es/
- DUC Document Summarization Dataset - https://duc.nist.gov/data.html
Performance Metrics
- Finding a way to measure summary performance with metrics has also been a challenge
- ROUGE - https://github.com/kavgan/ROUGE-2.0/blob/master/docs/usage-documentation.md
- How it Works
- Recall
- # of overlapping words b/w system and reference / # total words in the reference summary
- Precision
- # of overlapping words b/w system and reference / # total words in the system summary
- Issue: ROUGE is only as good as the dataset used for evaluation
- Recall
- How it Works
- Eyeball Approach
- As we continue to iterate the algorithm design, we will keep summarizing a test group of ~20 articles and keep track of how the summaries change through each iteration
- Sentiment Approach
- Compare sentiment of original article to sentiment of output summary to ensure context concurrency
Next Steps
Technical Plans - Winter Break and Moving Into Spring'19
- Algorithm will remain statistical for the foreseeable future
- Improving the Current Approach
- Use recursion to add multi document functionality to current algorithm with a method to eliminate sentence redundancy in the output summary
- Incorporate Word Vectors - Word2Vec
- Future Improvements for Adding Multi Document Summarization
- Clustering Approach
- Tokenize sentences from each document and perform similarity analysis
- Group each sentence by similarity and rank importance similarly to our current approach
- Reduce redundancy in output multi doc summary by selecting the most important sentences that are also a certain distance away from each other
- Clustering Approach