WEEKLY STATUS REPORT
Project Title: Cluster Computers & Distributed Computing
Total number of person-hours spent on project by group during past week: 34
Total number of person-hours spent on project by group leader: 10
Total number of person-hours spent on project by team member 1: 8
Total number of person-hours spent on project by team member 2: 8
Total number of person-hours spent on project by team member 3: 8
Is project on schedule：Yes[ X ]No[ ]
Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessary
1) Previous report’s summary:
· We found another software which can download tweets from twitter by using User IDs
2) Group’s accomplishments:
· Get our word dictionary by using Hadoop Wordcount.
· Getting more and more familiar with software ---- Oracle. We need to use this database software to help us accomplish our data analysis. We also get some help from Prof.Gu, Prof.Dong and a graduate student Mr.Ye.
· Detect all the wrong formats in the TXT file.
· Get almost all the data we want.
3) Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings
l Word Dictionary
We used a tool to separate the sentences into words one by one (Chinese is more complicate than English, one character may means something, but two words together may mean a totally different stuff). We count all the text files we downloaded from Sina Microblog using the Hadoop Wordcount we did set up last semester and get our own word dictionary.
We tried to do some statistic with the data we get. It is hard to achieve such a objective manually. Prof.Dong and Prof.Gu suggested that we should try the database tool. We chose Oracle which is powerful to handle our project. We met some problems in this step ---- the format should be unique. We did such a work by human and overcame such a problem.
l Data Ready
We also did some data analysis; export some useful data into Excel ready for the final combination and make a figure.
4) Problems faced:
l Oracle has some limitations in the size of the files, only 100MB is permitted. Maybe there is something wrong in our configuration or the primary key setting.
5) Next week’s goals:
· Get our figure----the statistic analysis ready and prepare for the source presentation.