Ninth Week

posted May 10, 2011, 7:31 PM by KAI CHEN
EENG 491 
Spring 2011

Week ending




Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 34    

Total number of person-hours spent on project by group leader: 10

Total number of person-hours spent on project by team member 1: 8

Total number of person-hours spent on project by team member 2: 8

Total number of person-hours spent on project by team member 3: 8

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

1)      Previous report’s summary:

·        Installed software to download information relate to a key word from twitter

·        We found another software which can download tweets from twitter by using User IDs

2)      Group’s accomplishments:

·           Get our word dictionary by using Hadoop Wordcount.

·           Getting more and more familiar with software ---- Oracle. We need to use this database software to help us accomplish our data analysis. We also get some help from Prof.Gu, Prof.Dong and a graduate student Mr.Ye.

·           Detect all the wrong formats in the TXT file.

·           Get almost all the data we want.

3)      Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings

l  Word Dictionary

We used a tool to separate the sentences into words one by one (Chinese is more complicate than English, one character may means something, but two words together may mean a totally different stuff).  We count all the text files we downloaded from Sina Microblog using the Hadoop Wordcount we did set up last semester and get our own word dictionary.

l  Oracle

We tried to do some statistic with the data we get. It is hard to achieve such a objective manually. Prof.Dong and Prof.Gu suggested that we should try the database tool. We chose Oracle which is powerful to handle our project. We met some problems in this step ---- the format should be unique. We did such a work by human and overcame such a problem.

l  Data Ready

We also did some data analysis; export some useful data into Excel ready for the final combination and make a figure.

4)      Problems faced:

l  Oracle has some limitations in the size of the files, only 100MB is permitted. Maybe there is something wrong in our configuration or the primary key setting.  

5)      Next week’s goals:

·         Get our figure----the statistic analysis ready and prepare for the source presentation.