Website of Intrepid

Recent site activity

Project Updates

Here is the list of weekly reports about our updates

Ninth Week

posted May 10, 2011, 7:31 PM by KAI CHEN

EENG 491 
Spring 2011
WEEKLY STATUS REPORT
Team
Intrepid

Week ending
(Thursday)

04/14/11

Report

9    

Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 34    

Total number of person-hours spent on project by group leader: 10

Total number of person-hours spent on project by team member 1: 8

Total number of person-hours spent on project by team member 2: 8

Total number of person-hours spent on project by team member 3: 8

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessary
 

1)      Previous report’s summary:

·        Installed software to download information relate to a key word from twitter

·        We found another software which can download tweets from twitter by using User IDs

2)      Group’s accomplishments:

·           Get our word dictionary by using Hadoop Wordcount.

·           Getting more and more familiar with software ---- Oracle. We need to use this database software to help us accomplish our data analysis. We also get some help from Prof.Gu, Prof.Dong and a graduate student Mr.Ye.

·           Detect all the wrong formats in the TXT file.

·           Get almost all the data we want.

3)      Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings

l  Word Dictionary

We used a tool to separate the sentences into words one by one (Chinese is more complicate than English, one character may means something, but two words together may mean a totally different stuff).  We count all the text files we downloaded from Sina Microblog using the Hadoop Wordcount we did set up last semester and get our own word dictionary.

l  Oracle

We tried to do some statistic with the data we get. It is hard to achieve such a objective manually. Prof.Dong and Prof.Gu suggested that we should try the database tool. We chose Oracle which is powerful to handle our project. We met some problems in this step ---- the format should be unique. We did such a work by human and overcame such a problem.

l  Data Ready

We also did some data analysis; export some useful data into Excel ready for the final combination and make a figure.

4)      Problems faced:

l  Oracle has some limitations in the size of the files, only 100MB is permitted. Maybe there is something wrong in our configuration or the primary key setting.  

5)      Next week’s goals:

·         Get our figure----the statistic analysis ready and prepare for the source presentation.

Eighth Week

posted May 10, 2011, 7:30 PM by KAI CHEN

EENG 491 
Spring 2011
WEEKLY STATUS REPORT
Team
Intrepid

Week ending
(Thursday)

03/31/11

Report

8    

Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 34    

Total number of person-hours spent on project by group leader: 10

Total number of person-hours spent on project by team member 1: 8

Total number of person-hours spent on project by team member 2: 8

Total number of person-hours spent on project by team member 3: 8

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessary
 

1)      Previous report’s summary:

·        We overwrote our program to do wordcount for all the information that we searched in sina micrblog.

2)      Group’s accomplishments:.

·        We overwrote a program which can download tweets from twitter by using User IDs

3)      Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings

·           Program

        We wanted to download tweets from twitter. After searching on the Internet, we found some programs. But we hope we can search tweets by using User IDs, so we overwrote the program. Finally we achieved our goal.

4)      Problems faced:

·           We achieved all the goals this week.

5)      Next week’s goals:

Get our word dictionary by using Hadoop Wordcount. And learn how to use the software Oracle. We need to use this database software to help us accomplish our data analysis.

 

Seventh Week

posted May 10, 2011, 7:28 PM by KAI CHEN

EENG 491 
Spring 2011
WEEKLY STATUS REPORT
Team
Intrepid

Week ending
(Thursday)

03/24/11

Report

7    

Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 34    

Total number of person-hours spent on project by group leader: 10

Total number of person-hours spent on project by team member 1: 8

Total number of person-hours spent on project by team member 2: 8

Total number of person-hours spent on project by team member 3: 8

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessary
 

1)      Previous report’s summary:

·        Simulated our program in a two level Internet with two subnets.

·        We used Java to write a program to search information from sina microblog by typing a key word.

2)      Group’s accomplishments:.

·           We want to do Wordcount for all the information that we searched in sina microblog. We overwrote our program.

3)      Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings

·           Program

We want to do Wordcount for all the information that we searched in sina microblog. But Chinese is different from English. In English, words are separated by space. We can easily know how many words by counting the spaces. But in Chinese, we can’t. So we overwrote our program to build segments between each words, then by counting how many segments we have, we can know the numbers of words.

4)      Problems faced:

·           We achieved all the goals this week.

5)      Next week’s goals:

·           We will contact Oracle Company and send them all the information we got, to   see whether we can build a query or not.

 

Sixth Week

posted May 10, 2011, 7:26 PM by KAI CHEN   [ updated May 10, 2011, 7:27 PM ]

EENG 491 
Spring 2011
WEEKLY STATUS REPORT
Team
Intrepid

Week ending
(Thursday)

03/17/11

Report

6    

Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 34    

Total number of person-hours spent on project by group leader: 10

Total number of person-hours spent on project by team member 1: 8

Total number of person-hours spent on project by team member 2: 8

Total number of person-hours spent on project by team member 3: 8

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessar
 

1)      Previous report’s summary:

·        Installed software to download information relate to a key word from twitter

·        We found another software which can download tweets from twitter by using User IDs

2)      Group’s accomplishments:

·           Simulated our program in a two level Internet with two subnets.

·           We used Java to write a program to search information from sina microblog by typing a key word.

3)      Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings

l  Simulation

We successfully runned our program in a singal Internet. So how about in a complicated network environment? This week we built a two level Internet with two subnets. And set up a master in one subnet, a slave in another subnet. It does work. Then we used Wordcount to test a 1GB text file in this system, it used 20mins.

l  Program

We plan to search chinese information by typing a keyword from sina microblog. So software we installed before cannot work. Our team leader wrote a new program by Java to achieve this goal this week.

4)      Problems faced:

l  We achieved all the goals this week.

5)      Next week’s goals:

·         Wordcount overwrite.

Fifth Week

posted May 10, 2011, 7:24 PM by KAI CHEN

EENG 491 
Spring 2011
WEEKLY STATUS REPORT
Team
Intrepid

Week ending
(Thursday)

03/09/11

Report

5    

Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 34    

Total number of person-hours spent on project by group leader: 10

Total number of person-hours spent on project by team member 1: 8

Total number of person-hours spent on project by team member 2: 8

Total number of person-hours spent on project by team member 3: 8

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessar
 

1)      Previous report’s summary:

·        We tried to run hadoop on two online computers in Planetlab. But finally we failed in SSH connection.

·        We come up with an objective that we want to use our system to search a key in mass information downloaded from twitter and analysis these information. We also did some research on this topic

2)      Group’s accomplishments:

·           We installed software to download information relate to a key word from twitter.

·           We also found another software which can download tweets from twitter by using User IDs.

3)      Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings

l  Software

We installed two softwares, Twitter-to-PDF and Archivist. Both of them can download tweets from twitter and export the results to a TXT file which is good for our WordCount project.

4)      Problems faced:

l  In fact,  a large amount of text posts from twitter will be good for analysis. But unfortunately we can only download a 227KB file. It is too small. We will use new software to fix this problem.

5)      Next week’s goals:

·         Use new software to fix the problem in downloading.

 

Fourth Week

posted May 10, 2011, 7:23 PM by KAI CHEN

EENG 491 
Spring 2011
WEEKLY STATUS REPORT
Team
Intrepid

Week ending
(Thursday)

03/03/11

Report

4

Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 34    

Total number of person-hours spent on project by group leader: 10

Total number of person-hours spent on project by team member 1: 8

Total number of person-hours spent on project by team member 2: 8

Total number of person-hours spent on project by team member 3: 8

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessar
 

1)      Previous report’s summary:

·         We installed and tried to run CoDepoly which is a tool provides a means to efficeiently and scalably distribute content from one source to many receives.

2)      Group’s accomplishments:

·        We tried to run hadoop on two online computers in Planetlab. But finally we failed in SSH connection.

·        We come up with an objective that we want to use our system to search a key in mass imformation downloaded from twitter and analysis these informations. We also did some research on this topic

3)      Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings

l  Planetlab

We have successfully built our system in local computers. But if the master and slaves machines are not in the same place, even not in the same country, whether it can work normally? To prove this problem we tried to install Hadoop on two computers provided by Planetlab. One is master and another is slave. Everything is almost same as before. Until when we want to build a SSH connection between these two machine. We got a lot of suggestions from Internet, but the problem cannot be fixed at last.

l  Objective

Twitter is widely used. Users will update their status anytime and anywhere. So it has mass information. Our idea is to download these information and analysis them. By this way, people can get what is the most popular topic now and the government can know what is the  ultimate concern of people. We did some research on this topic.

4)      Problems faced:

l  We will try to fix the SSH connection problem.

5)      Next week’s goals:

·         Tried to achieve our objective.

Third Week

posted May 10, 2011, 7:21 PM by KAI CHEN

EENG 491 
Spring 2011
WEEKLY STATUS REPORT
Team
Intrepid

Week ending
(Monday)

02/24/11

Report

3

Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 34    

Total number of person-hours spent on project by group leader: 10

Total number of person-hours spent on project by team member 1: 8

Total number of person-hours spent on project by team member 2: 8

Total number of person-hours spent on project by team member 3: 8

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessar
 

1)      Previous report’s summary:

·         We built two “One Master Two Slaves” system with different computers and tested them repeatedly and compared the results..

2)      Group’s accomplishments:

·        We installed and tried to run CoDepoly which is a tool provides a means to efficiently and scalably distribute content from one source to many receivers.

3)      Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings

l  CoDepoly

CoDepoly allows you to push content to hundreds of nodes without having to consume lots of bandwidth at the source. In general, these techniques can be used for efficient peer-to-peer hosting of arbitrary content.

 

l  Install

We followed the steps on web http://codeen.cs.princeton.edu/codeploy/ to install the CoDepoly.

4)      Problems faced:

l  Because this is a machine online. We don’t know whether it can work normally. We are trying to install a local machine.

5)      Next week’s goals:

·         Run CoDepoly and install a local machine.

·         Define the objective of our project. Use multi-node cluster to solve a concrete problem.

Second Week

posted May 10, 2011, 7:17 PM by KAI CHEN   [ updated May 10, 2011, 7:20 PM ]

EENG 491 
Spring 2011
WEEKLY STATUS REPORT
Team
Intrepid

Week ending
(Thursday)

02/16/11

Report

2

Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 34    

Total number of person-hours spent on project by group leader: 10

Total number of person-hours spent on project by team member 1: 8

Total number of person-hours spent on project by team member 2: 8

Total number of person-hours spent on project by team member 3: 8

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessary)
 
 

1)      Previous report’s summary:

·        We built a “One Master Two Slaves” system, “One Master Three slaves” and “One Master Five Slaves” systems and tested these 3 systems repeatedly and compared the test results..

2)      Group’s accomplishments:

·        We built two “One Master Two Slaves” system with different computers and tested them repeatedly and compared the results.

3)      Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings

·         Multi-node systems

Since we have solved the problem to build multiple nodes system, this week it only cost us a little to build these two “One Master Two Slaves” system. Actually building a “One Master Two Slaves” system is similar with “One Master One Slaves” system. We install, configure and test a “local” Hadoop setup for each of Ubuntu boxes. And then merge these single-node clusters into one multi-node cluster in which one Ubuntu box will become the designated master, and the other boxes will become slaves..

·         Test

We built these two system use the same master computer. One system uses two desktops with better hardware. The other uses two laptops. In order to show the obviously differences in test results, we used a series of txt documents which the total size is about 1GB to test these 2 systems repeatedly. We found the system with desktops needs about 5 minutes 6 seconds to finish the task. In the other hand the system built with laptops needs 14 minutes 8 seconds.

4)      Problems faced:

·         Because we repeatedly test systems, we have a plenty of data to record, we want to build snapshots for each tests. We consulted our professor and find a way by writing a shot scripting language in Linux system and run it with terminal. In this way, we successfully saved the data of each tests and saved us a lot of time.

5)      Next week’s goals:

·         We created accounts of Planet-lab.org. We can get much more useful information about our project.

·         Define the objective of our project. Use multi-node cluster to solve a concrete problem.

First Week

posted May 10, 2011, 7:14 PM by KAI CHEN   [ updated May 10, 2011, 7:21 PM ]

EENG 491 
Spring 2011
WEEKLY STATUS REPORT
Team
Intrepid

Week ending
(Monday)

02/07/11

Report

1

Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 34    

Total number of person-hours spent on project by group leader: 10

Total number of person-hours spent on project by team member 1: 8

Total number of person-hours spent on project by team member 2: 8

Total number of person-hours spent on project by team member 3: 8

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessary)
 
 

1)      Previous report’s summary:

·         We built a multi-node successful. It was a “One Master One Slave” system.

2)      Group’s accomplishments:

·         We built a “One Master Two Slaves” system and run smoothly. On this basis, successfully built “One Master Three slaves” and “One Master Five Slaves” systems.

·         We tested these 3 systems repeatedly and compared the test results.

3)      Group’s current week’s accomplishments: (Provide detailed description of your work) Structure your description using subheadings

·         Multi-node systems

Actually building a “One Master Two Slaves” system is similar with “One Master One Slaves” system. We install, configure and test a “local” Hadoop setup for each of Ubuntu boxes. And then merge these single-node clusters into one multi-node cluster in which one Ubuntu box will become the designated master, and the other boxes will become slaves.  In this method, we successfully built “One Master Three slaves” and “One Master Five Slaves” system.

·         Test

In order to show the obviously differences in test results, we used a series of txt documents which the total size is about 1GB to test these 3 systems repeatedly. We found “One Master Two Slaves” system needs about 15 minutes to finish the task. And One Master Three Slaves” and “One Master Five Slaves” systems only need about 5 minutes.

4)      Problems faced:

·         After building “One Master One Slave” system, we used the same method to build a “One Master Two Slaves” system, but it doesn’t work. According to some suggestions from Internet, we tried to change some configuration to fix the problem. Then we found it happens due to misconfiguration of network in “/etc/hosts” and “/etc/hostname” file. This is happened if you have different name for hostname and etc/hosts ip virtual domain name. Because hadoop uses “master” and “slave” to call nodes described in /etc/hosts file. Map-Reduce or any Hadoop class might use computers hostname described in “/etc/hostname” file. It is easy to solve this problem. We just update the hostname from “/etc/hostname” to match with “/etc/hosts”.  

·         We cannot explain why the total time used by “One Master Two Slaves” system is about 3 times than “One Master Three Slaves” system. But results of “One Master Three Slaves” and “One Master Five Slaves” systems are almost the same.

5)      Next week’s goals:

·         Define the objective of our project. Use multi-node cluster to solve a concrete problem.

Week Six

posted Dec 1, 2010, 11:29 PM by KAI CHEN

EENG 489
Fall 2010
WEEKLY STATUS REPORT
Group

Intrepid

Week ending
(Monday)

11/29/10

Report

6

Project Title:  Cluster Computers & Distributed Computing
Group Leader:       Hao Liu
Team Member 1   Hua Fang
Team Member 2   Hong Chen
Team Member 3   Kai Chen

Total number of person-hours spent on project by group during past week: 18    

Total number of person-hours spent on project by group leader: 6

Total number of person-hours spent on project by team member 1: 4

Total number of person-hours spent on project by team member 2: 4

Total number of person-hours spent on project by team member 3: 4

Is project on schedule:Yes[ X ]No[  ]

Weekly status:

Include discussion of what you accomplished this week, who you contacted, what decisions were made, what obstacles were identified, how they are being addressed. (Use multiple pages if necessary)

1. Previous report’s summary:

  • We build the single node successful and ran smoothly.
  • We tried our first MapReduce job which is WordCount job and it ran well. 

2. Group’s accomplishments:

  • Redo our website.
  • Running Hadoop on Ubuntu Linux and set up a multi-node cluster by using the Hadoop Distributed File System(HDFS).

3. Group’s current week’s accomplishments: (Provide detailed description of your work)Structure your description using subheadings

  • HDFS: HDFS is a highly fault-tolerant distributed file system and like Hadoop designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets
  • SSH access: SSH means Secure Shell. It is one of the most trusted names when it comes to data confidentiality and security. SSH provides web administrators a way to access their servers in a more secured way. Through an encrypted connection, SSH access allows you to log in to your account. This means that all data will be shown in an unreadable format, which makes it hard for hackers to get anything from it.
  • Set up a multi-node cluster: We built a multi-node cluster using two Ubuntu boxes. At the beginning we install, configure and test a “local” Hadoop setup for each of the two Ubuntu boxes, and in a second step to “merge” these two single-node clusters into one multi-node cluster in which one Ubuntu box will become the designated master, and the other box will become only a slave

4. Problems faced:

  • This week everything is ok. We achieve the aim completely.
 
5. Next week’s goals:
  • We will continue to code a simple MapReduce Job in the Python programming language which can serve as the basis for writing your own MapReduce programs.

1-10 of 15