Website of Intrepid

Recent site activity

Appendix

A.   Single-Node set up

Procedures

There are some prerequisites we need to do.

Ø  Install the Ubuntu operating system and Hadoop.

Ø  Install the working Java 1.6.x

Since Hadoop is written in Java, we need to have the Java6 installed. There are two ways to install the Java6. The first one is implementing some commands in the terminal like “sudo apt-get install sun-java6-jdk”, of course, the Java6 must be downloaded before doing this command. Our group chose to use one of the functions in the Ubuntu operating system called synaptic package manager to download and install the Java6. We checked whether it is installed into the system by using the “java –version” in the terminal.

 

After finishing the prerequisites, we come to the Hadoop part. The first thing we should do is to add a Hadoop group for the secure reason of running Hadoop. After adding the Hadoop group, we will add the user Hadoop (you can name it with whatever you want) to the Hadoop group by running “adduser –ingrouphadoophadoop” in the terminal. The next step is important for the Hadoop communication’s safety which is also mentioned before---SSH. The SSH is not only recommended, but required by the Hadoop. In the single node set up, we want to configure SSH access to localhost for the Hadoop user we create in the previous section. With the downloaded SSH, a command “ssh-keygen-t rsa-p “” ” will generate a key for the Hadoop user. In this command, “” means the password is empty. However, you can use any password you like. In order to make them work, we can use a “cat” command to enable SSH access to the local machine with the newly created key. The last step of the SSH is to test whether it is working by “sshlocalhost”.

 

We can never run anything without a Hadoop installed in your machine. After a few simple installations, there are a few configuration files we need to fix, they can be found online.

 

The engine of the Hadoop is almost done. The first step to start up the Hadoop installation is formatting the file system. There is one tricky place in the command here: “bin/hadoopnamenode–format”. Pay attention to the spelling, because if you spell format as formate, the system will give you a similar output which may really confuse you.

 

Everything is done here, let the car start! Run the command “/bin/start-all.sh” will start up all the Java processes: Namenode, Datanode, Jobtracker and Tasktracker on the machine which you can check by running another command “jps”. If everything runs well, it means the single node has been successfully set up.

 

B.    Multi-node Set Up

Multi-node is a group of single-nodes combined together through a common networking. We first tried the wireless as the networking, the router allocates different IPs for each laptop we were using, put the information into the hosts file which will combine the computers together by distinguishing the IP addresses. We failed with the wireless networking which will be explained in the report later. We tried the cable then, it succeeded in working with one master and one slave. However, when we add more slaves, the slaves work randomly.

 

The computers cannot work without the SSH which we set in the single-node set up. Before communicating, we have to run the command “ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@slave” on the master machine which will copy the key you have already generated from the slave. This command will prompt you for the login password for user hadoop on slave, then copy the public SSH key for you, creating the correct directory and fixing the permissions as necessary. We can test whether it works by running “ssh master” in the terminal to check whether the slaves can “talk” with the master. There are a few changes in the configuration files compared with the single node configuration. We found the changes online. Next step is similar to the single–node set up: format the namenode and start the multi-node cluster. The starting part is a little different with the single-node set up. The start-up process is divided into two parts, first: run the command “bin/start-dfs.sh” on the machine you want the (primary) namenode to run on. This will bring up HDFS with the namenode running on the machine you ran the previous command on, and datanodes on the machines listed in the conf/slaves file. Second step is run the command /bin/start-mapred.sh on the master machine. By checking the JPS, we can see the java processes which are running on the machine. After the successfully starting, we can stop it if we don’t want to do anything else. We did run a MapReduce job—wordcount job, the steps are the same as how we did it in the single-node.

 

C.    Commercial Topics

About commercial topics, they always exist as advertisements and they appear randomly and irregularly. We collected all tweets about Apple, iPad and iPhone from Twitter in the time between April 1st and April 15th. It is shown as the blue line in the line graph below. It shown the quantity of tweets we collected every day. We can see they always exist irregularly, so their survival time is also irregular.

 

Another Example from Twitter

 

Differ from all the other graphs and charts, this line graph above is about the tweets we collected from Twitter. It shows the tweets we collected everyday about two globe issue topics and 1 commercial topic. The red line trends smooth, because, nothing serious event happened about Libya and Gaddafi. The green line represents the quantity of tweets about nuclear leak on Japan. This line is quite similar with last line chart, both of these two curves peak on April 7th and April 11th. That’s because there were two aftershocks occurred on those two days. This explains that people all over the world are usually concerned on the same things, no matter in Chinese Sinamicroblog or world-wide Twitter.

 

D.    Core Codes

a)      Database Connection & Query


 

public class DBcon {

 

         Statement stmt = null;

       ResultSetrset = null;

         Connection conn = null;

         String tablename="test";

 

       public void setTableName(String s){

              tablename=s;

         }

      

      

       public  void insert(Status str,String s) {

                    

             

                String sql = "insert into "+s+ " ("+"id,username,text,time,location)values("+str.getUser().getId()+

                      ",'"+str.getUser().getName()+"','"+str.getText().replaceAll("'", "")+"','"+str.getCreatedAt()+"','"+str.getUser().getLocation()+"')";

                            System.out.println(sql);

                            try {

                                   stmt = getconn().createStatement();

                                   stmt.executeUpdate(sql);

                                   stmt.close();

                                     } catch (SQLException e) {

                                   e.printStackTrace();

                                     }

                    

                }

      

      

       public  voidnewTable(String s) {

               String sql = "create table "+s+"(ID       NUMBER, USERNAME VARCHAR2(60), TEXT     VARCHAR2(600),TIME     VARCHAR2(60),LOCATION VARCHAR2(60))";

                     System.out.println(sql);

                     try {

                            stmt = getconn().createStatement();

                            stmt.executeUpdate(sql);

                            stmt.close();

                              } catch (SQLException e) {

                            e.printStackTrace();

                              }

         }

 

public Connection getconn() {

  String driver = "oracle.jdbc.driver.OracleDriver";

 

try {

       if(conn==null || conn.isClosed()){

Class.forName(driver);

conn = DriverManager.getConnection(

     "jdbc:oracle:thin:@localhost:1521:orcl", "scott",

     "123");

   }

  } catch (SQLException e) {

e.printStackTrace();

  } catch (ClassNotFoundException e) {

e.printStackTrace();

  }

return conn;

 }

 

 

}


b)      Bar Chart

public class BarChart { 

public  void generate(String s,int[] i,String[] str) {

       DefaultCategoryDataset dataset = new DefaultCategoryDataset();

              dataset.setValue(i[0], str[0],""); 

       dataset.setValue(i[1], str[1],""); 

       dataset.setValue(i[2], str[2],"");

       dataset.setValue(i[3], str[3], "");

      

JFreeChart chart = createChart(s,dataset); 

 

 

drawToFrame(chart); 

 

 

drawToOutputStream("c:\\intrepid\\001.png", chart);

      

   }

 

} 

c)       Pie Chart

 

public class PieChart { 

 

public  void generate(String s,int[] i,String[] str) {

       DefaultPieDataset dataset = new DefaultPieDataset();

      

       dataset.setValue(str[0],i[0]); 

       dataset.setValue(str[1],i[1]); 

       dataset.setValue(str[2],i[2]);

       dataset.setValue(str[3],i[3]);

      

JFreeChart chart = createChart(s,dataset); 

 

 

drawToFrame(chart); 

 

 

         // drawToOutputStream("c:\\intrepid\\001.png", chart);

      

     }

} 

d)      User Window & Web Applet

 

public class Window extends JFrame implements ActionListener  {

       BorderLayout borderLayout1 = new BorderLayout();

//     GridBagLayoutgridbag = new GridBagLayout();

       …...

      

public static void main(String[] args) {

      

       Window a= new Window();

       a.setSize(new Dimension(800, 700));

       a.show();

      

 

}

 

public  Window() {

       addWindowListener(new WindowDestroyer());

       this.getContentPane().setLayout(borderLayout1);

//     GridBagConstraints c = new GridBagConstraints();

       ……

}

 

       public void actionPerformed(ActionEvent e) {

 

              if(e.getActionCommand().equals("Enter")){

                     String str=jTextField1.getText().trim();

                     if(str.equals("")){

                     jLabel1.setText("You entered nothing, please reenter");

                            }

                     else{

                            stage1();

                         }

                     }

                     else if(e.getActionCommand().equals("Start")){

                            stage2();

                     }

                     else if(e.getActionCommand().equals("AutoGenerate")){

                            stage3();

                     }

                     else if(e.getActionCommand().equals("Generate")){

                            stage6();

                     }

                     else if(e.getActionCommand().equals("Place")){

                            stage4();

                     }

                     else if(e.getActionCommand().equals("Next")){

                            String str=jTextField1.getText().trim();

                     if(str.equals("")){

                     jLabel1.setText("You entered nothing, please reenter");

                            }

                     else{

                            stage5();

                            count++;

                            }

                     }

                     else if(e.getActionCommand().equals("Clear")){

                            clear();

                            clear=false;

                     }

                     else {

                            System.exit(0);

                     }

              }

             

              private void stage1(){

                     ……

                     s="select * from test where text like '%"+kw+"%'";

                     ta.setText("The topic you entered is: "+kw);

              }

              private void stage2(){

                           

                            JO jo=new JO();

                            List list=jo.Query(s);

                            int count=0;

                            for(int i=0;i<list.size();i++){

                                   count++;

                            ta.append("\n"+count+"\t"+list.get(i).toString());

                            jLabel2.setText("  We got "+count+" results on "+kw+" in our database");

                            jLabel3.setText("  Click on AutoGenarate to see results based on places such as:");

                            jLabel4.setText("  Beijing, Shanghai, Guangdong, Overseas");

                            jLabel6.setText("  Click on Place to specify 4 locations that you're interested");

                           

                            }    

                           

              }

                           

                     …….     

             

}

e)       Insert Tweets into Database

 

public class InsertPublicTweets {

      

       public static void main(String[] args) {

              System.setProperty("weibo4j.oauth.consumerKey", Weibo.CONSUMER_KEY);

System.setProperty("weibo4j.oauth.consumerSecret", Weibo.CONSUMER_SECRET);

 

Weiboweibo = new Weibo();

weibo.setToken("2b94246580dc09063c4d187688c606de", "a99fc5f19e8b0fa380e6acb5b41c266d");

int k=0,changeName=0;

        String tablename="test";

DBcondbc=new DBcon();

 

       while(true)

              try{

                     List<Status> p=weibo.getPublicTimeline();

                    

                    

                     for (Status status : p) {

                            k++;

                           

                            dbc.insert(status,tablename);

                           

                     }

                    

       Thread.sleep(10000);

       if(k>500000){

              changeName++;

              dbc.newTable("test"+changeName);

              tablename="test"+changeName;

              k=0;

       }

              }

 

              catch (Exception e) {

                    

       }

 

       }

}

f)        Chinese Characters Segment

public class MyTest {

       public static void main(String[] args){

 

                            Analyzer analyzer = new IKAnalyzer();

             

       intchangeName=0;

              try {

                    

                     FileInputStream in = null;

                     in = new FileInputStream(new File("C:\\test.txt"));

                     InputStreamReaderinReader = new InputStreamReader(in,"UTF-8");

                     BufferedReaderbr = new BufferedReader(inReader);

                     FileOutputStreamostream=new FileOutputStream("splitted"+changeName+".txt",false);

                     OutputStreamWriterswriter=new OutputStreamWriter(ostream,"UTF-8");

                     BufferedWriter writer=new BufferedWriter(swriter);

//                   CJKSegmentercj=new CJKSegmenter();

//                   TokenStreamts=analyzer.tokenStream(test, br);

                    

                     Collection<String> list = new LinkedList<String>();

                     list.add(test);

                     list.add(test1);

                     Collection<String> vet =new Vector<String>();

                     vet.add(test2);

                     vet.add(test3);

                     Dictionary.loadExtendStopWords(list);

                     Dictionary.loadExtendWords(vet);

                    

                     IKSegmentation is =new IKSegmentation(br);

                    

                     Lexeme le;

                     while((le=is.next())!=null){

                            System.out.println(le.getLexemeText()+" ");

                            writer.write(le.getLexemeText());

                            writer.newLine();

                     }

                     writer.close();

                     //           

                    

              } catch (CorruptIndexException e) {

                     e.printStackTrace();

              } catch (LockObtainFailedException e) {

                     e.printStackTrace();

              } catch (IOException e) {

                     e.printStackTrace();

              }

       }

}

g)      Encoding Format Converter

public class Converter {

       public static void main(String[] args){

             

              File file = new File("c:\\intrepid\\UTFdata");

              String[] filelist = file.list();

              for (int k = 0; k <filelist.length; k++) {

       try{

              FileInputStream in = null;

       in = new FileInputStream(new File("c:\\intrepid\\UTFdata\\"+filelist[k]));

       InputStreamReaderinReader = new InputStreamReader(in,"UTF-8");

       BufferedReaderbr = new BufferedReader(inReader);

       FileOutputStreamostream=new FileOutputStream("c:\\intrepid\\GBKoutput\\"+filelist[k],false);

       OutputStreamWriterswriter=new OutputStreamWriter(ostream,"gbk");

       BufferedWriter writer=new BufferedWriter(swriter);

      

       while((br.readLine()) != null){

       System.out.println(s);

       writer.write(s);

       writer.newLine();

       }

       writer.close();

       }catch (IOException e) {

              e.printStackTrace();

       }

              }

       }

       }

h)      Text Re-Structuring

public class Split {

       public static void main(String[] args){

             

              File file = new File("c:\\intrepid\\UTFdata");

              String[] filelist = file.list();

              for (int k = 0; k <filelist.length; k++) {

       try{

              FileInputStream in = null;

       in = new FileInputStream(new File("c:\\intrepid\\UTFdata\\"+filelist[k]));

       InputStreamReaderinReader = new InputStreamReader(in,"UTF-8");

       BufferedReaderbr = new BufferedReader(inReader);

       FileOutputStreamostream=new FileOutputStream("c:\\intrepid\\GBKoutput\\"+filelist[k],false);

       OutputStreamWriterswriter=new OutputStreamWriter(ostream,"gbk");

       BufferedWriter writer=new BufferedWriter(swriter);

      

       while((br.readLine()) != null){

             

       String s="";

      

       for(int i=0;i<=6;i++){

              if(i==0){

                     s=br.readLine();

              }

              else if(i==2){

                     br.readLine();

              }

              else{

                    

              s=s+"\t"+br.readLine();

              s=s.replace("-", "");

              s=s.replace("ID:", "");

              s=s.replace("Name:", "");

              s=s.replace("Text:", "");

              s=s.replace("Time:", "");

              s=s.replace("Location:", "");

              }

       }

       System.out.println(s);

       writer.write(s);

       writer.newLine();

      

       }

       writer.close();

       }catch (IOException e) {

              e.printStackTrace();

       }

              }

       }

       }

Comments