hadoop + Hbase + thrift + php 安裝設定與程式設計
此教材包含兩個部份
第一部份:Hadoop v 0.20 + Hbase + Thrift + PHP 的安裝、設定與測試
第二部份:透過 Thrift 用 PHP 的程式碼存取 Hbase 資料
第一部份
安裝編譯及測試
一、前言
本篇作業系統為Ubuntu (9.04)
假設已經安裝好 Hadoop (0.20 ) Hbase (0.20) ,完成設定,並且已在運作中
安裝設定 hadoop 0.20
本篇的hadoop 安裝於 /opt/hadoop
安裝設定 hbase 0.20
本篇的Hbase 安裝於 /opt/hbase
二、安裝 thrift
編譯 thrift
目前 thrift 最新版本為 0.2
將原始碼解壓縮後的完整路徑為 /opt/thrift,並進入該目錄
$ cd /opt/thrift
安裝之前請先確定有裝了 libboost (c++的函式庫),以及make時會用到的yacc flex
$ apt-get install libboost-dev automake libtool flex bison g++ python python-all-dev
接著編譯與安裝thrift
$ ./bootstrap.sh
$ ./configure
$ make
$ sudo make install
產生出可以存取hbase的php
$ cp -r /opt/hbase/src/java/org/apache/hadoop/hbase/thrift ./hbase_thrift_src
$ cd hbase_thrift_src
$ thrift --gen php Hbase.thrift
以上若動作都確實完成,可以看到已經產生出一個資料夾: gen-php/Hbase/ 並且包含兩個php檔案,此兩個php檔可以幫助你存取hbase,不過此 php 檔還是需要其他檔案當函式庫
三、透過 thrift 存取 hbase
檢查hadoop 與 hbase 是否正常運作中
$ jps
26582 HQuorumPeer
26658 HMaster
16222 TaskTracker
16138 JobTracker
15952 NameNode
16047 DataNode
啟動 hbase 的 thrift daemon
$ /opt/hbase/bin/hbase thrift start &
複製 thrift 的php 專案到網頁伺服器
複製到 /var/www/hbase 內
$ cp -r /opt/thrift-0.2.0/lib/php/src /var/www/hbase/thrift
$(請用 chown 指令來改/var/www/hbase 下的權限)
$ mkdir /var/www/hbase/thrift/packages
$ cp -r hbase_thrift_src/gen-php/* /var/www/hbase/thrift/packages/
完成後/var/www/hbase /的目錄結構為
./
|-- DemoClient.php (下一節會提到)
`-- thrift
|-- Thrift.php
|-- autoload.php
|-- ext
| `-- thrift_protocol
| |-- config.m4
| |-- php_thrift_protocol.cpp
| `-- php_thrift_protocol.h
|-- packages
| `-- Hbase
| |-- Hbase.php
| `-- Hbase_types.php
|-- protocol
| |-- TBinaryProtocol.php
| `-- TProtocol.php
`-- transport
|-- TBufferedTransport.php
|-- TFramedTransport.php
|-- THttpClient.php
|-- TMemoryBuffer.php
|-- TNullTransport.php
|-- TPhpStream.php
|-- TSocket.php
|-- TSocketPool.php
`-- TTransport.php
四、測試
複製DemoClient.php 到測試目錄/var/www/hbase/
$ cp /opt/hbase/src/examples/thrift/DemoClient.php /var/www/hbase/DemoClient.php
修改 /var/www/hbase/DemoClient.php 的THRIFT_ROOT 參數
$GLOBALS['THRIFT_ROOT'] = '/var/www/hbase/thrift';
$socket = new TSocket( 'secuse.nchc.org.tw', 9090 );
執行
用瀏覽器打開 http://localhost/hbase/DemoClient.php 並檢視網頁內容
出現錯誤訊息為正常現象,可修改程式碼解決
PHP Warning: Module 'mcrypt' already loaded in Unknown on line 0
<html>
<head>
<title>DemoClient</title>
</head>
<body>
<pre>
scanning tables...
creating table: demo_table
column families in demo_table:
column: entry, maxVer: 10
column: unused, maxVer: 3
Fatal error: Uncaught exception 'Exception' with message 'shouldn't get here!' in /var/www/DemoClient.php:158
Stack trace:
#0 {main}
thrown in /var/www/DemoClient.php on line 158
參考
第二部份
程式碼解析
零、前言
thrift 是透過非java的其他程式語言,直接對hbase 進行存取的中介函式庫
此篇介紹的是如何用php透過 thrift 對 hbase 操作
程式安裝測試方法已紀錄於: 安 裝編譯及測試
因此目錄結構為
測試程式之前,請先確定
hbase , hadoop 都有正常運作中
$ bin/hbase thrift start 尚在執行
一、php引用thrift lib
<?
$GLOBALS['THRIFT_ROOT'] = '/var/www/hbase/thrift';
require_once( $GLOBALS['THRIFT_ROOT'].'/Thrift.php' );
require_once( $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php' );
require_once( $GLOBALS['THRIFT_ROOT'].'/transport/TBufferedTransport.php' );
require_once( $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php' );
require_once( $GLOBALS['THRIFT_ROOT'].'/packages/Hbase/Hbase.php' );
$socket = new TSocket( 'secuse.nchc.org.tw', 9090 );
$socket->setSendTimeout( 10000 ); // Ten seconds (too long for production, but this is just a demo ;)
$socket->setRecvTimeout( 20000 ); // Twenty seconds
$transport = new TBufferedTransport( $socket );
$protocol = new TBinaryProtocol( $transport );
$client = new HbaseClient( $protocol );
$transport->open();
?>
........
其他html 碼
<?
或是 下面提到的各式讀寫操作
?>
.......
<?
$transport->close();
?>
所有的程式碼都必須包含這些引入函式庫、開啟關閉socket 的敘述
二、各種對hbase的操作
2.1 列出hbase 裡的所有 table
<?
echo( "listing tables...\n" );
$tables = $client->getTableNames();
sort( $tables );
foreach ( $tables as $name ) {
echo( " found: {$name}\n" );
}}
?>
2.2 刪除table
<?
$name = "hbase table name";
if ($client->isTableEnabled( $name )) {
echo( " disabling table: {$name}\n");
$client->disableTable( $name );
}
echo( " deleting table: {$name}\n" );
$client->deleteTable( $name );
}
?>
2.3 新增table
我們先定義columns 的物件結構如下
<?
$columns = array(
new ColumnDescriptor( array(
'name' => 'entry:',
'maxVersions' => 10
) ),
new ColumnDescriptor( array(
'name' => 'unused:'
) )
);
?>
將剛剛的column 放到table 內
<?
$t = "table name";
echo( "creating table: {$t}\n" );
try {
$client->createTable( $t, $columns );
} catch ( AlreadyExists $ae ) {
echo( "WARN: {$ae->message}\n" );
}
?>
2.4 列出 table內的家族成員 family
<?
$t = "table name";
echo( "column families in {$t}:\n" );
$descriptors = $client->getColumnDescriptors( $t );
asort( $descriptors );
foreach ( $descriptors as $col ) {
echo( " column: {$col->name}, maxVer: {$col->maxVersions}\n" );
}
?>
2.5 寫入資料
<?
$t = "table name";
$row = "row name"
$valid = "foobar-\xE7\x94\x9F\xE3\x83\x93";
$mutations = array(
new Mutation( array(
'column' => 'entry:foo',
'value' => $valid
) ),
);
$client->mutateRow( $t, $row, $mutations );
?>
2.6 讀取資料
get 取得一個 column value
get 取得一個 column value 的用法
<?
$table_name = 't1';
$row_name = '1';
$fam_col_name = 'f1:c1';
$arr = $client->get($table_name, $row_name , $fam_col_name);
// $arr = array
foreach ( $arr as $k=>$v ) {
// $k = TCell
echo ("value = {$v->value} , <br> ");
echo ("timestamp = {$v->timestamp} <br>");
}}
?>
getRow 取得一整個row
getRow($tableName, $row) 用法
<?
$table_name = "table name";
$row_name = "row name";
$arr = $client->getRow($table_name, $row_name);
// $client->getRow return a array
foreach ( $arr as $k=>$TRowResult ) {
// $k = 0 ; non-use
// $TRowResult = TRowResult
printTRowResult($TRowResult);
}
?>
scan 一整個table
<?
$table_name = 't1';
$start_row = ""; // 從row 的起點開始
$family = array( "f1","f2","f3" );
$scanner = $client->scannerOpen( $table_name, $start_row , $family );
// $scanner 是一個遞增數字 for open socket
// scannerGet() 一次只抓一row,因此要用while迴圈不斷地抓
while (true ){
$get_arr = $client->scannerGet( $scanner );
// get_arr is an array
if($get_arr == null) break;
// 沒有回傳值代表已經沒有資料可抓,跳脫此無限迴圈
foreach ( $get_arr as $TRowResult ){
// $TRowResult = TRowResult
echo (" row = {$TRowResult->row} ; <br> ");
$column = $TRowResult->columns;
foreach ($column as $family_column=>$Tcell){
echo ("family:column = $family_column ");
// $family_column = family_column
// $Tcell = Tcell
echo (" value = {$Tcell->value} ");
echo (" timestamp = {$Tcell->timestamp} <br>");
} }
}
echo( "<br> ----------------- " );
echo( "<br> Scanner finished <br>" );
$client->scannerClose( $scanner );
?>
補充
TCell
<?
class TCell {
static $_TSPEC;
public $value = null;
public $timestamp = null;
public function __construct($vals=null) {
if (!isset(self::$_TSPEC)) {
self::$_TSPEC = array(
1 => array(
'var' => 'value',
'type' => TType::STRING,
),
2 => array(
'var' => 'timestamp',
'type' => TType::I64,
),
);
}
if (is_array($vals)) {
if (isset($vals['value'])) {
$this->value = $vals['value'];
}
if (isset($vals['timestamp'])) {
$this->timestamp = $vals['timestamp'];
} }
}}
?>
TRowResult
<?
class TRowResult {
static $_TSPEC;
public $row = null;
public $columns = null;
public function __construct($vals=null) {
if (!isset(self::$_TSPEC)) {
self::$_TSPEC = array(
1 => array(
'var' => 'row',
'type' => TType::STRING,
),
2 => array(
'var' => 'columns',
'type' => TType::MAP,
'ktype' => TType::STRING,
'vtype' => TType::STRUCT,
'key' => array(
'type' => TType::STRING,
),
'val' => array(
'type' => TType::STRUCT,
'class' => 'TCell',
), ), );
}
if (is_array($vals)) {
if (isset($vals['row'])) {
$this->row = $vals['row'];
}
if (isset($vals['columns'])) {
$this->columns = $vals['columns'];
} } }
}
?>