Sources:
Subpages (3): configurations co-processors in hbase hbase shell commands mapreduce fixes
Related pages: use hbase in python
The first Hadoop component to provide online access to HDFS was HBase, a key-value store that uses HDFS for its underlying storage. HBase provides both online read/write access of individual rows and batch operations for reading and writing data in bulk, making it a good solution for building applications on.
how do I connect, write to and read hbase table from java app?
//hbase-site.xml if available in classpath is picked by configuration create method.
Configuration configuration = HBaseConfiguration.create();
try (HConnection connection = HConnectionManager.createConnection(configuration);) {
HTableInterface table = connection.getTable("testMQC");
//store in Table.
Put p = new Put(Bytes.toBytes("Rowkey"));
p.add(Bytes.toBytes("ColumnFamily"), Bytes.toBytes("Qualifier"), Bytes.toBytes("Value"));
table.put(p);
//Read from Table using Get.
Get g = new Get(Bytes.toBytes("Rowkey"));
Result rs = table.get(g);
System.out.println(rs.getFamilyMap(Bytes.toBytes("ColumnFamily")).size());
for (Entry<byte[], byte[]> qualifierValue : rs.getFamilyMap(Bytes.toBytes("ColumnFamily")).entrySet()) {
System.out.println(qualifierValue.getKey() + "=" + qualifierValue.getValue());
}
//Read from Table using Scan.
Scan s = new Scan();
//Various applicable Filters can be applied on both Get and Scan objects.
RowFilter rowFilter = new RowFilter(CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("Rowkey")));
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
filterList.addFilter(rowFilter);
s.setFilter(filterList);
ResultScanner scnr = table.getScanner(s);
if (scnr != null) {
for (Result scnres = scnr.next(); scnres != null; scnres = scnr.next()) {
System.out.println("Scanned size : " + scnres.getFamilyMap(Bytes.toBytes("ColumnFamily")).size());
}
}
} catch (IOException e) {
e.printStackTrace();
}
Co-Processors in hbase-
1.Observer (Trigger) :
2.EndPoint (Stored procedure):
Steps for endpoint:
1.Create Interface
2.Create server
3.Create client
1.Create Call object
2.Call the end point
3.Iterate over the result
For deployment check co-processors in hbase
#How to export/import a table.(Source/Destination table must exists.)
hbase org.apache.hadoop.hbase.mapreduce.Export 'extInst' '/tmp/ei'
hbase org.apache.hadoop.hbase.mapreduce.Import 'extInst' '/tmp/ei'
#Steps to create copy of table. lets asume we have table called student.
1. desc 'student' #in hbase shell , copy DESCRIPTION's Value from result(comprises of 2 columns, DESCRIPTION & ENABLED) into some text editor.
2. Replace any leading spaces from copied lines. #*Be careful about merging words
3. Replace any trailing spaces and new lines. #*Be careful about merging words
4. Replace TTL => 'FOREVER' with TTL => org.apache.hadoop.hbase.HConstants::FOREVER
5. Replace any alphanumeric value for TTL with its numeric part. [i.e 604800 SECONDS (7 DAYS) to 604800]
6. Prefix create 'new_student' to updated description.
7. Execute complete text in hbase shell.
To Explore: copytable , snapshot, clone_snapshot , security in hbase, Hbase row copy(create duplicates) using CellUtil
Problem#1:Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server cdh5-00,60020,1462927222851 has been rejected; Reported time is too far out of sync with master. Time difference of 671194ms > max allowed of 30000ms
Sol: reboot cluster and time autosynced.
Problem#2:HMaster fails to start. /NODATA4U_SECUREYOURSHIT folder found in hdfs and /hbase was missing. Zookeeper is safe and still has hbase data nodes.
Sol:
Problem#3: hbase Export resulted in error: Connection refused on localhost:9000.
Sol: Removed/commented Hadoop Path in bash.bashrc, re-login, stop/started Hbase.
Problem#3: Deletion of counter qualifier deletes qualifier after long time.
Sol: call to tableInterface.flushCommits(); worked.