What is HBase?

HBase is a distributed column-oriented database built on top of HDFS. HBase is the Hadoop application to use when you require real-time read/write random access to very large datasets.

HBase is a scalable data store targeted at random read and write access of (fairly-)structured data. It’s modeled after Google’s Bigtable  and targeted to support large tables, on the order of billions of rows and millions of columns.

It uses HDFS as the underlying filesystem and is designed to be fully distributed and highly available. Version 0.20 introduces significant performance improvement.

HBase’s TableInputFormat is designed to allow a MapReduce program to operate on data stored in an HBase table. TableOutputFormat is for writing MapReduce outputs into an HBase table.

HBase has different storage characteristics than HDFS, such as the ability to do row updates and column indexing, so we can expect to see these features used by Hive in future releases. It is already possible to access HBase tables from Hive.