BIG DATA - HBase
What is HBase?
HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable.It leverages the fault tolerance provided by the Hadoop File System (HDFS)..
It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.
HBase achieves high throughput and low latency by providing faster Read/Write Access on huge data sets. Therefore, HBase is the choice for the applications which require fast & random access to large amount of data.
HBase-ARCHITECTURE
In HBase, tables are split into regions and are served by the region servers.
Regions are vertically divided by column families into “Stores”. Stores are saved as files in HDFS. Shown below is the architecture of HBase.
HBase has three major components:
Client library,
Master server,
Region servers. Region servers can be added or removed as per requirement.
HBase is a NoSQL database
HBase has several components which communicate together like HBase HMaster, ZooKeeper, NameNode, Region Severs.
HBase is optimized for read and supports single writes, which leads to strict consistency. HBase supports Range based scans, which makes scanning process faster
HBase supports ordered partitioning, in which rows of a Column Family are stored in RowKey order.
HBase does not support read load balancing, one Region Server serves the read request and the replicas are only used in case of failure.
In CAP (Consistency, Availability & Partition -Tolerance) theorem HBase maintains Consistency and Availability.
HBase Configuration---To Performance Tuning
In order to fine-tune our HBase Cluster setup, there are many configuration properties are available in HBase:
Decrease ZooKeeper timeout.
Increase handlers.
Increase heap settings.
Enable data compression.
Increase region size.
Adjust block cache size.
Adjust memstore limits.
Increase blocking store files.
Increase block multiplier.
Decrease maximum logfiles