BIG DATA - Hive

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. Its allows users to write queries in SQL -like language called HiveQL or HQL. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.It is a platform used to develop SQL type scripts to do MapReduce operations.

Hive is not

- A relational database
- A design for OnLine Transaction Processing (OLTP)
- A language for real-time queries and row-level updates

Features of Hive

- It stores schema in a database and processed data into HDFS.
- It is designed for OLAP.
- It provides SQL type language for querying called HiveQL or HQL.
- It is familiar, fast, scalable, and extensible.
- Hive efficiently converts your queries into Map reduce task at the backend.
  - All the data types in Hive are classified into four types, given as follows:
    - Column Types
    - Literals
    - Null Values
    - Complex Types
  - Hive organizes tables into partitions. It is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.
  - What is Partitions
    - Hive Partitions is a way to organizes tables into partitions by dividing tables into different parts based on partition keys.
  - Partition is helpful when the table has one or more Partition keys. Partition keys are basic elements for determining how the data is stored in the table

- - - What is Buckets

- - Buckets in hive is used in segregating of hive table-data into multiple files or directories. it is used for efficient querying.
    - - The data i.e. present in that partitions can be divided further into Buckets
      - The division is performed based on Hash of particular columns that we selected in the table.
      - Buckets use some form of Hashing algorithm at back end to read each record and place it into buckets
      - In Hive, we have to enable buckets by using the set.hive.enforce.bucketing=true;

- - - https://www.guru99.com/hive-partitions-buckets-example.html

- - When the partitions directories still exist in the HDFS, simply run this command:
  - MSCK REPAIR HIVE EXTERNAL TABLES
  - Run metastore check with repair table option

- - - hive> Msck repair table <db_name>.<table_name>

Page updated

Google Sites

Report abuse