Hive - Data Types
- All the data types in Hive are classified into four types, given as follows:
- Column Types
- Literals
- Null Values
- Complex Types
Column Types
Column type are used as column data types of Hive. They are as follows:
Integral Types
- Integer type data can be specified using integral data types, INT.
- When the data range exceeds the range of INT, you need to use BIGINT and if the data range is smaller than the INT, you use SMALLINT.
- TINYINT is smaller than SMALLINT.
The following table depicts various INT data types:
Type Postfix Example
TINYINT Y 10Y
SMALLINT S 10S
INT - 10
BIGINT L 10L
String Types
- String type data types can be specified using single quotes (' ') or double quotes (" ").
- It contains two data types: VARCHAR and CHAR. Hive follows C-types escape characters.
The following table depicts various CHAR data types:
Data Type Length
VARCHAR 1 to 65355
CHAR 255
Timestamp
- It supports traditional UNIX timestamp with optional nanosecond precision.
- It supports java.sql.Timestamp format “YYYY-MM-DD HH:MM:SS.fffffffff” and format “yyyy-mm-dd hh:mm:ss.ffffffffff”.
Dates
- DATE values are described in year/month/day format in the form {{YYYY-MM-DD}}.
Decimals
- The DECIMAL type in Hive is as same as Big Decimal format of Java. It is used for representing immutable arbitrary precision. The syntax and example is as follows:
- DECIMAL(precision, scale)
- decimal(10,0)
Union Types
- Union is a collection of heterogeneous data types.
- You can create an instance using create union.
- The syntax and example is as follows:
UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>
{0:1}
{1:2.0}
{2:["three","four"]}
{3:{"a":5,"b":"five"}}
{2:["six","seven"]}
{3:{"a":8,"b":"eight"}}
{0:9}
{1:10.0}
Literals
- The following literals are used in Hive:
- Floating Point Types
- Floating point types are nothing but numbers with decimal points.
- Generally, this type of data is composed of DOUBLE data type.
- Decimal Type
- Decimal type data is nothing but floating point value with higher range than DOUBLE data type.
- The range of decimal type is approximately -10-308 to 10308.
- Null Value
- Missing values are represented by the special value NULL.
Complex Types
- The Hive complex data types are as follows:
- Arrays
- Arrays in Hive are used the same way they are used in Java.
- Syntax: ARRAY<data_type>
- Maps
- Maps in Hive are similar to Java Maps.
- Syntax: MAP<primitive_type, data_type>
- Structs
- Structs in Hive is similar to using complex data with comment.
- Syntax: STRUCT<col_name : data_type [COMMENT col_comment], ...>
Hive - Create Database
- Hive is a database technology that can define databases and tables to analyze structured data.
- The theme for structured data analysis is to store the data in a tabular manner, and pass queries to analyze it.
- This chapter explains how to create Hive database.
- Hive contains a default database named default.
Create Database Statement
- Create Database is a statement used to create a database in Hive.
- A database in Hive is a namespace or a collection of tables.
- The syntax for this statement is as follows:
CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>
- Here, IF NOT EXISTS is an optional clause, which notifies the user that a database with the same name already exists.
- We can use SCHEMA in place of DATABASE in this command.
- The following query is executed to create a database named userdb:
hive> CREATE DATABASE [IF NOT EXISTS] userdb;
or
hive> CREATE SCHEMA userdb;
- The following query is used to verify a databases list:
hive> SHOW DATABASES;
default
userdb
Drop Database Statement
- Drop Database is a statement that drops all the tables and deletes the database. Its syntax is as follows:
DROP DATABASE StatementDROP (DATABASE|SCHEMA) [IF EXISTS] database_name
[RESTRICT|CASCADE];
- The following queries are used to drop a database. Let us assume that the database name is userdb.
hive> DROP DATABASE IF EXISTS userdb;
- The following query drops the database using CASCADE. It means dropping respective tables before dropping the database.
hive> DROP DATABASE IF EXISTS userdb CASCADE;
- The following query drops the database using SCHEMA.
hive> DROP SCHEMA userdb;
create Table Statement
- Create Table is a statement used to create a table in Hive. The syntax and example are as follows:
- Syntax
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
Let us assume you need to create a table named employee using CREATE TABLE statement.
hive> CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT Employee details
ROW FORMAT DELIMITED
FIELDS TERMINATED BY \t
LINES TERMINATED BY \n
STORED AS TEXTFILE;
Load Data Statement
- Generally, after creating a table in SQL, we can insert data using the Insert statement. But in Hive, we can insert data using the LOAD DATA statement.
- While inserting data into Hive, it is better to use LOAD DATA to store bulk records.
- There are two ways to load data: one is from local file system and second is from Hadoop file system.
- Syntax
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
- LOCAL is identifier to specify the local path. It is optional.
- OVERWRITE is optional to overwrite the data in the table.
- PARTITION is optional.
Example
hive> LOAD DATA LOCAL INPATH '/home/user/sample.txt' OVERWRITE INTO TABLE employee;
Alter Table Statement
- It is used to alter a table in Hive.
- The statement takes any of the following syntax based on what attributes we wish to modify in a table.
ALTER TABLE name RENAME TO new_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
Rename To… Statement
- The following query renames the table from employee to emp.
hive> ALTER TABLE employee RENAME TO emp;
******Page is under construction******