Install Hive with MySQL metadata store

Successfully intalled Hive 0.11 on a 3-node Hadoop cluster with a standalone MySQL metadata store.

1. Download a stable release hive-x.y.z.tar.gz

    

2. Unpack the gz file and move the hive folder to /usr/local.

   tar -xzf hive-x.y.z.tar.gz

   sudo mv hive-x.y.z /usr/local/hive

   chown ñR hduser:hadoop /usr/local/hive

   

3. Set HIVE_HOME in .bashrc. Make sure HADOOP_HOME has been set as well.

   export HIVE_HOME=/usr/local/hive

   export PATH=$HIVE_HOME/bin:$PATH  

    

4. Go to /hive/conf. If there is no hive-default.xml or hive-env.sh, then

   cp hive-default.xml.template hive-default.xml

   cp hive-env.sh.template hive-env.sh

   

   Note: hive-default.xml is for documentation purpose only. You must make your configuration 

   changes in hive-site.xml instead.

5. Configure hive-env.sh

   export HADOOP_HOME=/usr/local/hadoop

   export HIVE_CONF_DIR=/usr/local/hive/conf

   

6. Create /tmp and metastore folder on Hadoop. Make sure Hadoop instance has been started.

   /hadoop/bin/hadoop dfs -mkdir /tmp

   /hadoop/bin/hadoop dfs -mkdir /user/hive/warehouse 

   /hadoop/bin/hadoop dfs -chmod g+w /tmp

   /hadoop/bin/hadoop dfs -chmod g+w /user/hive/warehouse 

   

   The default location for metastore is /user/hive/warehouse

      

6. Create hive-site.xml in /usr/local/hive/conf and set mapred.job.tracker property. This should point to the Hadoop job tracker.

   

<configuration>

 <property>

  <name>mapred.job.tracker</name>

  <value>master:50030</value>

 </property>

</configuration>

7. At this point, it is ready to run a Hive instance.

  /usr/local/hive/bin/hive

8. Test HiveQL. A database row is delimited by Ctrl-A. Here we use ",".

  >create table test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

  >show tables;

  >describe test;  

  >load data local inpath '/usr/local/hive/test.txt' overwrite into table test;

  >select * from test;

  

9. Have a look at HDFS

  /hadoop/bin/hadoop dfs -ls /user/hive/warehouse/test

  

  Hive creates a new folder 'test' under 'warehouse' to store the raw txt file.

  Hive maps the table schema (metadata) to the raw file so users can do SQL-like queries

  over the raw file.

9. With the default setting, Hive uses a embedded Derby database instance to keep 

  the metadata which runs in the same JVM and supports only one session at a time. 

  

   To support multi-session (users), need to config a standalone database (MySQL) for metadata.

   9.1 Download and install MySQL.

   9.2 Download MySQL JDBC driver mysql-connector-java-x.y.z-bin.jar and put it in /hive/lib/.

   9.3 Add the following properties to hive-site.xml. Specify the MySQL URL, MySql Driver, username 

       and password. Don't use root account, as it cant access db remotely.

   

 <property>

  <name>javax.jdo.option.ConnectionURL</name>

  <value>jdbc:mysql://192.168.10.10/metastore_db?createDatabaseIfNotExist=true</value>

  <description>JDBC connect string for a JDBC metastore</description>

 </property>

 <property>

  <name>javax.jdo.option.ConnectionDriverName</name>

  <value>com.mysql.jdbc.Driver</value>

  <description>Driver class name for a JDBC metastore</description>

 </property>

 <property>

  <name>javax.jdo.option.ConnectionUserName</name>

  <value>hive</value>

 </property>

 <property>

  <name>javax.jdo.option.ConnectionPassword</name>

  <value>hive</value>

 </property>

 <property>

  <name>hive.metastore.warehouse.dir</name>

  <value>/user/hive/warehouse</value>

  <description>location of default database for the warehouse</description>

 </property>

10. From MySQL console, check if metastore_db has been created and modify the character setting.

    >show databases;

    >alter database metastore_db character set latin1;

    

    Sometimes the default character setting can cause an exception:

     Error in metadata: MetaException(message:Got exception: 

     org.apache.hadoop.hive.metastore.api.MetaException javax.jdo.JDODataStoreException: 

     An exception was thrown while adding/validating class(es) : 

     Specified key was too long; max key length is 767 bytes

11. Test create table, load, select, etc. from Hive.