This page explains the how to download, install, and set up Apache Pig in your system.
Prerequisites:
It is essential that you have Hadoop and Java installed on your system before you go for Apache Pig.
Step 1: Download Apache Pig
Download the latest version of Apache Pig from the following website:
Then go to News --> Release Page --> Download --> Download a release now.
Select any one mirror websites like: http://www-eu.apache.org/dist/pig
Then click on latest version of pig folder like : Pig-0.16.0/
Download the following two tar files :
i) pig-0.16.0-src.tar.gz
ii) pig-0.16.0.tar.gz
Step 2: Install Apache Pig
Login as a hduser using the following command
$su hduser
Create a directory for pig, where the hadoop installation directories are resided
$mkdir /usr/local/Pig
Extract tar file using following command
$tar zxvf pig-0.16.0.tar.gz
Move the extarcted files from current directory to Pig directory
$mv pig-0.16.0/* /usr/local/Pig
Step 3: Configure Apache Pig
After installing Apache Pig, we have to configure it. To configure, we need to edit two files − bashrc and pig.properties.
.bashrc file
In the .bashrc file, set the following variables −
PIG_HOME folder to the Apache Pig’s installation folder,
PATH environment variable to the bin folder, and
PIG_CLASSPATH environment variable to the etc (configuration) folder of your Hadoop installations (the directory that contains the core-site.xml, hdfs-site.xml and mapred-site.xml files).
$nano ~/.bashrc or vi ~/.bashrc
#add these line at the end of bashrc file
export PIG_HOME=/usr/local/Pig
export PATH=$PATH:/usr/local/Pig/bin
export PIG_CLASSPATH=/usr/local/hadoop/etc/hadoop
pig.properties file
In the conf folder of Pig, we have a file named pig.properties. In the pig.properties file, you can set various parameters as given below.
$pig -h properties
Step 4: Verification of Apache Pig Installation
$pig -version
Apache Pig version 0.16.0 (r1682971)
compiled Mar 08 2017, 11:44:35