Apache Pig Components
As shown in the figure, there are various components in the Apache Pig framework. Let us take a look at the major components.
Parser
- Initially the Pig Scripts are handled by the Parser. It checks the syntax of the script, does type checking, and other miscellaneous checks.
- The output of the parser will be a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators.
- In the DAG, the logical operators of the script are represented as the nodes and the data flows are represented as edges.
Optimizer
- The logical plan (DAG) is passed to the logical optimizer, which carries out the logical optimizations such as projection and pushdown.
Compiler
- The compiler compiles the optimized logical plan into a series of MapReduce jobs.
Execution engine
- Finally the MapReduce jobs are submitted to Hadoop in a sorted order. Finally, these MapReduce jobs are executed on Hadoop producing the desired results.
Execution modes:
Pig has two execution modes:
- Local mode : In this mode, Pig runs in a single JVM and makes use of local file system. This mode is suitable only for analysis of small data sets using Pig
- Map Reduce mode: In this mode, queries written in Pig Latin are translated into MapReduce jobs and are run on a Hadoop cluster (cluster may be pseudo or fully distributed). MapReduce mode with fully distributed cluster is useful of running Pig on large data sets.
How to execute a pig script in local mode?
Step 1:
$nano sample.txt
Hi
This is my first pig program
Step 2:
$nano smp.pig
A = LOAD 'sample.txt';
DUMP A;
Step 3:
$pig -x local smp.pig
Output: