How To Install Hortonworks HDP 2.1

1 Create VPC

> AWS > VPC Dashboard > 'Your VPCs' > Create VPC
    Name Tag: AWS-HDPCluster-VPC1
    CIDR Block: 10.0.0.0/16
    Tenancy: Default
> Create
> Summary > Edit > DNS Hostnames = yes > Save
> VPC Dashboard > Subnets > Create Subnet
    Name Tag: AWS-HDPCluster-Subnet1
    VPC: AWS-HDPCluster-VPC1
    Availability Zone: No Preference
    CIDR Block: 10.0.0.0/24
> VPC Dashboard > Internet Gateways > Create Internet Gateway
    Name Tag: AWS-HDPCluster-InternetGateway1
> Attach to VPC > AWS-HDPCluster-VPC1
> VPC Dashboard > Route Tables 
        > Select Route Table attached to VPC: AWS-HDPCluster-VPC1 > Routes Tab > Edit > add route:
        Destination: 0.0.0.0/0
        Target: igw-1b21cf7e (internet gateway sample, yours will pop-up when you click on the text box)
> Save
> Subnet Associations > Edit > Select Subnet > Save
> Tags > Edit
Name: AWS-HDPCluster-RouteTable1
> VPC Dashboard > Elastic IPs > Allocate New Address
Network Platform: EC2-VPC
> Create 4 elastic IP addresses (costs: $.005/hour/IP when Elastic IP is NOT tied to a RUNNING instance. Works out to about $2-4 bucks per month per             Elastic IP, about $12 for our cluster per month if we never use it. When its tied to a RUNNING instance, there is no Elastic IP cost)


2 Create Instances

> AWS > EC2 Dashboard > Launch Instance
Red Hat Enterprise Linux 6.4 (PV) - ami-a25415cb (64-bit)
Instance Type, General Purpose: m1.large (2 vCPU, 7.5GB Mem) (costs: $.18/hour/RUNNING instance for our 4 node cluster thats $.76/hour, NOT BAD!. But         with our method, you can start/stop the cluster at will. Storage still costs for non-running instances)
> Configure Details
Number of Instances: 1 (for this go around we create the instance for the NameNode, we will add 3 more medium instances in a moment)
Network: vpc, AWS-HDPCluster-VPC1
Subnet: subnet, AWS-HDPCluster-Subnet1
> storage: Gateway=30GB , NameNode=20GB , DataNode1=100GB , DataNode2=100GB costs at $.05/GB-Month. Total for all instances is 150GB, meaning             about $7.50/month on storage unless you increase)
> Tag Instance: 
Name: AWS-HDPCluster-EC2_4N
> Security Group > Create New Security Group
Name: AWS-HDPCluster-SecurityGroup1
> Remove SSH Rule
> Add Rule > All TCP > to My IP
> Review and Launch > Launch > Create new key pair
Key pair name: AWS-HDPCluster-Keypair1
> Download Keypair
> Launch Instances
        > REPEAT step above again with (3 instances, General Purpose: m1.medium, which will be for for Hadoop DataNodes and Ambari Server)
> AWS > EC2 Dashboard > Elastic IPs
> Select 1st Elastic IP > Associate Address > Instance 1
> Repeat for remaining 3 instances (example):
54.85.64.59
54.86.23.178
54.86.24.12
54.85.215.201
> AWS > EC2 Dashboard > Security Group
> Select AWS-HDPClusterSecurityGroup1 > Inboud Tab > Edit > Add 'ALL TCP' rule for each elastic ip > Add 'ALL TCP' for 10.0.0.0/24
> AWS > EC2 Dashboard > Instances
> Select first instances > Tags > Add/Edit > Modify Name and replace end with node type: Gateway
> Repeat for next 3 nodes > NameNode , DataNode1 , DataNode2


3 Connect to Instance/Gateway

> Puttygen Utility > Conversions > 
    Import Key: AWS-HDPCluster-Keypair1.pem > Save Private Key: AWS-HDPCluster-putty.ppk
> Putty Utility
> Sessions
Host Name: (select 1 instance as the 'gateway' always connect to this instance for admin/maintenance activities)
Host Name: 54.85.215.201
> Connections > Data
Auto-login username: ec2-user
> Connections > SSH > Auth
Private key file for authentication: C:\..\AWS-HDPCluster-putty.ppk
> Sessions
Saved Sessions: AWS-HDPCluster-Gateway1
> Save
> Open


4 Configure Instances

> Setup Password-less SSH , transfer keypair to gateway server
> pscp -i AWS-HDPCluster-putty.ppk AWS-HDPCluster-Keypair1.pem ec2-user@elastic-ip-of-gateway:/home/ec2-user/.ssh/id_rsa
> chmod 700 /home/ec2-user/.ssh ; chmod 640 /home/ec2-user/.ssh/authorized_keys ; chmod 600 /home/ec2-user/.ssh/id_rsa ; chmod 600                         /home/ec2-user/.ssh/config
> Create hosts file
> EC2 Dashboard > Elastic IPs > Private IP Address
> Create hosts file
> ip: private-ip , fqdn: private-ip.ec2.internal , alias: gateway/namenode/datanode1/datanode2
> Hosts file should look like the following (with your own list of private ips)
10.0.0.35       ip-10-0-0-35.ec2.internal       gateway
10.0.0.34       ip-10-0-0-34.ec2.internal       namenode
10.0.0.36       ip-10-0-0-36.ec2.internal       datanode1
10.0.0.37       ip-10-0-0-37.ec2.internal       datanode2
> #sudo vi /etc/hosts > append 4 lines of cluster information into hosts file > save and exit
> #scp /etc/hosts namenode:/home/ec2-user/hosts ; scp /etc/hosts datanode1:/home/ec2-user/hosts ; scp /etc/hosts datanode2:/home/ec2-                    user/hosts
> ssh namenode , datanode1 , datanode2 > #sudo mv hosts /etc/hosts
> Get repository files
  > #wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.5.0/ambari.repo 
> #sudo mv ambari.repo /etc/yum.repos.d/
> turn process on/off - ntp , iptables , selinux
> #sudo service ntpd start ; sudo chkconfig ntpd on ; sudo service iptables stop ; sudo chkconfig iptables off ; sudo service ip6tables stop ; sudo                         chkconfig ip6tables off
  > #sudo vi /etc/selinux/config > change to: SELINUX=disabled > save and exit
> repeat above 2 steps on remaining nodes (namenode, datanode1, datanode2) , before exiting nodes issue > sudo reboot
> #sudo reboot (on gateway node)
--wait a few minutes , reconnect
  > #sudo service ntpd status ; sudo service iptables status ; sudo service ip6tables status ; sudo sestatus
> repeat above 2 steps on remaining nodes (namenode, datanode1, datanode2) 
> Download and install software
> #sudo yum install -y epel-release
> #sudo yum install -y ambari-server
> #rpm -qa | grep libgenders > if exists, delete > #sudo yum erase -y libgenders
> #sudo ambari-server setup > press (enter) to all for defaults
> #sudo ambari-server start

5 Install Hadoop

> General Setup
> Navigate to web browser to: http://elastic-ip-of-gateway:8080
> user: admin , password: admin > Login
> Name cluster: HDP2_4N_Cluster > Next
> Choose HDP 2.0 > Next
> Install Options
Target Hosts: use all nodes FQDN 
SSH Private Key: AWS-HDPCluster-Keypair1.pem
SSH User: ec2-user
> Go through checks (success) > Choose all services (default)
> Assign Masters:
gateway: SNameNode, Ganglia Server, Nagios Server
namenode: NameNode, History Server, ResourceManager, HiveServer2, Hive Metastore, WebHCat Server, HBase Master, Oozie Server, Zookeeper
datanode1: ZooKeeper
datanode2: ZooKeeper
> Assign Slaves > DataNode1 + DataNode2 , check all boxes: DataNode, NodeManager, RegionServer, Client
> Customize Services
> Advanced Drop-Down , Set Block Replication = 2 (this is because we have only 2 data nodes)
> input remaining passwords, emails > install/deploy (takes about 20-30 minutes)
> Services tab > Run some service checks (HDFS, MapReduce, HBase) to verify system
> Shut down cluster
> Services < Action > Stop All
> putty (gateway node) > #sudo ambari-server stop > #sudo ambari-agent stop
> AWS > EC2 Dashboard > Instances > Select, Stop all

All DONE!

Totals: $7.50 for storage at 150GB/month , $12.00 for 4 elastic IPs/month , then $.76 for 4 nodes/hour RUNNING.
Comments