While learning to install Hadoop I came across several obstacles that prevented me from completing the installation. It would have been helpful to have someone knowledgeable review my configuration to see if it was really a user error or something else. No matter how closely one follows the documentation, there are still hundreds (if not thousands) of variations in Linux in terms of configurations, permissions, settings...

So as an opportunity to master the material myself, as well as share in my learning,  I have developed the following content after weeks of research and a few successful installations: a full-length video covering the installation of a 4 node Hadoop cluster using EC2 infrastructure. Hopefully by providing this video there will be little room for interpretation on what the settings or path to installation looks like. 

In addition to the in-depth video, you will find all of the supporting files, configurations and commands, and reference materials and links that were used to deploy the cluster. One of the most important factors is the deployment takes place on EC2 which allows repeatable environments that can be deployed for cents on the dollar (the 4 nodes here cost around 32 cents per hour). Couple that with a few of the best open-source applications and operating systems (namely Hadoop through Hortonworks, and Linux CentOS) and you are left with a quickly deployed, low-cost solution, that can serve as a test & development environment on demand, with little time, risk, or investment.

  • How To Install HDP 2.1 (Updated 4-16-2014)
    • Create a restartable 4 node Hadoop HDP 2 (YARN) cluster with network security on Amazon (AWS)

  • How To Install HDP 1.2.0
    • Create and install a 4 Node Hortonworks HDP (Apache Hadoop) cluster using Amazon EC2 in about an hour for about $1.
  • How To Install HDP 1.1
    • Create and install a 4 Node Hortonworks HDP (Apache Hadoop) cluster using Amazon EC2 in about an hour for about $1.
  • Enterprise Data Management
    • Analysis of the 2013 data management landscape. Covering solutions from relational, NoSQL, 'Big Data', and beyond.

Course on Hadoop!