Post date: Sep 20, 2014 7:54:12 AM
Hadoop HDFS, a distributed file system, is the underpinnings of many Big Data solutions.
Hadoop MapReduce/YARN, provides a distributed data processing engine and API to implement flow based transform functions on data files stored in Hadoop HDFS
Many tools have been developed to make using MapReduce/YARN easier, such as Pig(Pig Latin), Hive(SQL subset).
Other tools have bypassed MapReduce/YARN altogether, such as HBASE and Cassandra.
A new wave of tools is addressing the HDFS latency issues by moving the data into memory.
Tachyon provides a caching layer for Hadoop HDFS, improving access latency of HDFS.
Spark provides an API to a data processing engine that uses data caching to speed up processing.
Spark provides further extensions such as stream processing.
Shark sits on top of Spark and provides a SQL interface to the Spark API.
Storm is a similar tool to Spark but uses data flow techniques
http://lambda-architecture.net/