- The Google MapReduce framework is implemented in C++ with interfaces in Python and Java.
- The Hadoop project is a free open source Java MapReduce implementation.
- Twister is an open source Java MapReduce implementation that supports iterative MapReduce computations efficiently.
- Greenplum is a commercial MapReduce implementation, with support for Python, Perl, SQL and other languages.
- Aster Data Systems nCluster In-Database MapReduce supports Java, C, C++, Perl, and Python algorithms integrated into ANSI SQL.
- GridGain is a free open source Java MapReduce implementation.
- Phoenix is a shared-memory implementation of MapReduce implemented in C.
- FileMap is an open version of the framework that operates on files using existing file-processing tools rather than tuples.
- MapReduce has also been implemented for the Cell Broadband Engine, also in C.
- Mars:MapReduce has been implemented on NVIDIA GPUs (Graphics Processors) using CUDA.
- Qt Concurrent is a simplified version of the framework, implemented in C++, used for distributing a task between multiple processor cores.
- CouchDB uses a MapReduce framework for defining views over distributed documents and is implemented in Erlang.
- Skynet is an open source Ruby implementation of Google’s MapReduce framework
- Disco is an open source MapReduce implementation by Nokia. Its core is written in Erlang and jobs are normally written in Python.
- Misco is an open source MapReduce designed for mobile devices and is implemented in Python.
- Qizmt is an open source MapReduce framework from MySpace written in C#.
- The open-source Hive framework from Facebook (which provides an SQL-like language over files, layered on the open-source Hadoop MapReduce engine.)
- The Holumbus Framework: Distributed computing with MapReduce in Haskell Holumbus-MapReduce
- BashReduce: MapReduce written as a Bash script written by Erik Frey of Last.fm
- MapReduce for Go
- MongoDB is a scalable, high-performance, open source, schema-free, document-oriented database. Written in C++ that features MapReduce
- mapReduce provides R-like implementation that demostrates the simplicity of the mapReduce pattern in a functional programming language
- RHIPE integrates the R statistics language environment with Hadoop and makes it possible to code map-reduce algorithms in R.
- Parallel::MapReduce is a CPAN module providing experimental MapReduce functionality for Perl.
- MapReduce on volunteer computing
- Secure MapReduce
- MapReduce implemented in MPI
- MapReduce with MPI implementation from Sandia: No fault tolerance or data redundancy
- MapReduce implementation using MPI from IU
- T. Tu, etc. A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories. In SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1–12, Piscataway, NJ, USA, 2008.
- it described how MapReduce could be implemented on top of the ubiquitous distributed-memory MPI, and how the intermediate data-shuffle operation is conceptually identical to the familiar MPI Alltoall operation.
Microsoft data parallel programming
Mortar: Wide-Scale Stream ProcessingPACTFrenetic: a network programming languagePADS: processing ad hoc data sources
- Hadoop Common: The common utilities that support the other Hadoop subprojects.
- Avro: A data serialization system that provides dynamic integration with scripting languages.
- Chukwa: A data collection system for monitoring large distributed systems.
- HBase: A scalable, distributed database that supports structured data storage for large tables.
- HDFS: A distributed file system that provides high throughput access to application data.
- Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
- MapReduce: A software framework for distributed processing of large data sets on compute clusters.
- Pig: A high-level data-flow language and execution framework for parallel computation.
- ZooKeeper: A high-performance coordination service for distributed applications.
- Hama: a distributed scientific package on Hadoop for massive matrix and graph data
- Oozie: Hadoop workflow
- Sqoop: import data from relational databases into Hadoop
- BigTable [Paper][Video]
- Cascading is a Data Processing API, Process Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on an Apache Hadoop cluster.
- HAMAKE: data flow instructions: fold and foreach
- MRSim: a Hadoop simulator
- Mumak: Map-Reduce Simulator, ppt
- MRPerf: A Simulator for MapReduce
- myHadoop: Hadoop on HPC clusters
- Hadoop on Demand
- Hadoop online Prototype, paper
- Hadoop workflow survey
SawzallA list of Key-Value stores
Data intensive workflow
- Pwrake and G-Farm