Big Data Frameworks

BDF Syllabus

Big Data Projects

Apache Hadoop Ecosystem slides-fr

eng. Apache Hadoop MapReduce Framework

+ MR Labs

+ Apache Pig Latin lab: tpch files, Q16 pig script

Stream Processing slides-eng

Apache Spark slides-eng

+ Spark/Java workflows

+ Analytics of Chicago Crime Dataset with Scala programming language and Zeppelin notebook

+ Analytics of NYC Cabs dataset with PySpark programming language and Zeppelin notebook

+ Stream processing lab with Java programming language

AWS Educate slides

AWS EMR Spark aws-cli s3 ec2Keypair emr emr-sparksql emr-spark-workflow emr-graphframes

AWS EMR Hadoop slides

Bulk Synchronous Parallelism

Pregel Apache Giraph