Big Data

Big Data Characteristics:

    • Volume

    • Velocity

    • Variety

    • Variability

Use Cases

  • Audit Trail from e-commerce site - 10GB per day - 3 years archive for analysis purposes

    • Audit DB(Orale/SQLServer) -> Sqoop -> HDFS & Hive

  • Server Log Accumulation - 200 load balanced web servers - 40GB per day

    • Web Server Logs Flume Agents -> Avro -> Flume -> HDFS & Hive

  • IT Operations Analytics - compare user browsing and server performance metrics in 5 min interval

    • HDFS & Hive (Audit & Performance Log) -> Pig/Spark (Summary/Joins) , MySQL, 3rd party visualization tools

  • Customer 360 - Summary and Transactions (Interactions)

    • Sources (Ecomm/Order Proc/ Support) -> Spark / Kafka -> Spark -> Cassandra (Summary) / Elastic Search (Interactions)

  • Customer Analytics - Access for Exploratory Analysis - Visualization and Reports - Data Science - REST APIs

    • Customer 360 (Cassandra / Elastic Search) -> Visualization Tools (Kibana) -> Custom REST API (Spring Framework(Java), Swagger, Django(Python))