Big Data
Big Data Characteristics:
Volume
Velocity
Variety
Variability
Use Cases
Audit Trail from e-commerce site - 10GB per day - 3 years archive for analysis purposes
Audit DB(Orale/SQLServer) -> Sqoop -> HDFS & Hive
Server Log Accumulation - 200 load balanced web servers - 40GB per day
Web Server Logs Flume Agents -> Avro -> Flume -> HDFS & Hive
IT Operations Analytics - compare user browsing and server performance metrics in 5 min interval
HDFS & Hive (Audit & Performance Log) -> Pig/Spark (Summary/Joins) , MySQL, 3rd party visualization tools
Customer 360 - Summary and Transactions (Interactions)
Sources (Ecomm/Order Proc/ Support) -> Spark / Kafka -> Spark -> Cassandra (Summary) / Elastic Search (Interactions)
Customer Analytics - Access for Exploratory Analysis - Visualization and Reports - Data Science - REST APIs
Customer 360 (Cassandra / Elastic Search) -> Visualization Tools (Kibana) -> Custom REST API (Spring Framework(Java), Swagger, Django(Python))