This project aims to develop a system that efficiently processes high-volume supply chain data, with a primary focus on streaming order data, by utilizing an appropriate integration of advanced big data technologies. The system can deliver real-time key performance indicators, accurate demand forecasts, and comprehensive analytical insights, thereby enabling informed decision-making and operational optimization.
Data Ingestion: Transactional CSV data is loaded into a PostgreSQL database. Kafka serves as the message broker, enabling real-time data streaming.
Stream Processing: Apache Flink consumes events from Kafka, processes and enriches them (e.g., updating inventory, computing daily metrics), and writes results back to PostgreSQL.
Batch Forecasting: Apache Spark retrieves historical data from PostgreSQL and uses Facebook Prophet for time-series demand forecasting, triggered via a REST API.
Visualization Layer: A Streamlit dashboard presents real-time data streams, KPIs, and forecast insights, offering interactive control for analytics and model runs.
Deployment: All components are containerized with Docker, ensuring portability, scalability, and isolation across services.
Modular, microservice-based design for easy deployment and scaling
Data pipeline supports large and high-velocity event streams
REST API integration for forecasting on demand
Fully automated workflow: from raw data to high-level KPI dashboards
Real-time processing of supply chain data using Kafka and Flink
Predictive analytics with Apache Spark and Prophet
Scalable, containerized deployment via Docker Compose
Interactive dashboard built with Streamlit for real-time data visualization
Custom synthetic data generation for robust testing
Effective, real-time insights into inventory and demand for optimized supply chain decisions
Historical and live data visualization via an intuitive web dashboard
Accurate demand forecasting enabling proactive inventory management
Docker Compose: for container orchestration
Apache Kafka: for event streaming
Apache Flink: for stream processing
PostgreSQL: as the relational data store
Apache Spark & Prophet: for data preprocessing and ML forecasting.
Streamlit: for building the user dashboard
Head of AI at Afliant
Co-founder at Truebees
Software Developer at Afliant and Co-founder of Truebees srl