Vrinda Store Analysis
Using PostgreSQL
In this project, we analyze the Vrinda store data to create an annual sales report for 2022. So that, Vrinda can understand their customers and grow more sales in 2023.
ABOUT DATA
The Product Category column contains 8 different categories of items sold by Vrinda Store, including Kurta, Set, Western Dress, Top, Saree, Blouse, Ethnic Dress, and Bottom. The Sales Channel column contains 7 different channels used by Vrinda Store to sell their products, including Myntra, Ajio, Amazon, Flipkart, Meesho, Nalli, and Others.
The data also includes shipping details such as City and State, with 1860 different cities and 36 different states listed. This information could be used to analyze Vrinda Store's sales patterns and customer demographics, as well as identify which products and sales channels are the most popular. It could also help in identifying areas where the store could potentially expand its operations or improve its shipping logistics.
DATA CLEANING
Using Excel
Find null and blank cell in each column
Check data consistency in column gender and quntity
Step by step data cleaning process - Watch the following Video.
DATA PROCESSING
Create a new column as Age_Group by using the Age column and nested If formula
Similarly, create new column months by using the date column and TEXT formula
Important Tips :
If we have a lot of data and we create new columns by using different formulas, then excel will work slowly so we can copy, and use paste special function and paste as only value in original column.
Step by step data pre-processing - Watch the following Video.
ER DIAGRAM CREATION
How to Create separate Data for each table using original Vrinda store data, please watch following Videos.
After Cleaning and Preprocessing, Our data is ready for analysis.
But to reduce redundancy and for easy analysis, we make separate tables from the original data
like Customers, Products, Orders, and Order Details that are given in the above ER diagram Image.
Then Next Load data into each table using Postgresql(pgadmin4).
DATA ANALYSIS
Using PostgreSQL
1) Total sales and Total orders for each date.
SELECT o.date, SUM(od.amount) AS total_sales, COUNT(od.order_id) AS total_orders: This part of the query selects the columns to be included in the final result set. It selects the "date" column from the "orders" table, and calculates the total sales by summing the "amount" column from the "order_details" table, and the total orders by counting the number of rows in the "order_details" table for each date.
FROM orders o JOIN order_details od ON o.order_id = od.order_id: This part of the query joins the "orders" table with the "order_details" table using the "order_id" column as the join condition.
GROUP BY o.date: This part of the query groups the results by the "date" column from the "orders" table.
ORDER BY total_sales DESC, total_orders DESC: This part of the query orders the results by the total sales and total orders columns in descending order.
Overall, the query will return a result set that shows the total sales and total orders for each date, sorted by the total sales and total orders in descending order.
This is a SQL query that is similar to the previous one, but it adds a "LIMIT" clause at the end to return only the first row of the result set, i.e., the date with the highest total sales and total orders.
2) Total sales and total number of orders for each gender.
SELECT c.gender AS Gender, SUM(od.amount) AS Amount, COUNT(od.order_id): This part of the query selects the columns to be included in the final result set. It selects the "gender" column from the "customers" table, and calculates the total sales by summing the "amount" column from the "order_details" table, and the total number of orders by counting the number of rows in the "order_details" table for each gender. It also aliases the "gender" column as "Gender".
FROM customers c JOIN order_details od ON c.cust_id = od.cust_id: This part of the query joins the "customers" table with the "order_details" table using the "cust_id" column as the join condition.
GROUP BY gender: This part of the query groups the results by the "gender" column from the "customers" table.
ORDER BY Amount DESC: This part of the query orders the results by the total sales column in descending order.
Overall, the query will return a result set that shows the total sales and total number of orders for each gender, sorted by the total sales in descending order.
3) Total number of orders for each order status
SELECT o.status, COUNT(od.order_id) AS Orders: This part of the query selects the columns to be included in the final result set. It selects the "status" column from the "orders" table, and counts the number of rows in the "order_details" table for each status. It also aliases the "COUNT(od.order_id)" column as "Orders".
FROM orders AS o JOIN order_details AS od ON o.order_id = od.order_id: This part of the query joins the "orders" table with the "order_details" table using the "order_id" column as the join condition. It also aliases the tables as "o" and "od", respectively.
GROUP BY status: This part of the query groups the results by the "status" column from the "orders" table.
Overall, the query will return a result set that shows the total number of orders for each order status.
4) Total number of orders and total sales for each state
SELECT o.state, COUNT(od.order_id) AS order_count, SUM(od.amount) AS Total_sales: This part of the query selects the columns to be included in the final result set. It selects the "state" column from the "orders" table, counts the number of rows in the "order_details" table for each state, and calculates the total sales by summing the "amount" column from the "order_details" table.
FROM orders o JOIN order_details od ON o.order_id = od.order_id: This part of the query joins the "orders" table with the "order_details" table using the "order_id" column as the join condition.
GROUP BY state: This part of the query groups the results by the "state" column from the "orders" table.
ORDER BY Total_sales DESC: This part of the query orders the results by the total sales column in descending order.
LIMIT 10: This part of the query limits the results to the top 10 states with the highest total sales.
Overall, the query will return a result set that shows the total number of orders and total sales for each state, sorted by the total sales in descending order, and limited to the top 10 states.
5) Total sales amount, and age for each age group
SELECT COUNT(gender) AS gender_count, age, SUM(od.amount) AS sales_amount: This part of the query selects the columns to be included in the final result set. It counts the number of rows for each gender in the "customers" table, selects the "age" column from the "customers" table, and calculates the total sales by summing the "amount" column from the "order_details" table.
FROM customers c JOIN order_details od ON c.cust_id = od.cust_id: This part of the query joins the "customers" table with the "order_details" table using the "cust_id" column as the join condition.
GROUP BY age: This part of the query groups the results by the "age" column from the "customers" table.
ORDER BY sales_amount DESC: This part of the query orders the results by the sales amount in descending order.
Overall, the query will return a result set that shows the total number of customers, total sales amount, and age for each age group, sorted by the sales amount in descending order.
6) Total sales for each channel
SELECT o.channel, SUM(od.amount) AS total_sales: This part of the query selects the columns to be included in the final result set. It selects the "channel" column from the "orders" table and calculates the total sales by summing the "amount" column from the "order_details" table.
FROM orders o JOIN order_details od ON o.order_id = od.order_id: This part of the query joins the "orders" table with the "order_details" table using the "order_id" column as the join condition.
GROUP BY channel: This part of the query groups the results by the "channel" column from the "orders" table.
ORDER BY total_sales DESC: This part of the query orders the results by the total sales in descending order.
Overall, the query will return a result set that shows the total sales for each channel, sorted by the total sales in descending order.
7) Total sales amount and order count for each product category
SELECT p.category, SUM(od.amount) AS amount, COUNT(od.order_id) AS order_count: This part of the query selects the columns to be included in the final result set. It selects the "category" column from the "products" table, calculates the total sales amount by summing the "amount" column from the "order_details" table, and calculates the order count by counting the "order_id" column from the "order_details" table.
FROM products p JOIN order_details od ON p.product_id = od.product_id: This part of the query joins the "products" table with the "order_details" table using the "product_id" column as the join condition.
GROUP BY category: This part of the query groups the results by the "category" column from the "products" table.
ORDER BY amount DESC: This part of the query orders the results by the sales amount in descending order.
Overall, the query will return a result set that shows the total sales amount and order count for each product category, sorted by the sales amount in descending order.
8) Total sales for the top-selling product in each state
SELECT state, product_id, SUM(amount) AS total_sales: This part of the query selects the columns to be included in the final result set. It selects the "state" and "product_id" columns from the subquery and calculates the total sales by summing the "amount" column from the subquery.
FROM (SELECT ...) sub: This part of the query creates a subquery that selects the state, product ID, and amount columns from the "orders" and "order_details" tables and ranks the sales of each product by state using the "ROW_NUMBER()" function. The subquery is then treated as a virtual table called "sub" that the outer query can reference.
WHERE rn = 1: This part of the query filters the subquery results to only include the top-selling product in each state, which is determined by the product with the highest sales amount (ranked as 1 by the "ROW_NUMBER()" function).
GROUP BY state, product_id: This part of the query groups the results by the "state" and "product_id" columns.
Overall, the query will return a result set that shows the total sales for the top-selling product in each state.
9) Average order amount for each age group
SELECT ...: This part of the query selects the columns to be included in the final result set. It uses a CASE statement to group customers into age ranges and assigns them to corresponding age groups.
AVG(od.amount) AS avg_amount: This part of the query calculates the average order amount for each age group and renames the result column as "avg_amount".
FROM orders o JOIN order_details od ON o.order_id = od.order_id JOIN customers c ON od.cust_id = c.cust_id: This part of the query joins the "orders", "order_details", and "customers" tables on the "order_id" and "cust_id" columns to retrieve the necessary data.
GROUP BY age_group: This part of the query groups the results by the "age_group" column.
Overall, the query will return a result set that shows the average order amount for each age group. The customers are categorized into age ranges using a CASE statement, and the average order amount is calculated and grouped by the corresponding age group.
10) Total number of orders for each combination of channel and whether the order is a B2B order
The SELECT statement selects two columns from the orders table (channel and b2b) and the count of all rows (i.e., the total number of orders) with the alias total_orders.
The FROM clause specifies that we want to join the orders table with the order_details table on the order_id column, so that we can get information about the order and its details.
The GROUP BY clause groups the results by the channel and b2b columns. This means that the results will be grouped into sets where each set has the same channel and b2b values.
Finally, the COUNT(*) function counts the number of rows in each group, giving us the total number of orders for each combination of channel and b2b.
11) Correlation between Age and orders
This query calculates the correlation coefficient between the "amount" column in the "order_details" table and the "age" column in the "customers" table. The correlation coefficient measures the strength and direction of the linear relationship between two variables, and takes a value between -1 and 1. A value of 0 indicates no linear correlation, while a value of 1 or -1 indicates a perfect positive or negative linear correlation, respectively.
The query achieves this by joining the two tables on the "cust_id" column, and then using the "corr" function to calculate the correlation coefficient between the "amount" and "age" columns.
12) Most Popular Product size
This query is selecting the "size" column from the "order_details" table, and then groups the rows by the distinct values of "size". It then counts the number of rows in each group using the COUNT function, and renames the count column as "size_count". Finally, it orders the result set in descending order based on the count of each group and limits the result set to the first row using the LIMIT keyword.
Essentially, the query is trying to find the most commonly ordered size by counting the number of orders for each size and returning the size with the highest count.
13) Most Popular Product size by age group
This query groups orders by the customer's age group and the size of the product ordered. The CASE statement is used to create the age groups based on the customer's age. The result includes three columns: age_group, size, and size_count. The age_group column represents the customer's age group, the size column represents the size of the product ordered, and the size_count column represents the number of products ordered with that size by customers in that age group. The results are ordered by size_count in descending order.
REPORT
Create a new sheet for the report section.
Copy and paste all pivot charts from other sheets to the report sheet.
Add slicers Months, Channel, Category, and Connect to all pivot charts.
Arrange all pivot charts in a suitable format.
Vrinda Store Workbook Here Click Please to view.
INSIGHTS
Top sales in march month.
The women's purchase rate is high.
The delivered Status is 92%.
The top 3 states are Maharashtra, Karnataka, and Uttar Pradesh.
The top purchased category is "Set".
Most sales in women, in each category.
Maximum sales by amazon are 35%.
CONCLUSION
Target women customers of age group (30-49 yrs) living in Maharashtra, Karnataka and Uttar Pradesh by showing ads/offers/coupons available on Amazon, Flipkart and Myntra.
Vrinda Store Analysis PPT
In this PPT, we analyze the Vrinda store data annual sales report for 2022.
Download Material
How to download and use material :
1) Go to the following link.
2) Download and Open Vrinda_Store_Tables_Create.sql file in Postgresql and Run it.
3) Open the data folder download each table data then Add data in each table by using Import/Export option in Postgresql.
4) Then we perform analysis. (for helping , use Vrinda Analysis Using Postgresql.sql file)