A Step-by-Step Guide on How to Practice e-Commerce SQL Queries
A Step-by-Step Guide on How to Practice e-Commerce SQL Queries
And How to Think Like an e_Commerce Analyst
Let's begin with the help of ChatGPT 4o to create datasets.
The beauty of having AI help create the datasets is…
1) you can develop creative data scenarios and not have to browse endlessly for a dataset that matches your needs.
2) each time you have AI create a dataset it will be different by default (new values). Plus, you can add parameters such as “2% of the data will be NULL or inconsistent” which forces you to do data cleaning.
AI PROMPT:
Please provide an SQL table creation script that creates and populates a table that has the following columns: session_id, marketing_channel, and purchase_value. Please populate the table with 200 rows where 20% of the purchase values are 0.00
-- Create the Play table
CREATE TABLE EComm_ATT_TAB (
session_id INT PRIMARY KEY,
marketing_channel VARCHAR(50),
purchase_value DECIMAL(10, 2)
);
-- Populate the Play table with 200 sample data where 20% of the purchase values are 0.00
INSERT INTO EComm_ATT_TAB (session_id, marketing_channel, purchase_value) VALUES
(1, 'Email', 120.50),
(2, 'Social Media', 0.00),
(3, 'Email', 78.75),
(4, 'Organic Search', 0.00),
(5, 'Paid Search', 150.25),
(6, 'Referral', 0.00),
(7, 'Email', 110.10),
(8, 'Social Media', 0.00),
(9, 'Paid Search', 175.00),
(10, 'Organic Search', 125.75),
(11, 'Email', 95.50),
This continues through 200 inserts…
AI PROMPT:
Please provide an SQL table creation script that creates and populates a table that has the following columns: session_id, ad_click_timestamp, and user_id. Please populate the table with 200 rows where the session_id values are 1 through 200 and there are 25 unique user_id values and the ad_click_timestamp column is populated with 200 unique datetime values spanning 1 24-hour period
-- Create the Play table
CREATE TABLE EComm_SESSIONS (
session_id INT PRIMARY KEY,
ad_click_timestamp DATETIME,
user_id INT
);
-- Populate the Play table with 200 rows
-- Generate unique user_id values and unique datetime values spanning a 24-hour period
-- Initialize variables
DECLARE @start_datetime DATETIME = '2023-01-01 00:00:00'; -- starting datetime
DECLARE @interval INT = 432; -- interval in seconds to ensure uniqueness within 24 hours
-- Loop to insert 200 rows
DECLARE @i INT = 1;
WHILE @i <= 200
BEGIN
INSERT INTO EComm_SESSIONS (session_id, ad_click_timestamp, user_id)
VALUES (
@i,
DATEADD(SECOND, (@i - 1) * @interval, @start_datetime), -- unique datetime
((@i - 1) % 25) + 1 -- 25 unique user_id values (1 through 25)
);
SET @i = @i + 1;
END;
-- Verify the data
SELECT * FROM EComm_SESSIONS;
Create/Import the tables into your SQL environment and let's start learning the data.
What is the average purchase value by marketing channel?
select a.marketing_channel, AVG(a.purchase_value)
from [dbo].[EComm_ATT_TAB] a
group by a.marketing_channel
order by AVG(a.purchase_value)DESC;
What % of link clicks convert to a purchase for each marketing channel?
SELECT
marketing_channel,
AVG(purchase_value) AS average_purchase_value,
AVG(CAST((CASE WHEN purchase_value > 0 THEN 1 ELSE 0 END) AS FLOAT)) AS conversion_rate
FROM
[dbo].[EComm_ATT_TAB]
GROUP BY
marketing_channel
ORDER BY
conversion_rate DESC;
What is our overall conversion rate?
SELECT
AVG(CAST((CASE WHEN purchase_value > 0 THEN 1 ELSE 0 END) AS FLOAT)) AS conversion_rate
FROM
[dbo].[EComm_ATT_TAB];
What is the customer lifetime value (CLV) for each user?
select a.[user_id], SUM(b.purchase_value) as CLV
from [dbo].[EComm_SESSIONS] a,
[dbo].[EComm_ATT_TAB] b
where a.[session_id] = b.[session_id]
--and b.purchase_value > 0
GROUP BY a.[user_id]
ORDER BY CLV DESC;
How can I check my results to make sure they are correct?
Just because an SQL runs smoothly does not mean it is producing the results you want. You don't want to be called out on the numbers because they don't make sense. Always go into a presentation expecting someone to ask that you explain how you obtained the information.
Let's see if we can validate the numbers.
I double check my sum of sales:
select
SUM(b.purchase_value) as CLV
from [dbo].[EComm_SESSIONS] a
INNER JOIN
[dbo].[EComm_ATT_TAB] b
ON a.[session_id] = b.[session_id]
;
The result was: 26942.50
I took the results from my previous query and put them in a spreadsheet to calculate the total sales and the result was an exact match.
The question we want to answer is, “How do we connect with our best customers? Therefore, into which marketing channel should we invest the most money and time?
Do we know the answers based on what we know so far?
What does “best customers” mean? Should we place more value upon those who loyally make small purchases or those who occasionally make large purchases? How can we simultaneously value and invest in customers who are valuable for various reasons and not find ourselves ignoring those who are our true bread-and-butter?
This is what we know so far:
Organic Search leads in average CLV and leads in the highest overall income.
Paid Search is strong in all categories but leads in none
Email is how we reach the most people for the first time and they are a close second for overall income but is 4th place for Average CLV
Referrals are our weakest first touch, but their spending is in 3rd place
Social Media first-contacts spend the least on average but bring in a 5th of our customers, equal to organic and paid.
This dataset is by chance, very equally distributed across the marketing channels, but by being that way, it reveals a problem:
It can be difficult to define your "best customer" and also difficult to determine where to invest.
Let's determine behavioral averages and see if we can build the following set of customer value categories:
High CLV + High visits = VERY_HIGH
High CLV only = HIGH_CLV
HIGH Visits only = HIGH_Clicks
You may also want to give attention to the “slow and steady” customers who are almost average in value and clicks = Moderate_Value
They may have the potential for higher engagement.
Everyone else is = LOW_Value