HackerRank SQL Problem
15 Days of Learning SQL
15 Days of Learning SQL
HackerRank provides competitive programming challenges that can be solved in various programming languages, including SQL. There are 58 challenges for SQL, and this '15 Days of Learning SQL' challenge is the last and the hardest one. Also, this problem required the broadest range of SQL skills I've learned so far, so it was a perfect task to showcase what I can perform now.
In my learning SQL, I mainly worked with Oracle, MySQL, and MS SQL Server, and I solved this problem with MS SQL Server.
Sample Output:
2016-03-01 4 20703 Angela
2016-03-02 2 79722 Michael
2016-03-03 2 20703 Angela
2016-03-04 2 20703 Angela
2016-03-05 1 36396 Frank
2016-03-06 1 20703 Angela
The following tables hold the contest data:
'Hackers' Table
'Submissions' Table
Let's remind ourselves of the information we need to extract from the database:
(1) The IDs and the names of the hackers who made the highest number of submissions each day
(2) The unique number of hackers for each day who never missed a submission to the day from day one
The good news is we don't deal with the 'Hackers' table right now. We can plug it in later with INNER JOIN using the common variable, 'hacker_id.'
I wish I could develop a single set of codes to get all the necessary information, but unfortunately, I can't see it right now. I will deal with (1) and (2) separately for now. So, let's break the problem into three parts: (1) and (2), and (3) combine these two.
The first thing I want is a table showing who made how many submissions on each day since we need to find the hacker_id and name of the hacker who made the highest number of submissions each day.
select submission_date, hacker_id, count(submission_id) submission_per_day
from Submissions
group by submission_date, hacker_id
order by submission_per_day;
The 'order by' clause was not necessary, but I added it because it would help make sense of the data. From these several lines of output, you can see that 84307 submitted four submissions on 2016-03-06, and 75635 submitted three on the same day. In this case, we only need 84307's record of four submissions for the date of 2016-03-06. If another four submissions were made on the same date by a different hacker whose ID is 80000, we would need this because "if more than one such hacker has a maximum number of submissions, print the lowest hacker_id."
Luckily, we have a way to find who recorded the highest number of submissions of a day and, at the same time, make sure to pick the lowest hacker_id if more than one hacker recorded the highest number of submissions for the day. It's the window function: rank()
select submission_date, hacker_id, submission_per_day,
rank() over(partition by submission_date order by submission_per_day desc, hacker_id) as daily_submission_rank
from (select submission_date, hacker_id, count(submission_id) as submission_per_day
from Submissions
group by submission_date, hacker_id) a;
Please look at the snapshots of the results below, in which I deliberately present only the uppermost parts of three-day records. You'll see hacker_ids who made the highest number of submissions get the rank number one (2016-03-01), and the rank number one goes to the lower ID if more than one hacker_ids made the same highest number of submissions (2016-03-02, 2016-03-03).
And this means I can filter those number ones all at once with where daily_submission_rank = 1 to get the result that is exactly required by the problem.
We need the IDs of hackers who submitted their work daily from day one to the last. This time, the information on an individual hacker's number of submissions is not essential. Instead, we need to know who made consecutive 15-day submissions. Let's first give each day a number, with the last day having the number of 15. We will need the dense_rank() function to ensure that '2016-03-02' has exactly the number '2' instead of '115' after 114 '2016-03-01's.
select submission_date, hacker_id,
dense_rank() over(order by submission_date) as dayCount
from Submissions;
How do we find the consecutive 15-day submitting hackers?
If I come up with a way to add up the number of consecutive submission days by a hacker from that day and the previous day based on a given date, and the number of submission days matches the number we just assigned to each day with dense_rank(), that'll be our list of hackers who made their submissions every day for this 15-day challenge.
select a.submission_date, a.hacker_id, a.dayCount,
case when a.submission_date = '2016-03-01' then 1
else 1+(select count(distinct b.submission_date)
from (select submission_date, hacker_id,
dense_rank() over(order by submission_date) as dayCount
from Submissions) b
where b.hacker_id = a.hacker_id and
b.submission_date < a.submission_date
)
end as subCount
from (select submission_date, hacker_id, dense_rank() over(order by submission_date) as dayCount
from Submissions) a
Let's take a look at the snapshot of the code's result.
As described in the code itself, the sequence of the columns is as follows:
(1) submission_date
(2) hacker_id
(3) The number given to each day (dayCount)
(4) The number of days each hacker made at least one submission to the corresponding submission date (subCount)
So, from the table above, if we find rows that have the same numbers of dayCount and subCount all the way to 2016-03-15, that's our hacker_ids who made their consecutive submissions from day one to fifteen (2016-03-01 ~ 2016-03-15).
We need the number of hackers who have been doing it daily at each point.
with DaysandSubs as(
select a.submission_date, a.hacker_id, a.dayCount,
case when a.submission_date = '2016-03-01' then 1
else 1+(select count(distinct b.submission_date)
from (select submission_date, hacker_id,
dense_rank() over(order by submission_date) as dayCount
from Submissions) b
where b.hacker_id = a.hacker_id and
b.submission_date < a.submission_date
)
end as subCount
from (select submission_date, hacker_id, dense_rank() over(order by submission_date) as dayCount
from Submissions) a
)
select submission_date, count(distinct hacker_id)
from DaysandSubs
where dayCount = consec_sub
group by submission_date;
We have found all of our answers, but they are in separate rooms. We have to combine them to get one final result. Unfortunately, I can't do it with a single set of codes. So, until I figure out a better way, I have no choice but to make the most of Common Table Expressions(CTEs) and combine them with JOINs to get our final result. Our CTEs are in bold.
with DailySubRank as (
select submission_date, hacker_id, submission_per_day,
rank() over(partition by submission_date order by submission_per_day desc, hacker_id) as daily_submission_rank
from (select submission_date, hacker_id, count(submission_id) as submission_per_day
from Submissions
group by submission_date, hacker_id) a
),
DaysandSubs as(
select a.submission_date, a.hacker_id, a.dayCount,
case when a.submission_date = '2016-03-01' then 1
else 1+(select count(distinct b.submission_date)
from (select submission_date, hacker_id,
dense_rank() over(order by submission_date) as dayCount
from Submissions) b
where b.hacker_id = a.hacker_id and
b.submission_date < a.submission_date
)
end as subCount
from (select submission_date, hacker_id, dense_rank() over(order by submission_date) as dayCount
from Submissions) a
),
EverydaySubs as(
select submission_date, count(distinct hacker_id) as everyday_hackers
from DaysandSubs
where dayCount = subCount
group by submission_date
)
select DailySubRank.submission_date, EverydaySubs.everyday_hackers, DailySubRank.hacker_id, Hackers.name
from DailySubRank
inner join EverydaySubs on DailySubRank.submission_date = EverydaySubs.submission_date
inner join Hackers on DailySubRank.hacker_id = Hackers.hacker_id
where DailySubRank.daily_submission_rank = 1;