There are several techniques that can be used to calculate median in SQL Server 2005 and higher. The most efficient that IPAR has used so far uses the ROW_NUMBER function.
Technique taken from Itzik Ben Gan's article here: http://www.sqlmag.com/article/tsql3/calculating-the-median-gets-simpler-in-sql-server-2005.aspx. We partition by StudentId and order by whatever value we want the median of, in this case TermAttemptedCredits. Notice the use of the "windowed" count(*) function, too, which gives count(*) per StudentId
;with cte_CreditLoad as
(
select StudentId, sh.TermAttemptedCredits,
row_number() over(partition by StudentId order by sh.TermAttemptedCredits) as row,
count(*) over(partition by sh.StudentId) as cnt
from dbo.StudentHistory sh
where sh.RecordType = 'T'
)
-- the key to the median calculation is "row in ((cnt+1)/2, (cnt+2)/2)"
-- which yields one row where cnt is odd, and the middle two rows where even
select StudentId, avg(cl.TermAttemptedCredits) as MedianAttemptedCredits
from cte_CreditLoad cl
where row in ((cnt+1)/2, (cnt+2)/2)
group by cl.StudentID
Another way to calculate median relies on the NTILE function and can also return QUARTILES. Because of the way that SQL Server assigns rows to NTILE groups and the fact that the code below always gives the average of two rows (even when the dataset has an odd number of rows), it can be less accurate than the ROW_NUMBER() code above, but for IPAR uses the difference will not usually be significant.
; with cte_Data as
(
select ClassCode Strata, CumulativeGPA DeptVar
from StudentHistory
where termid=20103 and recordtype='f' and CumulativeGPA is not null
)
, cte_Tiles as
(
select Strata, Tile, min(DeptVar) TileMin, max(DeptVar) TileMax
from
(
select Strata, DeptVar, ntile(4) over (partition by Strata order by DeptVar) Tile
from cte_Data
) a
group by Tile, Strata
)
select Strata, [Q1], [M], [Q3]
from
(
select m1.Strata, case m1.Tile when 1 then 'Q1' when 2 then 'M' when 3 then 'Q3' end Quartile, (m1.TileMax+m2.TileMin)/2 Cutoff
from cte_Tiles m1
join cte_Tiles m2
on m1.Strata=m2.Strata
and m1.Tile=m2.Tile-1
) a
pivot (max(Cutoff) for Quartile in ([Q1], [M], [Q3]) ) pvt
Finally, one last way of determining the median of a set is to use the TOP (50) PERCENT statement, as in the code below:
; with cte_Data as
(
select ClassCode Strata, CumulativeGPA DeptVar
from StudentHistory
where termid=20103 and recordtype='F' and CumulativeGPA is not null
)
select Strata,
((
select max(DeptVar)
from
(
select top (50) percent DeptVar
from cte_Data as d2
where d2.Strata = d.Strata
order by DeptVar
) as M1
) +
(
select min(DeptVar)
from
(
select top (50) percent DeptVar
from cte_Data as d3
where d3.Strata = d.Strata
order by DeptVar desc
) as M2
)) / cast(2 as float) as median
from
(
select distinct Strata
from cte_Data
) as d
order by Strata