Ranking Data - with Duplicates

Post date: 01-Sep-2010 07:49:35

Ranking Data - Using Transact-SQL

Ranking data consists of:

- Finding the top N values

- Listing data in a particular order

- Listing ordinal numbers (the ranking)

There are several ways to rank data, depending on how you want to handle

duplicate values:

1- Not applicable, when duplicates are not possible in the source data

2- Allow duplicate values

3- Allow duplicates with gaps in the ranking

1 2 3 No dup. Dup. in Dup. in data, in data,easy data create gaps =========== ========== ========= 1 197 1 197 1 197 2 163 2 185 2 185 3 155 2 185 2 185 4 132 3 155 4 155 <-- 5 120 4 153 5 153 6 119 5 151 6 151 7 117 6 145 7 145 8 114 7 133 8 133 9 110 8 121 9 121 10 108 9 118 10 118 10 116

/* Sybase 12.5.3 does have a "top N" feature, for getting top 10 lists (for example) Method #1: Built-in Top-N, no rank # */ -- show top 10 select top 10 gross_sales, ticker from sales_data order by mkt_cap desc go /* Method #2: Rowcount Top-N, no rank # */ set rowcount 20 go select gross_sales, ticker from sales_data order by mkt_cap desc go set rowcount 0 go /* Method #3: Rank all rows, display rank # Works best where there are no duplicate values If the data being ranked does not contain duplicates, the easiest method is to: Copy the data into a temporary table with an identity column Use order by to sort the rows Note: If there are duplicate values, the rank assigned will be somewhat random. */ select rank = identity(10), ticker, gross_sales into #top from sales_data order by gross_sales desc go select * from #top where rank <= 20 go drop table #top go /* Method #4: Rank all rows, display rank, ties share ranking Where there may be duplicate values If the data to be ranked contains duplicates, Copy the data into a temporary table Join the temporary table back to the base table Final select uses rank_id in the WHERE clause. */ select rank = identity(10), gross_sales into #top from sales_data group by gross_sales order by gross_sales desc go select t1.rank, t2.ticker, t2.gross_sales from #top t1, sales_data t2 where t1.gross_sales = t2.gross_sales and rank <= 20 order by t2.gross_sales desc go drop table #top go /* Method #5: Rank N rows, display rank, ties share ranking If you want to rank only the top N values, use rowcount on the select into command */ set rowcount 20 go select rank = identity(10), gross_sales into #top from sales_data group by gross_sales order by gross_sales desc go set rowcount 0 go select t1.rank, t2.ticker, t2.gross_sales from #top t1, sales_data t2 where t1.gross_sales = t2.gross_sales order by t2.gross_sales desc go drop table #top go /* Method #6: Alternate to #4/5 There is an alternate solution for ranking data that contains duplicates: This method uses a correlated subquery in the select clause The previous method will probably outperform this method */ set rowcount 30 go select (select count(distinct t2.gross_sales) from sales_data t2 where t2.gross_sales >= t1.gross_sales) as rank, t1.ticker, t1.gross_sales from sales_data t1 where t1.gross_sales is not null order by rank go /* Method #7: Rank all rows, display rank, ties share ranking and take up subsequent rankings "Ranking with Gaps" "Olympic Style" The previous methods assume that each ordinal number will be followed by the next number For example, 6th place would follow 5th place However, if there are two 5th place entries, you may prefer to follow them with 7th place: */ set rowcount 30 go select (select count(t2.gross_sales) + 1 from sales_data t2 where t2.gross_sales > t1.gross_sales) as rank, t1.ticker, t1.gross_sales from sales_data t1 where t1.gross_sales is not null order by rank go