Common Task: Ranking Data and Selecting Top n Records
A common task for database developers is to rank a set of data and return the top n records – the top 5 companies per year by total sales volume, for example.
Coming from a SQL Server background, this would typically be handled by a CTE or a nested table or other method. By doing so, it was a two step process:
- Select the base data with a CTE, temporary table, nested table, (insert other option here) and apply a window function such as RANK, DENSE_RANK, etc.
- Select the top n records from the data set in step 1.
Similar methods work in Snowflake as well. Using the Snowflake sample data to retrieve the top 5 company keys by year ranked by total sales with a CTE would look something like this:
--SELECT THE YEAR, CUSTOMER KEY, SUM OF TOTAL SALES and RANK BY SALES BY YEAR FOR 1992-1995
WITH SALES_RNK AS(
SELECT
YEAR(O_ORDERDATE) as ORDER_YEAR,
O_CUSTKEY as CUSTOMER_KEY,
SUM(O_TOTALPRICE) as TOTAL_SALES,
RANK() OVER(PARTITION BY YEAR(O_ORDERDATE) ORDER BY SUM(O_TOTALPRICE) DESC) as CUST_YR_SALES_RANK
FROM
TPCH_SF100.ORDERS
WHERE
YEAR(O_ORDERDATE) BETWEEN 1992
AND 1995
GROUP BY
YEAR(O_ORDERDATE),
O_CUSTKEY
)
--SELECT ONLY THOSE COMPANIES RANKED IN THE TOP 5
SELECT
*
FROM
SALES_RNK
WHERE
CUST_YR_SALES_RANK <= 5
ORDER BY
ORDER_YEAR,
CUST_YR_SALES_RANK;
Using QUALIFY to Simplify Query Syntax
Snowflake provides the QUALIFY operator to filter the results of a query based upon a window function (i.e. RANK). QUALIFY is applied in the query similar to the HAVING function used in the GROUP BY command.
The query syntax above can be simplified to a single query by using QUALIFY as:
SELECT
YEAR(O_ORDERDATE) as ORDER_YEAR,
O_CUSTKEY as CUSTOMER_KEY,
SUM(O_TOTALPRICE) as TOTAL_SALES,
RANK() OVER(PARTITION BY YEAR(O_ORDERDATE) ORDER BY SUM(O_TOTALPRICE) DESC) as CUST_YR_SALES_RANK
FROM
TPCH_SF100.ORDERS
WHERE
YEAR(O_ORDERDATE) BETWEEN 1992 AND 1995
GROUP BY
YEAR(O_ORDERDATE), O_CUSTKEY
QUALIFY
CUST_YR_SALES_RANK <= 5
ORDER BY
ORDER_YEAR,
CUST_YR_SALES_RANK;
In this example, output from both queries is identical, and the query plans are similar. Using QUALIFY doesn’t appear to have any real performance gain or loss in this example.

So why use QUALIFY?
Since ranking and picking “winning”, or the top n records is a task that occurs frequently with my clients, using QUALIFY helps simplify the code used for ranking or picking records. “Future Me” thanks “Present Me” when doing maintenance or enhancements as the code is more legible and requires less “digging” to see what’s being ranked in the CTE/derived table.
For additional information, you can read more about the QUALIFY clause in the Snowflake documentation.