Snowflake: Simplify Ranking Syntax with QUALIFY

Common Task: Ranking Data and Selecting Top n Records

A common task for database developers is to rank a set of data and return the top n records – the top 5 companies per year by total sales volume, for example.

Coming from a SQL Server background, this would typically be handled by a CTE or a nested table or other method. By doing so, it was a two step process:

  1. Select the base data with a CTE, temporary table, nested table, (insert other option here) and apply a window function such as RANK, DENSE_RANK, etc.
  2. Select the top n records from the data set in step 1.

Similar methods work in Snowflake as well. Using the Snowflake sample data to retrieve the top 5 company keys by year ranked by total sales with a CTE would look something like this:

--SELECT THE YEAR, CUSTOMER KEY, SUM OF TOTAL SALES and RANK BY SALES BY YEAR FOR 1992-1995
WITH SALES_RNK AS(
    SELECT
        YEAR(O_ORDERDATE) as ORDER_YEAR,
        O_CUSTKEY as CUSTOMER_KEY,
        SUM(O_TOTALPRICE) as TOTAL_SALES,
        RANK() OVER(PARTITION BY YEAR(O_ORDERDATE) ORDER BY SUM(O_TOTALPRICE) DESC) as CUST_YR_SALES_RANK
    FROM
        TPCH_SF100.ORDERS
    WHERE
        YEAR(O_ORDERDATE) BETWEEN 1992
        AND 1995
    GROUP BY
        YEAR(O_ORDERDATE),
        O_CUSTKEY
)

--SELECT ONLY THOSE COMPANIES RANKED IN THE TOP 5
SELECT
    *
FROM
    SALES_RNK
WHERE
    CUST_YR_SALES_RANK <= 5
ORDER BY
    ORDER_YEAR,
    CUST_YR_SALES_RANK;

Using QUALIFY to Simplify Query Syntax

Snowflake provides the QUALIFY operator to filter the results of a query based upon a window function (i.e. RANK). QUALIFY is applied in the query similar to the HAVING function used in the GROUP BY command.

The query syntax above can be simplified to a single query by using QUALIFY as:

SELECT
    YEAR(O_ORDERDATE) as ORDER_YEAR,
    O_CUSTKEY as CUSTOMER_KEY,
    SUM(O_TOTALPRICE) as TOTAL_SALES,
    RANK() OVER(PARTITION BY YEAR(O_ORDERDATE) ORDER BY SUM(O_TOTALPRICE) DESC) as CUST_YR_SALES_RANK
FROM
    TPCH_SF100.ORDERS
WHERE
    YEAR(O_ORDERDATE) BETWEEN 1992 AND 1995
GROUP BY
    YEAR(O_ORDERDATE), O_CUSTKEY
QUALIFY
    CUST_YR_SALES_RANK <= 5
ORDER BY
    ORDER_YEAR,
    CUST_YR_SALES_RANK;

In this example, output from both queries is identical, and the query plans are similar. Using QUALIFY doesn’t appear to have any real performance gain or loss in this example.

So why use QUALIFY?

Since ranking and picking “winning”, or the top n records is a task that occurs frequently with my clients, using QUALIFY helps simplify the code used for ranking or picking records. “Future Me” thanks “Present Me” when doing maintenance or enhancements as the code is more legible and requires less “digging” to see what’s being ranked in the CTE/derived table.

For additional information, you can read more about the QUALIFY clause in the Snowflake documentation.

Leave a comment