RATIO_TO_REPORT(quantity::decimal * unit_cost) is what gives us the value that we’re working with in terms of ratio; the PARTITION BY sale_date then gives us the window; these ratios need to be calculated for the day.
sale_date
salesperson
?column?
ratio_to_report
2018-05-02
Bob
26
1.0
2018-05-13
Sally
60
0.2777777777777778
2018-05-13
June
156
0.7222222222222222
2018-05-14
John
96
1.0
2018-05-25
Bob
192
0.5962732919254659
2018-05-25
Sally
130
0.40372670807453415
2018-05-26
John
156
1.0
2018-05-27
John
52
0.0962962962962963
2018-05-27
June
20
0.037037037037037035
2018-05-27
June
468
0.8666666666666667
2018-06-02
Sally
26
1.0
2018-06-03
John
60
0.2777777777777778
2018-06-03
John
156
0.7222222222222222
2018-06-12
John
96
1.0
2018-06-13
Bob
192
0.5962732919254659
2018-06-13
Sally
130
0.40372670807453415
2018-06-15
John
156
1.0
2018-06-24
Bob
52
0.7222222222222222
2018-06-24
Sally
20
0.2777777777777778
2018-06-29
John
468
1.0
ROW_NUMBER
ROW_NUMBER is a utility function that simply gives the row an ordinal value, counting up from 1; over the window.
We count the sales for the day, by applying ROW_NUMBER over the sale_date.
This article is an extension of the window function, to follow up on each of the window functions.
The LISTAGG function will aggregate values that are collected within the window defined by the partition expression.
In this example, we’ll list each sale in the database; adding a column at the end which will list all of the salespeople who made a sale on that particular day.
This article is an extension of the window function, to follow up on each of the window functions.
The LEAD function returns values in an offset row. The offset is given by the second argument to the function itself. The offset pushes the index below (or after) the selected row.
To demonstrate the effect of this function, we’ll take the sales_id value, and then we’ll also take a led sales_id. By adjusting the offset value, you’ll see how the leaded value ascends up the rows.
This article is an extension of the window function, to follow up on each of the window functions.
The LAG function returns values in an offset row. The offset is given by the second argument to the function itself. The offset pushes the index above (or before) the selected row.
To demonstrate the effect of this function, we’ll take the sales_id value, and then we’ll also take a lagged sales_id. By adjusting the offset value, you’ll see how the lagged value slips down the rows.
Window functions allow database developers to perform analysis over partitions of information, very quickly. Prior to window functions, developers would need to create sub-queries (or common table expressions) that would allow their windows to be created. Primary queries would then work over the sub-queries to propgate a result. This now goes away, using the OVER syntax.
In today’s article we’ll walk through the basic structure of how to use window functions.
function (expression) OVER (
[ PARTITION BY expr_list ]
[ ORDER BY order_list [ frame_clause ] ] )
where function is one of the functions described in this section and expr_list is:
expression | column_name [, expr_list ]
and order_list is:
expression | column_name [ ASC | DESC ]
[ NULLS FIRST | NULLS LAST ]
[, order_list ]
and frame_clause is:
ROWS
{ UNBOUNDED PRECEDING | unsigned_value PRECEDING | CURRENT ROW } |
{BETWEEN
{ UNBOUNDED PRECEDING | unsigned_value { PRECEDING | FOLLOWING } |
CURRENT ROW}
AND
{ UNBOUNDED FOLLOWING | unsigned_value { PRECEDING | FOLLOWING } |
CURRENT ROW }}
The function gets applied to an expression of columns that you’re extracting from your query. The amount of information that the function is applied to is controlled by PARTITION BY. How the information is ordered within the window is then controlled by the ORDER BY.
This is all text-booky kind of stuff, so let’s apply it to some actual data in the database.
Sample Data
For today’s article, lets consider a really simple sales breakdown. No relationships to other table; everything is locallised and contained in here (no that it needs to be - it just makes the example simpler).
This offers the same result, with a much cleaner query experience.
To look into this data a little further, and perform the same operation over the salesperson’s averages, we simply swap out the PARTITION BY expression:
This now tells us how each of the saleperson’s sales performed against their average.
Functions
With a basic understanding of the windowing function framework and how it can be applied to your queries, any of the supported functions can be use. The actual documentation of these functions can be found in the AWS documentation for Redshift, so I won’t reproduce that material here.
The list of functions are:
AVG
COUNT
CUME_DIST
DENSE_RANK
FIRST_VALUE
LAST_VALUE
LAG
LEAD
LISTAGG
MAX
MEDIAN
MIN
NTH_VALUE
NTILE
PERCENT_RANK
PERCENTILE_CONT
PERCENTILE_DISC
RANK
RATIO_TO_REPORT
ROW_NUMBER
STDDEV_SAMP
STDDEV_POP
SUM
VAR_SAMP
VAR_POP
In future articles I’ll cover these functions and how they can be used to perform analysis over your datasets.