Cogs and Levers A blog full of technical stuff

Window Function FIRST_VALUE and LAST_VALUE

This article is an extension of the window function, to follow up on each of the window functions.

The FIRST_VALUE and LAST_VALUE functions will allow you to retrieve the first and last values for a given window (respectively).

Taking the sales example for another run, we can compare every single sale for the day to see how it stacked up against the first sale of the day with the following query:

SELECT sale_date, salesperson, quantity::decimal * unit_cost,
	   FIRST_VALUE(quantity::decimal * unit_cost) OVER (
         PARTITION BY sale_date
         ORDER BY sale_id
         ROWS UNBOUNDED PRECEDING
       ) as sale_number
FROM public.sales
ORDER BY sale_date;

The result of which is as you’d expect:

sale_date salesperson ?column? sale_number
2018-05-02 Bob 26 26
2018-05-13 Sally 60 60
2018-05-13 June 156 60
2018-05-14 John 96 96
2018-05-25 Bob 192 192
2018-05-25 Sally 130 192
2018-05-26 John 156 156
2018-05-27 John 52 52
2018-05-27 June 20 52
2018-05-27 June 468 52
2018-06-02 Sally 26 26
2018-06-03 John 60 60
2018-06-03 John 156 60
2018-06-12 John 96 96
2018-06-13 Bob 192 192
2018-06-13 Sally 130 192
2018-06-15 John 156 156
2018-06-24 Bob 52 52
2018-06-24 Sally 20 52
2018-06-29 John 468 468

This demonstrates how the first value can be drawn out of a set, per window.

Inversely, you simply use the LAST_VALUE function to get the last value; if you wanted to see how each sale stacked up against the last sale of the day:

SELECT sale_date, salesperson, quantity::decimal * unit_cost,
	   LAST_VALUE(quantity::decimal * unit_cost) OVER (
         PARTITION BY sale_date
         ORDER BY sale_id
         ROWS UNBOUNDED PRECEDING
       ) as sale_number
FROM public.sales
ORDER BY sale_date;

Window Function DENSE_RANK

This article is an extension of the window function, to follow up on each of the window functions.

The DENSE_RANK function will use the ORDER BY expression to determine a rank of a value amongst groups. If a PARTITION BY is present, this is also included in the reset of groups.

The important difference between RANK and DENSE_RANK is that RANK will use to total number of rows to nominate it’s highest rank value; whereas DENSE_RANK will only use the number of ranks calculated, and the highest rank.

Take the following for example:

SELECT sale_id, quantity, unit_cost,
   DENSE_RANK() OVER (ORDER BY quantity DESC),
   RANK() OVER (ORDER BY quantity DESC)
FROM sales
ORDER BY quantity, sale_id;

The sale_id, quantity, and unit_cost are retrieved and ranked by the quantity value. Both RANK and DENSE_RANK are used in this example to illustrate the difference in value distribution (along the number line).

As there is a change in quantity, a new rank value is assigned:

sale_id quantity unit_cost dense_rank rank
1 1 26 4 11
3 1 156 4 11
4 1 96 4 11
7 1 156 4 11
9 1 20 4 11
11 1 26 4 11
13 1 156 4 11
14 1 96 4 11
17 1 156 4 11
19 1 20 4 11
5 2 96 3 7
8 2 26 3 7
15 2 96 3 7
18 2 26 3 7
2 3 20 2 3
10 3 156 2 3
12 3 20 2 3
20 3 156 2 3
6 5 26 1 1
16 5 26 1 1

Window Function CUME_DIST

This article is an extension of the window function, to follow up on each of the window functions.

The CUME_DIST function calculates the cumulative distribution of a value, for the data that’s seen.

An example of calculating the cumulative distribution of each salesperson, and the amount of value that they’ve sold would look like this:

SELECT salesperson, quantity::decimal * unit_cost,
	   CUME_DIST() OVER (
         PARTITION BY salesperson
         ORDER BY (quantity::decimal * unit_cost)
       ) as distr
FROM public.sales
ORDER BY salesperson;

As sales people have sold the same amount in different sales, you can see that the cumulative distribution value makes accumulative jumps.

salesperson value distr
Bob 26 0.25
Bob 52 0.5
Bob 192 1.0
Bob 192 1.0
John 52 0.125
John 60 0.25
John 96 0.5
John 96 0.5
John 156 0.875
John 156 0.875
John 156 0.875
John 468 1.0
June 20 0.3333333333333333
June 156 0.6666666666666666
June 468 1.0
Sally 20 0.2
Sally 26 0.4
Sally 60 0.6
Sally 130 1.0
Sally 130 1.0

Taking an interesting piece of this window, we focus in on John.

salesperson value distr
John 52 0.125
John 60 0.25
John 96 0.5
John 96 0.5
John 156 0.875
John 156 0.875
John 156 0.875
John 468 1.0

The repeated value amounts don’t seem to shift the distribution value (because of its cumulativity). By the time the next amount is present, there’s a significant jump. This is a property of this set because the data is ordered by the value column.

Window Function COUNT

This article is an extension of the window function, to follow up on each of the window functions.

The COUNT function will counts each row that contains non-NULL values for the given expression.

An example usage over the test data set would be:

SELECT sale_date, salesperson,
	   COUNT(*) OVER (
         PARTITION BY sale_date
         ORDER BY sale_id
         ROWS UNBOUNDED PRECEDING
       ) as sale_number
FROM public.sales
ORDER BY sale_date;

We list the sale_date and salesperson out for every sale on record. The COUNT partitions by the date of the sale made. This makes the sale_number tell us what number the sale was for that given date.

sale_date salesperson sales_count
2018-05-02 Bob 1
2018-05-13 Sally 1
2018-05-13 June 2
2018-05-14 John 1
2018-05-25 Bob 1
2018-05-25 Sally 2
2018-05-26 John 1
2018-05-27 John 1
2018-05-27 June 2
2018-05-27 June 3
2018-06-02 Sally 1
2018-06-03 John 1
2018-06-03 John 2
2018-06-12 John 1
2018-06-13 Bob 1
2018-06-13 Sally 2
2018-06-15 John 1
2018-06-24 Bob 1
2018-06-24 Sally 2
2018-06-29 John 1

Taking 2018-05-27, we can see that John got the first sale, June got the second and third.

Window Function AVG

This article is an extension of the window function, to follow up on each of the window functions.

The AVG function takes the arithmetic mean of the input that it sees.

An example usage over the test data set would be:

SELECT salesperson, sale_date, (quantity * unit_cost),
	   avg(quantity * unit_cost) OVER (
         PARTITION BY salesperson
       ) as month_avg
FROM public.sales
ORDER BY salesperson, sale_date DESC;

This takes the average cost value (quantity * unit_cost), but personalises the value by salesperson. Each salesperson is going to have their average calculated in isolation because of PARTITION BY salesperson.