This article is an extension of the window function, to follow up on each of the window functions.
The FIRST_VALUE and LAST_VALUE functions will allow you to retrieve the first and last values for a given window (respectively).
Taking the sales example for another run, we can compare every single sale for the day to see how it stacked up against the first sale of the day with the following query:
This demonstrates how the first value can be drawn out of a set, per window.
Inversely, you simply use the LAST_VALUE function to get the last value; if you wanted to see how each sale stacked up against the last sale of the day:
This article is an extension of the window function, to follow up on each of the window functions.
The DENSE_RANK function will use the ORDER BY expression to determine a rank of a value amongst groups. If a PARTITION BY is present, this is also included in the reset of groups.
The important difference between RANK and DENSE_RANK is that RANK will use to total number of rows to nominate it’s highest rank value; whereas DENSE_RANK will only use the number of ranks calculated, and the highest rank.
The sale_id, quantity, and unit_cost are retrieved and ranked by the quantity value. Both RANK and DENSE_RANK are used in this example to illustrate the difference in value distribution (along the number line).
As there is a change in quantity, a new rank value is assigned:
As sales people have sold the same amount in different sales, you can see that the cumulative distribution value makes accumulative jumps.
salesperson
value
distr
Bob
26
0.25
Bob
52
0.5
Bob
192
1.0
Bob
192
1.0
John
52
0.125
John
60
0.25
John
96
0.5
John
96
0.5
John
156
0.875
John
156
0.875
John
156
0.875
John
468
1.0
June
20
0.3333333333333333
June
156
0.6666666666666666
June
468
1.0
Sally
20
0.2
Sally
26
0.4
Sally
60
0.6
Sally
130
1.0
Sally
130
1.0
Taking an interesting piece of this window, we focus in on John.
salesperson
value
distr
John
52
0.125
John
60
0.25
John
96
0.5
John
96
0.5
John
156
0.875
John
156
0.875
John
156
0.875
John
468
1.0
The repeated value amounts don’t seem to shift the distribution value (because of its cumulativity). By the time the next amount is present, there’s a significant jump. This is a property of this set because the data is ordered by the value column.
We list the sale_date and salesperson out for every sale on record. The COUNT partitions by the date of the sale made. This makes the sale_number tell us what number the sale was for that given date.
sale_date
salesperson
sales_count
2018-05-02
Bob
1
2018-05-13
Sally
1
2018-05-13
June
2
2018-05-14
John
1
2018-05-25
Bob
1
2018-05-25
Sally
2
2018-05-26
John
1
2018-05-27
John
1
2018-05-27
June
2
2018-05-27
June
3
2018-06-02
Sally
1
2018-06-03
John
1
2018-06-03
John
2
2018-06-12
John
1
2018-06-13
Bob
1
2018-06-13
Sally
2
2018-06-15
John
1
2018-06-24
Bob
1
2018-06-24
Sally
2
2018-06-29
John
1
Taking 2018-05-27, we can see that John got the first sale, June got the second and third.
This takes the average cost value (quantity * unit_cost), but personalises the value by salesperson. Each salesperson is going to have their average calculated in isolation because of PARTITION BY salesperson.