Range generation in Redshift
05 Jul 2019A little while ago, I’d written an article on Range generation in PostgreSQL, and whilst Redshift and PostgreSQL are similar database beasts, they are quite different in terms of their query capabilities due to their different strengths.
In today’s article, we’ll walk through a couple of different solutions to range generation.
generate_series
The generate_series is a pretty serious tool in terms of achieving exactly what this article is talking about. Unfortunately, there is no analog for this function within the Redshift database management system. Who knows; maybe one day.
Until then though, we need to create our own.
The query
with digit as (
select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
),
seq as (
select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
cross join
digit b
cross join
digit c
cross join
digit d
order by 1
),
drange as (
SELECT (getdate()::date - seq.num)::date as start_date
FROM seq
LIMIT 20
)
SELECT start_date
FROM drange;
There’s a bit going on in here.
The process starts with the digit
CTE. Is does nothing more than counts from 0, up to 9.
select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
The next CTE seq
uses the digits
CTE, 4 times. Each time its treated as an increasing power of 10. a
handles the 10^0
, b
handles the 10^1
, etc. This catersian effect quickly multiplies our 0-9 range up to 0-9999. If more numbers are required, we simple add another CROSS JOIN
, and treat the next power of 10.
select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
cross join
digit b
cross join
digit c
cross join
digit d
order by 1
That’s it as far a number generation. It’s pretty easy.
I’ve seen other developers materialise this information into a physical table, and simply join to that table also.
Extending into time
Applying this into some date ranges now, we use the seq
CTE and a little date arithmatic. My example here just uses the numbers within seq
to enumerate dates. Adding LIMIT
to this query also prevents us from utilising all 10000
values produced by seq
.
SELECT (getdate()::date - seq.num)::date as start_date
FROM seq
LIMIT 20
And that it.