Cogs and Levers A blog full of technical stuff

Range generation in Redshift

A little while ago, I’d written an article on Range generation in PostgreSQL, and whilst Redshift and PostgreSQL are similar database beasts, they are quite different in terms of their query capabilities due to their different strengths.

In today’s article, we’ll walk through a couple of different solutions to range generation.

generate_series

The generate_series is a pretty serious tool in terms of achieving exactly what this article is talking about. Unfortunately, there is no analog for this function within the Redshift database management system. Who knows; maybe one day.

Until then though, we need to create our own.

The query

with digit as (
    select 0 as d union all
    select 1 union all select 2 union all select 3 union all
    select 4 union all select 5 union all select 6 union all
    select 7 union all select 8 union all select 9
),
seq as (
    select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
    from digit a
        cross join
        digit b
        cross join
        digit c
        cross join
        digit d
    order by 1
),
drange as (
    SELECT (getdate()::date - seq.num)::date       as start_date
    FROM seq
    LIMIT 20
)
SELECT start_date
FROM drange;

There’s a bit going on in here.

The process starts with the digit CTE. Is does nothing more than counts from 0, up to 9.

select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9

The next CTE seq uses the digits CTE, 4 times. Each time its treated as an increasing power of 10. a handles the 10^0, b handles the 10^1, etc. This catersian effect quickly multiplies our 0-9 range up to 0-9999. If more numbers are required, we simple add another CROSS JOIN, and treat the next power of 10.

select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
    cross join
    digit b
    cross join
    digit c
    cross join
    digit d
order by 1

That’s it as far a number generation. It’s pretty easy.

I’ve seen other developers materialise this information into a physical table, and simply join to that table also.

Extending into time

Applying this into some date ranges now, we use the seq CTE and a little date arithmatic. My example here just uses the numbers within seq to enumerate dates. Adding LIMIT to this query also prevents us from utilising all 10000 values produced by seq.

SELECT (getdate()::date - seq.num)::date       as start_date
FROM seq
LIMIT 20

And that it.