This article is an extension of the window function, to follow up on each of the window functions.
The COUNT function will counts each row that contains non-NULL values for the given expression.
An example usage over the test data set would be:
We list the sale_date and salesperson out for every sale on record. The COUNT partitions by the date of the sale made. This makes the sale_number tell us what number the sale was for that given date.
sale_date
salesperson
sales_count
2018-05-02
Bob
1
2018-05-13
Sally
1
2018-05-13
June
2
2018-05-14
John
1
2018-05-25
Bob
1
2018-05-25
Sally
2
2018-05-26
John
1
2018-05-27
John
1
2018-05-27
June
2
2018-05-27
June
3
2018-06-02
Sally
1
2018-06-03
John
1
2018-06-03
John
2
2018-06-12
John
1
2018-06-13
Bob
1
2018-06-13
Sally
2
2018-06-15
John
1
2018-06-24
Bob
1
2018-06-24
Sally
2
2018-06-29
John
1
Taking 2018-05-27, we can see that John got the first sale, June got the second and third.
This article is an extension of the window function, to follow up on each of the window functions.
The AVG function takes the arithmetic mean of the input that it sees.
An example usage over the test data set would be:
This takes the average cost value (quantity * unit_cost), but personalises the value by salesperson. Each salesperson is going to have their average calculated in isolation because of PARTITION BY salesperson.
A little while ago, I’d written an article on Range generation in PostgreSQL, and whilst Redshift and PostgreSQL are similar database beasts, they are quite different in terms of their query capabilities due to their different strengths.
In today’s article, we’ll walk through a couple of different solutions to range generation.
generate_series
The generate_series is a pretty serious tool in terms of achieving exactly what this article is talking about. Unfortunately, there is no analog for this function within the Redshift database management system. Who knows; maybe one day.
Until then though, we need to create our own.
The query
There’s a bit going on in here.
The process starts with the digit CTE. Is does nothing more than counts from 0, up to 9.
The next CTE seq uses the digits CTE, 4 times. Each time its treated as an increasing power of 10. a handles the 10^0, b handles the 10^1, etc. This catersian effect quickly multiplies our 0-9 range up to 0-9999. If more numbers are required, we simple add another CROSS JOIN, and treat the next power of 10.
That’s it as far a number generation. It’s pretty easy.
I’ve seen other developers materialise this information into a physical table, and simply join to that table also.
Extending into time
Applying this into some date ranges now, we use the seq CTE and a little date arithmatic. My example here just uses the numbers within seq to enumerate dates. Adding LIMIT to this query also prevents us from utilising all 10000 values produced by seq.
In today’s article, I’ll take you through some basic daemon development for POSIX based systems. This article should give you enough information to get started on your own project to add in the bits that your program does.
Getting started
The largest amout of effort into writing any daemon is validation. Validation of incoming data is one thing; but we’re not even there yet. We’re just talking about validating the current system state, environment, and any configuration values.
fork
First job is to use the fork system call, to duplicate the current process.
fork() creates a new process by duplicating the calling process. The new process is referred to as the child process. The calling process is referred to as the parent process.
Directly after forking from the parent process, the parent is terminated; leaving the child to own the running of the daemon. This allows the daemon to now be independent of any other process running. Removing this dependency means that no one (or nothing) can kill your daemon by killing another (parent) process.
A secondary fork is then invoked; usually after a call to setsid. This starts a new session with no terminal.
setsid() creates a new session if the calling process is not a process group leader. The calling process is the leader of the new session (i.e., its session ID is made the same as its process ID). The calling process also becomes the process group leader of a new process group in the session (i.e., its process group ID is made the same as its process ID).
The side-effect that we’re looking for here.
Initially, the new session has no controlling terminal.
The child is then exited, removing any reliance on the terminal.
The grandchild is now the daemon.
Working directory
Now that we have a safe running process, it’s time to change to a location that we know is safe. We know that the root directory exists. We’ll use it for our example.
This puts us in a sane place, where we know where we are. Doesn’t need to be the root directory; but at least we know where we are.
File mask
Next up, we’ll set the umask for the process to 0. It it standard practice for a daemon to operate with a umask of 0. At the operating system level, this forces new file objects created to have permission of 0666 (world-writable); and directories as 0777.
umask() sets the calling process’s file mode creation mask (umask) to mask & 0777 (i.e., only the file permission bits of mask are used), and returns the previous value of the mask.
In short, umask of 0 means objects are created with privileges initially revoked.
Standard File Descriptors
Daemons now close and re-open the standard file handles. STDIN, STDOUT, and STDERR all get closed. It’s assumed that /dev/null is provided on any POSIX system, and as a result it’s used to re-open these handles.
All together
Your daemonize process might now look something like this:
Closing up
You’re now daemonised, and ready to start actually writing your daemon process.
Some points to note might be installing some signal handlers?
Being a data warehouse, Redshift from AWS doesn’t provide the same level of programming platform that you may have had the luxury of using.
The concept of variables in Redshift is non existant however with this clever hack you can fake your way into some variable based information.
The same trick can be achieved using CREATE TEMP TABLE also. The temporary table version will allow you to use your variables outside of the context of that one query. As above, the variables are bound to the CTE feeding the query.