Generators are python functions that act like iterators. This abstraction allows you to simplify a lot of your for-loop code, implement lazy evaluation and even create more intelligent value producing iterators.
In today’s post, I’ll go through a basic usage of the yield keyword; the generator that’s created as a result and how you can interact with this function type.
Prime number example
To produce the generator, I’ve written a function that will filter through numbers picking out prime numbers. The algorithm isn’t highly optimised. It’s quite crude/brute-force in its approach, but it’ll be enough for us to understand the generator function.
We’re maintaining an internal list of primes that we’ve found. When we come across a potential candidate, we try to divide it by primes that we’ve already found. To cut down on the number of divides, we only go for numbers lower than the square root of the candidate.
Note the use of yield. As we call yield, this makes another value available in the iterator. You can see that this is an iterator that doesn’t end. Well, it will end - once the integer data type overflows. If we were using a data type that wasn’t susceptible to this type of overflow, we’d only be limited by the amount of memory in the machine.
Iterating
So, we’ve created what appears to be an infinite list. Testing it out in the REPL:
ps is the generator, and we’re able to call the next function on it. As we do that, we progress through the iterator. We can start to work with ps now as if it were any other iterator.
Using a list comprehension, we can find the first 10 primes:
>>> ps = primes()
>>> [ps.next() for _ in xrange(1, 10)]
[2, 3, 5, 7, 9, 11, 13, 15, 17]
Using itertools we can get all of the prime numbers under 100:
yield allows you to make generators which are the potential to create values, as opposed to the values themselves. It’s not until you start to iterate over the generator that the values start to materialise.
Ruby has a very cool feature called code blocks. Sometimes referred to as closures, code blocks are custom pieces (or blocks) of ruby code that you specify to functions that inject your code block whenever the yield keyword is used.
In today’s post, I’m just going to present a simple example and usage.
Source data
The example we’re going to use will be a Manager to Employee style relationship. A Person class is going to manage an array of Person objects that we’ll classify as staff for that person. Here’s the class definition:
We can use each to look at Bill’s staff, which is just Mary at this stage. More interestingly, we could implement our own function on the Person class that shows all of that person’s descendants.
We’re going to call any code block specified to our descendants function for each of the staff that are managed by this Person object, but we’re also going to call each descendant’s descendants function so that we recurse down the tree.
We could augment this call slightly to also include the manager of the descendants:
This will supply a manager to the calling block, 1 level down from where we specify.
bill.descendantsdo|p,s|ifs==nilputs"#{p} is managed by #{bill}"elseputs"#{p} is managed by #{s}"endend
This code here emits the following:
Mary (Technology Director) is managed by Bill (Managing Director)
Bob (Support Officer) is managed by Mary (Technology Director)
Joe (Support Officer) is managed by Mary (Technology Director)
Mary’s manager variable in the block comes through as nil as she’s a direct descendant of bill, so we handle this case in the block as opposed to in descendants.
You can specify as many parameters as you want in your yield. It’s your block’s responsibility to do something useful with them!
Other options
Within your function, you can test the block_given? property for a boolean that will determine if the calling code did or didn’t specify a block.
You can also have a parameter specified in your function &block which can be handy should you need to pass the block around.
AWK is a programming language that deals with processing text in a sequence of pattern matching rules. It’s really handy for reducing massive amounts of text into just the information that you care about. The full user guide for AWK can be found here.
Rather than take you on a tour through the user guide, I thought today’s post might be better as a practical example. I’m going to present some useful functions with AWK using the Linux Kernel’s dmesg output as source data.
As a final note, a lot if not all of the information that I’ll present below can be transformed into a “one liner”. There’s quiteafewinstances of crafty AWK hackers putting these together. I just want to present some of the language.
Source data
The dmesg data is in an easy-enough format to work with. Taking the first few lines as an example:
We see that there is an elapsed time figure surrounded with square brackets, the rest of the line is the log text. Further on through the text, we start to see the log lines prefixed with a driver name also:
For the purposes of today’s post, the following usage is going to be most useful to us
dmesg | awk-f our-awk-script.awk
This supplies the dmesg output to our AWK script.
Print any line with the word “failed” in it
To accomplish this task, we’re going to use a regular expression to pick out each line with “fail” in it.
/failed/{print$0}
Immediately, you can see that AWK statements take the shape of:
condition { actions }
The action here print $0 prints the whole, captured line to the console. Other variables are available to be printed such as $1, $2, and so on. These numbered variables take chunks of the captured string, split by a space character as its delimiter.
Exploring the variables
Just to take a look at those variables a little closer, we can augment our initial rule slightly to see what’s contained in those variables:
Run for one line of text matching the “failed” rule:
$0: [ 1.804314] iwlwifi 0000:03:00.0: Direct firmware load failed with error -2
$1: [
$2: 1.804314]
$3: iwlwifi
$4: 0000:03:00.0:
Listing out which drivers mentioned the word “failed”
AWK has a very flexible associative array type as well. We can basically reference any variable with any index we choose. For the next progression of this script, we’ll build an array of driver names with an instance count so we can just give the user a report of the which drivers were mentioned how many times.
$3 is giving us the driver name, so we just increment a value in the array for that driver. END is something new. It’s executed, at the end. We enumerate the array that we’ve built, printing the name of the driver and the count.
Running this, I get the following result:
nouveau: 1
nouveau:: 1
iwlwifi: 2
That’s annoying. nouveau appears in the report twice because it’s mentioned with and without a colon : character in the source text.
[ 1.687503] nouveau E[ DRM] failed to create 0x80000080, -22
[ 1.687631] nouveau: probe of 0000:01:00.0 failed with error -22
Adding a call to gsub to perform a simple string replacement does the trick. gsub is a part of AWK’s string functions.
Just as we have an ‘END’ section above, we are also given the ability to write code in a ‘BEGIN’ section that will kick off before any of our pattern rules are executed.
Using boolean logic in conditions
AWK conditions aren’t just regular expressions, they can incorporate boolean logic from the file also. You can test any variable like a normal boolean condition. In the following example, I don’t want to count failures that come out of the iwlwifi driver.
If at any time, your rule wants to bug out of the script entirely - wire up the exit call. If you just want to stop processing this line of text and move on to the next, you can use next.
Getting a quick web server up and running is really simple (if you don’t need a fully blown application server). I find this technique really useful when prototyping web sites that I only need to serve static HTML, CSS & Javascript with.
In the folder that hosts your web application, issue the following Python command:
python -m SimpleHTTPServer
After you do this, you’ll get a confirmation message that your site is available:
Serving HTTP on 0.0.0.0 port 8000 ...
And that’s it. You can read up more on this really handy utility here.
The first parameter that is passed is the assembly code itself. It’ll be in AT&T syntax, but will also have some extra rules apply to it which will allow for the compiler to make some decisions for you. The outputs, inputs and clobbers are optional lists consisting of directives instructing the compiler how to handle inputs, outputs and what’s expected to be trashed (clobbered) in your assembly block.
A simple example usage, to add two integers and return the result might look like this:
edx and ecx were chosen as our general purpose registers for inputs, so they’re loaded first-up.
The addition occurs and then the result (as requested) is placed in the memory location of our output.
Back in the inline code, you can see that these registers have been symbolically referenced as %1, %2, etc.
Outputs are a mix of constraints and modifiers, inputs are just constraints and clobbers list out what was modified (register-wise or other).
What about volatile?
The volatile keyword allows you to tell the compiler to not optimise away our code if it deems that it isn’t required (i.e. is has no effect on anything).
Constraints
Constraint
Description
m
Any kind of a memory address
o
Memory address if it’s offsettable
V
Memory address if it’s not offsettable
<
Memory with autodecrement addressing
>
Memory with autoincrement addressing
r
General purpose register
i
Immediate integer value
n
Immediate integer with a known value
I . . P
Range based immediate integer values
E
Immediate format-dependent floating point number
F
Immediate floating point number
G, H
Range based immediate float values
s
Immediate integer that is not an explicit integer
g
Any register, memory or immediate value; not a general purpose register though
X
Any operand is allowed
p
Any operand that is a valid memory address
A full description of all of these constraints can be found here.
Modifiers
Modifier
Description
=
Operand is written to
+
Operand is read from and written to
&
Operand is written to (clobbered) before input operands are used
%
Instruction is cumulative for this operand
A full description of all of these modifiers can be found here.
Clobbers
Clobber
Description
cc
Flags are modified
memory
Memory outside of what is in the constraints is modified