Cogs and Levers A blog full of technical stuff

Macro-ing yourself out of the boilerplate

Just working on a “build your own clone” style tutorial and going over some older code here, I thought it’d be interesting to note some funky work I’d done with macros. When you’re working with 2-d array style states you get very good a writing for-loops over and over and over and … So, thinking about this in a very generic sense - take a look at this macro

#define FIELD_WIDTH  12
#define FIELD_HEIGHT 18

/* grabs an element from the field */
#define field_block(x, y) (_field_blocks[(y * FIELD_WIDTH) + x])

/* iterates over the field */
#define enum_field_blocks(x, y, o, fn)     \
   o = 0;                                  \
   for (y = 0; y < FIELD_HEIGHT; y ++) {   \
      for (x = 0; x < FIELD_WIDTH; x ++) { \
         fn                                \
         o ++;                             \
      }                                    \
   }                                       \

Working your way through the enum_field_blocks macro here, the parameters are:

  • A horizontal axis iterator x
  • A vertical axis iterator y
  • A memory location (array index) iterator o
  • and fn

So, what’s fn? fn here allows the developer to specify what they want to execute inside the nested for loops:

enum_field_blocks(x, y, offset, {
   _field_blocks[offset] = NULL;
});

If you’re thinking “why didn’t he just use memset to set all of the items in this array to NULL” - you’ve missed the point. The example is quite a bad one, I agree, but it does demonstrate that you can write any code that you’d like and have it execute for every item in the specific array cutting down on boilerplate that you have to write (just get the C pre-processor to do it for you).

Super.

Simple FPU Operations

The CPU itself is an impressive piece of kit just by itself however it does struggle to do complex floating point mathematics. Chip designers noticed this deficiency pretty quickly and bolted on a floating-point arithmetic unit. It should be noted at this point that many, many other tutorial/article writers have gone into great depth explaining how to program FPU’s and they have been great references to me in previous times:

Anyway, this blog post is just going to show you a few nuggets that you can use straight away. It’s always fun seeing this stuff in action. This snippet will sum (accumulate) together an array of doubles.

; entry conditions
; edx points to an array of doubles
; ecx holds the number of items in the array

; set st0 to zero
fldz                            

next:

; add the next double to st0
fadd    qword [edx]             

; progress
add     edx, 8                  
dec     ecx                     
jnz     next                    

; at this point, st0 holds the sum

This following snippet will show you a simple addition, subtraction, multiplication and division. You’ll notice a pretty distinct pattern in what to do in these situations. It’ll follow:

  • Load the first term (fld)
  • Apply the operator (fadd,fsub,fmul,fdiv) with the second term
  • Store the result into memory
section .data

a:	dq	3.333333333	
b:	dq	4.444444444	
	
section .bss 		

c:	resq	1		

section .text

addb:				
	fld	qword [a] 	
	fadd	qword [b]	
	fstp	qword [c]	
	
subb:				
	fld	qword [a] 	
	fsub	qword [b]	
	fstp	qword [c]	
	
mulb:				
	fld	qword [a]	
	fmul	qword [b]	
	fstp	qword [c]	
	
diva:				
	fld	qword [b] 	
	fdiv	qword [a]	
	fstp	qword [c]	

With a bit of extra help from the C library, you can print out values that you’re using in the FPU. The following snippet prints PI to the console.

; compiled on linux 64 bit using the following
;
; nasm -f elf32 print.asm -o print.o
; gcc -m32 print.o -o print
;

[bits 32]

section .text

extern printf
global main

main:
   ; load pi into st(0)
   fldpi

   ; prepare some space on the stack
   sub   esp, 8
   ; to be able to push st(0) in there
   fstp  qword [esp]
   ; get the string format on the stack as well
   push  format
   ; print the string
   call  printf

   ; repair the stack
   ;   4 bytes memory address (for the format)
   ; + 8 bytes memory for the float
   ; =========
   ;  12 bytes
   add   esp, 12

   ; exit without error
   xor   eax, eax
   ret

section .data

format: db "%.20g",10,0

As I find other snippets in old pieces of code, I’ll be sure to add them to this page.

Snow flakes keep falling on my... screen?

A very simple effect this time around. It’s snow flakes. The operating premise for the effect is very simple and goes like this:

  • Generate 1 new snow flake at the top of the screen at every frame
  • A snow flake has an absolute floor of the last line in video memory
  • A snow flake should come to rest if it lands on top of another

Snowflakes

That’s it! So, immediately we need a way to get random numbers. We’re using a 320x200 screen here and my dodgy routine for getting random numbers only returns us 8 bit numbers (which gets us to 255). We need to add some more width to these numbers if we expect to be able to randomize across the whole 320 column positions. Calling the random port twice and adjusting the resolution of the second number should do it for us, such that:

8 bits (256) and 6 bits (64) will give us 320 - or the ability to add using 14 bits worth of numbers, which in this configuration takes us up to 320. Perfect.

Here’s the code!

get_random:
    ; start out with ax and bx = 0
	xor	ax, ax
	xor     bx, bx

    ; get the first random number and
    ; store it off in bl
	mov	dx, 40h
	in	al, dx
	mov	bl, al

    ; get the second random number and
    ; store it off in al, but we only 
    ; want 6 bits of this number
	mov	dx, 40h
	in	al, dx
	and     al, 63

    ; add the two numbers to produce a
    ; random digit in range
	add	ax, bx
	
	ret

Excellent. We can span the breadth of our screen with random flakes. Now it’s time to progress them down the screen. Here’s the main frame routine to do so.

no_kbhit:

    ; put a new snowflake at the top 
    ; of the screen
	call	get_random
	mov	di, ax
	mov	byte ptr es:[di], 15

decend:

    ; we can't move snowflakes any further
    ; than the bottom of the screen so we
    ; process all other lines
	mov	di, 63680
	mov	cx, 63680
	
next_pixel:

    ; test if there is a snowflake at the 
    ; current location
	mov	al, es:[di]
	cmp	al, 0
	je	no_flake
	
    ; test if there is a snowflake beneath
    ; us at the moment
	mov	al, es:[di+320]
	cmp	al, 0
	jne	no_flake
	
    ; move the snowflake from where we are 
    ; at the moment to one line below us
	xor	al, al
	mov	byte ptr es:[di], al
	mov	al, 15
	mov	byte ptr es:[di+320], 15
	
no_flake:

    ; move our way through video memory
	dec	di
	dec	cx
	jnz	next_pixel

    ; check for a keypress
	mov	ah, 01h
	int	16h
	jz	no_kbhit

The code itself above is pretty well commented, you shouldn’t need me to add much more here. There are a couple too many labels in the code, but they should help to add readability. I’ll leave it as an exercise to the reader to implement different speeds, colours and maybe even some horizontal movement (wind). Cool stuff.

Mandelbrot set

Another cool routine built off of some relatively simple mathematics is the mandelbrot set. Wikipedia has a really good write up if you’re a little rusty on the ins and outs of a mandelbrot set.

Mandelbrot

This tutorial assumes that you’ve already got a video display ready with a pointer to your buffer. We’ll just focus on the function that makes the doughnuts. Here’s the code, explanation to follow. This code has been lifted out of a file that I had in an old dos program. It was written using turbo C, but will port over to anything pretty easily.

Anyway, on to the code!

void mandelbrot_frame(double zoom, int max_iter, int xofs, int yofs) {
	double zx = 0, zy = 0, cx = 0, cy = 0;

    /* enumerate all of the rows */
	for (int y = 0; y < 200; y ++) {

        /* enumerate all of the columns */
		for (int x = 0; x < 320; x ++) {

            /* initialize step variables */
			zx = zy = 0;
			cx = (x - xofs) / zoom;
			cy = (y - yofs) / zoom;

			int iter = max_iter;

            /* calculate the iterations at this particular point */
			while (zx * zx + zy * zy < 4 && iter > 0) {
				double tmp = zx * zx - zy * zy + cx;
				zy = 2.0 * zx * zy + cy;
				zx = tmp;

				iter --;
			}

            /* plot the applicable pixel */
			video[x + (y * 320)] = (unsigned char) iter & 0xff;
		}
	}
}

So, you can see this code is very simple - to the point. We need to investigate all of the pixels (in this case 320x200 - or 64,000) and we calculate the iteration intersection at each point. The variables passed into the function allows the caller to animate the mandelbrot. zoom will take you further into the pattern, xofs and yofs will translate your position by this (x,y) pair. max_iter just determines how many cycles the caller wants to spend working out the iteration count. A higher number means that the plot comes out more detailed, but it slower to generate.

2D in OpenGL

Whilst OpenGL enjoys a lot of fame as a 3D graphics package, it’s also a fully featured 2D graphics library as well. In this short tutorial, I’ll walk you through the code required to put OpenGL into 2D mode. This tutorial does assume that you’re ready to run some OpenGL code. It won’t be a primer in how to use SDL or GLUT, etc. Here’s the code.

/* first of all, we need to read the dimensions of the 
   display viewport. */
int viewport[4];
glGetIntegerv(GL_VIEWPORT, viewport);

/* setup an orthographic matrix using the viewport 
   dimensions on the projection matrix */
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0, viewport[2], viewport[3], 0, -1, 1);

/* clear out the modelview matrix */
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();

/* disable depth testing (as we're not using z) */
glDisable(GL_DEPTH_TEST);

First off, we use glGetIntegerv to read the dimensions of the viewport. You will probably already have these values on hand anyway, but getting the viewport this way just generalises the code and it’s not expensive to do so. We generate an orthographic matrix next using glOrtho and the viewport information we retrieved in the first step. The model view matrix is then kept clean - out of habbit, i’d say and finally `glDisable is used to turn off depth testing.

We’re in 2D. No Z-Axis, no depth testing! That’s it! You can draw what you need to as simply as this:

/* note that we're not clearing the depth buffer */
glClear(GL_COLOR_BUFFER_BIT);

glMatrixMode(GL_MODELVIEW);
glLoadIdentity();

/* draw a triangle */
glBegin(GL_TRIANGLES);

glColor3ub(255, 0, 0);
glVertex2d(0, 0);

glColor3ub(0, 255, 0);
glVertex2d(100,0);

glColor3ub(0, 0, 255);
glVertex2d(50, 50);

glEnd();

Quick update to this one - I switched the parameters around in the call to glOrtho so that it’s a much more natural drawing experience (putting 0,0 in the top left hand corner of the screen). It’s not perfect mathematically but it sure does help any of your existing code that assumes your screen goes positive to the right and positive to the bottom!