Cogs and Levers A blog full of technical stuff

Simple FPU Operations

The CPU itself is an impressive piece of kit just by itself however it does struggle to do complex floating point mathematics. Chip designers noticed this deficiency pretty quickly and bolted on a floating-point arithmetic unit. It should be noted at this point that many, many other tutorial/article writers have gone into great depth explaining how to program FPU’s and they have been great references to me in previous times:

Anyway, this blog post is just going to show you a few nuggets that you can use straight away. It’s always fun seeing this stuff in action. This snippet will sum (accumulate) together an array of doubles.

; entry conditions
; edx points to an array of doubles
; ecx holds the number of items in the array

; set st0 to zero
fldz                            

next:

; add the next double to st0
fadd    qword [edx]             

; progress
add     edx, 8                  
dec     ecx                     
jnz     next                    

; at this point, st0 holds the sum

This following snippet will show you a simple addition, subtraction, multiplication and division. You’ll notice a pretty distinct pattern in what to do in these situations. It’ll follow:

  • Load the first term (fld)
  • Apply the operator (fadd,fsub,fmul,fdiv) with the second term
  • Store the result into memory
section .data

a:	dq	3.333333333	
b:	dq	4.444444444	
	
section .bss 		

c:	resq	1		

section .text

addb:				
	fld	qword [a] 	
	fadd	qword [b]	
	fstp	qword [c]	
	
subb:				
	fld	qword [a] 	
	fsub	qword [b]	
	fstp	qword [c]	
	
mulb:				
	fld	qword [a]	
	fmul	qword [b]	
	fstp	qword [c]	
	
diva:				
	fld	qword [b] 	
	fdiv	qword [a]	
	fstp	qword [c]	

With a bit of extra help from the C library, you can print out values that you’re using in the FPU. The following snippet prints PI to the console.

; compiled on linux 64 bit using the following
;
; nasm -f elf32 print.asm -o print.o
; gcc -m32 print.o -o print
;

[bits 32]

section .text

extern printf
global main

main:
   ; load pi into st(0)
   fldpi

   ; prepare some space on the stack
   sub   esp, 8
   ; to be able to push st(0) in there
   fstp  qword [esp]
   ; get the string format on the stack as well
   push  format
   ; print the string
   call  printf

   ; repair the stack
   ;   4 bytes memory address (for the format)
   ; + 8 bytes memory for the float
   ; =========
   ;  12 bytes
   add   esp, 12

   ; exit without error
   xor   eax, eax
   ret

section .data

format: db "%.20g",10,0

As I find other snippets in old pieces of code, I’ll be sure to add them to this page.

Snow flakes keep falling on my... screen?

A very simple effect this time around. It’s snow flakes. The operating premise for the effect is very simple and goes like this:

  • Generate 1 new snow flake at the top of the screen at every frame
  • A snow flake has an absolute floor of the last line in video memory
  • A snow flake should come to rest if it lands on top of another

Snowflakes

That’s it! So, immediately we need a way to get random numbers. We’re using a 320x200 screen here and my dodgy routine for getting random numbers only returns us 8 bit numbers (which gets us to 255). We need to add some more width to these numbers if we expect to be able to randomize across the whole 320 column positions. Calling the random port twice and adjusting the resolution of the second number should do it for us, such that:

8 bits (256) and 6 bits (64) will give us 320 - or the ability to add using 14 bits worth of numbers, which in this configuration takes us up to 320. Perfect.

Here’s the code!

get_random:
    ; start out with ax and bx = 0
	xor	ax, ax
	xor     bx, bx

    ; get the first random number and
    ; store it off in bl
	mov	dx, 40h
	in	al, dx
	mov	bl, al

    ; get the second random number and
    ; store it off in al, but we only 
    ; want 6 bits of this number
	mov	dx, 40h
	in	al, dx
	and     al, 63

    ; add the two numbers to produce a
    ; random digit in range
	add	ax, bx
	
	ret

Excellent. We can span the breadth of our screen with random flakes. Now it’s time to progress them down the screen. Here’s the main frame routine to do so.

no_kbhit:

    ; put a new snowflake at the top 
    ; of the screen
	call	get_random
	mov	di, ax
	mov	byte ptr es:[di], 15

decend:

    ; we can't move snowflakes any further
    ; than the bottom of the screen so we
    ; process all other lines
	mov	di, 63680
	mov	cx, 63680
	
next_pixel:

    ; test if there is a snowflake at the 
    ; current location
	mov	al, es:[di]
	cmp	al, 0
	je	no_flake
	
    ; test if there is a snowflake beneath
    ; us at the moment
	mov	al, es:[di+320]
	cmp	al, 0
	jne	no_flake
	
    ; move the snowflake from where we are 
    ; at the moment to one line below us
	xor	al, al
	mov	byte ptr es:[di], al
	mov	al, 15
	mov	byte ptr es:[di+320], 15
	
no_flake:

    ; move our way through video memory
	dec	di
	dec	cx
	jnz	next_pixel

    ; check for a keypress
	mov	ah, 01h
	int	16h
	jz	no_kbhit

The code itself above is pretty well commented, you shouldn’t need me to add much more here. There are a couple too many labels in the code, but they should help to add readability. I’ll leave it as an exercise to the reader to implement different speeds, colours and maybe even some horizontal movement (wind). Cool stuff.

Mandelbrot set

Another cool routine built off of some relatively simple mathematics is the mandelbrot set. Wikipedia has a really good write up if you’re a little rusty on the ins and outs of a mandelbrot set.

Mandelbrot

This tutorial assumes that you’ve already got a video display ready with a pointer to your buffer. We’ll just focus on the function that makes the doughnuts. Here’s the code, explanation to follow. This code has been lifted out of a file that I had in an old dos program. It was written using turbo C, but will port over to anything pretty easily.

Anyway, on to the code!

void mandelbrot_frame(double zoom, int max_iter, int xofs, int yofs) {
	double zx = 0, zy = 0, cx = 0, cy = 0;

    /* enumerate all of the rows */
	for (int y = 0; y < 200; y ++) {

        /* enumerate all of the columns */
		for (int x = 0; x < 320; x ++) {

            /* initialize step variables */
			zx = zy = 0;
			cx = (x - xofs) / zoom;
			cy = (y - yofs) / zoom;

			int iter = max_iter;

            /* calculate the iterations at this particular point */
			while (zx * zx + zy * zy < 4 && iter > 0) {
				double tmp = zx * zx - zy * zy + cx;
				zy = 2.0 * zx * zy + cy;
				zx = tmp;

				iter --;
			}

            /* plot the applicable pixel */
			video[x + (y * 320)] = (unsigned char) iter & 0xff;
		}
	}
}

So, you can see this code is very simple - to the point. We need to investigate all of the pixels (in this case 320x200 - or 64,000) and we calculate the iteration intersection at each point. The variables passed into the function allows the caller to animate the mandelbrot. zoom will take you further into the pattern, xofs and yofs will translate your position by this (x,y) pair. max_iter just determines how many cycles the caller wants to spend working out the iteration count. A higher number means that the plot comes out more detailed, but it slower to generate.

2D in OpenGL

Whilst OpenGL enjoys a lot of fame as a 3D graphics package, it’s also a fully featured 2D graphics library as well. In this short tutorial, I’ll walk you through the code required to put OpenGL into 2D mode. This tutorial does assume that you’re ready to run some OpenGL code. It won’t be a primer in how to use SDL or GLUT, etc. Here’s the code.

/* first of all, we need to read the dimensions of the 
   display viewport. */
int viewport[4];
glGetIntegerv(GL_VIEWPORT, viewport);

/* setup an orthographic matrix using the viewport 
   dimensions on the projection matrix */
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0, viewport[2], viewport[3], 0, -1, 1);

/* clear out the modelview matrix */
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();

/* disable depth testing (as we're not using z) */
glDisable(GL_DEPTH_TEST);

First off, we use glGetIntegerv to read the dimensions of the viewport. You will probably already have these values on hand anyway, but getting the viewport this way just generalises the code and it’s not expensive to do so. We generate an orthographic matrix next using glOrtho and the viewport information we retrieved in the first step. The model view matrix is then kept clean - out of habbit, i’d say and finally `glDisable is used to turn off depth testing.

We’re in 2D. No Z-Axis, no depth testing! That’s it! You can draw what you need to as simply as this:

/* note that we're not clearing the depth buffer */
glClear(GL_COLOR_BUFFER_BIT);

glMatrixMode(GL_MODELVIEW);
glLoadIdentity();

/* draw a triangle */
glBegin(GL_TRIANGLES);

glColor3ub(255, 0, 0);
glVertex2d(0, 0);

glColor3ub(0, 255, 0);
glVertex2d(100,0);

glColor3ub(0, 0, 255);
glVertex2d(50, 50);

glEnd();

Quick update to this one - I switched the parameters around in the call to glOrtho so that it’s a much more natural drawing experience (putting 0,0 in the top left hand corner of the screen). It’s not perfect mathematically but it sure does help any of your existing code that assumes your screen goes positive to the right and positive to the bottom!

Plasma, plasma, plasma!

Kicking back into old, old, old school mode, I had found another cool effect laying around that seemed to work really well in dosbox. It’s a plasma (if you couldn’t tell from the title). Plasmas are the cool, blobby sort of shapeless eye-grabbers that are seriously cool and simple. The basic mathematical thory of the plasma is simple.

  • 4 state counters for the program track where you’re up to overall
  • 4 state counters per frame render track where you’re up to for that frame
  • Each pixel is the result of these 4 cosine wave intersections. You can include the x and y counting dimensions to add the 6 intersections.

That’s it really. There’s a little special sauce to make the plasma move and mutate but they really have no bearing over the generation of the effect itself.

Cosine? Easy, I’ll just call cosf()!

Well, not quite. This is a demo routine that will entirely written in assembly language (8086 assembly language to be exact) and as such we won’t have the luxury of a math library (or math-coprocessor) to do the work of finding out cosine values. So, we must pre-calculate. A small C application gives us all the table pre-calculation we’ll need for this application. It’s good to keep this application handy to re-pre-calculate this table to taste. If you like a little extra calculation to go into your cos table, that is. Me, I like nerdy numbers. So, according to this cos table, there are 256 degress in a circle (see what I did there) and the top of the cos curve (1.0) is 255 moving through the centre point (0.0) at 127 all the way down to the bottom point (-1.0) at 0.

Here’s the code to generate that table.

#include <stdio.h>
#include <math.h>

#define PI_BY_2		(3.14159f * 2)

int main(int argc, char *argv[]) {

	int theta = 0, count = 0, count2 = 0;
	unsigned char values[256];

	for (theta = 0; theta <= 255; theta ++) {
		float angle = ((float)theta / 256.0f) * PI_BY_2;
		values[theta] = (unsigned char)((cosf(angle) * 127.0f) + 127.0f);
	}

	printf("costab DB ");

	for (count = 0; count < 32; count ++) {
		for (count2 = 0; count2 < 8; count2 ++) {
			printf("%03xh, ", values[(count << 3) + count2]);
		}

		printf("\n       DB ");
	}

	return 0;
}

Here is the table that is generated when running this code.

costab DB 0feh, 0fdh, 0fdh, 0fdh, 0fdh, 0fdh, 0fch, 0fch
       DB 0fbh, 0fah, 0fah, 0f9h, 0f8h, 0f7h, 0f6h, 0f5h
       DB 0f4h, 0f3h, 0f1h, 0f0h, 0efh, 0edh, 0ebh, 0eah
       DB 0e8h, 0e6h, 0e5h, 0e3h, 0e1h, 0dfh, 0ddh, 0dah
       DB 0d8h, 0d6h, 0d4h, 0d1h, 0cfh, 0cdh, 0cah, 0c8h
       DB 0c5h, 0c2h, 0c0h, 0bdh, 0bah, 0b8h, 0b5h, 0b2h
       DB 0afh, 0ach, 0a9h, 0a6h, 0a3h, 0a0h, 09dh, 09ah
       DB 097h, 094h, 091h, 08eh, 08bh, 088h, 085h, 082h
       DB 07fh, 07bh, 078h, 075h, 072h, 06fh, 06ch, 069h
       DB 066h, 063h, 060h, 05dh, 05ah, 057h, 054h, 051h
       DB 04eh, 04bh, 048h, 045h, 043h, 040h, 03dh, 03bh
       DB 038h, 035h, 033h, 030h, 02eh, 02ch, 029h, 027h
       DB 025h, 023h, 020h, 01eh, 01ch, 01ah, 018h, 017h
       DB 015h, 013h, 012h, 010h, 00eh, 00dh, 00ch, 00ah
       DB 009h, 008h, 007h, 006h, 005h, 004h, 003h, 003h
       DB 002h, 001h, 001h, 000h, 000h, 000h, 000h, 000h
       DB 000h, 000h, 000h, 000h, 000h, 000h, 001h, 001h
       DB 002h, 003h, 003h, 004h, 005h, 006h, 007h, 008h
       DB 009h, 00ah, 00ch, 00dh, 00eh, 010h, 012h, 013h
       DB 015h, 017h, 018h, 01ah, 01ch, 01eh, 020h, 023h
       DB 025h, 027h, 029h, 02ch, 02eh, 030h, 033h, 035h
       DB 038h, 03bh, 03dh, 040h, 043h, 045h, 048h, 04bh
       DB 04eh, 051h, 054h, 057h, 05ah, 05dh, 060h, 063h
       DB 066h, 069h, 06ch, 06fh, 072h, 075h, 078h, 07bh
       DB 07eh, 082h, 085h, 088h, 08bh, 08eh, 091h, 094h
       DB 097h, 09ah, 09dh, 0a0h, 0a3h, 0a6h, 0a9h, 0ach
       DB 0afh, 0b2h, 0b5h, 0b8h, 0bah, 0bdh, 0c0h, 0c2h
       DB 0c5h, 0c8h, 0cah, 0cdh, 0cfh, 0d1h, 0d4h, 0d6h
       DB 0d8h, 0dah, 0ddh, 0dfh, 0e1h, 0e3h, 0e5h, 0e6h
       DB 0e8h, 0eah, 0ebh, 0edh, 0efh, 0f0h, 0f1h, 0f3h
       DB 0f4h, 0f5h, 0f6h, 0f7h, 0f8h, 0f9h, 0fah, 0fah
       DB 0fbh, 0fch, 0fch, 0fdh, 0fdh, 0fdh, 0fdh, 0fdh

Ooooh aaahhh, that’s a nerdy cosine table! Now that we’ve solved all of the world’s mathematical problems here, it’s on to the effect! Just getting 4 counters to run over this cosine table and intersect with each other can produce a mesmerising result. Without setting a palette (the standard vga palette is a bit: ewwwwww), here’s how the effect looks:

Plasma on standard VGA

Feel like you’re at Woodstock yet? So, the effect really spans across two smaller functions, which I’ve tried to comment as best I can below. Here’s drawing a single frame:

plasma_frame:

	; jump over the local variables
	jmp plasma_frame_code

	temp_phase_1 DB	0
	temp_phase_2 DB 0
	temp_phase_3 DB 0
	temp_phase_4 DB 0

	y_loc 		 DW 0

plasma_frame_code:	

	; setup where we'll draw to
	xor		di, di

	; setup a pointer to our cos table
	lea		si, costab

	; iterate over every pixel
	mov		cx, 64000

	; setup temp state into 3 and 4
	mov		al, phase_3
	mov		temp_phase_3, al
	
	mov		al, phase_4
	mov		temp_phase_4, al
	
reset_1_and_2:

	; re-setup temp state into 1 and 2
	mov		al, phase_1
	mov		temp_phase_1, al
	
	mov		al, phase_2
	mov		temp_phase_2, al
	
	; save our overall progress
	push	cx
	; process the next row of pixels
	mov		cx, 320

plasma_frame_pixel:

	; calculate the pixel value
	; col = costab[t1] + costab[t2] + costab[t3] + costab[t4] 

	xor		bx, bx
	
	mov		bl, temp_phase_1
	mov		al, ds:[si + bx]
	
	mov		bl, temp_phase_2
	add		al, ds:[si + bx]
	adc		ah, 0
	
	mov		bl, temp_phase_3
	add		al, ds:[si + bx]
	adc		ah, 0
	
	mov		bl, temp_phase_4
	add		al, ds:[si + bx]
	adc		ah, 0

	; draw the pixel
	mov		es:[di], al

	; adjust counter 1
	mov		al, temp_phase_1
	add		al, 2
	mov		temp_phase_1, al
	
	; adjust counter 2
	mov		al, temp_phase_2
	sub		al, 1
	mov		temp_phase_2, al

	; move onto the next pixel
	inc		di
	dec		cx
	jnz		plasma_frame_pixel

	; adjust the y location by 1
	inc		y_loc

	pop		cx
	
	; adjust counter 3
	mov		al, temp_phase_3
	add		al, 2
	mov		temp_phase_3, al
	
	; adjust counter 4
	mov		al, temp_phase_4
	sub		al, 1
	mov		temp_phase_4, al		
	
	sub		cx, 320
	jnz		reset_1_and_2
	
	ret

Drawing a single frame isn’t too difficult at all. It’s important to remember that es:[di] is pointing to the vga buffer to draw to where as ds:[si] is pointing at the cosine table. We’re using bx as a base pointer to offset si such that it acts as our array index. Neat-O!

Between frame draws, we need to make the plasma MOVE!!.. This is just some simple additions or subtractions. Using random values adds a sense of entropy to the process making the plasma move in an almost unpredictable way. It’s a little more organic this way. I haven’t done it this way though. The code you’ll see below moves the plasma by fixed amounts per frame. Still gives it some movement.

move_plasma:

	mov		al, phase_1
	add		al, 2	
	mov		phase_1, al
	
	mov		al, phase_2
	add		al, 1	
	mov		phase_2, al
	
	mov		al, phase_3
	sub		al, 1	
	mov		phase_3, al
	
	mov		al, phase_4
	add		al, 2	
	mov		phase_4, al

	ret

Wrapping those two calls in a loop that waits for a key to be pressed is all you should need to draw a plasma to the screen.

Things for you to do:

  • Change the cosine table generation to produce a more interesting cosine curve
  • Apply a palette to take the 60’s-ness out of the default palette
  • Apply a palette that you can cycle through (like 1024 or 2048 entries in size) so that the palette (and therefore the plasma) will morph colour as frames progress
  • Add randomness.

Have fun.