An interesting way of finding values that fall within a domain is to perform random sample analysis with the Monte Carlo method. This method of finding values relies on random values (lots of them) being measured against some form of deterministic calculation to determine if the value falls within the source function’s scope.
In today’s post, I’m going to illustrate how to use this method in some practical scenarios.
Approximating π
To approximate the value of π, we’re going to treat a square containing a quarter of a unit circle with a lot of random data. We only treat a quarter of a circle (which will contain angles 0 through 90) as we can easily mirror image a single quarter 4 times. Mathematically, you can consider the ratio of the circle’s area with respect to the square that contains it.
Circle area = πr^2
Square area = 2r^2 + 2r^2
= 4r^2
Ratio = πr^2 / 4r^2
= π/4
For every point that we randomly sample, the following must be true in order for us to consider the point as satisfying the circle:
rx^2 + ry^2 <= radius^2
This tells us, that from the midpoint 0,0, if the x and y values that we’ve randomly selected are within the bounds of the radius; we’ll consider it as “in”.
Once we’ve sampled enough data, we’ll take the ratio of points that are “in” and points that are “out”. We are still only dealing with a quarter of a circle, so we’ll multiply our result out as well and we should get close to π if we’ve sampled enough data. Here’s how it looks in python:
runs=1000000radius=1# get a batch of random points
rand_points=map(lambdax:(random()*radius,random()*radius),range(runs))# filter out the points that satisfy our equation
in_points=filter(lambda(x,y):((x*x)+(y*y))<=(radius*radius),rand_points)# calculate the ratio of points in the circle vs. points out of the circle
ratio=float(len(in_points))/float(runs)# multiply this figure by 4 to get all 4 quadrants considered
estimate=ratio*4
runs is the number of points that we’re going to sample. radius is only defined to be clear. If you were to change the radius of the tested area, your output ratio would need to be adjusted also.
Running this code a few times, I get the following results:
3.140356
3.14274
3.14064
3.142
3.140664
Area under a curve
When it comes to finding the area under a curve, nothing really beats numeric integration. In some cases though, your source function doesn’t quite allow for integration. In these cases, you can use a Monte Carlo simulation to work it out. For the purposes of this post though, I’ll work with x^2.
Let’s integrate it to begin with and work out what the area is between the x-axis points 0 and 3.
f(x) = x^2
ʃf(x) = x^3/3
area = ʃf(3) - ʃf(0)
= 9
So we’re looking for a value close to 9. It’s also important to note the values of our function’s output at the start of where we want to take the area from to the end as this will setup the bounds of our test:
f(x) = x^2
f(0) = 0
f(3) = 9
The area that we’ll be testing from is 0, 0 to 3, 9. The following code looks very similar to the π case. It has been adjusted to test the area and function:
runs=1000000max_x=3max_y=9# get a batch of random points
rand_points=map(lambdax:(random()*max_x,random()*max_y),range(runs))# filter out the points that satisfy our equation
in_points=filter(lambda(x,y):y<=(x*x),rand_points)# calculate the ratio of points in the curve area vs. points outside
ratio=float(len(in_points))/float(runs)# the estimate is the ratio over the area of the rectangle
estimate=ratio*(max_x*max_y)
Here’s some example outputs. Remember, our answer is 9; we want something close to that:
In today’s post, we’re going to dissect the internals of this format.
MZ
This particular gets its name “MZ” due to the first two bytes of the file 0x4d and 0x5a. Translated to ASCII text, these two bytes form the characters “MZ”. This is the opening signature (or magic number) for a file of this format.
The header
The first chunk of an EXE file is the header information. It stores relocation information important to the execution of the file. A few important notes when reading the header:
All values spanning more than one byte are stored LSB first
A block is 512 bytes in size
A paragraph is 16 bytes in size
Offset
Description
0x00-0x01
The values 0x4d and 0x5a translating to the ASCII string “MZ”. This is the magic number for the file
0x02-0x03
The number of bytes used in the last block of the EXE. A zero value indicates that the whole block is used
0x04-0x05
The number of blocks that form part of the EXE
0x06-0x07
The number of relocation entries. These are stored after the header
0x08-0x09
The number of paragraphs in the header
0x0A-0x0B
The number of paragraphs required for uninitialized data
0x0C-0x0D
The number of paragraphs of additional memory to constrain this EXE to
The first two bytes are indeed “MZ”, or 0x4d0x5a. So we’ve got the correct signature.
(0x02-0x03) 2200
This is the number of bytes used in the last block of the EXE. Remember, we’ve got LSB first when we’re dealing with multi-byte values, so this is 0x22 bytes. If you take a look at the resulting code listing above, you’ll see that the code for the executable starts at address 0x200 and ends at 0x220. At the address of 0x220, 2 additional bytes are used.
This is our 0x22 bytes as it is the first, last and only block that we have!
(0x04-0x05) 0200
This is the number of blocks (remember: 512 bytes chunks) that comprise of our EXE. We have 2. Our header is using the first block, our code and data is in the second.
(0x06-0x07) 0100
We have 1 relocation item. A relocation item is just a 16-bit value for the offset followed by a 16-bit value for the segment.
This calculates out. 512 bytes in the header. We can see that the file offset starts at 0x00. Code doesn’t appear until 0x200. 0x200 is 512 in decimal.
(0x0A-0x0B) 0000
Our program didn’t define any uninitialized data, only a pre-initialized string: “Hello, world”.
(0x0C-0x0D) ffff
This is the default mode of operation for memory constraints. It says, use everything (i.e. don’t place any constraint).
(0x0E-0x0F) 0000
No translation to the stack segment (SS) will go on here. This value gets added to the segment value of where the program was loaded at and that’s how SS is initialized. The program that we’ve written didn’t define a stack, so no translation required.
(0x10-0x11) 0000
SP’s initial value
(0x12-0x13) 0000
This is the word checksum. It’s seldom used.
(0x14-0x15) 0000
The instruction pointer will start at 0x0000.
(0x16-0x17) 0000
This value would adjust CS.
(0x18-0x19) 3e00
This is the address of the first relocation item in the file. If we take a look back at the dump now, we can see the value sat at that address:
This takes the format of offset:segment here, so we’ve got 0000:0100. This will be used at execution time and will also influence the resulting stack segments and offsets.
(0x1A-0x1B) 0000
Overlay number. Zero indicates that this is the main program.
The rest
Everything from here looks pretty familiar. We can see our assembly code start off and our string defined at the end.
COM files are plain binary executable file format from the MS-DOS era (and before!) that provide a very simple execution model.
The execution environment is given one 64kb segment to fit its code, stack and data segments into. This memory model is sometimes referred to as the “tiny” model.
In today’s post, we’re going to write a really simple program; compile it, disassemble it and dissect it. Here’s our program that very helpfully prints “Hello, world!” to the console and then exits.
ORG 100h
section .text
start:
mov dx, msg
mov ah, 09h
int 21h
ret
section .data
msg DB 'Hello, world!', 13, 10, '$'
Nothing of great interest here. The only thing worth a mention is the ORG directive. This tells the assembler (and therefore the execution environment once executed) that our program starts at the offset 100h. There’s some more information regarding 16bit programs with nasm here.
nasm’s default output format is plain binary so, assembly is very simple:
$ nasm hello.asm -o hello.com
Running our program in dosbox and we’re given our prompt as promised. Taking a look at the binary on disk, it’s seriously small. 24 bytes small. We won’t have much to read when we dissassemble it!
Because this is a plain binary file, we need to give objdump a little help in how to present the information.
hello.com: file format binary
Disassembly of section .data:
00000000 <.data>:
0: ba 08 01 mov $0x108,%dx
3: b4 09 mov $0x9,%ah
5: cd 21 int $0x21
7: c3 ret
8: 48 dec %ax
9: 65 gs
a: 6c insb (%dx),%es:(%di)
b: 6c insb (%dx),%es:(%di)
c: 6f outsw %ds:(%si),(%dx)
d: 2c 20 sub $0x20,%al
f: 77 6f ja 0x80
11: 72 6c jb 0x7f
13: 64 21 0d and %cx,%fs:(%di)
16: 0a 24 or (%si),%ah
Instructions located from 0 through to 7 correspond directly to the assembly source code that we’ve written. After this point, the file is storing our string that we’re going to print which is why the assembly code looks a little chaotic.
Removing the jibberish assembly language, the bytes directly correspond to our string:
So, our string starts at address 8 but the first line of our assembly code; the line that’s loading dx with the address of our string msg has disassembled to this:
0: ba 08 01 mov $0x108,%dx
The address of $0x108 is going to overshoot the address of our string by 0x100! This is where the ORG directive comes in. Because we have specified this, all of our addresses are adjusted to suit. When DOS loads our COM file, it’ll be in at 0x100 and our addresses will line up perfectly.
sysstat is a collection of utilities for Linux that provide performance and activity usage monitoring. In today’s post, I’ll go through a brief explanation of these utilities.
iostat
iostat(1) reports CPU statistics and input/output statistics for devices, partitions and network filesystems.
mpstat goes a little deeper into how the cpu time is divided up among its responsibilities. By specifying -P ALL on the command line to it, you can get a report per cpu:
pidstat will give you the utilisation breakdown by process that’s running on your system.
sar
sar(1) collects, reports and saves system activity information (CPU, memory, disks, interrupts, network interfaces, TTY, kernel tables,etc.)
sar requires that data collection is on to be used. The settings defined in /etc/default/sysstat will control this collection process. As sar is the collection mechanism, other applications use this data:
sadc(8) is the system activity data collector, used as a backend for sar.
sa1(8) collects and stores binary data in the system activity daily data file. It is a front end to sadc designed to be run from cron.
sa2(8) writes a summarized daily activity report. It is a front end to sar designed to be run from cron.
sadf(1) displays data collected by sar in multiple formats (CSV, XML, etc.) This is useful to load performance data into a database, or import them in a spreadsheet to make graphs.
Docker is a platform that allows you to bundle up your applications and their dependencies into a distributable container easing the overhead in environment setup and deployment.
The Dockerfile reference in the docker documentation set goes through the important pieces of building an image.
In today’s post, I’m just going to run through some of the commands that I’ve found most useful.
Building a container
# build an image and assign it a tagsudo docker build -t username/imagename:tag .
Controlling containers
# run a single commandsudo docker run ubuntu /bin/echo 'Hello world'# run a container in a daemonized statesudo docker run -d ubuntu /bin/sh -c"while true; do echo hello world; sleep 1; done"# run a container interactivelysudo docker run -t-i ubuntu /bin/bash
# connect to a running containersudo docker attach container_id
# stop a running containersudo docker stop container_name
# remove a containersudo docker rm container_name
# remove an imagesudo docker rmi image_name
When running a container, -p will allow you to control port mappings and -v will allow you to control volume locations.
Getting information from docker
# list imagessudo docker images
# list running containerssudo docker ps
# list all containerssudo docker ps -a# inspecting the settings of a containersudo docker inspect container_name
# check existing port mappingssudo docker port container_name
# retrieve stdout from a running containersudo docker logs container_name
sudo docker logs -f container_name