Cogs and Levers A blog full of technical stuff

Mounting remote filesystems with sshfs

Getting direct access over ssh is simplified greatly by sshfs, a fuse based file system. To get started, install sshfs with your favourite package manager:

$ sudo pacman -S sshfs

To connect to a remote file system, you just use the following:

$ sshfs host: mountpoint

Much like ssh, the host argument can take on the format of user@host if you’re logged in as a user that doesn’t correspond to the remote machine.

When you’re done, unmounting the filesystem is done like so:

$ fusermount -u mountpoint

A GDT Primer

For Intel chips, the major processes (like memory management, interrupts, etc) are managed through a set of tables. These tables are as simple as length and a linear address to the actual table data.

The GDT or Global Descriptor Table is one of these tables and it’s what your CPU uses to describe its internal memory segmentation for the system.

In today’s post, I’ll take you through how the GDT is defined and how it is applied to your system.

What is it and how is it defined?

Like I said in the introduction, all that the GDT is made up of is a length and a linear address to the data. Here’s an example below, defined in assembly language

gdt_data:
   DQ 0x0000000000000000
   DQ 0x00CF9A000000FFFF
   DQ 0x00CF92000000FFFF

gdt_tab:
   DW 23
   DD gdt_data

In the above snippet, gdt_data defines the actual GDT entries. We’ll get into what the values mean shortly, but for now it’s important to understand that this block of data starts with a null entry (or all zeros) and then the entries begin. You’ll see that each entry is defined by DQ, so each entry is 8 bytes.

gdt_tab starts with the length of the structure (minus 1). The whole “minus 1” part comes in because the expected data type of the length is a word, it can only hold a maximum value of 65535 but you are allowed up to 65536 entries in the table. Obviously, it’s invalid to specify a table that has a zero length. Next, gdt_tab defines a linear address to the table data itself, gdt_data.

How is a GDT entry assembled?

Each GDT entry conforms to the following format:

Start End Meaning Size
63 56 Base (bits 24 - 31) 8 bits
55 52 Flags 4 bits
51 48 Limit (bits 16 - 19) 4 bits
47 40 Access byte 8 bits
39 16 Base (bits 0 - 23) 24 bits
15 0 Limit (bits 0 - 15) 16 bits

From this table, you can see that it defines a 32 bit base which is a linear address of where the segment begins and a 20 bit limit which is the maximum addressable unit.

The access byte is 8 bits in flags that describe different access privileges. The byte breaks down like this:

Bit Code Description
7 Pr Present bit. Must be 1 for all selectors.
6-5 Privl Privilege bits. Defines the ring level this selector is allowed to be used from.
4   Always 1
3 Ex Executable bit. 1 for code, 0 for data
2 DC Direction bit/Conforming bit. This is a direction bit for data selectors, in which case when it is set to 0, the segment grows up. 1, it’ll grow down. This is a conforming bit for code selectors. When is is set to 1, execution is allowed by the defined privilege level or below. When it’s 0, it’s only allowed from the defined privilege level.
1 RW Readable for code selectors, Writeable for data selectors. Code selectors can’t have write access and data selectors don’t have read access.
0 Ac Leave this as 0. The CPU will set it to 1 once the segment is accessed

The flags nibble is 4 bits that control size:

Bit Code Description
7 Gr Granularity when set to 0 will make the limit be interpreted in bytes. When it’s set to 1, the limit is defined in pages (4KiB blocks)
6 Sz Size when 0 defines 16-bit protected mode. 1 defines as 32-bit mode selectors
5 L Long when set to 1 will setup 64-bit mode selectors. Sz must be set to 0
4   Unused. Set to 0

How do the values breakdown?

Above, we had some example data that we were setting up for a GDT. Here’s how those values break down.

Original value
00CF9A000000FFFF

base  24-31 : 00
flags       : C  (1100b)
limit 16-19 : F
access      : 9A (10011010b)
base 23-0   : 000000
limit 15-0  : FFFF

This particular entry says it’s at a base of 0x00000000, has a limit of 0xFFFFF. The access byte tells us that the segment is:

  • Present
  • Is privileged to Ring-0
  • Is executable
  • Can ONLY be executed in Ring-0
  • Is readable

The flags also tell us that the segment has:

  • A limit that is expressed in 4KiB units
  • Our selectors are 32 bits

How is it set?

Actually defining the GDT entries is one thing, but you also need to set them as well. This is quite an easy process.

mov   eax, gdt_tab   ; load in the address of the table
lgdt  [eax]          ; load the new GDT

After this has happened, we need to jump into our new segment to continue executing code. In the table gdt_tab, the code segment was defined 2nd (after the null entry). The code segment definition is 0x08 (or just 8) bytes into the table.

After jumping to our code segment, we need to refresh all of the segment selectors so that they’re now pointing at the right place as well. 16 bytes (0x10) into the table (the third entry) is where we’ve defined the data segment.

   jmp   0x08:refresh_segments

refresh_segments:

   mov   eax, 0x10
   mov   ds, ax
   mov   es, ax
   mov   fs, ax
   mov   gs, ax
   mov   ss, ax

Differences between 32 and 64 bit

Segmentation is very simple once you enter the 64 bit world. Four of the segment registers: CS, SS, DS and ES start at 0x00 and have a limit of 0xFFFFFFFFFFFFFFFF. Pretty simple. FS and GS are still capable of a non-zero base address.

An example table on how this would look is like this:

gdt_tab_64:
   DQ 0x0000000000000000
   DQ 0x00A09A0000000000
   DQ 0x00A0920000000000

You can see how the base and limits have simplified greatly here.

Conclusion

There’s quite a bit more you can learn in this field. There’s also some excellent resources around the web to help out. Here’s just a few:

Encode or Decode base64 at the console

Today’s post is a quick tip on encoding and decoding base64 information with the base64 command at the linux prompt.

In order to encode a piece of text, you can do the following:

$ echo 'I want to encode this text' | base64
SSB3YW50IHRvIGVuY29kZSB0aGlzIHRleHQK

To reverse the process, simply use the -d switch:

$ echo SSB3YW50IHRvIGVuY29kZSB0aGlzIHRleHQK | base64 -d
I want to encode this text

Simple.

Starting Linux module development

Introduction

Developing modules for the Linux Kernel has previously been a difficult discipline to even get started in, but as time has passed it’s become a more approachable and accessible topic. In today’s post, I’m going to go through

  • How to setup your build environment
  • Writing the code for a simple module
  • Building your module
  • Adding and removing your module from the Kernel

There are a lot of different sources on the internet that have this information and the best reference that I can suggest is on the LDP which is here.

We’ve got a bit to cover, so let’s get started. One last thing though, I’m running Arch Linux to write this tutorial however everything that I’m writing about here should be directly translatable into your distribution of choice.

Setting up your build environment

You need to put your system into a state where you’re able to build kernel modules, and you’ll do this with the linux-headers package from your distribution’s package manager.

sudo pacman -S linux-headers

Once this has installed, you’ll find a build environment has been made under /lib/modules/ on your system. You’ll also have all of the development files required to include in your modules.

The code

First up, just a couple of requirements. We’re going to print a kernel message using printk when the module is initialised and we’ll print another when the module has been unloaded. Pretty unimaginative but it’ll be great for the sake of demonstration.

Here’s the code:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Michael Tuttle");
MODULE_DESCRIPTION("A useless module for demonstration purposes");

/** Initialises our module into the kernel */
static int __init message_init(void) {
   printk(KERN_INFO "Module has been loaded");
   return 0;
}

/** Unloads our module from the kernel */
static void __exit message_exit(void) {
   printk(KERN_INFO "Module has been unloaded");
}

module_init(message_init);
module_exit(message_exit);

That’s it for the code. It’s pretty easy to follow. You can see that message_init is what will be called when our module is loaded and message_exit when unloaded. Traditionally these were called init_module and cleanup_module respectively but these names are allowed to change due to the use of the __init, module_init, __exit and module_exit macros.

printk is what we’ll use to send some text into the kernel messages. You retrieve these with the dmesg shell command.

Building your module

The Makefile for this module is actually quite simple, but requires a little explanation.

obj-m += msg.o

all:
   make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
   make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Quite simple in the end, but cryptic if you haven’t come across some of it before. The obj-m directive tells the make system that we want a kernel module. obj-d can be used here when we’re making a driver.

The make targets are executed against the build environment we installed above. -C tells make to issue its instructions in the folder given.

After compiling your module, you’re ready to see it in action.

Adding and Removing your module

To get your kernel to start using your module, you issue insmod against the .ko file that has been built for you.

$ sudo insmod msg.ko

The kernel now should have loaded your module and you should be able to confirm that it’s loaded by checking dmesg:

$ dmesg

. . .
. . .
. . .
[ 3320.686668] Module has been loaded

Of course there’s an easier way to check that it’s loaded. You can see a list of all the loaded modules in your kernel by issuing lsmod.

$ lsmod
Module                  Size  Used by
msg                      825  0
. . .
. . .
. . .

The first half has gone to plan. To unload this module, we now use rmmod.

$ sudo rmmod msg.ko

Now that the module has been removed, we should see the leaving message in the output of dmesg.

$ dmesg

. . .
. . .
. . .
[ 3641.668948] Module has been unloaded

Notes on working monadically

You can do little bits and pieces in your Haskell code to take it from an imperative looking style to being more concise and monadic in your approach. In today’s post, I’m going to run through a small sample of functions that should help this process greatly.

One final thought is that it’s not so much the use of these functions that’s important; it’s more the change in thought process to put you in a position where these functions become effective is where the true power lies.

»=

The >>= function (pronounced “bind”) sequences operations together by passing the result of what’s on the left hand side, to the right hand side. >>= looks like this:

(>>=) :: Monad m => m a -> (a -> m a) -> m b

>>= is asking for:

  • a which is a value wrapped in a monad
  • (a -> m a) which is a function accepting a value and returning a wrapped value

An example of this in action looks like this:

-- | Returns a string in context of a monad
ns :: Monad m => m [Char]
ns = return $ "Hello"

-- | Prints a string to the console
putStrLn :: String -> IO ()

-- Using bind
ns >>= putStrLn

What’s going on in the above snippet is that >>= has unwrapped the string returned by ns from its IO monad so that it’s just a string. >>= has then applied this raw value to putStrLn.

The =<< function performs the same role as >>= only it has its parameters flipped.

putStrLn =<< ns

liftM

liftM allows you to promote a function to a monad. Here’s what it looks like

liftM :: Monad m => (a1 -> r) -> m a1 -> m r

(a1 -> r) is what will get lifted into the monad, so we can keep our code that does work free of any monad awareness.

Where I have found that this is useful is when you have a set of functions that don’t operate on wrapped values, and you’d like to sequence them monadically.

My initial attempts to do this look like this:

-- | Collects all of the successfully read contents of
--   of the read files into an array of targets
allContents :: [String] -> IO [Target]
allContents paths = do
   rs <- safeReadFiles
   let cs = map parseSentinelFile (catMaybes rs)
   return $ concat $ rights cs
 where safeReadFiles = mapM safeReadFile paths

This reads like a big blob of imperative glug! So, I thought that all of these pieces could be sequenced together and chained with >>=. Here’s what I got:

allContents paths = mapM safeReadFile paths
                >>= return . catMaybes
                >>= return . mapParseSentinelFile
                >>= return . rights
                >>= return . concat

Well, at least the code is sequenced - but all of those returns sure are annoying. Here’s where liftM comes in. With liftM, we can compose all of those functions without needing to know anything about the monad that it’s executing in. Here’s what I’ve ended up with:

allContent paths = liftM p (mapP safeReadFile paths)
 where p = concat . rights . map parseSentinelFile . catMaybes

liftM has allowed us to express our function chain using . as function composition. liftM then handles all of the monadness for us!

»

The >> function performs he same sequencing as what >>= does, only the first action specified is discarded. >> looks like this:

(>>) :: Monad m => m a -> m b -> m b

This particular function comes in handy where you’re interested in not passing along a result from certain links in your sequencing chain, like this:

putStrLn "Hello. What is your name? " >>  getLine
                                      >>= putStr
                                      >>  putStrLn "! That's a great name"

From this particular sequence, you can see that

  • The action emitted from the first putStrLn is dropped
  • The action emitted from getLine is passed onto putStr
  • The action emitted from putStr is dropped
  • The last action terminates the sequence

sequence

sequence will evaluate all of the actions passed to it from left to right and return out the results of these actions. It’s defined like this

sequence :: Monad m => [m a] -> m [a]

What this allows you to do is something like this

sequence [putStr "What's your name? " >> getLine
         ,putStr "What's your age? " >> getLine
         ,putStr "What's your favourite colour? " >> getLine
         ]

This will then give you back an array of the IO actions emitted from each array index.

sequence_ will perform the same task as what sequence does, only it’ll throw away the result.

mapM

mapM will allow you to perform a monadic action over a list of normal (or unwrapped) values. It looks like this

mapM :: Monad m => (a -> m b) -> [a] -> m [b]

mapM wants:

  • A function that takes an unwrapped value a as its first parameter and returns a wrapped value m b
  • A list of unwrapped values [a]

It will then give you back a list of wrapped outputs m [b]. So, this allows you to do something like this

-- our list of unwrapped values
let questions = ["What's your name? ", "What's your age? "]

-- print out each question and ask for a response
mapM (\q -> putStr q >> getLine) questions

Also notice that in the lambda above (\q -> putStr q >> getLine), we’ve used the >> function from above as we don’t care for the action emitted from printing a string to the console.

mapM_ will perform the same task as what mapM does, only it’ll throw the result away.

filterM

filterM will filter a list based on a Bool wrapped in an action. Here’s how filterM is defined

filterM :: Monad m => (a -> m Bool) -> [a] -> m [a]

filterM will execute an action (a -> m Bool) that wants an a as its input and returns you a wrapped m Bool which will determine which a’s in the passed list [a] end up in the resulting wrapped list m [a]. Mouthful, I know. Here’s an example

filterM (\_ -> randomIO >>= return . even) [1..50]

The lambda here (\_ -> randomIO >>= return . even) actually ignores the input parameter by using \_. It’s using randomIO to grab a number out of the hat which is then bound to (return . even), which will return a wrapped Bool of if the number supplied by randomIO is even or not.

There’s heaps more that you can do. Just check out the Control.Monad namespace of the base library for some more. That’s it for today though!