Cogs and Levers A blog full of technical stuff

Getting istream to work off a byte array

Introduction

The C++ Standard Library provides an extensive library for working with streams. These are abstract classes designed to work with data that is in a stream format. There are comprehensive concrete implementations for working with files and strings, however I’m still yet to find an implementation that will take a plain old c-array and allow you to treat it as a stream.

In today’s post, I’ll present a small std::istream implementation that will consume these plain old c-arrays so that you can keep the rest of your APIs uniform to using stream objects.

A brief explanation

We’ll actually be developing two classes here. We’ll need a class to derive from std::istream which is what we’ll pass around to other parts of our program, but internally this std::istream derived object will manage a std::basic_streambuf<char> derivative.

Looking at the definition of a std::basic_streambuf we can see the following:

The class basic_streambuf controls input and output to a character sequence.

It would appear that most of the work here has been done for us. basic_streambuf will take care of the I/O from our character sequence, we just need to supply it (the character sequence, that is). I did say byte array in the title of this post, so the actual data type will be uint8_t* as opposed to char*.

Implementation

class membuf : public std::basic_streambuf<char> {
public:
  membuf(const uint8_t *p, size_t l) {
    setg((char*)p, (char*)p, (char*)p + l);
  }
};

Our implementation of basic_streambuf must abide by the char_traits type definition, so we get as close to our byte definition as possible with char. You can see that the constructor has a little bit of cast work going on to get setg to operate correctly.

Finally, we just create an istream derivative that uses this membuf object under the covers:

class memstream : public std::istream {
public:
  memstream(const uint8_t *p, size_t l) :
    std::istream(&_buffer),
    _buffer(p, l) {
    rdbuf(&_buffer);
  }

private:
  membuf _buffer;
};

We set the internal buffer that memstream will use by making a call to rdbuf. The constructor performs some initialisation of the stream itself (to use a membuf) implementation.

In Use

You can now treat your plain old c-arrays just like an input stream now. Something simple:

uint8_t buf[] = { 0x00, 0x01, 0x02, 0x03 };
memstream s(buf, 4);

char b;

do {
  s.read(&b, 1);
  std::cout << "read: " << (int)b << std::endl;
} while (s.good());

That’s all there is to it. From the snippet above, you can pass s around just like any other input stream, because, well, it is just any other input stream.

Basic Aeson Usage

Introduction

JSON is a common interchange data format used across the web these days. It’s so popular because it’s easy to work with, marries directly with Javascript (really helping out the web guys) and its structure allows you to specify complex information in a simple, readable format.

In today’s post I’m going to walk through some basic usage of the Haskell library aeson which provides some tools for working with JSON.

Getting started

First of all, you’ll need the aeson library installed locally to work with it. You can install this with cabal:

$ cabal install aeson

While that’s installing, take a look at the examples up in the repo for aeson.

Defining your data structure

The first example, Simple.hs starts with defining the data structure that we’re expecting to work with:

data Coord = Coord { x :: Double, y :: Double }
             deriving (Show)

This is pretty simple. Just a 2d co-ordinate. The example goes on to define instances of ToJSON and FromJSON to facilitate the serialization and deserialization of this data (respectively).

instance ToJSON Coord where
  toJSON (Coord xV yV) = object [ "x" .= xV,
                                  "y" .= yV ]

instance FromJSON Coord where
  parseJSON (Object v) = Coord <$>
                         v .: "x" <*>
                         v .: "y"
  parseJSON _ = empty

The only really curly bit about this, is the use of the (.=) and the (.:) operators. These will pair-up or extract data in context of your JSON object.

Simplify

With all of this said, now take a look at Generic.hs. This file makes use of the DeriveGeneric language extension to write the ToJSON and FromJSON implementations above. Those type class instances now read as follows:

instance FromJSON Coord
instance ToJSON Coord

The type of Coord needs to be augmented slightly to include the Generic type class.

data Coord = Coord { x :: Double, y :: Double }
             deriving (Show, Generic)

Pretty easy.

Reading and writing

Finally, we need to actually poke a string into this thing and pull built objects back out of it. The main covers this:

main :: IO ()
main = do
  let req = decode "{\"x\":3.0,\"y\":-1.0}" :: Maybe Coord
  print req
  let reply = Coord 123.4 20
  BL.putStrLn (encode reply)

You can see that we can turn a string into a Coord object with little effort now.

Extending this a little further to read values off of disk, we can lean on readFile from Data.ByteString.Lazy:

λ> x <- (eitherDecode <$> B.readFile "coord.json") :: IO (Either String Coord)
λ> x
Right (Coord {x = 3.5, y = -2.2})

eitherDecode was either going to give us an error message on the Left, or the built object on the Right.

That’s it for today.

What is Weak Head Normal Form?

Introduction

Lazy languages provide a way for developers to define expressions without necessarily forcing them to evaluate immediately. Rather than provide an immediate value, these languages will generate a thunk instead.

Show me

If you load up GHCi and bind an expression to a name:

λ> let five = 2 + 3 :: Int

We can check if this expression has been evaluated or not by using sprint. If the expression hasn’t been evaluated, sprint will show us an underscore _. This is how GHCi tells us that an expression is unevaluated.

λ> :sprint five
five = _

five is currently a thunk.

If we do force the expression to evaluate and then re-run this test, sprint tells us a different story:

λ> five
5
λ> :sprint five
five = 5

five has now been evaluated and as such, sprint is telling us the value.

Weak Head Normal Form

With some knowledge of thunks under our belt, we can move onto Weak Head Normal Form or WHNF. If we take our five example back to unevaluated, and use a mixture of take and cycle to generate a list of five, we’ll end up with another thunk:

λ> let five = 2 + 3 :: Int
λ> let fiveFives = take 5 $ cycle [five]
λ> :sprint fiveFives
fiveFives = _

If we use seq on this list, fiveFives we end up with two thunks getting concatenated.

λ> seq fiveFives []
[]
λ> :sprint fiveFives
fiveFives = _ : _

seq has evaluated fiveFives to head normal form here. In fact, hoogle says the following about seq:

Evaluates its first argument to head normal form, and then returns its second argument as the result.

seq is defined as follows

seq :: a -> b -> b

So, seq forced the list to be evaluated but not the components that make up the list. This is known as weak head normal form.

In summary

From this stack overflow question:

An expression in weak head normal form has been evaluated to the outermost data constructor or lambda abstraction (the head).

Mounting your Android phone with mtp

This post is really just a short-cut bookmark to the Arch documentation on the topic. This post will walk through the steps required to mount your Android phone using FUSE.

Install the package mtpfs if you don’t already have it on your system. After that’s installed, you’ll need to uncomment the line user_allow_other in your /etc/fuse.conf file.

Plug your phone in and issue the following.

To mount your device:

$ mtpfs -o allow_other /media/your_mountpoint_here

Once you’re done, you can unmount with:

$ fusermount -u /media/your_mountpoint_here

Getting started with OpenMP

Introduction

OpenMP is an API for performing shared memory multiprocessing tasks in a variety of different languages and platforms. OpenMP hides away the complexities of parallel programming so that the developer can focus on writing their applications.

In today’s post, I’ll run through building OpenMP applications, some examples and what the internal #pragma statements mean.

Building

Building applications against the OpenMP API is relatively simple. Using GCC:

$ gcc -o progname -fopenmp progname.c

Interestingly, the maximum number of threads that the framework will use (at runtime) can be controlled by setting the OMP_NUM_THREADS environment variable. This can be overridden in your programs with the omp_set_num_threads.

Hello MP!

The standard “Hello, World” application that you’ll find around the web is as follows:

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
  int th_id, nthreads;

  #pragma omp parallel private(th_id)
  {
    th_id = omp_get_thread_num();
    printf("Hello world from thread %d\n", th_id);

    #pragma omp barrier
    if (th_id == 0) {
      nthreads = omp_get_num_threads();
      printf("There are %d threads\n", nthreads);
    }
  }

  return EXIT_SUCCESS;
}

Immediately, you’ll notice the use of #pragma directives throughout the source. These are used to control the behaviour of the OpenMP API within your application. I’ll go through a few of these below.

#pragma omp parallel private forces the variable th_id to be private to the thread that’s executing. omp_get_thread_num (as the name suggests) gives you back the current thread number. #pragma omp barrier tells OpenMP to synchronize all threads to that point.

Some example invocations of this program look as follows:

$ OMP_NUM_THREADS=2 ./hello
Hello world from thread 0
Hello world from thread 1
There are 2 threads

$ OMP_NUM_THREADS=4 ./hello
Hello world from thread 0
Hello world from thread 3
Hello world from thread 2
Hello world from thread 1
There are 4 threads

Moving on, we’ll take a look at a few of the #pragma directives you can use to control OpenMP.

pragmas

All of the following directives in the table are applied in your source code with a #pragma omp prefix.

Pragma Description
atomic Applied to a memory assignment that forces OpenMP to perform the update atomically
parallel Parallelize a segment of code.
for Distributes loop iterations over a group of threads
ordered Forces a block of code to be executed in sequential order
parallel for Combines both the parallel and for directives
single Forces a block of code to be run by a single thread
master Forces a block of code to be run by the master thread
critical Forces a block of code to be run by threads, one at a time
barrier Forces execution to wait for all threads to reach this point
flush Gives all threads a refresh of specified objects in memory
threadprivate Forces named file-scope, namespace-scope or static block-scope variables private to a thread

References

There is plenty of information around on this particular topic. Take a look at the following links to dig into these topics even further: