Cogs and Levers A blog full of technical stuff

File encodings and iconv

There are a handful of really useful tools for dealing with character encoding. In today’s post, I’ll take you through identifying this characteristic and changing it.

What is Character Encoding?

Wikipedia has the most comprehensive breakdown on the topic. The simplest way to look at it though is that a character encoding assigns a code to each character in an alphabet.

Code Encoding Character
65 ASCII A
U+2776 UNICODE
0xd8 LATIN4 Ø

The unicode and latin4 characters don’t exist within the ASCII character space, therefore those characters simple don’t translate and can’t be encoded by ASCII.

Querying files

To determine what encoding is being used with a file, you can use the file unix utility.

$ echo "Here is some text" > a-test-file
$ file a-test-file
a-test-file: ASCII text

Using the -i switch, we can turn the ASCII text output into a mime string which can yield some more information:

$ file -i a-test-file
a-test-file: text/plain; charset=us-ascii

To make this encoding representation change a little clearer, below I’ve pasted in the output of hexdump on that test file that we’d created earlier.

0000000 6548 6572 6920 2073 6f73 656d 7420 7865
0000010 0a74                                   
0000012

Remember, these bytes are not only in hex; they’re also flipped according to how the string is written. Let’s take the first two bytes 6548:

0x65 = e
0x48 = H

We’re using an 8-bit encoding, our string has 17 characters plus a newline (18). Easy.

Changing the encoding of a file

We can use iconv to transition our text file from one encoding to another. We specify its current encoding with the -f switch and the encoding that we want to convert it to using the -t switch.

$ iconv -f ascii -t unicode a-test-file > a-test-file.unicode

This is changing our test file into an encoding that uses more data-space per character. Taking a look at the type of file we’ve just created:

$ file a-test-file.unicode a-test-file.unicode: Little-endian UTF-16 Unicode text, with no line terminators

If we take a look at the hexdump of this file, you can see that every byte is now padded with an extra zeroed-out byte.

0000000 feff 0048 0065 0072 0065 0020 0069 0073
0000010 0020 0073 006f 006d 0065 0020 0074 0065
0000020 0078 0074 000a                         
0000026

The file also starts with a BOM of feff which was unseen in the ASCII counterpart.

What encodings are supported

You can list the known coded character sets with iconv as well with the --list switch. This will dump a massive list of encodings (and aliases) that you can use.

More!

A really good article was written about the The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). It’s certainly well written article that’ll give you some schooling in encodings, quick smart.

Doing math with bash

bash can be used to perform simple arithmatic when needed. It is only limited to integer mathematics though. You can see this by using the expr.

An excerpt about expr from the link above:

All-purpose expression evaluator: Concatenates and evaluates the arguments according to the operation given (arguments must be separated by spaces). Operations may be arithmetic, comparison, string, or logical.

Some trivial example usage shows that you can get some quick results if you only need integer math.

$ expr 1 + 1 
2
$ expr 1 - 7
-6
$ expr 6 \* 7
42
$ expr 6.5 \* 7
expr: non-integer argument

At this point, you could probably go for the full-nuclear option and get perl or python to perform floating point calculations; but to keep things a little more shell oriented, you can go with a lighter-weight option, bc.

bc is an arbitrary precision calculator language. Much like every other good shell tool, you can invoke it so that it’ll take its input from STDIN and return its output to STDOUT. Here are some example invocations:

$ echo "1+1" | bc
2
$ echo "1.9+1" | bc
2.9
$ echo "76/5" | bc
15
$ echo "scale=2; 76/5" | bc
15.20

You can see that if you want precision on your answers from integer inputs, you’ll need to set the scale variable to suit. Only feeding in static values is a bit basic though. To put this to work, you just need some variable data at hand.

What’s the percentage battery left on this notebook?

$ echo "scale=2;" $(cat /sys/class/power_supply/BAT0/charge_now) / $(cat /sys/class/power_supply/BAT0/charge_full) | bc
.30

Not much!

node.js module patterns

In today’s post, I’ll walk through some of the more common node.js module patterns that you can use when writing modules.

Exporting a function

Exporting a function from your module is a very procedural way to go about things. This allows you to treat your loaded module as a function itself.

You would define your function in your module like so:

module.exports = function (name) {
  console.log('Hello, ' + name);
};

You can then use your module as if it were a function:

var greeter = require('./greeter');
greeter('John');

Exporting an object

Next up, you can pre-assemble an object and export it as the module itself.

var Greeter = function () { };

Greeter.prototype.greet = function (name) {
  console.log('Hello, ' + name);
}

module.exports = new Greeter();

You can now start to interact with your module as if it were an object:

var greeter = require('./greeter');
greeter.greet('John');

Exporting a prototype

Finally, you can export an object definition (or prototype) as the module itself.

var Greeter = function () { };

Greeter.prototype.greet = function (name) {
  console.log('Hello, ' + name);
}

module.exports = Greeter;

You can now create instances from this module:

var Greeter = require('./greeter');
var greeter = new Greeter();
greeter.greet('John');

Listing open ports and who owns them

To list all of the network ports and users that own them you can use the lsof command.

sudo lsof -i

The netstat command is also available to provide the same sort of information.

sudo netstat -lptu

Working with Promises using Q in Node.js

A promise is an object that represents the result of a computation; whether it be a positive or negative result. What’s special about promises in concurrent programming is that they allow you to compose your code in such a way that is a little more natural than the callbacks-in-callbacks style.

In today’s post, I’m going to work with the Q library for Node.js to demonstrate how we can use promises to clean up our code into more concise blocks of logic.

From the npm page for the Q library, it even says:

On the first pass, promises can mitigate the “Pyramid of Doom”: the situation where code marches to the right faster than it marches forward.

Callbacks to Promises

In the following example, I’m going to simulate some work using setTimeout. This will also give us some asynchronous context. Here are the two function calls we’ll look to sequence:

var getUserByName = function (name, callback) {
  setTimeout(function () {

    try {
      callback(null, {
        id: 1,
        name: name
      });            
    } catch (e) {
      callback(e, null);
    }

  }, 1000);
};

var getCarsByUser = function (userId, callback) {
  setTimeout(function () {

    try {
      callback(null, ['Toyota', 'Mitsubishi', 'Mazda']);
    } catch (e) {
      callback(e, null);
    }

  }, 1000);
};

Even though the inputs and outputs of these functions are invalid, I just wanted to show that getCarsByUser is dependent on the output of getUserByName.

As any good-citizen in the node eco-system the last parameter of both of these functions is a callback function that take the signature of (err, data). Sequencing this code normally would look as follows:

getUserByName('joe', function (err, user) {
  getCarsByUser(user.id, function (err, cars) {
    // do something here
  });
});

The code starts to move to the right as you get deeper and deeper into the callback tree.

We can convert this into promises with the following code:

var pGetUserByName = Q.denodeify(getUserByName),
    pGetCarsByUser = Q.denodeify(getCarsByUser);

pGetUserByName('joe').then(pGetCarsByUser)
                     .done();

Because we’ve structured our callbacks “correctly”, we can use the denodeify function to directly convert our functions into promises. We can then sequence our work together using then. If we wanted to continue to build this promise, we could omit the done call for something else to complete work on.

Going pear-shaped

When error handling gets involved in the callback scenario, the if-trees start to muddy-up the functions a little more:

getUserByName('joe', function (err, user) {
  if (err != null) {
    console.error(err);
  } else {
    getCarsByUser(user.id, function (err, cars) {
      if (err != null) {
        console.error(err);
      } else {
        // work with the data here
      }
    });
  }
});

In the promise version, we can use the fail function to perform our error handling for us like so:

pGetUserByName('joe').then(pGetCarsByUser)
                     .fail(console.error)
                     .done();

Makes for a very concise set of instructions to work on.

Different ways to integrate

There are a couple of ways to get promises integrated into your existing code base. Of course, it’s always best to implement these things at the start so that you have this model of programming in the front of your mind; as opposed to an after thought.

From synchronous code, you can just use the fcall function to start off a promise:

var getName = Q.fcall(function () {
  return 'John';
});

In this case, you just supply any parameters that are expected also:

var getGenderName = function (gender) {
  if (gender == 'F') {
    return 'Mary';
  }

  return 'John';
}

var getName = Q.fcall(getGenderName, 'F');

In asynchronous cases, you can use defer. This will require you to restructure your original code though to include its use.

var getGenderName = function (gender) {
  var deferred = Q.defer();
  var done = false;
  var v = 0;

  var prog = function () {
    setTimeout(function () {
      if (!done) {
        v ++;
        deferred.notify(v);
        prog();
      }
    }, 1000);

  };

  prog();

  setTimeout(function () {

    if (gender == 'F') {
      deferred.resolve('Mary');
    } else if (gender == 'M') {
      deferred.resolve('John');  
    } else {
      deferred.reject(new Error('Invalid gender code'));
    }

    done = true;

  }, 5000);

  return deferred.promise;
};

We’re able to send progress updates using this method as well. You can see that with the use of the notify function. Here’s the call for this function now:

getGenderName('F')
.then(function (name) {
  console.log('Gender name was: ' + name);
})
.progress(function (p) {
  console.log('Progress: ' + p);
})
.fail(function (err) {
  console.error(err);
})
.done();

resolve is our successful case, reject is our error case and notify is the progress updater.

This function can be restructured a little further with the use of promise though:

var getGenderName = function (gender) {
  return Q.promise(function (resolve, reject, notify) {

    var done = false;
    var v = 0;

    var prog = function () {
      setTimeout(function () {
        if (!done) {
          v ++;
          notify(v);
          prog();
        }
      }, 1000);

    };

    prog();

    setTimeout(function () {

      if (gender == 'F') {
        resolve('Mary');
      } else if (gender == 'M') {
        resolve('John');  
      } else {
        reject(new Error('Invalid gender code'));
      }

      done = true;

    }, 5000);

  });
};

Our client code doesn’t change.

Finally, nfcall and nfapply can be used to ease the integration of promises in your code. These functions are setup deliberately to deal with the Node.js callback style.