Cogs and Levers A blog full of technical stuff

MongoDB Aggregation Framework

Introduction

Sometimes having the power of MapReduce at your fingertips and applying this technology to simpler aggregate queries can be more hassle than it needs to be. MongoDB provides a simpler solution (for a simpler class of problems) in the form of the Aggregation framework. This framework allows you to develop queries within the mongo environment that are analogous to GROUP BY, HAVING, COUNT, SUM, etc. that you would normally use in “relational land”.

Today’s post, I want to walk through a couple of simple queries on using this framework to maximise productivity when pivoting data.

Marrying the old with the new

As a bit of a cheat’s reference, the following table provides the some examples of aggregate queries in a relational database and how they transpose over to the Mongo aggregation environment.

The source of this table can be found here.

Technique Relational Aggregation Framework
Criteria matching WHERE $match
Grouping GROUP BY $group
Aggregate criteria filtering HAVING $match
Result projection SELECT $project
Record sorting ORDER BY $sort
Limiting result sets LIMIT or TOP $limit
Accumulation SUM $sum
Counting COUNT $sum
Dropping SKIP $skip

Counting records

Sql example:

	
SELECT COUNT(*) AS count
FROM items

Mongo example:

	
db.items.aggregate( [
{ $group: { _id: null,
            count: { $sum: 1 } } }
] )

Accumulating values

Sql example:

	
SELECT SUM(price) AS total
FROM items

Mongo example:

	
db.items.aggregate( [
{ $group: { _id: null,
            total: { $sum: "$price" } } }
] )

Aggregation with identifier

Sql example:

	
SELECT category_id, SUM(price) AS total
FROM items
GROUP BY category_id

Mongo example:

	
db.items.aggregate( [
{ $group: { _id: "$category_id",
            total: { $sum: "$price" } } }
] )

Aggregation with identifier and sorting

Sql example:

	
SELECT category_id, SUM(price) AS total
FROM items
GROUP BY category_id
ORDER BY total

Mongo example:

	
db.items.aggregate( [
{ $group: { _id: "$category_id",
            total: { $sum: "$price" } } },
{ $sort: { total: 1 } }
] )

Multiple Aggregation

Sql example:

	
SELECT category_id, when SUM(price) AS total
FROM items
GROUP BY category_id, when

Mongo example:

	
db.items.aggregate( [
{ $group: { _id: { category_id: "$category_id",
                   when: "$when" },
            total: { $sum: "$price" } } }
] )

Aggregate Filtration

Sql example:

	
SELECT category_id, count(*)
FROM items
GROUP BY category_id
HAVING count(*) > 1

Mongo example:

	
db.items.aggregate( [
{ $group: { _id: "$category_id",
            count: { $sum: 1 } } },
{ $match: { count: { $gt: 1 } } } 
] )

Multiple Aggregate Filtration

Sql example:

	
SELECT category_id, when, SUM(price) AS total
FROM items
GROUP BY category_id, when
HAVING total > 100

Mongo example:

	
db.items.aggregate( [
{ $group: { _id: { category_id: "$category_id",
                   when: "$when" },
            total: { $sum: "$price" } } },
{ $match: { total: { $gt: 100 } } } 
] )

Aggregate with Source Filtration

Sql example:

	
SELECT category_id, SUM(price) AS total
FROM items
WHERE active = 1
GROUP BY category_id

Mongo example:

	
db.items.aggregate( [
{ $match: { active: 1 } },
{ $group: { _id: "$category_id",
            total: { $sum: "$price" } } }
] )

Aggregate Filtration with Source Filtration

Sql example:

	
SELECT category_id, SUM(price) AS total
FROM items
WHERE active = 1
GROUP BY category_id
HAVING total > 100

Mongo example:

	
db.items.aggregate( [
{ $match: { active: 1 } },
{ $group: { _id: "$category_id",
            total: { $sum: "$price" } } },
{ $match: { total: { $gt: 100 } } }
] )

Aggregated joins

Sql example:

	
SELECT category_id, SUM(co.weight) AS weight
FROM items i, components co
WHERE co.item_id = i.id
GROUP BY category_id

Mongo example:

	
db.items.aggregate( [
{ $unwind: "$components" },
{ $group: { _id: "$category_id",
            weight: { $sum: "$components.weight" } } }
] )

Sub-query Aggregation

Sql example:

	
SELECT COUNT(*)
FROM (SELECT category_id, when
      FROM items
      GROUP BY category_id, when) AS Table1

Mongo example:

	
db.items.aggregate( [
{ $group: { _id: { category_id: "$category_id",
                   when: "$when" } } },
{ $group: { _id: null, count: { $sum: 1 } } }
] )

Stored Javascripts in MongoDB

Introduction

Coming from a relational database background, technologies such as stored procedures and user defined functions have always helped out when building a database infrastructure. MongoDB provides the same sort of code storage in stored javascripts in the database.

Creating a stored javascript

Creating a stored javascript into a database is a straight forward process of adding an item to the system.js collection.

> db.collection.js.save({_id: "greaterThan10", 
...                      value: function (x) { return x > 10 }});

Ok, this isn’t the most useful of functions. We’re testing if the value passed in the greater than 10. We’re able to use this in queries of our own using $where syntax like so:

> db.people.find({$where: "greaterThan10(this.age)"})

This would get all of the “people” documents out of the database where they were over the age of 10. This is quite verbose of such a simple example, but you can see that by filling out the function in the saved javascript with more complex operations, you could achieve a lot with a little.

Removing a stored javascript

Working with the collection as usual, you can simply remove your stored javascript by id.

> db.collection.js.remove({_id: "greaterThan10"})

Testing

As a final note, once you’ve created your stored javascript you can test it using eval easy enough.

> db.eval("return greaterThan10(9);")
false
> db.eval("return greaterThan10(11);")
true

This is just a short introduction into the world of stored javascripts. The internal workings of MongoDB is all based on javascript, so it’s a good idea to have your skills nice and sharp before going in!

64bit ABI Cheatsheet

Introduction

Using a few different articles around the web, I thought it might be a good idea to aggregate all of the information around calling conventions in 64bit land. This could come in handy when wanting to write a cross OS compliant library at the assembly level. This particular article will directly target the assembly programming layer so that topics like C++ name mangling and caller clean-up are removed from its scope.

References

Microsoft

Windows will use RCX, RDX, R8 and R9 for the first four integer or pointer arguments. XMM0, XMM1, XMM2 and XMM3 are used for floating point arguments. Additional arguments are passed via the stack (right to left).

An integer or pointer return value will be returned in RAX. Floating point return will be in XMM0.

System V

System V operating systems will use RDI, RSI, RDX, RCX, R8 and R9. XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7 will be used to pass floating point parameters. RAX will hold the syscall number. Additional arguments are passed via the stack (right to left).

Return values are sent back via RAX.

Syscall Numbers

It’s interesting to note the structure of the syscall number when it comes time to execute. Looking at syscall_sw.h, you’ll see that apple machines want a 2 in the higher-order double word such that the write syscall, normally passed as 0x04 would be passed as 0x2000004 in OSX.

That’s it for today. Just a cheat sheet really.

Entering Unicode Characters in VIM

Introduction

Just a quick tip sheet on how to enter unicode characters into a text page when using vim. From time to time I’ve needed characters listed up here just to give my applications that little extra touch.

Entering characters by code

Entry Code
Enter a character by its decimal value ^Vnnn
Enter a character by its octal value ^Vonnn
Enter a character by its hex value ^Vxnn
Enter a character by its hex value for BMP unicode codepoints ^Vunnnn
Enter a character by its hex value for any unicode codepoint ^VUnnnnnnnn

In all of these examples, the n’s are the code and ^V means Control-V.

That’s it!

Working with Audio in Haskell (Part 1)

Introduction

Digital signal processing, audio processing and the like are all rather complex topics of study. I have a personal interest in these fields as I try to create guitar processing effects from time to time. Today’s post is all about taking the first steps in getting our hands on some audio and associated information from within Haskell.

hsndfile

For today’s post, I’ll be using the library hsndfile to do all of the heavy lifting as far as opening audio files and interpreting information. The files that we’ll work with will need to be in wave format. The demonstration in this post will simply open an audio file, read some information about the file and then close the file.

Project setup

I’ve created a Haskell project using cabal so that I can manage the hsndfile dependency locally to this application. You may already have this installed globally on your system, but if you follow along here, you should have it installed to your project in not time.

Setup your project as usual:

$ mkdir sndtest
$ cd sndtest
$ touch LICENSE
$ cabal init

Just select all of the defaults when setting up your project (well, that’s what I did, anyway). We need to add hsndfile as a dependency to our project, so we’ll specify this in our sndtest.cabal file. Open it up and make sure that your build-depends reads as follows.

build-depends:         base ==4.5.*,
                       hsndfile ==0.5.3

Of course, you may have some different version of base, but here’s where I was at anyway. Create a new file in your project called Test.hs. We’ll now fill out this file with the code that will open a file, read its information, close the file and then display the information to screen.

module Main where

import Sound.File.Sndfile as SF

main :: IO ()
main = do
  -- open the file that we want to know about
  f <- SF.openFile "test.wav" SF.ReadMode SF.defaultInfo

  -- read the information about the file out
  let info = SF.hInfo f

  -- close the file
  SF.hClose f

  -- display information about the file
  putStrLn $ "format:      " ++ (show $ SF.format info)
  putStrLn $ "sample rate: " ++ (show $ SF.samplerate info)
  putStrLn $ "channels:    " ++ (show $ SF.channels info)
  putStrLn $ "frames:      " ++ (show $ SF.frames info)

This is pretty straight forward. First up, we import Sound.File.SndFile qualified as SF so we know when we’re using something from this import. Dissecting the main function, firstly we open the file using openFile. This function expects the path to the audio file (in this case we’re using “test.wav” which by the way you’ll have to find something), we’re only reading from the file at the moment so we specify ReadMode and finally we have the info parameter which is useful to us when we’re writing a new file (so we can tell it what format to write in, etc), but for reading we just use defaultInfo.

We now read the stream information about the file using hInfo, the result of which will give us back a value of type Info. This info packet tells us the number of frames in the file, the sample rate, number of channels, header and sample format, number of sections and if the file is seekable or not.

Now that we have the information from the stream, we can close it off. We do this with hClose. Now we can interrogate the Info value with a series of print statements. We’ve got a module ready to run, but we need to tell our project that it’s the entry point to run. In the sndtest.cabal file, make sure you set your main-is: attribute like so.

main-is:           Test.hs

Build and Run<

We’ve created our project and finished our code. Let’s build and run the application. The first build is going to take a bit more time as cabal-dev will need to resolve all of the dependencies that it doesn’t yet have. Get this process moving with the following command.

$ cabal-dev install

All going well, you should be able to launch your executable and check out the results:

$ dist/build/sndtest/sndtest
format:      Format {headerFormat = HeaderFormatWav, sampleFormat = SampleForm
atPcm16, endianFormat = EndianFile}
sample rate: 48000
channels:    2
frames:      302712

That’s it for today’s post. Hopefully I’ll get back to this topic soon so I can write some more about audio processing.