Cogs and Levers A blog full of technical stuff

Project setup with Maven at the Command Line

Introduction

A few utilities exist to manage your build, dependencies, test running for Java projects. One that I’ve seen that is quite intuitive (once you wrap your head around the xml structure) is Maven. According to the website, Maven is a “software project management and comprehension tool”.

The main benefit I’ve seen already is how the developer’s work-cycle is managed using the “POM” (project object model). The POM is just an XML file that accompanies your project to describe to Maven what your requirements are to build, test & package your software unit.

An excellent, short post can be found on the Maven website called “Maven in 5 minutes”.

Today’s post will focus on Maven installation and getting a “Hello, world” project running.

Installation

I’m on a Debian-flavored Linux distribution, so you may need to translate slightly between package managers. To get Maven installed, issue the following command at the prompt:

sudo apt-get install maven

Check that everything has installed correctly with the following command:

mvn --version

You should see some output not unlike what I’ve got here:

Apache Maven 3.0.4
Maven home: /usr/share/maven
Java version: 1.7.0_25, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-7-openjdk-amd64/jre
Default locale: en_AU, platform encoding: UTF-8
OS name: "linux", version: "3.2.0-4-amd64", arch: "amd64", family: "unix"

If you’re seeing output like what I’ve got above - that’s it. You’re installed now.

First Project

Getting your first application together is pretty easy. A “quick start” approach is to use the quick start templates to generate a project structure like so:

cd ~/Source
mvn archetype:generate -DgroupId=org.temp -DartifactId=hello -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

Maven will then go out and grab all that it needs from the web to get your project setup. It’s now generated a project structure for you (in a directory called “hello”) that looks like this:

.
└── src
    ├── main
    │   └── java
    │       └── org
    │           └── temp
    └── test
        └── java
        	└── org
        		└── temp

Editing the file that they put into the source folder for you (at src/main/java/org/temp/App.java), you can see that your job is already done:

package org.temp;

/**
 * Hello world!
 *
 */
public class App {
    public static void main( String[] args ) {
            System.out.println( "Hello World!" );
    }
}

Build it and give it a run!

mvn compile
mvn exec:java -Dexec.mainClass="org.temp.App"

You should see some output like this:

[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building hello 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> exec-maven-plugin:1.2.1:java (default-cli) @ hello >>>
[INFO]
[INFO] <<< exec-maven-plugin:1.2.1:java (default-cli) @ hello <<<
[INFO]
[INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ hello ---
Hello World!
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.331s
[INFO] Finished at: Thu Jan 30 12:46:28 EST 2014
[INFO] Final Memory: 7M/30M
[INFO] ------------------------------------------------------------------------

Most important line there is “Hello World!”.

There is so much more that you can do with Maven for your projects. Check out the documentation - that’s it for today.

Create a MapReduce Job using Java and Maven

Introduction

In a previous post, I walked through the very basic operations of getting a Maven project up and running so that you can start writing Java applications using this managed environment.

In today’s post, I’ll walk through the modifications required to your POM to get a MapReduce job running on Hadoop 2.2.0.

If you don’t have Maven installed yet, do that . . . maybe even have a bit of a read up on what it is, how it helps and how you can use it. Of course you’ll also need your Hadoop environment up and running! Project Setup First thing you’ll need to do, is to create a project structure using Maven in your workspace/source folder. I do this with the following command:

$ mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=com.test.wordcount -DartifactId=wordcount

As it runs, this command will ask you a few questions on the details of your project. For all of the questions, I’ve found selecting the default value was sufficient. So . . . enter enter enter !

Once the process is complete, you’ll have a project folder created for you. In this example, my project folder is “wordcount” (you can probably see where this tutorial is now headed). Changing into this folder and having a look at the directory tree, you should see the following:

~/src/wordcount$ tree
.
├── pom.xml
└── src
    ├── main
    │   └── java
    │       └── com
    │           └── test
    │               └── wordcount
    │                   └── App.java
    └── test
        └── java
            └── com
                └── test
                    └── wordcount
                        └── AppTest.java

11 directories, 3 files

Now it’s time to change the project environment so that it’ll suit our Hadoop application target.

Adjusting the POM for Hadoop

There’s only a few minor alterations that are required here. The first one is, referencing the Hadoop libraries so that they are available to you to program against. We also specify the type of packaging for the application. Lastly, changing the language version (to something higher than what’s specified as default).

Open up “pom.xml” in your editor of choice and add the following lines into the “dependencies” node.

<dependency>
  <groupid>org.apache.hadoop</groupid>
  <artifactid>hadoop-client</artifactid>
  <version>2.2.0</version>
</dependency>

This tells the project that we need the “hadoop-client” library (version 2.2.0).

We’re now going to tell Maven to make us an executable JAR. Unfortunately, here’s where the post is slightly pre-emptive upon itself. In order to tell Maven that we want an executable JAR, we need to tell it what class is holding our “main” function. . . we haven’t written any code yet - but we will!

Create a “build” node and within that node create a “plugins” node and add the following to it:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-jar-plugin</artifactId>
  <configuration>
    <archive>
      <manifest>
        <addClasspath>true</addClasspath>
        <mainClass>com.test.wordcount.WordCount</mainClass>
      </manifest>
    </archive>
  </configuration>
</plugin>

More on the maven-jar-plugin plugin can be found on the Maven website, but this block builds an executable JAR for us.

Add this next plugin to use Java 1.7 for compilation:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-compiler-plugin</artifactId>
  <configuration>
    <source>1.7</source>
    <target>1.7</target>
  </configuration>
</plugin>

That’s all that should be needed now to perform compilation and packaging of our Hadoop application.

The Job

I’ll leave writing Hadoop Jobs to another post, but we still need some code to make sure our project is working (for today).

All I have done for today, is taken the WordCount code that’s on the Hadoop Wiki here http://wiki.apache.org/hadoop/WordCount, changed the package name to align with what I created my project as com.test.wordcount and saved it into src/main/java/com/test/wordcount/WordCount.java

I removed the template provided App.java that was in this folder. I did make one minor patch to this code also. Here’s my full listing that I’ve used for reference anyway.

package com.test.wordcount;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount {

   public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();

      public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
         String line = value.toString();
         StringTokenizer tokenizer = new StringTokenizer(line);
         while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            context.write(word, one);
         }
      }
   }

   public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

      public void reduce(Text key, Iterable<IntWritable> values, Context context)
         throws IOException, InterruptedException {
         int sum = 0;
         for (IntWritable val : values) {
            sum += val.get();
         }
         context.write(key, new IntWritable(sum));
      }
   }

   public static void main(String[] args) throws Exception {
      Configuration conf = new Configuration();

      Job job = new Job(conf, "wordcount");

      job.setJarByClass(WordCount.class);
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(IntWritable.class);

      job.setMapperClass(Map.class);
      job.setReducerClass(Reduce.class);

      job.setInputFormatClass(TextInputFormat.class);
      job.setOutputFormatClass(TextOutputFormat.class);

      FileInputFormat.addInputPath(job, new Path(args[0]));
      FileOutputFormat.setOutputPath(job, new Path(args[1]));

      job.waitForCompletion(true);
   }

}

Compile, Package & Run!

Our project is setup, our code is in place; it’s now time to compile our project.

$ mvn clean install

Lots of downloading of dependencies and a bit of compilation go on . . . If all has gone to plan, you can now have a package to run. As usual, you’ll need a text file of words to count. I’ve popped one up on hdfs called “input.txt”.

$ hadoop jar target/wordcount-1.0-SNAPSHOT.jar input.txt wcout

You should now have a map reduce job running!

Simple forkIO Usage

Creating new threads in Haskell is quite easy (once you know how). Here’s a simple snippet for using forkIO and myThreadId to get you started.

module Main where

import Control.Concurrent

main :: IO ()
main = do
    -- grab the parent thread id and print it
    parentId <- myThreadId
    putStrLn (show parentId)
    
    -- create a new thread (ignore the return)
    _ <- forkIO $ do
      -- grab the child thread id and print it
      childId <- myThreadId
      putStrLn (show childId)
      
    return ()

Installing Software from Testing or Unstable within your Debian Stable Environment

Introduction

The great thing about using the current stable version of Debian is that you’re assured that a lot of testing has gone in to ensure that all of the packages you’re looking at are in fact stable - sometimes this works against us as it takes so long for packages to become stable, making the Debian stable repository quite stale with its versions.

In today’s post, I’ll show you how you can install a package from a different repository (other than stable) within your stable Debian environment.

At the time of this writing, I’m currently using “Wheezy” (codename for Stable). This makes “Jessie” the codename for Testing and “Sid” the codename for Unstable.

Adding Software Sources

In order to install software from another repository, you need to tell “apt” where to get the software from. Before making any changes, my /etc/apt/sources.list looks like this:

deb http://ftp.au.debian.org/debian/ wheezy main
deb-src http://ftp.au.debian.org/debian/ wheezy main

deb http://security.debian.org/ wheezy/updates main
deb-src http://security.debian.org/ wheezy/updates main

deb http://ftp.au.debian.org/debian/ wheezy-updates main
deb-src http://ftp.au.debian.org/debian/ wheezy-updates main

The man page for “sources.list” will fill you in on the structure of these lines in your sources.list. For the purposes of this post, just take note that each line mentions “wheezy” at the end.

Without modification, if we were to use “apt-cache policy” we can find out what versions of a particular package are available to us. For the purposes of this post, I’ll use “haskell-platform”. Taking a look at the cache policy for this package:

$ apt-cache policy haskell-platform
haskell-platform:
  Installed: (none)
  Candidate: 2012.2.0.0
  Version table:
     2012.2.0.0 0
        500 http://ftp.au.debian.org/debian/ wheezy/main amd64 Packages

We’ve got version “2012.2.0.0” available to us in the stable repository. With “2013.2.0.0” as the current version, we can see that stable is a little behind. Let’s try and fix that.

We’re going to add some software from the testing repository, so we’re going to link up with the binary source pointed to “jessie”. To do this, we’ll add one extra line to /etc/apt/sources.list, like so:

deb http://ftp.au.debian.org/debian/ wheezy main
deb-src http://ftp.au.debian.org/debian/ wheezy main
deb http://ftp.au.debian.org/debian/ jessie main

deb http://security.debian.org/ wheezy/updates main
deb-src http://security.debian.org/ wheezy/updates main

deb http://ftp.au.debian.org/debian/ wheezy-updates main
deb-src http://ftp.au.debian.org/debian/ wheezy-updates main

Note the third line (new) that mentions “jessie”.

Setting Priorities

Now that we’ve confused apt, by mixing software sources - we need to set some priorities where the stable repository will take precedence over the testing repository.

To do this, we open/create the file /etc/apt/preferences. In this file, we can list out all of the repositories that we’d like to use and assign a priority to them. Here’s the sample putting a higher priority on stable:

Package: *
Pin: release a=stable
Pin-Priority: 700

Package: *
Pin: release a=testing
Pin-Priority: 600

The instructions here are defining what packages these rules apply to, which release they apply to and what priority is to be applied. Now that we’ve put these priorities in place, we’ll update our local software cache:

$ sudo apt-get update

Then, we can take a look at the policy:

haskell-platform:
  Installed: (none)
  Candidate: 2012.2.0.0
  Version table:
     2013.2.0.0.debian3 0
        600 http://ftp.au.debian.org/debian/ jessie/main amd64 Packages
     2012.2.0.0 0
        700 http://ftp.au.debian.org/debian/ wheezy/main amd64 Packages

We now have the ability to install the later package!

Installing

With all of these rules in place now, installing software from a particular repository is as simple as:

$ sudo apt-get -t testing install haskell-platform

We’ve passed the repository the -t option. This will take haskell-platform and associated dependencies from the testing repository.

RAII for C++

Introduction

In programming, RAII stands for “Resource Acquisition is Initialization” and it’s an idiom or technique established by Bjarne Stroustrup to ease resource allocation and deallocation in C++.

Common problems have been when an exception is thrown during initialization, any memory associated during construction (or underlying resources) aren’t released, creating memory leaks in applications.

The Idea

The basic premise is that resource allocation is to be performed in the constructor of your class. Release of the resources occurs in your destructor. The example given on the Wikipedia page deals with holding a lock/mutex for a given file. When execution leaves the scope of the code (whether it be from premature termination of an exception or from the code naturally exiting), the destructors run to release the file handle and lock.

The concept is a great way to not only clean up your code (as all of the “if !null” code is now redundant) but it’s a great safe-guard that you can almost be absent minded about.

It’s important to note that this idiom doesn’t allow you to ignore good exception handling practice. You’re still expected to use exception handling in your code, this will just ensure that your cleanup/release code is executed as expected.

An Implementation

Implementing this idea into your own code is really quite simple. If you have a resource (handle) that you’re managing manually, wrap it in a class.

  • Ensure the constructor takes the handle in
  • Release the handle in the destructor

When working with OpenGL textures, I use a very small class that allows me to handle the resource cleanup, it just managed the generated texture id. When the class falls out of scope or there’s a failure during initialization, the texture is cleaned up.

class texture {
  public:
    // manage the generated texture id
    texture(const GLuint t) : _reference(t) { }
    
    // cleanup of the allocated resource
    virtual ~texture(void);

    // provide access to the reference
    const GLuint reference() const { return _reference; }

  private:
    GLuint _reference;
};

texture::~texture(void) {
  // only run if we have something to clean up
  if (this->_reference != 0) {
    // clear out the texture  
    glDeleteTextures(1, &this->_reference);
    this->_reference = 0;
  }
}

Strictly speaking, the constructor should probably do the generation of the texture itself. Where I’m loading the texture is in another managed object of itself. Most importantly, if an exception is thrown during initialization, this class will remove anything allocated to it (if it did allocate).

It should be mentioned that there are lots of extra attributes we can pile into RAII style classes. There’s a really good write up (in depth) here.

Conclusion

RAII is a great idea to implement into your own classes. The more of this you can practice, the more exception-safe your code will become . . . from car accidents.