Cogs and Levers A blog full of technical stuff

Compojure

In a previous post, we setup a really simple route and server executing some Clojure code for us. In today’s post, we’re going to use a library called Compojure to fancy-up a little bit of that route definition.

This should make defining our web application a bit more fun, anyway.

Getting started

Again, we’ll use Leiningen to kick our project off:

lein new webapp-1

We’re going to add some dependencies to the project.clj folder for compojure and http-kit. http-kit is the server that we’ll be using today.

(defproject webapp-1 "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.8.0"]
           [compojure "1.1.8"]
           [http-kit "2.1.16"]])

And then, installation.

lein deps

Hello!

To get started, we’ll define a root route to greet us.

(ns webapp-1.core
  (:require [compojure.core :refer :all]
        [org.httpkit.server :refer [run-server]]))

(defroutes greeter-app
  (GET "/" [] "Hello!"))

(defn -main []
  (run-server greeter-app {:port 3000}))

A quick hit through curl lets us know that we’re up and running:

curl --verbose localhost:3000
* Rebuilt URL to: localhost:3000/
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 3000 (#0)
> GET / HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
< Content-Length: 6
< Server: http-kit
< Date: Thu, 17 Nov 2016 02:42:48 GMT
< 
* Connection #0 to host localhost left intact
Hello!

Check out the following:

Good fun.

Simple web server with Clojure

In today’s post, we’re going to use the Clojure HTTP server abstraction called ring to stand a web server up, and attach some some routes. This allows us to expose our Clojure functions over the web in a relatively simple fashion.

Getting started

This blog post is mainly centered around the getting started guide from the ring documentation pages, found here.

We’re going to get started by creating a project using lein.

lein new jetty-test

After this process finishes, you’ll end up with a directory called jetty-test that has a project structure something like this:

.
├── CHANGELOG.md
├── doc
│   └── intro.md
├── LICENSE
├── project.clj
├── README.md
├── resources
├── src
│   └── jetty_test
│       └── core.clj
└── test
    └── jetty_test
        └── core_test.clj

Dependencies

Now we need to make our newly created project depend on ring. We need to add references to ring-core and ring-jetty-adapter in the project.clj file. So it should read something like this:

(defproject jetty-test "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.8.0"]
           [ring/ring-core "1.5.0"]
           [ring/ring-jetty-adapter "1.5.0"]])

We can now install these dependencies into the project.

lein deps

Server code

We can start writing our route code now that the server will respond to. We’ll define a function that simply returns the current date and time:

(defn now [] (java.util.Date.))

We’ll also create a route that will use this function, and send back the text each time the route is requested:

(defn current-time [request]
  {:status 200
   :headers {"Content-Type" "text/plain"}
   :body (str (now))})

That’s it for the server code. We still need to fire up Jetty and attach the handler to it. We need to import ring.adapter.jetty as it contains run-jetty for us:

(use 'ring.adapter.jetty)

(run-jetty current-time {:port 3000})

Running

We run our project using lein:

lein run 

Our output now looks something like this:

. . .
. . .
. . .

Retrieving clj-time/clj-time/0.11.0/clj-time-0.11.0.jar from clojars
Retrieving ring/ring-core/1.5.0/ring-core-1.5.0.jar from clojars
Retrieving ring/ring-servlet/1.5.0/ring-servlet-1.5.0.jar from clojars
Retrieving clojure-complete/clojure-complete/0.2.4/clojure-complete-0.2.4.jar from clojars
2016-11-15 22:34:11.551:INFO::main: Logging initialized @877ms
2016-11-15 22:34:11.620:INFO:oejs.Server:main: jetty-9.2.10.v20150310
2016-11-15 22:34:11.646:INFO:oejs.ServerConnector:main: Started ServerConnector@795f253{HTTP/1.1}{0.0.0.0:3000}
2016-11-15 22:34:11.647:INFO:oejs.Server:main: Started @973ms

. . . suggesting that our server is ready to take requests. We can use curl to test it out for us:

$ curl -v http://localhost:3000/

*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 3000 (#0)
> GET / HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Tue, 15 Nov 2016 22:36:34 GMT
< Content-Type: text/plain; charset=ISO-8859-1
< Content-Length: 28
< Server: Jetty(9.2.10.v20150310)
< 
* Connection #0 to host localhost left intact
Tue Nov 15 22:36:34 UTC 2016

That’s it. Pretty simple.

Working with HBase

Apache HBase is a data storage technology that allows random, realtime read/write access to your big stores. It’s modelled on Google’s Bigtable paper and is available for use with Apache Hadoop. In today’s article, I’ll walk through some very simple usage of this technology.

Installation

First up, we’ll need to get some software installed. From the downloads page, you can grab a release. Once this is downloaded, get it unpacked onto your machine. In this instance, we’ll be using HBase in standalone mode

This is the default mode. Standalone mode is what is described in the Section 1.2, “Quick Start - Standalone HBase” section. In standalone mode, HBase does not use HDFS – it uses the local filesystem instead – and it runs all HBase daemons and a local ZooKeeper all up in the same JVM. Zookeeper binds to a well known port so clients may talk to HBase.

If you need to perform any further configuration, the /conf folder holds the xml files required. To put your root folders into more sane places, you can change the values of conf/hbase-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///DIRECTORY/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/DIRECTORY/zookeeper</value>
  </property>
</configuration>

Start up your server:

$ ./bin/start-hbase.sh 
starting master, logging to /opt/hbase-1.2.3/bin/../logs/hbase--master-0f0ebda04483.out
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hbase-1.2.3/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Shell!

Now that HBase is running, we can shell into it and have a poke around.

$ ./bin/hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hbase-1.2.3/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.3, rbd63744624a26dc3350137b564fe746df7a721a4, Mon Aug 29 15:13:42 PDT 2016

hbase(main):001:0> 

First up, we’ll create a table called person with a column family of `name’:

hbase(main):002:0> create 'person', 'name'
0 row(s) in 1.5290 seconds

=> Hbase::Table - person

Now we can insert some tables into our table:

hbase(main):004:0> put 'person', 'row1', 'name:first', 'John'
0 row(s) in 0.1430 seconds

hbase(main):005:0> put 'person', 'row2', 'name:first', 'Mary'
0 row(s) in 0.0150 seconds

hbase(main):006:0> put 'person', 'row3', 'name:first', 'Bob'
0 row(s) in 0.0080 seconds

hbase(main):007:0> scan 'person'
ROW                   COLUMN+CELL                                               
 row1                 column=name:first, timestamp=1475030956731, value=John    
 row2                 column=name:first, timestamp=1475030975840, value=Mary    
 row3                 column=name:first, timestamp=1475030988587, value=Bob

Values can also be read out of our table:

hbase(main):009:0> get 'person', 'row1'
COLUMN                CELL                                                      
 name:first           timestamp=1475030956731, value=John                       
1 row(s) in 0.0250 seconds

Now, we can clean up our test:

hbase(main):011:0> disable 'person'
0 row(s) in 2.3000 seconds

hbase(main):012:0> drop 'person'
0 row(s) in 1.2670 seconds

Following up

Now that we can start to work with HBase, further posts will focus on designing schemas and processing data into the store.

Creating a Servlet

Servlets are java applications that are run on the server, responding to different requests made by clients. They most commonly are setup to respond to web-based requests although they are not limited to this scope.

From the tutorial:

A servlet is a Java programming language class used to extend the capabilities of servers that host applications accessed by means of a request-response programming model. Although servlets can respond to any type of request, they are commonly used to extend the applications hosted by web servers. For such applications, Java Servlet technology defines HTTP-specific servlet classes.

The javax.servlet and javax.servlet.http packages provide interfaces and classes for writing servlets. All servlets must implement the Servlet interface, which defines lifecycle methods. When implementing a generic service, you can use or extend the GenericServlet class provided with the Java Servlet API. The HttpServlet class provides methods, such as doGet and doPost, for handling HTTP-specific services.

In today’s post, I’ll walkthrough the creation of a servlet.

Setup

First up, we’re going to use Maven to generate the project infrastructure required to support our servlet.

mvn archetype:generate \
    -DgroupId=org.test \
    -DartifactId=hello \
    -Dversion=1.0-SNAPSHOT \
    -DarchetypeArtifactId=maven-archetype-webapp \
    -DinteractiveMode=false

This will generate our hello project for us, and will create a directory structure that looks like this:

.
├── pom.xml
└── src
    └── main
        ├── resources
        └── webapp
            ├── index.jsp
            └── WEB-INF
                └── web.xml

Get started

We can package our application and test it out pretty quickly:

mvn package

After processing, the package instruction will leave our project directly looking like this:

└── target
    ├── classes
    ├── hello
    │   ├── index.jsp
    │   ├── META-INF
    │   └── WEB-INF
    │       ├── classes
    │       └── web.xml
    ├── hello.war
    └── maven-archiver
        └── pom.properties

The hello.war file can now be deployed to our application server of choice for testing. In my example here, I’m using Jetty inside of a docker container.

docker run -ti --rm -p 8080:8080 \
           -v $(pwd)/target/hello.war:/var/lib/jetty/webapps/hello.war \
           jetty

Navigate to http://localhost:8080/hello/, and you’ll see your jsp running.

Serializing data with Avro

In order to marshal data between systems in a language and technology agnostic fashion, you’ll need to lean on a serialization system that affords you the flexibility as well as the strong contract definition that a serialization system provides. Avro is such a system. You declare a set of requirements that your data object must exhibit; run a code-generator over that schema file and Avro will generate objects for you to work with.

In today’s post, I’ll run through the creation of a schema; generation of a code class and basic usage.

Setup

First up, we’ll need the avro-tools jar in order to compile schemas into java class files for us. Not only that, the project that will perform serializations will require the avro libraries. Add the following dependency to your pom.xml to enable Avro’s libraries.

<dependency>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro</artifactId>
  <version>1.8.1</version>
</dependency>

Schema

The schema definition is the key part of this entire process. It’s the schema that will be the understanding or the strong contract between two systems that they’ll agree on in order to send information back and forth. Without this, it’d be chaos. No one would know what the true shape of a data packet was. The following has been taken from the Avro page about schemas:

Avro relies on schemas. When Avro data is read, the schema used when writing it is always present. This permits each datum to be written with no per-value overheads, making serialization both fast and small. This also facilitates use with dynamic, scripting languages, since data, together with its schema, is fully self-describing.

Lets define a car in Avro schema, which is just JSON anyway:

{
  "namespace": "autoshop.avro",
  "type": "record",
  "name": "Car",
  "fields": [
    { "name": "make", "type": "string" },
    { "name": "model", "type": "string" },
    { "name": "year", "type": "int" }
  ]
}

Now that we’ve defined a schema, we can use the avro-tools jar (which is available on Avro’s releases page). Armed with our Avro schema file named car.avsc, we can now generate our Java classes:

java -jar tools/avro-tools-1.8.1.jar compile schema car.avsc .

Take a look at Car.java now. Avro has filled this class out for you, ready to use. The generation process also honors the namespace asked for:

├── autoshop
│   └── avro
│       └── Car.java

Instance construction

Now that we’ve generated a class, we can start constructing instances of it. Out of the box, we’re given three different construction flavors:

Default

Car car1 = new Car();
car1.setMake("Ferarri");
car1.setModel("F40");
car1.setYear(1992);

Parameterized

Car car2 = new Car("Porsche", "911", 1965);

Builder

Car car3 = Car.newBuilder()
              .setMake("McLaren")
              .setModel("650s")
              .setYear(2014)
              .build();

Serialization and Deserialization

Now that we’ve created our three cars, we can write them into a data file:

/* up above */
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.specific.SpecificDatumWriter;
import java.io.*;

/* . . . */

DataFileWriter<Car> carFileWriter = new DataFileWriter<Car>(
  new SpecificDatumWriter<Car>(Car.class)
);

carFileWriter.create(car1.getSchema(), new File("cars.avro"));
carFileWriter.append(car1);
carFileWriter.append(car2);
carFileWriter.append(car3);
carFileWriter.close();

All of the cars subscribe to the same schema, so it’s ok for all of the instances to use the schema from car1. cars.avro now holds a serialized representation of our cars. We can read them out again using the following:

/* up above */
import org.apache.avro.file.DataFileReader;
import org.apache.avro.specific.SpecificDatumReader;

/* . . . */

DataFileReader<Car> carFileReader = new DataFileReader<Car>(
  new File("cars.avro"), new SpecificDatumReader<Car>(Car.class)
);
Car car = null;

while (carFileReader.hasNext()) {
  car = carFileReader.next(car);
  System.out.println(car);
}

The output of which will look something like this:

{"make": "Ferarri", "model": "F40", "year": 1992}
{"make": "Porsche", "model": "911", "year": 1965}
{"make": "McLaren", "model": "650s", "year": 2014}

Further Reading

In situations where code generation isn’t going to be an option, or when you are generally late-binding; you’ll still be able to use a class like GenericRecord in order to satisfy records. You’ll need to use methods like put to set internals of the classes, but you won’t need to strongly bind to generated code.