Creating a Servlet

27 Sep 2016

Servlets are java applications that are run on the server, responding to different requests made by clients. They most commonly are setup to respond to web-based requests although they are not limited to this scope.

From the tutorial:

A servlet is a Java programming language class used to extend the capabilities of servers that host applications accessed by means of a request-response programming model. Although servlets can respond to any type of request, they are commonly used to extend the applications hosted by web servers. For such applications, Java Servlet technology defines HTTP-specific servlet classes.

The javax.servlet and javax.servlet.http packages provide interfaces and classes for writing servlets. All servlets must implement the Servlet interface, which defines lifecycle methods. When implementing a generic service, you can use or extend the GenericServlet class provided with the Java Servlet API. The HttpServlet class provides methods, such as doGet and doPost, for handling HTTP-specific services.

In today’s post, I’ll walkthrough the creation of a servlet.

Setup

First up, we’re going to use Maven to generate the project infrastructure required to support our servlet.

mvn archetype:generate \
    -DgroupId=org.test \
    -DartifactId=hello \
    -Dversion=1.0-SNAPSHOT \
    -DarchetypeArtifactId=maven-archetype-webapp \
    -DinteractiveMode=false

This will generate our hello project for us, and will create a directory structure that looks like this:

.
├── pom.xml
└── src
    └── main
        ├── resources
        └── webapp
            ├── index.jsp
            └── WEB-INF
                └── web.xml

Get started

We can package our application and test it out pretty quickly:

mvn package

After processing, the package instruction will leave our project directly looking like this:

└── target
    ├── classes
    ├── hello
    │   ├── index.jsp
    │   ├── META-INF
    │   └── WEB-INF
    │       ├── classes
    │       └── web.xml
    ├── hello.war
    └── maven-archiver
        └── pom.properties

The hello.war file can now be deployed to our application server of choice for testing. In my example here, I’m using Jetty inside of a docker container.

docker run -ti --rm -p 8080:8080 \
           -v $(pwd)/target/hello.war:/var/lib/jetty/webapps/hello.war \
           jetty

Navigate to http://localhost:8080/hello/, and you’ll see your jsp running.

Serializing data with Avro

26 Sep 2016

In order to marshal data between systems in a language and technology agnostic fashion, you’ll need to lean on a serialization system that affords you the flexibility as well as the strong contract definition that a serialization system provides. Avro is such a system. You declare a set of requirements that your data object must exhibit; run a code-generator over that schema file and Avro will generate objects for you to work with.

In today’s post, I’ll run through the creation of a schema; generation of a code class and basic usage.

Setup

First up, we’ll need the avro-tools jar in order to compile schemas into java class files for us. Not only that, the project that will perform serializations will require the avro libraries. Add the following dependency to your pom.xml to enable Avro’s libraries.

<dependency>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro</artifactId>
  <version>1.8.1</version>
</dependency>

Schema

The schema definition is the key part of this entire process. It’s the schema that will be the understanding or the strong contract between two systems that they’ll agree on in order to send information back and forth. Without this, it’d be chaos. No one would know what the true shape of a data packet was. The following has been taken from the Avro page about schemas:

Avro relies on schemas. When Avro data is read, the schema used when writing it is always present. This permits each datum to be written with no per-value overheads, making serialization both fast and small. This also facilitates use with dynamic, scripting languages, since data, together with its schema, is fully self-describing.

Lets define a car in Avro schema, which is just JSON anyway:

{
  "namespace": "autoshop.avro",
  "type": "record",
  "name": "Car",
  "fields": [
    { "name": "make", "type": "string" },
    { "name": "model", "type": "string" },
    { "name": "year", "type": "int" }
  ]
}

Now that we’ve defined a schema, we can use the avro-tools jar (which is available on Avro’s releases page). Armed with our Avro schema file named car.avsc, we can now generate our Java classes:

java -jar tools/avro-tools-1.8.1.jar compile schema car.avsc .

Take a look at Car.java now. Avro has filled this class out for you, ready to use. The generation process also honors the namespace asked for:

├── autoshop
│   └── avro
│       └── Car.java

Instance construction

Now that we’ve generated a class, we can start constructing instances of it. Out of the box, we’re given three different construction flavors:

Default

Car car1 = new Car();
car1.setMake("Ferarri");
car1.setModel("F40");
car1.setYear(1992);

Parameterized

Car car2 = new Car("Porsche", "911", 1965);

Builder

Car car3 = Car.newBuilder()
              .setMake("McLaren")
              .setModel("650s")
              .setYear(2014)
              .build();

Serialization and Deserialization

Now that we’ve created our three cars, we can write them into a data file:

/* up above */
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.specific.SpecificDatumWriter;
import java.io.*;

/* . . . */

DataFileWriter<Car> carFileWriter = new DataFileWriter<Car>(
  new SpecificDatumWriter<Car>(Car.class)
);

carFileWriter.create(car1.getSchema(), new File("cars.avro"));
carFileWriter.append(car1);
carFileWriter.append(car2);
carFileWriter.append(car3);
carFileWriter.close();

All of the cars subscribe to the same schema, so it’s ok for all of the instances to use the schema from car1. cars.avro now holds a serialized representation of our cars. We can read them out again using the following:

/* up above */
import org.apache.avro.file.DataFileReader;
import org.apache.avro.specific.SpecificDatumReader;

/* . . . */

DataFileReader<Car> carFileReader = new DataFileReader<Car>(
  new File("cars.avro"), new SpecificDatumReader<Car>(Car.class)
);
Car car = null;

while (carFileReader.hasNext()) {
  car = carFileReader.next(car);
  System.out.println(car);
}

The output of which will look something like this:

{"make": "Ferarri", "model": "F40", "year": 1992}
{"make": "Porsche", "model": "911", "year": 1965}
{"make": "McLaren", "model": "650s", "year": 2014}

Packing data with Python

06 Apr 2016

Defining how a sequence of bytes sits in a memory buffer or on disk can be challenging from time to time. Since everything that you’ll work with is a byte, it makes sense that we have an intuitive way to work with this information agnostic of the overlying type restrictions that the language will enforce on us.

In today’s post, I’m going to run through Python’s byte string packing and unpacking using the struct package.

Basics

From the Python documentation:

This module performs conversions between Python values and C structs represented as Python bytes objects. This can be used in handling binary data stored in files or from network connections, among other sources. It uses Format Strings as compact descriptions of the layout of the C structs and the intended conversion to/from Python values.

When working with a byte string in Python, you prefix your literals with b.

>>> b'Hello'
'Hello'

The ord function call is used to convert a text character into its character code representation.

>>> ord(b'H')
72
>>> ord(b'e')
101
>>> ord(b'l')
108

We can use list to convert a whole string of byte literals into an array.

>>> list(b'Hello')
[72, 101, 108, 108, 111]

The compliment to the ord call is chr, which converts the byte-value back into a character.

Packing

Using the struct module, we’re offered the pack function call. This function takes in a format of data and then the data itself. The first parameter defines how the data supplied in the second parameter should be laid out. We get started:

>>> import struct

If we pack the string 'Hello' as single bytes:

>>> list(b'Hello')
[72, 101, 108, 108, 111]
>>> struct.pack(b'BBBBB', 72, 101, 108, 108, 111)
b'Hello'

The format string b'BBBBB' tells pack to pack the values supplied into a string of 5 unsigned values. If we were to use a lower case b in our format string, pack would expect the byte value to be signed.

>>> struct.pack(b'bbbbb', 72, 101, 108, 108, 111)
b'Hello'

This only gets interesting once we send a value that would make the request overflow:

>>> struct.pack(b'bbbbb', 72, 101, 108, 129, 111)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: byte format requires -128 <= number <= 127

The following tables have been re-produced from the Python documentation.

Byte order, size and alignment

Character	Byte order	Size	Alignment
`@`	native	native	native
`=`	native	standard	none
`<`	little-endian	standard	none
`>`	big-endian	standard	none
`!`	network (= big-endian)	standard	none

Types

Format	C Type	Python type	Standard size	Notes
`x`	pad byte	no value
`c`	char	bytes of length 1	1
`b`	signed char	integer	1	(1),(3)
`B`	unsigned char	integer	1	(3)
`?`	_Bool	bool	1	(1)
`h`	short	integer	2	(3)
`H`	unsigned short	integer	2	(3)
`i`	int	integer	4	(3)
`I`	unsigned int	integer	4	(3)
`l`	long	integer	4	(3)
`L`	unsigned long	integer	4	(3)
`q`	long long	integer	8	(2), (3)
`Q`	unsigned long long	integer	8	(2), (3)
`n`	ssize_t	integer		(4)
`N`	size_t	integer		(4)
`f`	float	float	4	(5)
`d`	double	float	8	(5)
`s`	char[]	bytes
`p`	char[]	bytes
`P`	void *	integer		(6)

Unpacking

The direct reverse process of packing bytes into an array, is unpacking them again into usable variables inside of your python code.

>>> struct.unpack(b'BBBBB', struct.pack(b'BBBBB', 72, 101, 108, 108, 111))
(72, 101, 108, 108, 111)
>>> struct.unpack(b'5s', struct.pack(b'BBBBB', 72, 101, 108, 108, 111))
(b'Hello',)

Consistent Hashing

24 Mar 2016

In today’s post, I’m going to run through how Consistent hashing differs from standard modulus distribution and how it can help distributed key searches.

From the wikipedia article:

Consistent hashing is a special kind of hashing such that when a hash table is resized, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped.

Using a simple modulus

A simple way to balance information between a set of nodes is to take a simple modulus of a hash. The hash of an object is taken; a modulus is calculated with respect to the number of nodes. The information is then assigned to that node:

Number of Nodes = 5
"Joe"           = 348561223
Node            = 348561223 mod 5
                = 3

So, the string "Joe" is then destined to live on node 3.

This sort of balancing gets the information distributed, but starts to really show problems when nodes are added or removed from the set. Under these operations there are large suffles of information between nodes required.

How it Works

The aim here is to lower the sensitivity of a piece of information’s hash identity amongst replicas. This way, we still reap the benefits of being in a distributed system but we don’t incur such a loss at the time of adding or removing nodes. Minimalising this disturbance is what consistent hashing aims to solve.

To achieve consistent hashing, not only the key by which the information is retrieved is cached; so do the nodes managing the information. Both of these elements are added to a ring buffer. When this system gets exercised, and a client is looking for information; the key that they’re looking for will land somewhere on the circular buffer. We continually move clockwise through the buffer until we hit a node to find our information.

Adding and removing nodes from the ring buffer is a matter of distribution from a neighbour now, rather than the whole set.

One problem is the distribution of nodes along the ring buffer. If these nodes clump together, there will be a large hash-space empty that a lot of queries could hit. Adding replicas of each node to the hash seems to saturate the ring sufficiently to mitigate this problem.

Implementations

There are many implementations of consisten hashing available. It’s a good exercise to implement one yourself by hand, but these problems have already been solved for you. Some of the better known uses can be found in projects like Openstack, Amazon’s Dynamo and Apache Cassandra.

There are much simpler examples to look at also:

Paxos consensus algorithm

22 Mar 2016

Paxos is a set of protocols designed to solve consensus problems when implementing fault tolerant distributed systems. When many actors in a system can propose values for a system to use, a controlling process (or protocol) needs to do the job of selecting the value and propagating that value’s selection to other actors in the system.

What are we trying to solve?

As in the introduction, distributed systems have many participants or actors that can propose values. A controlling node or consensus needs to be established on which of these values to actually use. This seems quite trivial in a local sense, but once this problem is distributed amongst separate nodes; the system becomes exponentially more complex. The complexity arises in a distributed setting as messages carrying these values can be lost, delayed or destroyed.

The common issues that you always see in distributed systems really boil down to reliability and consistency. Network communications and computers in general fail making for a less-than-reliable model. When these things fail, vectors open up allowing inconsistencies to creep into our system’s view of the world.

The whole idea of the consensus algorithm is to select a value when they’re on offer. Paxos will only select from values that have been offered; it will only select on of those values and the selection isn’t announced to other actors unless it’s actually made.

Assumptions

There are a few assumptions of all systems that need to be made clear; this is important as the protocols are designed around abstract concepts that are modeled to reality. The following lists have re-produced from the Paxos Wikipedia Article

Processors
- Processors operate at arbitrary speed.
- Processors may experience failures.
- Processors with stable storage may re-join the protocol after failures
- Processors do not collude, lie, or otherwise attempt to subvert the protocol.
Network
- Processors can send messages to any other processor.
- Messages are sent asynchronously and may take arbitrarily long to deliver.
- Messages may be lost, reordered, or duplicated.
- Messages are delivered without corruption.

Roles

With those assumptions in place, we can introduce the main roles in the Paxos system. It’s important to note at this point that physical computing entities can perform one or many of these roles.

Client

The client is the creation and termination point for the system. It’s a request from a client that start the Paxos system in motion. The protocol finished its work with a response being sent back to the client. The client isn’t a part of the system itself, it’s just be protocol’s main protagonist.

Proposer

The proposers job takes on requests from the client. It’s the responsibility of the proposers to get a valued agreed upon.

Voter

It’s the job of the voter to accept values from a proposer. It’s not until the quorum (or majority of voters) agree on a value that a state change happens.

Learner

The learner propagates its information back to the client once it’s notified by the quorum (through a primary acceptor) that a proposal has been accepted.

Workflow

There’s a particular workflow that the Paxos system can follow. There are simple and complex flavors of the workflow which all aim at focusing different parts of the consensus flow. The following steps outline the most basic flavor of the protocol:

Prepare & Promise

A proposer will send a prepare request to as many or every acceptor that it can. If the acceptor has not yet seen another proposal, a prepare response is sent back which promises to not accept another proposal. Any prepare request now received by an acceptor is ignored if the proposal number is lower than the previous request seen.

This is key, because if an acceptor now sees a message with a higher number, it’ll process that prepare request.

Accept

When the quorum send prepare resonses back to the proposer (and this ends up as the majority: by definition of quorum), the proposer can now start to issue accept requests. If an acceptor is sent an accept request that has a number equal or above to the highest proposal that it has seen, it will accept that request.

This information is now propagated down to all of the learners for them to see that a particular proposal has been accepted.

Accepted

The system will now choose a value once learners discover that a majority of nodes have accepted the same value.

Implementation

A simple implementation of this protocol can be found in this GitHub repository. It walks through a few of the different message flows in nicely laid out Python code.

Older Newer

Cogs and Levers A blog full of technical stuff

Creating a Servlet

Setup

Get started

Serializing data with Avro

Setup

Schema

Instance construction

Serialization and Deserialization

Further Reading

Packing data with Python

Basics

Packing

Unpacking

Consistent Hashing

Using a simple modulus

How it Works

Implementations

Paxos consensus algorithm

What are we trying to solve?

Assumptions

Roles

Client

Proposer

Voter

Learner

Workflow

Prepare & Promise

Accept

Accepted

Implementation