Servlets are java applications that are run on the server, responding to different requests made by clients. They most commonly are setup to respond to web-based requests although they are not limited to this scope.
A servlet is a Java programming language class used to extend the capabilities of servers that host applications accessed by means of a request-response programming model. Although servlets can respond to any type of request, they are commonly used to extend the applications hosted by web servers. For such applications, Java Servlet technology defines HTTP-specific servlet classes.
The javax.servlet and javax.servlet.http packages provide interfaces and classes for writing servlets. All servlets must implement the Servlet interface, which defines lifecycle methods. When implementing a generic service, you can use or extend the GenericServlet class provided with the Java Servlet API. The HttpServlet class provides methods, such as doGet and doPost, for handling HTTP-specific services.
In today’s post, I’ll walkthrough the creation of a servlet.
Setup
First up, we’re going to use Maven to generate the project infrastructure required to support our servlet.
The hello.war file can now be deployed to our application server of choice for testing. In my example here, I’m using Jetty inside of a docker container.
docker run -ti--rm-p 8080:8080 \-v$(pwd)/target/hello.war:/var/lib/jetty/webapps/hello.war \
jetty
In order to marshal data between systems in a language and technology agnostic fashion, you’ll need to lean on a serialization system that affords you the flexibility as well as the strong contract definition that a serialization system provides. Avro is such a system. You declare a set of requirements that your data object must exhibit; run a code-generator over that schema file and Avro will generate objects for you to work with.
In today’s post, I’ll run through the creation of a schema; generation of a code class and basic usage.
Setup
First up, we’ll need the avro-tools jar in order to compile schemas into java class files for us. Not only that, the project that will perform serializations will require the avro libraries. Add the following dependency to your pom.xml to enable Avro’s libraries.
The schema definition is the key part of this entire process. It’s the schema that will be the understanding or the strong contract between two systems that they’ll agree on in order to send information back and forth. Without this, it’d be chaos. No one would know what the true shape of a data packet was. The following has been taken from the Avro page about schemas:
Avro relies on schemas. When Avro data is read, the schema used when writing it is always present. This permits each datum to be written with no per-value overheads, making serialization both fast and small. This also facilitates use with dynamic, scripting languages, since data, together with its schema, is fully self-describing.
Lets define a car in Avro schema, which is just JSON anyway:
Now that we’ve defined a schema, we can use the avro-tools jar (which is available on Avro’s releases page). Armed with our Avro schema file named car.avsc, we can now generate our Java classes:
Now that we’ve created our three cars, we can write them into a data file:
/* up above */importorg.apache.avro.file.DataFileWriter;importorg.apache.avro.specific.SpecificDatumWriter;importjava.io.*;/* . . . */DataFileWriter<Car>carFileWriter=newDataFileWriter<Car>(newSpecificDatumWriter<Car>(Car.class));carFileWriter.create(car1.getSchema(),newFile("cars.avro"));carFileWriter.append(car1);carFileWriter.append(car2);carFileWriter.append(car3);carFileWriter.close();
All of the cars subscribe to the same schema, so it’s ok for all of the instances to use the schema from car1. cars.avro now holds a serialized representation of our cars. We can read them out again using the following:
/* up above */importorg.apache.avro.file.DataFileReader;importorg.apache.avro.specific.SpecificDatumReader;/* . . . */DataFileReader<Car>carFileReader=newDataFileReader<Car>(newFile("cars.avro"),newSpecificDatumReader<Car>(Car.class));Carcar=null;while(carFileReader.hasNext()){car=carFileReader.next(car);System.out.println(car);}
The output of which will look something like this:
In situations where code generation isn’t going to be an option, or when you are generally late-binding; you’ll still be able to use a class like GenericRecord in order to satisfy records. You’ll need to use methods like put to set internals of the classes, but you won’t need to strongly bind to generated code.
Defining how a sequence of bytes sits in a memory buffer or on disk can be challenging from time to time. Since everything that you’ll work with is a byte, it makes sense that we have an intuitive way to work with this information agnostic of the overlying type restrictions that the language will enforce on us.
In today’s post, I’m going to run through Python’s byte string packing and unpacking using the struct package.
Basics
From the Python documentation:
This module performs conversions between Python values and C structs represented as Python bytes objects. This can be used in handling binary data stored in files or from network connections, among other sources. It uses Format Strings as compact descriptions of the layout of the C structs and the intended conversion to/from Python values.
When working with a byte string in Python, you prefix your literals with b.
>>>b'Hello''Hello'
The ord function call is used to convert a text character into its character code representation.
>>>ord(b'H')72>>>ord(b'e')101>>>ord(b'l')108
We can use list to convert a whole string of byte literals into an array.
>>>list(b'Hello')[72,101,108,108,111]
The compliment to the ord call is chr, which converts the byte-value back into a character.
Packing
Using the struct module, we’re offered the pack function call. This function takes in a format of data and then the data itself. The first parameter defines how the data supplied in the second parameter should be laid out. We get started:
The format string b'BBBBB' tells pack to pack the values supplied into a string of 5 unsigned values. If we were to use a lower case b in our format string, pack would expect the byte value to be signed.
In today’s post, I’m going to run through how Consistent hashing differs from standard modulus distribution and how it can help distributed key searches.
Consistent hashing is a special kind of hashing such that when a hash table is resized, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped.
Using a simple modulus
A simple way to balance information between a set of nodes is to take a simple modulus of a hash. The hash of an object is taken; a modulus is calculated with respect to the number of nodes. The information is then assigned to that node:
Number of Nodes = 5
"Joe" = 348561223
Node = 348561223 mod 5
= 3
So, the string "Joe" is then destined to live on node 3.
This sort of balancing gets the information distributed, but starts to really show problems when nodes are added or removed from the set. Under these operations there are large suffles of information between nodes required.
How it Works
The aim here is to lower the sensitivity of a piece of information’s hash identity amongst replicas. This way, we still reap the benefits of being in a distributed system but we don’t incur such a loss at the time of adding or removing nodes. Minimalising this disturbance is what consistent hashing aims to solve.
To achieve consistent hashing, not only the key by which the information is retrieved is cached; so do the nodes managing the information. Both of these elements are added to a ring buffer. When this system gets exercised, and a client is looking for information; the key that they’re looking for will land somewhere on the circular buffer. We continually move clockwise through the buffer until we hit a node to find our information.
Adding and removing nodes from the ring buffer is a matter of distribution from a neighbour now, rather than the whole set.
One problem is the distribution of nodes along the ring buffer. If these nodes clump together, there will be a large hash-space empty that a lot of queries could hit. Adding replicas of each node to the hash seems to saturate the ring sufficiently to mitigate this problem.
Implementations
There are many implementations of consisten hashing available. It’s a good exercise to implement one yourself by hand, but these problems have already been solved for you. Some of the better known uses can be found in projects like Openstack, Amazon’s Dynamo and Apache Cassandra.
Paxos is a set of protocols designed to solve consensus problems when implementing fault tolerant distributed systems. When many actors in a system can propose values for a system to use, a controlling process (or protocol) needs to do the job of selecting the value and propagating that value’s selection to other actors in the system.
What are we trying to solve?
As in the introduction, distributed systems have many participants or actors that can propose values. A controlling node or consensus needs to be established on which of these values to actually use. This seems quite trivial in a local sense, but once this problem is distributed amongst separate nodes; the system becomes exponentially more complex. The complexity arises in a distributed setting as messages carrying these values can be lost, delayed or destroyed.
The common issues that you always see in distributed systems really boil down to reliability and consistency. Network communications and computers in general fail making for a less-than-reliable model. When these things fail, vectors open up allowing inconsistencies to creep into our system’s view of the world.
The whole idea of the consensus algorithm is to select a value when they’re on offer. Paxos will only select from values that have been offered; it will only select on of those values and the selection isn’t announced to other actors unless it’s actually made.
Assumptions
There are a few assumptions of all systems that need to be made clear; this is important as the protocols are designed around abstract concepts that are modeled to reality. The following lists have re-produced from the Paxos Wikipedia Article
Processors
Processors operate at arbitrary speed.
Processors may experience failures.
Processors with stable storage may re-join the protocol after failures
Processors do not collude, lie, or otherwise attempt to subvert the protocol.
Network
Processors can send messages to any other processor.
Messages are sent asynchronously and may take arbitrarily long to deliver.
Messages may be lost, reordered, or duplicated.
Messages are delivered without corruption.
Roles
With those assumptions in place, we can introduce the main roles in the Paxos system. It’s important to note at this point that physical computing entities can perform one or many of these roles.
Client
The client is the creation and termination point for the system. It’s a request from a client that start the Paxos system in motion. The protocol finished its work with a response being sent back to the client. The client isn’t a part of the system itself, it’s just be protocol’s main protagonist.
Proposer
The proposers job takes on requests from the client. It’s the responsibility of the proposers to get a valued agreed upon.
Voter
It’s the job of the voter to accept values from a proposer. It’s not until the quorum (or majority of voters) agree on a value that a state change happens.
Learner
The learner propagates its information back to the client once it’s notified by the quorum (through a primary acceptor) that a proposal has been accepted.
Workflow
There’s a particular workflow that the Paxos system can follow. There are simple and complex flavors of the workflow which all aim at focusing different parts of the consensus flow. The following steps outline the most basic flavor of the protocol:
Prepare & Promise
A proposer will send a prepare request to as many or every acceptor that it can. If the acceptor has not yet seen another proposal, a prepare response is sent back which promises to not accept another proposal. Any prepare request now received by an acceptor is ignored if the proposal number is lower than the previous request seen.
This is key, because if an acceptor now sees a message with a higher number, it’ll process that prepare request.
Accept
When the quorum send prepare resonses back to the proposer (and this ends up as the majority: by definition of quorum), the proposer can now start to issue accept requests. If an acceptor is sent an accept request that has a number equal or above to the highest proposal that it has seen, it will accept that request.
This information is now propagated down to all of the learners for them to see that a particular proposal has been accepted.
Accepted
The system will now choose a value once learners discover that a majority of nodes have accepted the same value.
Implementation
A simple implementation of this protocol can be found in this GitHub repository. It walks through a few of the different message flows in nicely laid out Python code.