Paxos consensus algorithm

22 Mar 2016

Paxos is a set of protocols designed to solve consensus problems when implementing fault tolerant distributed systems. When many actors in a system can propose values for a system to use, a controlling process (or protocol) needs to do the job of selecting the value and propagating that value’s selection to other actors in the system.

What are we trying to solve?

As in the introduction, distributed systems have many participants or actors that can propose values. A controlling node or consensus needs to be established on which of these values to actually use. This seems quite trivial in a local sense, but once this problem is distributed amongst separate nodes; the system becomes exponentially more complex. The complexity arises in a distributed setting as messages carrying these values can be lost, delayed or destroyed.

The common issues that you always see in distributed systems really boil down to reliability and consistency. Network communications and computers in general fail making for a less-than-reliable model. When these things fail, vectors open up allowing inconsistencies to creep into our system’s view of the world.

The whole idea of the consensus algorithm is to select a value when they’re on offer. Paxos will only select from values that have been offered; it will only select on of those values and the selection isn’t announced to other actors unless it’s actually made.

Assumptions

There are a few assumptions of all systems that need to be made clear; this is important as the protocols are designed around abstract concepts that are modeled to reality. The following lists have re-produced from the Paxos Wikipedia Article

Processors
- Processors operate at arbitrary speed.
- Processors may experience failures.
- Processors with stable storage may re-join the protocol after failures
- Processors do not collude, lie, or otherwise attempt to subvert the protocol.
Network
- Processors can send messages to any other processor.
- Messages are sent asynchronously and may take arbitrarily long to deliver.
- Messages may be lost, reordered, or duplicated.
- Messages are delivered without corruption.

Roles

With those assumptions in place, we can introduce the main roles in the Paxos system. It’s important to note at this point that physical computing entities can perform one or many of these roles.

Client

The client is the creation and termination point for the system. It’s a request from a client that start the Paxos system in motion. The protocol finished its work with a response being sent back to the client. The client isn’t a part of the system itself, it’s just be protocol’s main protagonist.

Proposer

The proposers job takes on requests from the client. It’s the responsibility of the proposers to get a valued agreed upon.

Voter

It’s the job of the voter to accept values from a proposer. It’s not until the quorum (or majority of voters) agree on a value that a state change happens.

Learner

The learner propagates its information back to the client once it’s notified by the quorum (through a primary acceptor) that a proposal has been accepted.

Workflow

There’s a particular workflow that the Paxos system can follow. There are simple and complex flavors of the workflow which all aim at focusing different parts of the consensus flow. The following steps outline the most basic flavor of the protocol:

Prepare & Promise

A proposer will send a prepare request to as many or every acceptor that it can. If the acceptor has not yet seen another proposal, a prepare response is sent back which promises to not accept another proposal. Any prepare request now received by an acceptor is ignored if the proposal number is lower than the previous request seen.

This is key, because if an acceptor now sees a message with a higher number, it’ll process that prepare request.

Accept

When the quorum send prepare resonses back to the proposer (and this ends up as the majority: by definition of quorum), the proposer can now start to issue accept requests. If an acceptor is sent an accept request that has a number equal or above to the highest proposal that it has seen, it will accept that request.

This information is now propagated down to all of the learners for them to see that a particular proposal has been accepted.

Accepted

The system will now choose a value once learners discover that a majority of nodes have accepted the same value.

Implementation

A simple implementation of this protocol can be found in this GitHub repository. It walks through a few of the different message flows in nicely laid out Python code.

PostgreSQL programming environment

05 Mar 2016

Some databases rely heavily on their programming environment to deliver an imperative programming environment to where a developer can use high-level programming concepts such as control selection, iteration and flow.

Oracle supports PL/SQL, Sql Server supports T-SQL. PostgreSQL supports PL/pgSQL.

Prior to running through some top-level topics on the programming environment, some very important links:

Anonymous Code Blocks

Executing an anonymous code block is achieved with the DO keyword. It emulates setting up any other function, with no parameters and no return value. Because you have a function body, you are afforded the ability to use a DECLARE section to house variables.

DO $$
BEGIN

END $$;

Adding variables to this block:

DO $$
DECLARE age INT;
BEGIN

  age := 24;

  IF age < 18 THEN
    -- handle the below 18 case
  END IF;
END$$;

Because of the parameter and return value constraints on anonymous code blocks, getting information out can be a little tricky. There are some ways that we can get some information out though.

RAISE NOTICE allows us to present information to the console. This is considerably handy when we’re testing a block of code.

DO $$
DECLARE age INT;
BEGIN

  age := 17;

  IF age < 18 THEN
    RAISE NOTICE 'Underage!';
  END IF;
END$$;

Of course, if your code is a re-usable block you should be creating a full function for it. This takes care of any input/output parameter issues as you take full control.

Looking elsewhere

If this isn’t to your liking, you can try any of the other operating environments/languages available:

Monitoring PostgreSQL Databases

28 Feb 2016

Understanding what your database is doing and when is essential to runtime administration, maintenance, monitoring and reporting. Gaining insight into how your system responds to different workloads can also tell you how your current deployment is or isn’t serving your purpose.

There are many great articles on this particular topic already. In today’s post, I’m going to walk through a couple of the simple things you can do to check your system’s runtime status.

Unix Tools

When your database is being hosted in a unix-like environment, you’re given the greatest tools at your disposal to understand what’s happening.

ps can show you running processes on your system. Paired with grep, you can focus ps to look at postgres processes only:

ps -ef | grep postgres

postgres     1     0  0 11:05 ?        00:00:00 postgres
postgres    17     1  0 11:05 ?        00:00:00 postgres: checkpointer process  
postgres    18     1  0 11:05 ?        00:00:00 postgres: writer process  
postgres    19     1  0 11:05 ?        00:00:00 postgres: wal writer process  
postgres    20     1  0 11:05 ?        00:00:00 postgres: autovacuum launcher process  
postgres    21     1  0 11:05 ?        00:00:00 postgres: stats collector process

iostat and vmstat will also give you some operating system level insight to how your database application is performing.

Statistics Collector

An important, integrated piece of the Postgres architecture is the statistics collector. Using this, you can query to a very low level many pieces of information surrounding your system’s performance.

The following except is just a small sample of all of the views offered by the statistics collector; which are made available to the developer.

View Name	Description
`pg_stat_activity`	One row per server process, showing information related to the current activity of that process, such as state and current query. See `pg_stat_activity` for details.
`pg_stat_bgwriter`	One row only, showing statistics about the background writer process’s activity. See `pg_stat_bgwriter` for details.
`pg_stat_database`	One row per database, showing database-wide statistics. See `pg_stat_database` for details.
`pg_stat_all_tables`	One row for each table in the current database, showing statistics about accesses to that specific table. See `pg_stat_all_tables` for details.
`pg_stat_sys_tables`	Same as `pg_stat_all_tables`, except that only system tables are shown.
`pg_stat_user_tables`	Same as `pg_stat_all_tables`, except that only user tables are shown.

The full list can be found here.

Scala type construction

24 Feb 2016

Scala gives the developer flexibility when reasoning about and designing type systems for applications. By using classes and traits, a developer can quickly build a complex hierarchy that can assist in describing constraint and relationship information.

In today’s post, I’m going to walk through a useless but demonstrative example of a type hierarchy and some of the constraint features available to the developer.

Vehicles

We’re going to model some different vehicles. Cars, planes, trucks, skateboards, whatever.

abstract class Vehicle

We could start case-classing this base out or directly adding derivatives that specialise down to the exact vehicle types that we want, but we’re going to reason about some attributes that these vehicles might have. Wheels and Jets.

trait HasWheels extends Vehicle {
  def numberOfWheels: Int
}

trait HasJets extends Vehicle {
  def numberOfJets: Int
}

When a vehicle HasWheels, the type is going to require us to specify numberOfWheels. Likewise numberOfJets for HasJets. These traits are extending our abstract Vehicle class.

When we have wheels, we should be able to set how fast they’re spinning.

trait WheelPropulsion {
  this: HasWheels =>

  def setWheelSpin(velocity: Double) {
    println("Wheel spin: " + velocity)
  }
}

Our WheelPropulsion trait says that this needs to be HasWheels. Makes sense. We can’t spin wheels if we don’t have wheels.

Likewise, we’d want to set the turbine intensity if we have jets.

trait JetPropoulsion {
  this: HasJets =>

  def setTurbine(strength: Double) {
    println("Setting turbine strength to " + strength)
  }
}

Even with this very basic level of type description we can start to make some basic vehicles.

class Motorcycle extends Vehicle with HasWheels
                                 with WheelPropulsion {
  def numberOfWheels = 2                                  
}

class MotorVehicle extends Vehicle with HasWheels
                                   with WheelPropulsion {
  def numberOfWheels = 4
}

class TwinJetPlane extends Vehicle with HasJets
                                   with JetPropoulsion {
  def numberOfJets = 2                                  
}

Mixing in

When you’re assembling a variable of your own, there’s no reason you can’t mix in when creating your own types:

val toyota = new Vehicle() with HasWheels 
                           with WheelPropulsion { 
  def numberOfWheels = 4
}

Of course, we could have just constructed toyota as a MotorVehicle for the same effect. This just demonstrates the instance construction flexibility.

Constraints

Finally, when you’re writing functions that work with your types you can specify rich constraint rules so that you can target functionality with as much precision as you require:

// everything can be painted
def paint(v: Vehicle) = { }

// only a vehicle with wheels can burnout
def doBurnout(v: Vehicle with HasWheels) = { }

As you can see, you not only use the with keyword to define your types; this keyword is also used for variable construction and function signature definition.

Covariance and contravariance in your types

23 Feb 2016

When creating parameterised types, you have control on how those types can be passed. These nuances are referred to as variance and scala allows you to explicitly nominate how this works in your own classes.

An excellent explanation on these terms can be found here. I’ve reproduced the three main points for this article though:

That is, if A and B are types, f is a type transformation, and ≤ the subtype relation (i.e. A ≤ B means that A is a subtype of B), we have:

f is covariant if A ≤ B implies that f(A) ≤ f(B)

f is contravariant if A ≤ B implies that f(B) ≤ f(A)

f is invariant if neither of the above holds

Invariant

Invariant parameter types are what ensures that you can only pass MyContainer[Int] to def fn(x: MyContainer[Int]). The guarantee is that the type that you’re containing (when it’s being accessed) is done so as the correct type.

class MyInvariant[T](var value: T)

This guarantees the type of T when we go to work on it.

def double(a: MyInvariant[Int]) = { 
  a.value *= 2
}

You can see here that a good case for invariant is for mutable data.

To show the error case here, we define a show function specialising to MyInvariant[Any]

def show(a: MyInvariant[Any]) = { 
  println("Here is: " + a.value) 
}

Trying to use this function:

scala> show(new MyInvariant[Int](5))
<console>:13: error: type mismatch;
 found   : MyInvariant[Int]
 required: MyInvariant[Any]
Note: Int <: Any, but class MyInvariant is invariant in type T.
You may wish to define T as +T instead. (SLS 4.5)
       show(new MyInvariant[Int](5))
            ^

Covariant

Covariant parameter type is specific. You pass these sorts of types to functions that generalise their inner type access. You need to decorate the type parameter with a +.

class CovariantContainer[+T](var value: T)

Then your function to generalise over this type:

def show(a: CovariantContainer[Any]) = { 
  println("The value is " + a.value)
}

Covariance is a good case for read-only scenarios.

Contravariant

Contravariance is defined by decorating the type parameter with a -. It’s useful in write-only situations.

class ContravariantContainer[-T](var value: T)

We write specialised functions for the type, but that are write-only cases:

def write(a: ContravariantContainer[String]) = {
  println("Writing " + a)
}

Rules

When designing types, the following rules are very important when dealing with parameterization of types.

Mutable containers should be invariant
Immutable containers should be covariant
Transformation inputs should be contravariant
Transformation outputs should be covariant

Modeling a function call

Armed with this information, we can generalise function execution into the following type:

trait Fn[-In, +Out] {
  def apply(i: In): Out
}

Defining this trait, allows us to generalise the computation of an input to an output like the following:

val anyToInt = new Fn[Any, Int] {
  def apply(i: Any) = i.toString.toInt
}

Older Newer

Cogs and Levers A blog full of technical stuff