Cogs and Levers A blog full of technical stuff

MYO Language with Antlr

Introduction

ANTLR is a code generation tool for making language parsers. Using a grammer file, you can get ANTLR to generate code to read, interpret, and execute your very own code.

In today’s article I’ll walk through the basic setup to create a Calculator language that can execute simple equations in a golang project of our own.

Before you begin

You’ll need a JRE.

Before we start, there are some software pre-requisites. You will need to install ANTLR. This is a simple JAR File that we can invoke locally.

$ wget http://www.antlr.org/download/antlr-4.7-complete.jar
$ alias antlr='java -jar $PWD/antlr-4.7-complete.jar'

Code generation

Now that we’ve got ANTLR installed, it’s time to generate some code. We do this using a grammer file. A very comprehensive calculator can be found in the examples of the antlr grammers repository here.

For today’s example, we’ll just focus on addition, subtraction, multiplication, and division with the following grammer file:

// Calc.g4
grammar Calc;

// Tokens
MUL: '*';
DIV: '/';
ADD: '+';
SUB: '-';
NUMBER: [0-9]+;
WHITESPACE: [ \r\n\t]+ -> skip;

// Rules
start : expression EOF;

expression
   : expression op=('*'|'/') expression # MulDiv
   | expression op=('+'|'-') expression # AddSub
   | NUMBER                             # Number
   ;

Even without fully understanding the grammer language, you can see that there is some basic token definitions, rules, and expression definitions.

MUL, DIV, ADD, SUB, NUMBER, and WHITESPACE all being significant to the language that we’re definting.

The expression definition not only defines operations for us, but will also be key in defining operator precedence, with the MulDiv rule occuring before the AddSub rule, finally dealing with Number.

We can turn this grammer file into some go code with the following invocation:

$ antlr -Dlanguage=Go -o parser Calc.g4

This creates a parser folder for us now with a few different pieces of go code.

Parsers, Lexers, and Listener

If you look in the parser folder at the code that was created, you shoul see something similar to this:

└── parser
    ├── calc_base_listener.go
    ├── calc_lexer.go
    ├── CalcLexer.tokens
    ├── calc_listener.go
    ├── calc_parser.go
    └── Calc.tokens

The Lexer’s job is to perform Lexical Analysis on arbitrary pieces of text, and tokenizes that text into a set of symbols. For example, the input of 1 + 2 might get tokenized to NUMBER 1, ADD, NUMBER 2. These tokens are now fed into the parser.

The Parser’s job is to take these tokens, and make sure they conform to the rules of the language. You can imagine that a LISP style language would expect ADD, NUMBER 1, NUMBER 2 rather than a c-style language that would expect the operator in between the number tokens.

After the string has passed through the lexer and the parser, it now runs through the listener where we can write some code to respond to these symbols in order.

Implementation

The internal implementation of this calculator is a stack-based calculator. This gets represented as struct:

type calculatorListener struct {
	*parser.BaseCalcListener
	stack []int
}

The internal state of the calculator are int values on that stack. As operations execute, the program will take the top of the stack as well that second-to-the-top and perform arithmetic, leaving the result on the top of the stack.

func (l *calculatorListener) push(i int) {
	l.stack = append(l.stack, i)
}

func (l *calculatorListener) pop() int {
	if len(l.stack) < 1 {
		panic("TOS invalid")
	}

	result := l.stack[len(l.stack)-1]
	l.stack = l.stack[:len(l.stack)-1]

	return result
}

The BaseCalcListner type that was generated for us has all of the hooks we need to latch onto the complete the implementation. The NUMBER, ADDSUB, and MULDIV rules all get their own listener for us to respond to.

func (l *calculatorListener) ExitMulDiv(c *parser.MulDivContext) {
  // get TOS and STOS
	rhs, lhs := l.pop(), l.pop()

  // perform the required operation, pushing the result back
  // up as the new TOS
	switch c.GetOp().GetTokenType() {
	case parser.CalcParserMUL:
		l.push(lhs * rhs)
	case parser.CalcParserDIV:
		l.push(lhs / rhs)
	default:
		panic(fmt.Sprintf("not yet implemented: %s", c.GetOp().GetText()))
	}
}

func (l *calculatorListener) ExitAddSub(c *parser.AddSubContext) {
  // get TOS and STOS
	rhs, lhs := l.pop(), l.pop()

  // perform the required operation, pushing the result back
  // up as the new TOS
	switch c.GetOp().GetTokenType() {
	case parser.CalcParserADD:
		l.push(lhs + rhs)
	case parser.CalcParserSUB:
		l.push(lhs - rhs)
	default:
		panic(fmt.Sprintf("not yet implemented: %s", c.GetOp().GetText()))
	}
}

func (l *calculatorListener) ExitNumber(c *parser.NumberContext) {
  // coerce the string into an integer
	i, err := strconv.Atoi(c.GetText())
	if err != nil {
		panic(err.Error())
	}

  // push onto the stack
	l.push(i)
}

Execution

Now we go from text input to execution. In the following snippet, the input stream feeds the text into the lexer. The lexer then gets setup as a stream ready to tokenize our input.

Finally, all of those tokens get parsed to make sure they represent valid expressions for our language.

equation := "1 + 5 - 2 * 20"
is := antlr.NewInputStream(equation)
lexer := parser.NewCalcLexer(is)
stream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)

p := parser.NewCalcParser(stream)

We can now walk the parser tree with a listener attached. The listener will fire off our hooks that we defined earlier; and our stack-based calculator should leave us with the result at the TOS.

var listener calcListener
antlr.ParseTreeWalkerDefault.Walk(&listener, p.Start())
answer := listener.pop()

fmt.Printf("%s = %d", equation, answer)

We should be left with something like this on screen:

1 + 5 - 2 * 20 = -34

Conclusion

As you can see, ANTLR is a very powerful tool for writing all of the pieces of a compiler (or in this case, an interpreter) to get you kick started very quickly.

You’d almost be insane to ever do this stuff yourself!

Shell Tricks

Sometimes you can be just as productive using your shell as you are in any programming environment, you just need to know a couple of tricks. In this article, I’ll walk through some basic tips that I’ve come across.

Reading input

You can make your scripts immediately interactive by using the read instruction.

#/bin/bash

echo -n "What is your name? "
read NAME

echo "Hi there ${NAME}!"

String length

You can get the length of any string that you’ve stored in a variable by prefixing it with #.

#/bin/bash

echo -n "What is your name? "
read NAME

echo "Your name has ${#NAME} characters in it"

Quick arithmetic

You can perform some basic arithmetic within your scripts as well. The value emitted with the # character is an integral value that we can perform tests against.

#/bin/bash

echo -n "What is your name? "
read NAME

if (( ${#NAME} > 10 )) 
then
  echo "You have a very long name, ${NAME}"
fi

Substrings

String enumeration will also allow you to take a substring directly. The format takes the form of ${VAR:offset:length}.

Passing positive integers for offset and length will make substring operate from the leftmost side of the string. Negative numbers provide a reverse index, from the right.

STR="Scripting for the win"

echo ${STR:10:3}
# for

echo ${STR: -3}
# win

echo ${STR: -7: 3}
# the

Replacement

It’s common place to be able to use regular expressions to make substitutions where needed, and they’re available to you at the shell as well.

#!/bin/bash

STR="Scripting for the win"

echo ${STR/win/WIN}
# Scripting for the WIN

Finishing up

There’s lots more that you can do just from the shell, without needing to reach for other tools. This is only a few tips and tricks.

Mutual TLS (mTLS)

Introduction

TLS has forever played a very large part in securing internet communications. Secure Socket Layer (SSL) filled this space prior to TLS coming to the fore.

In today’s article, I’m going to walk through an exercise of mTLS which is just an extension of TLS.

CA

First of all, we need a certificate authority (CA) that both the client and the server will trust. We generate these using openssl.

openssl req -new -x509 -nodes -days 365 -subj '/CN=my-ca' -keyout ca.key -out ca.crt

This now puts a private key in ca.key and a certificate in ca.crt on our filesystem. We can inspect these a little further with the following.

openssl x509 --in ca.crt -text --noout

Looking at the output, we see some interesting things about our CA certificate. Most importantly the X509v3 Basic Constraints value is set CA:TRUE, telling us that this certificate can be used to sign other certificates (like CA certificates can).

Server

The server now needs a key and certificate. Key generation is simple, as usual:

openssl genrsa -out server.key 2048

We need to create a certificate that has been signed by our CA. This means we need to generate a certificate signing request, which is then used to produce the signed certificate.

openssl req -new -key server.key -subj '/CN=localhost' -out server.csr

This gives us a signing request for the domain of localhost as mentioned in the -subj parameter. This signing request now gets used by the CA to generate the certificate.

openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -days 365 -out server.crt

Inspecting the server certificate, you can see that it’s quite a bit simpler than the CA certificate. We’re only able to use this certificate for the subject that we nominated; localhost.

Client

The generation of the client certificates is very much the same as the server.

# create a key
openssl genrsa -out client.key 2048

# generate a signing certificate
openssl req -new -key client.key -subj '/CN=my-client' -out client.csr

# create a certificate signed by the CA
openssl x509 -req -in client.csr -CA ca.crt -CAkey ca.key -CAcreateserial -days 365 -out client.crt

The subject in this case is my-client.

The -CAcreateserial number also ensures that we have unique serial numbers between the server and client certificates. Again, this can be verified when you inspect the certificate.

# Server serial number
        Serial Number:
            5c:2c:47:44:2c:13:3b:c9:56:56:99:37:3f:c9:1e:62:c4:c7:df:20

# Client serial number
        Serial Number:
            5c:2c:47:44:2c:13:3b:c9:56:56:99:37:3f:c9:1e:62:c4:c7:df:21

Only the last segment was incremented here. You get the idea though. Unique.

Appliation

Now, we setup a basic node.js server that requires mTLS.

const https = require('https');
const fs = require('fs');

const hostname = 'localhost';
const port = 3000;

const options = { 
    ca: fs.readFileSync('ca.crt'), 
    cert: fs.readFileSync('server.crt'), 
    key: fs.readFileSync('server.key'), 
    rejectUnauthorized: true,
    requestCert: true, 
}; 

const server = https.createServer(options, (req, res) => {
  res.statusCode = 200;
  res.setHeader('Content-Type', 'text/plain');
  res.end('Hello World');
});

server.listen(port, hostname, () => {
  console.log(`Server running at http://${hostname}:${port}/`);
});

Most important here is that the server’s options specify rejectUnauthorized as well as requestCert. This will force the mTLS feedback look back to the client.

A curl request now verifies that the solution is secured by this system of certificates.

curl --cacert ca.crt --key client.key --cert client.crt https://localhost:3000

The client’s key, certificate, and the ca cert accompany a successful request. A request in any other format simply fails as the authentication requirements have not been met.

Find the listening process for a port

Introduction

In networking, a port is assigned as a logical entity that a socket is established on. These sockets are owned by processes in your operation system. From time to time, it can be unclear which process owns which socket (or who is hogging which port).

In today’s article, I’ll take you through a few techniques on finding out who is hanging onto particular ports.

netstat

netstat is a general purpose network utility that will tell you about activity within your network interfaces.

netstat - Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships

If you can not find netstat installed on your system, you can normally get it from the net-tools package.

The following command will give you a breakdown of processes listening on port 8080, as an example:

➜  netstat -ltnp | grep -w ':8080'
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp6       0      0 :::8080                 :::*                    LISTEN      -                   

An important message appears here. “Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.”. There will be processes invisible to you unless you run this command as root.

Breaking down the netstat invocation:

  • l will only show listening sockets
  • t will only show tcp connections
  • n will show numerical addresses
  • p will show you the PID

You can see above, that no process is shown. Re-running this command as root:

➜  sudo netstat -ltnp | grep -w ':8080'
tcp6       0      0 :::8080                 :::*                    LISTEN      2765/docker-proxy

lsof

lsof will give you a list of open files on the system. Remember, sockets are just files. By using -i we can filter the list down to those that match on an internet address.

➜  sudo lsof -i :8080
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
docker-pr 2765 root    4u  IPv6  36404      0t0  TCP *:http-alt (LISTEN)

fuser

fuser is a program that has overlapping responsibilities with the likes of lsof.

fuser — list process IDs of all processes that have one or more files open

You can filter the list down directly with the command:

➜  sudo fuser 8080/tcp
8080/tcp:             2765

This gives us a PID to work with. Again, note this is run as root. Now all we need to do is to tranform this PID into a process name. We can use ps to finish the job.

➜  ps -p 2765 -o comm=
docker-proxy

PostgreSQL Data Access with Haskell

Introduction

PostgreSQL is a very popular relational database which has quite a few different data access libraries available for the Haskell programming language.

Today’s article aims to get you up and running, executing queries against PostgreSQL from your Haskell environment with the least amount of hassle.

postgresql-simple

The first library that we’ll go through is postgresql-simple. This library has a very basic interface, and is really simple to get up an running.

A mid-level client library for the PostgreSQL database, aimed at ease of use and high performance.

Prerequisites

Before you get started though, you’ll need libpq installed.

pacman -S postgresql-libs

Now you’re ready to develop.

You’ll need to add a dependency on the postgresql-simple library to your application. The following code will then allow you to connect to your PostgreSQL database, and ru a simple command.

Hello, Postgres!

{-# LANGUAGE OverloadedStrings #-}
module Main where

import Database.PostgreSQL.Simple

localPG :: ConnectInfo
localPG = defaultConnectInfo
        { connectHost = "172.17.0.1"
        , connectDatabase = "clients"
        , connectUser = "app_user"
        , connectPassword = "app_password"
        }

main :: IO ()
main = do
  conn <- connect localPG
  mapM_ print =<< (query_ conn "SELECT 1 + 1" :: IO [Only Int])

When your application successfully builds and executes, you should be met with the following output:

Only {fromOnly = 2}

Walking through this code quickly, we first enable OverloadedStrings so that we can specify our Query values as literal strings.

localPG :: ConnectInfo
localPG = defaultConnectInfo
        { connectHost = "172.17.0.1"
        , connectDatabase = "clients"
        , connectUser = "app_user"
        , connectPassword = "app_password"
        }

In order to connect to Postgres, we use a ConnectInfo value which is filled out for us via defaultConnectInfo. We just override those values for our examples. I’m running PostgreSQL in a docker container, therefore I’ve got my docker network address.

conn <- connect localPG

The localPG value is now used to connect to the Postgres database. The conn value will be referred to after successful connection to send instructions to.

mapM_ print =<< (query_ conn "SELECT 1 + 1" :: IO [Only Int])

Finally, we run our query SELECT 1 + 1 using the query_ function. conn is passed to refer to the connecion to execute this query on.

With this basic code, we can start to build on some examples.

Retrieve a specific record

In the Hello, World example above, we were adding two static values to return another value. As exampeles get more complex, we need to give the library more information about the data that we’re working with. Int is very well known already, and already has mechanisms to deal with it (along with other basic data types).

In the client database table we have a list of names and ids. We can create a function to retrieve the name of a client, given an id:

retrieveClient :: Connection -> Int -> IO [Only String]
retrieveClient conn cid = query conn "SELECT name FROM client WHERE id = ?" $ (Only cid)

The Query template passed in makes use of the ? character to specify where substitutions will be put. Note the use of query rather than query_. In this case, query also accepts a Tuple containing all of the values for substitution.

Using the FromRow type class, our code can define a much stronger API. We can actually retrieve client rows from the database and convert them into Client values.

We need FromRow first:

import Database.PostgreSQL.Simple.FromRow

The Client data type needs definition now. It’s how we’ll refer to a client within our Haskell program:

data Client = Client { id :: Int, name :: String }
  deriving (Show)

The Client data type now gets a FromRow instance, which allows postgresql-simple to use it.

instance FromRow Client where
  fromRow = Client <$> field <*> field

In order of the fields definitions, we give fromRow definition. The retrieveClient function only changes to broaden its query, and change its return type!

retrieveClient :: Connection -> Int -> IO [Client]
retrieveClient conn cid = query conn "SELECT id, name FROM client WHERE id = ?" $ (Only cid)

Create a new record

When creating data, you can use the function execute. The execute function is all about execution of the query without any return value.

execute conn "INSERT INTO client (name) VALUES (?)" (Only "Sam")

Extending our API, we can make a createClient function; but with a twist. We’ll also return the generated identifier (because of the id field).

createClient :: Connection -> String -> IO [Only Int64]
createClient conn name =
  query conn "INSERT INTO client (name) VALUES (?) RETURNING id" $ (Only name)

We need a definition for Int64. This is what the underlying SERIAL in PostgreSQL will translate to inside of your Haskell application.

import Data.Int

We can now use createClient to setup an interface of sorts fo users to enter information.

main :: IO ()
main = do
  conn <- connect localPG
  putStrLn "Name of your client? "
  clientName <- getLine
  cid <- createClient conn clientName
  putStrLn $ "New Client: " ++ (show cid)

We’ve created a data creation interface now.

Name of your client?
Ringo
New Client: [Only {fromOnly = 4}]

Update an existing record

When it comes to updating data, we don’t expect much back in return aside from the number of records affected by the instruction. The execute function does exactly this. By measuring the return, we can convert the row count into a success/fail style message. I’ve simply encoded this as a boolean here.

updateClient :: Connection -> Int -> String -> IO Bool
updateClient conn cid name = do
  n <- execute conn "UPDATE client SET name = ? WHERE id = ?" (name, cid)
  return $ n > 0

Destroying records

Finally, destroying information out of the database will look a lot like the update.

deleteClient :: Connection -> Int -> IO Bool
deleteClient conn cid = do
  n <- execute conn "DELETE FROM client WHERE id = ?" $ (Only cid)
  return $ n > 0

execute providing the affected count allows us to perform the post-execution validation again.

Summary

There’s some basic operations to get up and running using postgresql-simple. Really looks like you can prototype software all the way through to writing fully blown applications with it.

Really simple to use.