MYO Language with Antlr
30 Aug 2022Introduction
ANTLR is a code generation tool for making language parsers. Using a grammer file, you can get ANTLR to generate code to read, interpret, and execute your very own code.
In today’s article I’ll walk through the basic setup to create a Calculator language that can execute simple equations in a golang project of our own.
Before you begin
You’ll need a JRE.
Before we start, there are some software pre-requisites. You will need to install ANTLR. This is a simple JAR File that we can invoke locally.
Code generation
Now that we’ve got ANTLR installed, it’s time to generate some code. We do this using a grammer file. A very comprehensive calculator can be found in the examples of the antlr grammers repository here.
For today’s example, we’ll just focus on addition, subtraction, multiplication, and division with the following grammer file:
Even without fully understanding the grammer language, you can see that there is some basic token definitions, rules, and expression definitions.
MUL
, DIV
, ADD
, SUB
, NUMBER
, and WHITESPACE
all being significant to
the language that we’re definting.
The expression
definition not only defines operations for us, but will also be
key in defining operator precedence, with the MulDiv
rule occuring before the AddSub
rule, finally dealing with Number
.
We can turn this grammer file into some go code with the following invocation:
This creates a parser
folder for us now with a few different pieces of go code.
Parsers, Lexers, and Listener
If you look in the parser
folder at the code that was created, you shoul see something
similar to this:
The Lexer’s job is to perform Lexical Analysis on
arbitrary pieces of text, and tokenizes that text into a set of symbols. For example, the input
of 1 + 2
might get tokenized to NUMBER 1, ADD, NUMBER 2
. These tokens are now
fed into the parser.
The Parser’s job is to take these
tokens, and make sure they conform to the rules of the language. You can imagine that
a LISP style language would expect ADD, NUMBER 1, NUMBER 2
rather than a c-style
language that would expect the operator in between the number tokens.
After the string has passed through the lexer and the parser, it now runs through the listener where we can write some code to respond to these symbols in order.
Implementation
The internal implementation of this calculator is a stack-based calculator. This gets
represented as struct
:
The internal state of the calculator are int
values on that stack. As operations
execute, the program will take the top of the stack as well that second-to-the-top
and perform arithmetic, leaving the result on the top of the stack.
The BaseCalcListner
type that was generated for us has all of the hooks we need
to latch onto the complete the implementation. The NUMBER
, ADDSUB
, and MULDIV
rules
all get their own listener for us to respond to.
Execution
Now we go from text input to execution. In the following snippet, the input stream feeds the text into the lexer. The lexer then gets setup as a stream ready to tokenize our input.
Finally, all of those tokens get parsed to make sure they represent valid expressions for our language.
We can now walk the parser tree with a listener attached. The listener will fire off our hooks that we defined earlier; and our stack-based calculator should leave us with the result at the TOS.
We should be left with something like this on screen:
Conclusion
As you can see, ANTLR is a very powerful tool for writing all of the pieces of a compiler (or in this case, an interpreter) to get you kick started very quickly.
You’d almost be insane to ever do this stuff yourself!