Intro
This post is about a Brainfuck-to-Shakespeare transpiler that I wrote one warm Sunday evening called Brain2Speare. To fully appreciate this post, you will need to know a bit about the Brainfuck and Shakespeare esoteric programming languages, but don’t worry! I’ll talk a little bit about each of those before diving into the meaty bits.
First, what’s a transpiler? It’s like a compiler, except it doesn’t compile to an executable file. Instead, source code written in one programming language run through a transpiler will be ‘transpiled’ to source code in another language. The produced code is theoretically equivalent to the original code: If both programs are compiled with their respective compilers, we expect that running them will produce the same outputs given the same inputs. It’s not a new concept. According to Wikipedia (the most trustworthy source available), many implementations of programming languages we use today started life as transpilers. C++ originally transpiled to C, Coffeescript still transpiles to Javascript… And now most importantly of all, Brainfuck transpiles to Shakespeare. Even though Brainfuck can be run directly via countless interpreters or compilers, I thought it would be a fun project to write a transpiler from one joke language to another. I was right!
And how about an esoteric programming language? The clue is in the name: They’re programming languages that are esoteric. Unusual and typically designed as a joke, esoteric programming languages thrive on weird or minimal syntax, confusing behaviour, and general unwieldy-ness. For example, one might look to Malbolge as the quintessential ‘screw you’ language. I won’t even attempt to explain it, check out the link if you want to see what true malice looks like.
Brainfuck
Let’s dive in with a brief explanation of Brainfuck. One of the oldest esoteric programming languages, Brainfuck was designed by Urban Müller in 1993. He’s quite into sky-diving.
It consists of eight instructions, an infinite-length tape, and hatred. Doing anything useful in this language isn’t recommended, but not impossible. To give you an idea about how difficult working in Brainfuck is, imagine a C program with a single char*
pointer called p
pointing to the beginning of an infinite char
buffer. Now, each of the eight instructions can be represented by C code.
Instruction | C equivalent |
---|---|
> |
++p; |
< |
--p; |
+ |
++*p; |
- |
--*p; |
. |
putchar(p); |
, |
*p=getchar(); |
[ |
while(*p) { |
] |
} |
You could almost transpile Brainfuck to C immediately by applying the above transformations to a Brainfuck program. Here’s what Hello World looks like in Brainfuck:
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
Looks kind of compact, right? Until you apply those above transformations and discover the horrific C program lurking beneath the surface:
++p;+p;++p;++p;++p;while(*p) {++p;++p;++p;++p;++p;while(*p) {++p;++p;++p;++p;++p;++p;++p;++p;++p;++p;++p;++p;++p;--p;--p;--p;--p;--*p;}++p;++p;++p;++p;++p;--*p;++p;++p;++p;while(*p) {--p;}--p;--*p;}++p;++p;putchar(p);++p;--*p;--*p;--*p;putchar(p);++p;++p;++p;++p;++p;++p;++p;putchar(p);putchar(p);++p;++p;++p;putchar(p);++p;++p;putchar(p);--p;--*p;putchar(p);--p;putchar(p);++p;++p;++p;putchar(p);--*p;--*p;--*p;--*p;--*p;--*p;putchar(p);--*p;--*p;--*p;--*p;--*p;--*p;--*p;--*p;putchar(p);++p;++p;++p;putchar(p);++p;++p;++p;putchar(p);c
Yikes. That’s your crash course in Brainfuck. Check out the link at the top of this post for more info.
Shakespeare
Shakespeare was written by this guy:
Just kidding! Shakespeare was designed by Kalle Hasselström and Jon Åslund, and it’s quite incredible. Rather than tell you the instruction set (we’ll go into it in more detail when talking about Brain2Speare anyway), here’s a snippet from the Hello World program:
The Infamous Hello World Program. Romeo, a young man with a remarkable patience. Juliet, a likewise young woman of remarkable grace. Ophelia, a remarkable woman much in dispute with Hamlet. Hamlet, the flatterer of Andersen Insulting A/S. Act I: Hamlet's insults and flattery. Scene I: The insulting of Romeo. [Enter Hamlet and Romeo] Hamlet: You lying stupid fatherless big smelly half-witted coward! You are as stupid as the difference between a handsome rich brave hero and thyself! Speak your mind! You are as brave as the sum of your fat little stuffed misused dusty old rotten codpiece and a beautiful fair warm peaceful sunny summer's day. You are as healthy as the difference between the sum of the sweetest reddest rose and my father and yourself! Speak your mind! You are as cowardly as the sum of yourself and the difference between a big mighty proud kingdom and a horse. Speak your mind. Speak your mind! [Exit Romeo]
You’d never believe that was code, right? It definitely has the trappings of describing a program: The repeated references to differences and sums, ‘speak your mind’ probably meaning ‘print your value’… Well, there’s a transpiler for the above that will convert Shakespeare to C code. Said C code will compile with the help of the libspl (lib-Shakespeare Language) library. Crazy huh? The transpiler was written with the help of GNU Bison and flex, parser and lexical analyser generators.
Characters in Shakespeare represent memory addresses, and through the medium of dialogue their values are manipulated. All characters in Shakespeare programs MUST be from a Shakespeare play, the transpiler will not accept any non-Shakespeare characters. Turns out roses aren’t as sweet if called by any other name.
In the above, Hamlet tells Romeo that he is as brave as the sum of his fat little stuffed misused dusty old rotten codpiece and a beautiful fair warm peaceful sunny summer’s day. This transpiles to a C statement that assigns the value of -96 to Romeo. Wait, what? How does that make sense? In Shakespeare, we have the notion of ‘positive’ and ‘negative’ adjectives, nouns, comparatives, etc. A noun is worth either 1 or -1 if it is positive or negative respectively. A codpiece is a negative noun, and as such has value -1. Each adjective has the effect of doubling the value of a noun. In the case of the codpiece, it has seven such adjectives in front of it. 2^7 is 128, but the value is negative so we get -128. We apply a similar process to the positive noun ‘summer’s day’ and find that we’re actually calculating the sum of -128 and 32 and assigning the result to Romeo. Phew. The rest of the program continues in a similar fashion, where occasionally Romeo is asked to print the ASCII character corresponding to his stored value (“Speak your mind!”). I recommend checking out the Shakespeare homepage for more information, but be warned if you want to download the transpiler and run it for yourself: The official parser has a bug when reading Roman numerals. It has been fixed in an unofficial build, which I recommend using instead.
Brain2Speare
Now for the main event! I have written an abomination. Brain2Speare takes a Brainfuck program as input and outputs a Shakespeare program. The methods by which it performs this vile transformation are witchcraft. A simple Python program written in an afternoon, Brain2Speare abuses a feature of Shakespeare in order to simulate the never-ending buffer that a Brainfuck program operates on. But before I talk about that, let’s talk about the initial failed attempt.
At first, I naively believed that by using the full cast of every Shakespeare play that we would have enough memory spaces to run virtually any Brainfuck program. Using the Brainfuck-as-C translation as an example, Romeo in the transpiled code could represent p[0]
, Juilet could represent p[1]
, and so on up until p[120]
or thereabouts. Turns out, this is garbage for a couple of reasons. One: Only using 120 spaces on the tape? Never gonna happen except for the most basic Brainfuck programs. Two: I thought I’d be able to tell which characters were ‘on stage’ by reasoning about the supposed location of the data pointer at each point in the source program. This would have been fine, if it weren’t for a little thing called a ‘while loop’. For instance, suppose I’ve decided that Romeo, Juliet, Hamlet, and Lady Macbeth are to represent p[0]
, p[1]
, p[2]
, and p[3]
. If a Brainfuck program starts at p[0]
, then the program >>>
can be reasoned about at each instruction to say that before the code is run, Romeo should be on stage, after the first instruction is run, Juliet should be on stage, after the third instruction Hamlet should be on stage, and so on. However, for a program like this: >[>-]
We’re immediately in trouble if we don’t know what the values stored in the buffer are after the second instruction, because we don’t know how many times the loop will run. We’d have to effectively run the program in order to know who should be on stage… And if we’re going to do that, we might as well go the whole hog and just write a program that prints the eventual output without performing any of the computations that the source program does. Lame.
So for an hour or so, I was stumped. If we didn’t know who was on stage at each point in the program, or who had to have their values modified, we wouldn’t be able to write a Shakespeare file. After poring over the Shakespeare documentation for some time, some hope emerged. Each character, in addition to storing a single value, has a stack associated with it. A stack! That means we can store ‘infinite’ values in a single character, thereby removing our initial problem of only having about a hundred characters to choose from. It also solved the problem of knowing which characters had to be on stage, because now we would only need two characters to represent the entire memory space used in a Brainfuck program. A quick explanation of how the stack works in Shakespeare: As mentioned before, a character holds a value and also a stack. If the other character on stage instructs the character to ‘Remember X’, where X is an expression such as a sum or value, X is pushed onto the top of that character’s stack. If the character is asked to ‘recall’, the character’s held value is replaced with the value on the top of the stack, and the stack is popped.
Using stacks and a bit of thinking, I was able to construct a mapping between each Brainfuck instruction and a set of Shakespeare instructions. Going back to the infinite buffer that forms the Brainfuck data array, I would have one character hold all values up to the current location of the pointer in his stack, while a second character would use her stack to hold all the values in the buffer after the current position of the pointer. A picture would be helpful:
Now we can do some work! If we are given the instruction >
(as a reminder, this will move the pointer forward by one), we will have Romeo push his current value onto his stack, have Romeo take Juliet’s value, and then have Juliet pop her stack.
At this point, it is probably clear that Romeo represents our p
pointer, now pointing to p[6]
after the >
instruction. <
works similarly, but in reverse. In Shakespeare, the >
command transpiles to:
Juliet: Remember yourself! Romeo: Recall your past mistakes. Juliet: Thou art as vile as me.
Note that the above code is slightly different from what I described: In the actual program, Juliet’s held value doesn’t matter. We only care about what’s on her stack. This code should work just as well by swapping the second and third statements and making adjustments elsewhere in the transpiler, and the generated code would match our above description.
The +
and -
instructions are simple too. As you may have noticed, whenever you add or subtract in Shakespeare you have to do it in a power of two. To make a non-power of two number, you have to sum up several powers of two. That’s right, you’re effectively setting bits with each Shakespeare instruction, and that’s the only way to store an arbitrary value in a character. My transpiler is clever: Given code like +++++++
(add 7 to *p
), it will not generate the following:
Juliet: You are as brave as the sum of a fox and yourself! You are as handsome as the sum of a warm summer's day and thyself. You are as good as the sum of a handsome rich brave hero and yourself!
That’s three instructions that add one, two, and four to Romeo’s value. Instead, by counting the number of bits in the difference between the next highest power of two (eight) and the value you want to add (seven), it becomes clear that it would be fewer instructions to add eight to Romeo’s value and subtract one:
Juliet: You are as brave as the sum of a cunning clever brave rich fox and yourself! You are as dull as the sum of a codpiece and thyself.
Same result, less code generated!
The .
and ,
instructions are easy too, transpiled into their direct Shakespeare equivalents.
Juliet: Speak your mind! Juliet: Open your mind.
The remaining challenges are the [
and ]
instructions. They form the only control path-affecting instructions in Brainfuck, and represent while
loops in C. Shakespeare has two ways to control program flow, and they’re often spotted together:
Lady Macbeth: Are you as rich as me? Romeo: If so, let us proceed to scene III.
The first statement is clearly a comparison, and the following ‘if so’ statement forms the conditional to go along with it. In this case, if the condition is true we will jump to scene III. This is a GOTO
, but don’t worry, it won’t bite. Lady Macbeth has made her first appearance in our transpiler: Her only role is to have a value of zero and to be compared against Romeo whenever we encounter a [
or ]
. If she is equal to Romeo at the [
instruction, we skip to the corresponding ]
, else we keep running the program as normal. If she is not equal to Romeo at a ]
, we do the jump back to the [
. In order to translate this into Shakespeare, we do a pass on the Brainfuck code to pull out all the parentheses and pair them up. We assign each of them a scene number that we will use when we generate the code associated with each pair.
The code generated for a [
looks like:
Scene III: A love lost. [Exeunt] [Enter Lady Macbeth and Romeo] Lady Macbeth: Are thou as handsome as me? Romeo: If so, let us proceed to scene IV. [Exit Lady Macbeth] [Enter Juliet]
First we plop down a GOTO
marker in the form of the scene statement. Then we clear the stage and bring on Lady Macbeth and Romeo. Lady Macbeth states her question and then the GOTO
is either performed or ignored. If not, Lady Macbeth will leave the stage and Juliet will return.
For ]
, we have a similar block:
[Exit Juliet] [Enter Lady Macbeth] Lady Macbeth: Are thou as ugly as me? Romeo: If not, let us proceed to scene III. [Exeunt] Scene IV: Big Trouble in Little Denmark. [Exeunt] Enter Romeo and Juliet]
Once again, we get rid of Juliet and perform our comparison. This time we make sure Romeo is not equal to zero. If we don’t do the jump, we restore the Romeo-Juliet stage and continue with the program. We’ve also put down the scene marker for the [
instruction to jump to if that comparison passes.
Hang on, is that it? That’s every instruction covered! That’s just about the whole process in transpiling Brainfuck to Shakespeare. There’s some boilerplate stuff you need to do as well: A full Shakespeare program needs a Title and a Dramatis Personæ listing the characters you’ll be using. There’s also a higher-level jump marker called an ‘Act’, of which scenes are a part of. And in Brain2Speare, there’s a total of three passes done on the code: One pass on the Brainfuck code to grab the parentheses and pair them up correctly, another pass on the Brainfuck code to generate the initial, very boring Shakespeare file, and a single pass on the generated Shakespeare to make it more entertaining and spruce up the text a little.
To finish off, here’s what the whole thing looks like in action. brain2speare.py
is our transpiler, hello_world.b
contains our Brainfuck program, test.spl
is our generated Shakespeare file, spl2c
is the Shakespeare-to-C transpiler, and test
is our final executable. I hope you enjoyed this, and remember to check out Brain2Speare on Github!
azrael@Phoenix>cat examples/hello_world.b ++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++. azrael@Phoenix> ./brain2speare.py examples/hello_world.b > test.spl azrael@Phoenix> cat test.spl The Interpreted Brainfuck. Hector, a stack that represents the present and past. Mistress Ford, a stack that represents the future. Cassandra, a zero that is only good for comparison. Act I: An egregious abuse. Scene I: The stackening of Mistress Ford. [Enter Hector and Mistress Ford] Hector: Remember thyself! [Many, many, many times to build the stack, snip] Act II: Our main performance. Scene I: It begins here. [Enter Hector and Mistress Ford] Mistress Ford: Thou are as amazing as the sum of yourself and a cunning trustworthy bold pony! Scene VI: A cowardly plum. [Exeunt] [Enter Cassandra and Hector] Cassandra: Are you as fine as me? Hector: If so, let us proceed to scene VII! [Exit Cassandra] [Enter Mistress Ford] Mistress Ford: Remember thyself! Hector: Recall the prompt Hell in the mind's eye. Mistress Ford: Thou are as proud as me. You are as rich as the sum of yourself and a prompt golden hero. Scene II: An embroidered joy. [Exeunt] [Enter Cassandra and Hector] Cassandra: Are thou as noble as me? Hector: If so, let us proceed to scene III. [It continues ...] azrael@Phoenix> ./spl2c < test.spl > test.c azrael@Phoenix> gcc test.c -I . -L . -lspl -lm -o test azrael@Phoenix> ./test Hello World!
Oh baby.