Speare2Brain

The predecessor project of Speare2Brain, Brain2Speare, took all of a day to implement. It took another day to spruce up, but the fact was: After a single day of development, nearly every Brainfuck program could be successfully transpiled to Shakespeare^[1]. The same cannot said of Speare2Brain. Also, spoilers, it only partially works.

What is hell is a parser generator?

The result of the original Shakespeare project was a Shakespeare-to-C transpiler called spl2c. This is impressive in itself, and is mostly powered by GNU Bison and flex (a parser generator and lexical analyser, respectively). Flex is responsible for taking an spl file and tokenising it for consumption by a parser generated by Bison. While parsing, the program will construct C code that corresponds directly to a set of spl tokens. There’s no need for multiple passes when building the C code, because the generated C is simply a translation of the Shakespeare. For example, the command ‘Romeo: Remember thyself’ is tokenised by flex as ‘REMEMBER SECOND_PERSON_REFLEXIVE‘, which a parser generated by Bison will consume. The parser will then output code like ‘push(second_person, value(second_person))‘. The functions and constants used by the generated code comprise the remainder of the Shakespeare project: In addition to the code generated by spl2c, one must link against libspl and include spl.h when compiling to a binary.

You transpile me right round baby, right round

Of course, when I started the project I didn’t know any of this. I took a course on compilers back at university that I must have mostly sleepwalked through (although I remember doing okay?). Was there a way to take advantage of the work already done by the Shakespeare project? Of course! The hard part of lexical analysis was already done, and the parser generator would make a a great jumping off point. If I modified the parser generator, I could possibly directly output the Brainfuck instead of C! Except… Bison’s parser generation code is heavily C based. Have you ever tried working in C? I have no doubt that it would have been possible to bend Bison to my will and perform the complete transpilation as part of the parsing, but it was beyond my power and patience. If I was going to finish this project in a reasonable time-frame, I would have to work in a language I was more comfortable in.

So why not get all meta about it? I eventually decided to modify the parser generator to output something that wasn’t C. Instead, it would output a brand new language! That language came to be, and was called ‘Not-Shakespeare Programming Language’, or NSPL. By tearing out most of the good stuff from the C generation bit of spl2c and replacing it with my own horrific code, I was able to create a new program that I imaginatively named spl2nspl. This part of the project took an evening in itself to figure out and implement. Let’s compare the some of the output from spl2c and spl2nspl real quick.

C	NSPL
`activate_character(30, hamlet); assign(16, second_person, 222222(-1)); assign(18, second_person, int_sub(17, 222*1, value_of(18, second_person))); char_output(18, second_person);`	`activate,hamlet, assign,const,-64,end_assign, assign,sub,const,8,value_of,second_person,end_sub,end_assign, output`

Yep. It’s the same thing, but written out such that it can be more easily parsed by the real workhorse of the project: A Python script named… Wait for it… nspl2bf. This is the main part of Speare2Brain that I’d like to talk about.

To be or not to be? If so, let us proceed to scene III.

Brain2Speare was a relatively simple project for many reasons. The biggest reason is as follows: It is very easy to rewrite a low-level instruction as a higher-level instruction. After all, you can translate Brainfuck symbol-for-symbol into C code, and Brain2Speare proved that it was also possible to find a similar mapping between Shakespeare and Brainfuck. What about the the reverse? That is to say, translating a higher-level language into a lower-level language. This is what real compilers and linkers do all the time: Binary files are as low-level as you can get.

Shakespeare-to-Brainfuck (or more accurately, Not-SPL to Brainfuck) was significantly more difficult to design and implement. Brain2Speare was based entirely on the idea that we could use two Shakespeare characters to represent the Brainfuck memory array. Let’s dive in to how the memory array is laid out in Speare2Brain. Once again, p is a pointer into the Brainfuck array. Shamelessly ripped from the explanation in the Github repository, and then modified for accuracy:
p[0] - Copy Register (Copy) p[1] - Result Register (Result) p[2] - Loop Termination Register (Loop) p[3] - Character Value Retrieval Register (Retrieve) p[4] - Temporary Register (Temp) p[5] - Right Register (Right) p[6] - On Stage 1 Register (OS1) p[7] - On Stage 2 Register (OS2) p[8] - Active Character Register (Active) p[9] - Inactive Character Register (Second) p[10] - Left Register (Left) p[11] - First Character Register (First character's register) p[12] - Second Character Register (Second character's register) ...

The registers are named after what they are used for the most. In reality, all the registers below p[6] are temporary and cannot be relied upon to be clean at any time. Internally, nspl2spl keeps track of the offset of each register and the ‘current’ location of the Brainfuck pointer in a class called MemoryLayout. In addition to the normal registers, we also have a number of character registers. These represent the character from the original Shakespeare, and are supposed to hold the same values and perform much of the same function as the character in the C equivalent.

I’ll go through what each register is used for, but be warned: You’ll need to know a bit about Brainfuck for this, because this is more about the hard implementation and less about the transpilation process.

Memory Registered

Time for some more shameless copying from the documentation:

Copy – In BF, the simplest way of moving an unknown-until-runtime value from one cell to another involves looping on the source cell; decrementing the value of that cell, moving to the destination cell, incrementing that cell, then moving the pointer back to the source. In this fashion, the source cell will be emptied and the destination cell will gain the value that the source cell originally held. However, this destroys the source cell. The Copy register allows for copies between cells by not only incrementing the value of the destination cell inside the loop, but also that of the Copy register. Once the source cell is emptied, we move the value in the Copy register back into the source cell.
Result – Multiple uses, but one notable use inside nspl2bf is as an indicator for if-else statements. We set the result register to 1 before moving the pointer to another register and attempting to enter one of two blocks, only one of which we want to execute. If we enter the first block (the ‘if’ block) successfully, we immediately decrement the Result register. After we leave the if block, we test Result and only enter the else block if it is still non-zero; that is to say, we did not enter the first block.
Loop – In nspl2bf, it is very important that the pointer is not moved manually. You use functions in the MemoryLayout class to return movement commands, do your BF magic, then reset the pointer. However, this puts a constraint on where the pointer must be at the end of a [] block: It has to be at the same place as it would have been if the block wasn’t entered at all, otherwise we won’t know how many < commands are needed to reset the pointer. This means we have a problem if we need to terminate the loop but we know that the value in the register we used to enter the loop won’t necessarily be zero. To get around this, we use the Loop register: Move the value of the ‘entering’ register into Loop, move the pointer to said ‘entering’ register, allow the loop to terminate, and then move Loop back into the original register.
Retrieve – This is horrible to explain. In nspl2spl we store the offsets of character memory cells in other registers named Active and Second. Retrieve is used to hold these offsets when forcing the memory pointer to the character’s cell via some really twisted construction. It’s only named Retrieve because it was created out of necessity while writing the assign transpilation code. It serves as any other temporary register otherwise.
Temp – It does whatever it needs to do. It’s not the register we deserve, but it’s the one we need.
Right – Holds the value for the right and unary argument in binary and unary expressions respectively.
On Stage (OS1, OS2) – The On Stage registers contain the memory offsets of the characters currently on stage. They’re necessary because when we ‘activate’ an On Stage character, we need to put the inactive character into the Second register. To do this, we need to know which characters are on stage, hence the OS registers.
Active – This holds the offset of the active on stage character. More on this below.
Second – Holds the offset of the character that is on stage but NOT active. This is very useful, because many commands like assign operate exclusively on the inactive character. Having their offset stored simplifies things, but actually accessing the value at the offset is something of a pain. However, because we can construct some really twisted Brainfuck to let as get that memory pointer in the right place.
Left – Holds the value for the left argument in binary expressions. It occupies this part of memory because it is secretly the first character, which means it gets its own stack. This is required because nspl2bf evaluates binary expression trees from left-to-right, meaning the value in the left register can get clobbered when calculating the right value. To avoid this, Left has its own pseudo-stack for storing evaluated expressions.
Character registers + stacks – These hold the value that each character currently, well, holds. Each character has a stack counter cell n cells after their initial offset, where n is the number of characters in the original Shakespeare. The next cell n spaces after marks the bottom of the character’s stack, and every nth space after that is another possible value of their stack.

Do you bite your thumb at me, sir?

A couple of high level examples for what the transpiler does: Each token read by nspl2bf from an nspl file generates Brainfuck code for manipulating the above registers. For instance, the activate token will cause the parser to grab the next token (the name of a character), then output the Brainfuck for copying that character’s offset into the Active register. Under the assumption that this character is already on stage, the other character who is on stage will have their offset copied into the Second register. Why? Because that’s what the original C would have done: If a character is activated, the other character on stage is inactive. Certain other tokens use the active or inactive character for their functions, so it’s important that we know which characters are both on stage and active.

The assign token (a true monster to implement) first clears the Result register before parsing all the tokens between itself and end_assign. The tokens between these two are expression tokens that represent one of the following: Constant values, unary functions, and binary functions. After each of these are applied, we expect a value to be stored in the Result register that is then copied into the memory offset pointed to by the Second register.

At a high level, it’s quite easy to describe what each function does. However, writing the Brainfuck to actually do all that? It’s horrible. It’s really, really hard (not for a good programmer, but yeah). Especially for things like the GOTO statements and the character stacks that we loved so much in Brain2Speare. In fact, those are so hard, they’re not even implemented. That’s right, Speare2Brain isn’t finished. However, it has passed a major development milestone: It successfully transpiles the Hello World Shakespeare program into Brainfuck. Even better, the Brainfuck itself actually runs and prints ‘Hello World!’. There’s one tiny little caveat, but I don’t really care:

azrael@Phoenix> wc -l examples/hello.spl
89 examples/hello.spl
azrael@Phoenix> wc -m examples/hello.bf
22739 examples/hello.bf

Yeah, the end result is huge, but we’re not writing gcc here. But nevertheless:

Oh baby.

Speare2Brain deserves way more explanation, especially more details on the transpilation itself. I’ll write more about it in a follow-up post: We’ll dive into the real crunch that is the mapping between nspl and Brainfuck. In the meantime, check out the Github repository for the project! If you want to see a bunch of example nspl, just clone the repository and run make. It should generate all the example nspl files along with some non-functioning Brainfuck for every file except hello.spl.

Footnotes

[1] An exception until recently included programs that relied on a wrapping cell implementation of Brainfuck. Until I wrote Speare2Brain, I didn’t even realise such implementations existed, but they end up making many, many otherwise difficult tasks much simpler.