Random Blog Post #2

Unfortunately, I am posting this rather late because I've had a particularly stressful week. Last night, I worked on a tax research project until 4am, and now I have to program a custom compiler for a C++ project before I leave for Coachella. I decided to share my misery with you guys, so in this post I will be talking about how to build a custom compiler.

For those of you that don't know, a compiler is essentially a software that reads a high-level programming language (such as C++ or Java) and converts it into machine language that can be processed by the computer. There are five main steps to building a compiler: lexical analysis, syntax analysis, generating the AST, emitting code, and allocating registers.

Lexical Analysis:
A scanner takes input and serializes it into tokens that our compiler can recognize. It ensures that the lexical components of the input are correct. For example, given the sentence "I hate my life", we can tokenize it into four recognizable words. However, if the sentence was "I h8 my life", our scanner will give us a lexical error since "h8" is not a recognizable word.

Syntax analysis:
Once we know our tokens are valid, we have to make sure the syntax, or grammar, of the code is valid as well. To do this, we write a parser that establishes grammar rules for the input. For example, the sentence "hate my I life" would pass the lexical analysis, but does not make any sense grammatically. If we create a grammar rule that says a sentence must follow the format subject-verb-object, then we will catch this error. Of course, we would also have to create rules for the other thousands of sentence structures if we want our parser to work properly.

Generating the AST:
AST stands for Abstract Syntax Tree. This one's pretty difficult to explain so I'll just show you a picture.

Emitting code:
Once we have our scanner, parser, and tree, we can begin to write code for each node in the tree. In our tree above, we are trying to replicate an if-statement. You may recognize if-statements if you have experience in Excel. We can trace through our tree to emit our code. However, each variable must be stored somewhere in memory in order to be computed. The memory address can be referred to as a register.

Allocating registers:
Since we want to use memory efficiently, we need to track when registers are being used and when they become available. There are many optimization techniques to do this, but I am using one called "linear scanning". This is what I'm trying to figure out in my code right now.

I'm sure I made this all sound super easy but I am really struggling. Thanks for reading, now back to the grind.

Comments

  1. This looks complicated and something I would not be able to do myself. However this was an easy read that I enjoyed. Thanks!

    ReplyDelete

Post a Comment

Popular posts from this blog

Reading Post 1

Part V: Chapter 16 - Bringing Wealth Home

Random Post 1