Writing An Parser In Nim

  • 2019/2/6

Table of Contents

  1. Overview
  2. What is Parser?
  3. Demo
  4. What is Pratt Parser?
  5. Where it was difficult to make this Nim's parser ?
  6. Introduction to Generator
  7. Impressions

Overview

  • 「Goで作るインタプリタ」
    • Chapter 2
  • Continuation of Lexer

What is Parser?

  • Make AST from Token
    • Abstract Syntax Tree of program
    • Leave out whitespace and parentheses and end of line semicolon

Example

  • let <identifier> = <expression>;

"expression" and "statement" and ..

Expression

  • generate a value
  • ex.
    • 42
    • hoge

Statement

  • do not generate a value
  • ex.
    • let hoge = 42
    • return 42
    • if..else
    • for..

Expression Statement

  • statement consisting of only one expression
  • ex.
    • x + 42

Token

  • output of lexer
[Type = "LET", Literal = "let"]
[Type = "IDENT", Literal = "hoge"]
[Type = "=", Literal = "="]
[Type = "INT", Literal = "1"]
[Type = "+", Literal = "+"]
[Type = "INT", Literal = "2"]
[Type = "*", Literal = "*"]
[Type = "INT", Literal = "3"]
[Type = "/", Literal = "/"]
[Type = "INT", Literal = "4"]
[Type = "+", Literal = "+"]
[Type = "INT", Literal = "5"]
[Type = "-", Literal = "-"]
[Type = "INT", Literal = "6"]
[Type = ";", Literal = ";"]

AST Object

  • output of Parser
Token: {
    Type: 'LET',
    Literal: 'let'
},
Let: {
    Token: {
        Type: 'IDENT',
        Literal: 'hoge'
    },
    IdentValue: 'hoge'
},
LetValue: {
    Left: {
        Left: {
            Token: {
                Type: 'INT',
                Literal: '1'
            },
            IntValue: 1
        },
        Operator: '+',
        Right: {
            Left: {
                Left: {
                    Token: {
                        Type: 'INT',
                        Literal: '2'
                    },
                    IntValue: 2
                },
                Operator: '*',
                Right: {
                    Token: {
                        Type: 'INT',
                        Literal: '3'
                    },
                    IntValue: 3
                }
            },
            Operator: '/',
            Right: {
                Token: {
                    Type: 'INT',
                    Literal: '4'
                },
                IntValue: 4
            }
        }
    },
    Operator: '+',
    InRight: {
        Left: {
            Token: {
                Type: 'INT',
                Literal: '5'
            },
            IntValue: 5
        },
        Operator: '*',
        Right: {
            Token: {
                Type: 'INT',
                Literal: '6'
            },
            IntValue: 6
        }
    }
}

AST

  • AST image

Demo

$ nim c -r src/repl/repl.nim

What is Pratt Parser ?

Pratt Parser

  • Vaughan Pratt, 1973
  • he said in this thesis about this parser
    • very simple to understand
    • trivial to implement
    • easy to use
    • extremely efficient in practice if not in theory
  • use when make parser without parser generator

paper: link
explanation with js: link←he made JSLint

Types of parsers

  • Top-down parsing
    • recurcive-descent parser ← this!
    • LL parser
    • packrat parser
  • Bottom-up parsing
    • Earley parser
    • CYK parser
    • LR parser
      • Simple LR parser
      • LALR parser
      • CLR parser
      • GLR parser

ref: Parsing - Wikipedia

How It Works

Example

  • input: 2 * 3 / 4
  • output: AST

AST

  • a part of the AST just before
  • the red part of the figure below

output object

make "2 * 3"

1. Whether "Statement" or "Expression Statement"

  • determine the type of expression

2. save "2" in the object

3. Enter the while loop

  • Enter "while"
    • the conditional formulas are described later

4. save "*" in the object

  • Call parseExpression recursively!

5. parseExpression again

  • not enter "while"

|

6. return to the first parseExpression

  • Since the value of parseInfixExpression is returned, it returns to the top of while

Determine precedence of Tokens

  • priority gets larger as going down

Sum < Product # true

while

  • conditions

    • precedence < self.peekPrecedence()
    • not self.peekTokenIs(SEMICOLON)
  • ex.

    • "1 + 2 * 3"
    • Sum < Product
    • → enter while loop..

make "/ 4"

7. in while

8. parseInfixExpression again again

  • save "4" to object
  • don't enter the while loop

9. completion

Where it was difficult to make this Nim's parser ?

  • don't know about Nim
    • posted a question on the forum
  • comparison with Golang
    • don't have 'Interface'
    • reffered to the code of Rust
  • read the Nim's parser code
    • use enum well

Impression

  • complexity
    • easy to write using a functional languages
    • too hard to make this slide
  • next, make a evaluator
    • the book is still harf