#!/usr/bin/env python # coding: utf-8 # # Writing An Parser In Nim # # - 2019/2/6 # # Table of Contents # # 1. Overview # 2. What is Parser? # 4. Demo # 3. What is Pratt Parser? # 5. Where it was difficult to make this Nim's parser ? # 6. Introduction to Generator # 7. Impressions # # Overview # - 「Goで作るインタプリタ」 # - Chapter 2 # - Continuation of Lexer # ![](https://www.oreilly.co.jp/books/images/picture_large978-4-87311-822-2.jpeg) # # What is Parser? # # - Make AST from Token # - Abstract Syntax Tree of program # - Leave out whitespace and parentheses and end of line semicolon # ## Example # # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/inputLetStatementCode.png?raw=true) # # - `let = ;` # ## "expression" and "statement" and .. # # ### Expression # # - generate a value # - ex. # - `42` # - `hoge` # # # ### Statement # # - do not generate a value # - ex. # - `let hoge = 42` # - `return 42` # - `if..else` # - `for..` # # ### Expression Statement # # - statement consisting of only one expression # - ex. # - `x + 42` # ## Token # # - output of lexer # # ``` # [Type = "LET", Literal = "let"] # [Type = "IDENT", Literal = "hoge"] # [Type = "=", Literal = "="] # [Type = "INT", Literal = "1"] # [Type = "+", Literal = "+"] # [Type = "INT", Literal = "2"] # [Type = "*", Literal = "*"] # [Type = "INT", Literal = "3"] # [Type = "/", Literal = "/"] # [Type = "INT", Literal = "4"] # [Type = "+", Literal = "+"] # [Type = "INT", Literal = "5"] # [Type = "-", Literal = "-"] # [Type = "INT", Literal = "6"] # [Type = ";", Literal = ";"] # ``` # ## AST Object # # - output of Parser # # # ``` # Token: { # Type: 'LET', # Literal: 'let' # }, # Let: { # Token: { # Type: 'IDENT', # Literal: 'hoge' # }, # IdentValue: 'hoge' # }, # LetValue: { # Left: { # Left: { # Token: { # Type: 'INT', # Literal: '1' # }, # IntValue: 1 # }, # Operator: '+', # Right: { # Left: { # Left: { # Token: { # Type: 'INT', # Literal: '2' # }, # IntValue: 2 # }, # Operator: '*', # Right: { # Token: { # Type: 'INT', # Literal: '3' # }, # IntValue: 3 # } # }, # Operator: '/', # Right: { # Token: { # Type: 'INT', # Literal: '4' # }, # IntValue: 4 # } # } # }, # Operator: '+', # InRight: { # Left: { # Token: { # Type: 'INT', # Literal: '5' # }, # IntValue: 5 # }, # Operator: '*', # Right: { # Token: { # Type: 'INT', # Literal: '6' # }, # IntValue: 6 # } # } # } # # ``` # ## AST # # - AST image # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/ast.png?raw=true) # # Demo # # `$ nim c -r src/repl/repl.nim` # # What is Pratt Parser ? # ## Pratt Parser # # - Vaughan Pratt, 1973 # - he said in this thesis about this parser # - very simple to understand # - trivial to implement # - easy to use # - extremely efficient in practice if not in theory # - use when make parser without parser generator # # # paper: [link](https://tdop.github.io/) # explanation with js: [link](http://crockford.com/javascript/tdop/tdop.html)←he made JSLint # ## Types of parsers # # - Top-down parsing # - recurcive-descent parser ← this! # - LL parser # - packrat parser # - Bottom-up parsing # - Earley parser # - CYK parser # - LR parser # - Simple LR parser # - LALR parser # - CLR parser # - GLR parser # # # # ref: [Parsing - Wikipedia](https://en.wikipedia.org/wiki/Parsing) # ## How It Works # ### Example # # - input: `2 * 3 / 4` # - output: AST # # # #### AST # # - a part of the AST just before # - the red part of the figure below # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/ast2.png?raw=true) # # # # #### output object # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/struct.png?raw=true) # ### make "2 * 3" # #### 1. Whether "Statement" or "Expression Statement" # # - determine the type of expression # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/1.png?raw=true) # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseStatementCode-2.png?raw=true) # # # #### 2. save "2" in the object # # # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/2.png?raw=true) # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseExpressionCode-INT.png?raw=true) # # #### 3. Enter the while loop # # - Enter "while" # - the conditional formulas are described later # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/3.png?raw=true) # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseExpressionCode-ASTERISC.png?raw=true) # # #### 4. save "*" in the object # # - Call parseExpression recursively! # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/4.png?raw=true) # # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseInfixExpression-2.png?raw=true) # #### 5. parseExpression again # # - not enter "while" # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/5.png?raw=true) # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseExpressionCode-INT-2.png?raw=true) # | # #### 6. return to the first parseExpression # # - Since the value of parseInfixExpression is returned, it returns to the top of while # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/6.png?raw=true) # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseExpressionCode-while.png?raw=true) # # ## Determine precedence of Tokens # # - priority gets larger as going down # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/PrecedenceCode-2.png?raw=true) # # # ``` # Sum < Product # true # ``` # # # ### while # # - conditions # - `precedence < self.peekPrecedence()` # - `not self.peekTokenIs(SEMICOLON)` # # - ex. # - "1 + 2 * 3" # - Sum < Product # - → enter while loop.. # # # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/tree.png?raw=true) # ### make "/ 4" # #### 7. in while # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseExpressionCode-SLASH.png?raw=true) # # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseInfixExpression-Operator.png?raw=true) # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/7.png?raw=true) # # # # #### 8. parseInfixExpression again again # # - save "4" to object # - don't enter the while loop # # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseInfixExpression-right.png?raw=true) # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/8.png?raw=true) # # #### 9. completion # # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/9.png?raw=true) # # ![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseInfixExpression-last.png?raw=true) # # Where it was difficult to make this Nim's parser ? # - don't know about Nim # - posted a question on the forum # - comparison with Golang # - don't have 'Interface' # - reffered to the code of Rust # - read the Nim's parser code # - use enum well # # Impression # # - complexity # - easy to write using a functional languages # - too hard to make this slide # - next, make a evaluator # - the book is still harf #