Notebook

Peter Norvig, Feb 2017

BASIC Interpreter¶

Years ago, I showed how to write an Interpreter for a dialect of Lisp. Some readers appreciated it, and some asked about an interpreter for a language that isn't just a bunch of parentheses. In 2014 I saw a celebration of the 50th anniversary of the 1964 Dartmouth BASIC interpreter, and I thought that I could show how to implement such an interpreter. I never quite finished in 2014, but now it is 2017, I rediscovered this unfinished file, and completed it. For those of you unfamiliar with BASIC, here is a sample program:

In [1]:

program = '''
REM POWER TABLE
DATA 8, 4
READ N0, P0
PRINT "N",
FOR P = 2 to P0
 PRINT "N^" P,
NEXT P
PRINT "SUM"
LET S = 0
FOR N = 2 TO N0
 PRINT N,
 FOR P = 2 TO P0
   LET S = S + N ^ P
   PRINT N ^ P,
 NEXT P
 PRINT S
NEXT N
END
'''

Of course I don't have to build everything from scratch in assembly language, and I don't have to worry about every byte of storage, like Kemeny, Gates, Allison and Woz did, so my job is much easier. The interpreter consists of three phases:

Tokenization: breaking a text into a list of tokens, for example: "10 READ N" becomes ['10', 'READ', 'N'].
Parsing: building an internal representation from the tokens: Stmt(num=10, typ='READ', args=['N']).
Execution: do what each statement says; in this case the READ statement

assigns the next DATA element to N.

Before covering the three phases, some imports:

In [2]:

import re 
from typing import *
from collections import namedtuple, defaultdict, deque

Tokenization¶

One way to break a line of text into tokens is to define a regular expression that matches every legal token, and use the findall method to return a list of tokens. I'll use the re.VERBOSE option so that I can add whitespace and comments to make the regular expression more readble. You can refer to regular expression syntax if needed.

In [3]:

tokenize = re.compile(r'''
      \d* \.? \d+ (?: E -? \d+)?                      # number 
    | SIN|COS|TAN|ATN|EXP|ABS|LOG|SQR|RND|INT|FN[A-Z] # functions
    | LET|READ|DATA|PRINT|GOTO|IF|FOR|NEXT|END        # statement types
    | DEF|GOSUB|RETURN|DIM|REM|STOP                   # statement types
    | TO|THEN|STEP                                    # keywords
    | [A-Z]\d?                                        # variable names (letter + optional digit)
    | ".*?"                                           # label strings (in double quotes)
    | <> | >= | <=                                    # multi-character relational operators
    | \S                                              # non-space single character (operators and variables) ''',
                      re.VERBOSE).findall

In this version of BASIC, variable names must be either a single letter, or a letter followed by a digit, and user-defined function names must be "FN" followed by a single letter. The only complicated part is the syntax for numbers: optional digits followed by an optional decimal point, some digits, and optionally a power of 10 marked by "E" and followed by an (optional) minus sign and some digits. (Note: The regular expression syntax "(?: ... )?" means that all of "..." is optional, but is not the group that the findall method captures.)

Example usage of tokenize:

In [4]:

tokenize('10 READ N')

Out[4]:

['10', 'READ', 'N']

In [5]:

tokenize('100 PRINT "SIN(X)^2 = ", SIN(X) ^ 2')

Out[5]:

['100', 'PRINT', '"SIN(X)^2 = "', ',', 'SIN', '(', 'X', ')', '^', '2']

That looks good. Note that my tokens are just strings; it will be the parser's job, not the tokenizer's, to recognize that '2' is a number and 'X' is the name of a variable. (In some interpreters, the tokenizer makes distinctions like these.)

There's one important complication: the manual says that "spaces have no significance in BASIC (except in messages to be printed...)" Thus, the following should all be equivalent:

10 GOTO 99
10 GO TO 99
10GOTO99
1 0 GOT O9 9

The problem is that tokenize gets some of these wrong:

In [6]:

tokenize('10 GO TO 99')

Out[6]:

['10', 'G', 'O', 'TO', '99']

In [7]:

tokenize('1 0 GOT O9 9')

Out[7]:

['1', '0', 'G', 'O', 'T', 'O9', '9']

My first thought was to remove all white space from the input. But that would change the label string "HELLO WORLD" to "HELLOWORLD", which is wrong. To remove spaces everywhere except between double quotes, I can tokenize the line, then join the tokens back together without spaces, then re-tokenize to get the final list of tokens. I do that in my new function below called tokenizer.

In [8]:

def tokenizer(line: str) -> List[str]:
    "Return a list of the tokens on this line, handling spaces properly, and upper-casing."
    line = ''.join(tokenize(line)) # Remove whitespace
    return tokenize(line.upper())

Peeking at and Popping Tokens¶

I will store my list of tokens in the global variable TOKENS and access them through this interface:

peek(): return the next token in TOKENS (without changing TOKENS), or None if there are no more tokens.
pop(): remove and return the next token in TOKENS.
pop("string"): remove and return the next token if it is equal to the string; else return None and leave TOKENS unchanged.
pop(predicate): remove and return the next token if predicate(token) is true; else return None and leave TOKENS unchanged.

In [9]:

Token = str # Type: a Token is a string

TOKENS = [] # Global variable to hold a list of TOKENS

def peek() -> Optional[Token]: 
    "Return the first token in the global `TOKENS`, or None if we are at the end of the line."
    return (TOKENS[0] if TOKENS else None)

def pop(constraint=None) -> Optional[Token]:
    """Remove and return the first token in `TOKENS`, or return None if token fails constraint.
    constraint can be None, a literal (e.g. pop('=')), or a predicate (e.g. pop(is_varname))."""
    token = peek()
    if constraint is None or (token == constraint) or (callable(constraint) and constraint(token)):
        return TOKENS.pop(0)

(Note: if I expected program lines to contain many tokens, I would use a deque instead of a list of tokens.)

Testing the Tokenizer¶

We can test tokenizer and the related functions:

In [10]:

def lines(text: str) -> List[str]: 
    "A list of the non-empty lines in a text."
    return [line for line in text.splitlines() if line]
    
def test_tokenizer() -> bool:
    global TOKENS
    assert tokenizer('X-1') == ['X', '-', '1'] 
    assert tokenizer('-2') == ['-', '2']
    assert tokenizer('PRINT "HELLO WORLD"') == ['PRINT', '"HELLO WORLD"']
    assert (tokenizer('10 GOTO 99') == tokenizer('10 GO TO 99') == tokenizer('10GOTO99')  == 
            tokenizer('1 0 GOT O9 9') == ['10', 'GOTO', '99'])
    assert (tokenizer('100 PRINT "HELLO WORLD", SIN(X) ^ 2') == 
            ['100', 'PRINT', '"HELLO WORLD"', ',', 'SIN', '(', 'X', ')', '^', '2'])
    assert (tokenizer('100IFX1+123.4+E1-12.3E4 <> 1.2E-34*-12E34+1+"HI" THEN99') ==
            ['100', 'IF', 'X1', '+', '123.4', '+', 'E1', '-', '12.3E4', '<>', 
             '1.2E-34', '*', '-', '12E34', '+', '1', '+', '"HI"', 'THEN', '99'])
    assert lines('one line') == ['one line']
    assert lines('''two
    lines''') == ['two', '    lines']
    assert lines(program) == [
     '10 REM POWER TABLE',
     '11 DATA 8, 4',
     '15 READ N0, P0',
     '20 PRINT "N",',
     '25 FOR P = 2 to P0',
     '30   PRINT "N^" P,',
     '35 NEXT P',
     '40 PRINT "SUM"',
     '45 LET S = 0',
     '50 FOR N = 2 TO N0',
     '55   PRINT N,',
     '60   FOR P = 2 TO P0',
     '65     LET S = S + N ^ P',
     '70     PRINT N ^ P,',
     '75   NEXT P',
     '80   PRINT S',
     '85 NEXT N',
     '99 END']
    assert [tokenizer(line) for line in lines(program)] == [
     ['10', 'REM', 'P', 'O', 'W', 'E', 'R', 'T', 'A', 'B', 'L', 'E'],
     ['11', 'DATA', '8', ',', '4'],
     ['15', 'READ', 'N0', ',', 'P0'],
     ['20', 'PRINT', '"N"', ','],
     ['25', 'FOR', 'P', '=', '2', 'TO', 'P0'],
     ['30', 'PRINT', '"N^"', 'P', ','],
     ['35', 'NEXT', 'P'],
     ['40', 'PRINT', '"SUM"'],
     ['45', 'LET', 'S', '=', '0'],
     ['50', 'FOR', 'N', '=', '2', 'TO', 'N0'],
     ['55', 'PRINT', 'N', ','],
     ['60', 'FOR', 'P', '=', '2', 'TO', 'P0'],
     ['65', 'LET', 'S', '=', 'S', '+', 'N', '^', 'P'],
     ['70', 'PRINT', 'N', '^', 'P', ','],
     ['75', 'NEXT', 'P'],
     ['80', 'PRINT', 'S'],
     ['85', 'NEXT', 'N'],
     ['99', 'END']]

    TOKENS = tokenizer('10 GO TO 99') 
    assert peek() == '10'
    assert pop()  == '10'
    assert peek() == 'GOTO'
    assert pop()  == 'GOTO'
    assert peek() == '99'
    assert pop(str.isalpha) == None    # '99' is not alphabetic
    assert pop('98.6') == None         # '99' is not '98.6'
    assert peek() == '99'
    assert pop(str.isnumeric)  == '99' # '99' is numeric
    assert peek() is None and not TOKENS 
    
    return True
    
test_tokenizer()

Out[10]:

True

Parsing: Grammar of Statements¶

Parsing is the process of transforming the text of a program into an internal representation, which can then be executed. For BASIC, the representation will be an ordered list of statements, and we'll need various data types to represent the parts of the statements. I'll start by showing the grammar of BASIC statements, as seen in the appendix on pages 56-57 of the manual (see also pages 26-30 for a simpler introduction). A statement starts with a line number, and then can be one of the following 15 types of statements, each type introduced by a distinct keyword and followes by components that are specific to the type:

LET <variable> = <expression>
READ <list of variable>
DATA <list of number>
PRINT <sequence of label and expression>
GOTO <linenumber>
IF <expression> <relational> <expression> THEN <linenumber>
FOR <varname> = <expression> TO <expression> [STEP <expression> ]
NEXT <varname>
END
STOP
DEF <funcname> ( <varname> ) = <expression>
GOSUB <linenumber>
RETURN
DIM <list of variable>
REM <any string of characters whatsoever>

The notation <variable> means any variable and <list of variable> means zero or more variables, separated by commas. [STEP <expression> ] means that the literal string "STEP" followed by an expression is optional. Bold characters must be matched exactly.

Rather than use one of the many language parsing frameworks, I will show how to build a parser from scratch. First I'll translate the grammar above into Python. Not character-for-character (because it would take a lot of work to get Python to understand how to handle those characters), but almost word-for-word (because I can envision a straightforward way to get Python to handle the following format):

In [11]:

def Grammar() -> Dict[Token, list]: 
    return {
    'LET':    [variable, '=', expression],
    'READ':   [list_of(variable)],
    'DATA':   [list_of(number)],
    'PRINT':  [labels_and_expressions],
    'GOTO':   [linenumber],
    'IF':     [expression, relational, expression, 'THEN', linenumber],
    'FOR':    [varname, '=', expression, 'TO', expression, step],
    'NEXT':   [varname],
    'END':    [],
    'STOP':   [],
    'DEF':    [funcname, '(', varname, ')', '=', expression],
    'GOSUB':  [linenumber],
    'RETURN': [],
    'DIM':    [list_of(variable)], 
    'REM':    [anycharacters],
    }

Parsing Strategy¶

A program is parsed into a list of statements, each statement an object of type Stmt. A statement contains various components: a line number, a statement type, and various parts such as numbers, variable names, subscripted variable references, user-defined functions, function calls, operation calls, and labels.

The grammar of BASIC is designed so that at every point, the next token tells us unambiguously how to parse. For example, the first token after the line number defines the type of statement; also, in an expression we know that all three-letter names are functions while all 1-letter names are variables. So the next token will always determine what to parse next, and we will never have to back up and undo a pop(). Here is my strategy for parsing statements:

The grammatical categories, like variable and expression (and also statement), will be defined as functions

(with no argument) that pop tokens from the global variable TOKENS, and return a data object. For example, calling linenumber() will pop a token, convert it to an int, and return that.

As an example, consider parsing the statement "20 LET X = X + 1".
First tokenize to get: TOKENS = ['20', 'LET', 'X', '=', 'X', '+', '1'].
Then call statement() (defined below).
- statement first calls linenumber(), getting back the integer 20 (and removing '20' from TOKENS).
- Then it calls pop() to get 'LET' (removing 'LET' from TOKENS).
- Then it indexes into the grammar with 'LET', retrieving the grammar rule [variable, '=', expression].
- Then it processes the 3 constituents of the grammar rule:
  - First, call variable(), which removes and returns 'X'.
  - Second, call pop('='), which removes '=' from TOKENS, and discard it.
  - Third, call expression(), which returns a representation of X + 1; we write that as Opcall('X', '+', 1.0).
- Finally, statement assembles the pieces and returns Stmt(num=20, typ='LET', args=['X', Opcall('X', '+', 1.0)]).
If anything goes wrong, call fail, which raises an error.

Here is the definition of the Stmt type and the statement grammatical category function:

In [12]:

Stmt = namedtuple('Stmt', 'num, typ, args') # E.g. '10 GOTO 999' => Stmt(10, 'GOTO', 999)

def statement() -> Stmt:
    "Parse a BASIC statement from the global variable `TOKENS`."
    num  = linenumber()
    typ  = pop(is_stmt_type) or fail('unknown statement type')
    args = []
    for p in grammar[typ]: # For each part of rule, call if callable or match if literal string
        if callable(p):
            args.append(p())
        else:
            pop(p) or fail('expected ' + repr(p))
    return Stmt(num, typ, args)

Many of the grammatical categories are easy one-liners:

In [13]:

def number():        return (-number() if pop('-') else float(pop())) # Optional minus sign
def step():          return (expression() if pop('STEP') else 1)     # 1 is the default step
def linenumber():    return int(pop(is_number) or fail('missing line number'))
def relational():    return pop(is_relational) or fail('expected a relational operator')
def varname():       return pop(is_varname)    or fail('expected a variable name')
def funcname():      return pop(is_funcname)   or fail('expected a function name')
def anycharacters(): TOKENS.clear() # Ignore tokens in a REM statement

Here are the predicates that distinguish different types of tokens:

In [14]:

def is_stmt_type(x)  -> bool: return isa(x, Token) and x in grammar # LET, READ, ...
def is_funcname(x)   -> bool: return isa(x, Token) and len(x) == 3 and x.isalpha()  # SIN, COS, FNA, FNB, ...
def is_varname(x)    -> bool: return isa(x, Token) and len(x) in (1, 2) and x[0].isalpha() # A, A1, A2, B, ...
def is_label(x)      -> bool: return isa(x, Token) and x.startswith('"') # "HELLO WORLD", ...
def is_relational(x) -> bool: return isa(x, Token) and x in ('<', '=', '>', '<=', '<>', '>=')
def is_number(x)     -> bool: return isa(x, Token) and x and x[0] in '.0123456789'

isa = isinstance

Note that varname is what the manual calls "unsubscripted variable name" (a letter by itself, like X, or followed by a digit, like X3), and that variable is a varname optionally followed by one or two subscript expressions in parentheses, like A(I) or M(2*I, J):

In [15]:

Varname   = str # Type for a variable name such as `V` or `X3`
Subscript = namedtuple('Subscript', 'var, indexes') # E.g. 'A(2, 3)' => Subscript('A', [2, 3])
Variable  = Union[Varname, Subscript]

def variable() -> Variable: 
    "Parse a possibly subscripted variable e.g. 'X3' or 'A(I)' or 'M(I, J)'."
    V = varname()
    if pop('('):
        indexes = list_of(expression)()
        pop(')') or fail('expected ")" to close subscript')
        return Subscript(V, indexes) # E.g. 'A(I)' => Subscript('A', ['I'])
    else: 
        return V                     # E.g. 'X3' 

list_of is tricky because it works at two different times. When I write list_of(number) in the grammar, this returns an object of class list_of. When this object is called (just as other grammatical categories like variable are called), the effect is that it will parse a list of number. That list can be empty (if there are no more tokens on the line), or can be a single number, or can be several numbers separated by comma tokens. I could have defined list_of as a function that returns a function, but I thought that it would be clearer to define it as a class, so I can clearly separate what happens at the two different times: first the __init__ method determines what category to parse, and later the __call__ method does the actual parsing.

In [16]:

class list_of:
    "list_of(category) is a callable that parses a comma-separated list of <category>"
    def __init__(self, category): self.category = category
    def __call__(self) -> list:
        result = ([self.category()] if TOKENS else [])
        while pop(','):
            result.append(self.category())
        return result

Parsing: Top Level `parse`, and Handling Errors¶

Most of the parsing action happens inside the function statement(), but at the very top level, parse(program) takes a program text (that is, a string), and parses each line by calling parse_line, sorting the resulting list of lines by line number:

In [17]:

def parse(program: str) -> List[Stmt]: return sorted(map(parse_line, lines(program)))

To handle syntactic errors, I add to parse_line a try/except that catches exceptions raised by calls to fail. I acknowledge the interpreter isn't very thorough about handling all errors, and the error messages could be more helpful.

In [18]:

def parse_line(line: str) -> Stmt:
    "Return a Stmt(linenumber, statement_type, arguments)."
    global TOKENS
    TOKENS = tokenizer(line)
    try:
        stmt = statement()
        if TOKENS: 
            fail('extra TOKENS at end of line')
        return stmt
    except SyntaxError as err:
        error = f"Error in line '{line}' at '{' '.join(TOKENS)}': {err}"
        print(error)
        return Stmt(0, 'REM', [error]) # Return dummy statement
        
def fail(message): raise SyntaxError(message)

Parsing: Grammar of `PRINT` Statements¶

On page 26 of the manual, it appears that the grammar rule for PRINT should be [list_of(expression)]. But in section 3.1, More about PRINT, some complications are introduced:

Labels (strings enclosed in double quotes) are allowed, as well as expressions.
The "," is not a separator. A line can end with ",".
Optionally, ";" can be used instead of ",".
Optionally, the "," or ";" after a label can be omitted—we can have a label immediately followed by an expression.

The effect of a comma is to advance the output to the next column that is a multiple of 15 (and to a new line if this goes past column 100). The effect of a semicolon is similar, but works in multiples of 3, not 15. (Note that column numbering starts at 0, not 1.) Normally, at the end of a PRINT statement we advance to a new line, but this is not done if the statement ends in "," or ";". Here are some examples:

10 PRINT X, Y

Prints the value of X in column 0 and Y in column 15. Advances to new line.

20 PRINT "X =", X

Prints the string "X =" in column 0 and the value of X in column 15. Advances to new line.

30 PRINT "X = " X

Prints the string "X =" in column 0 and the value of X immediately after. Advances to new line.

40 PRINT X; Y,

Prints the value of X in column 0, and the value of Y in the column that is the first available multiple of 3 after that. For example, if X is '0', then print Y in column 3, but if X is '12345', then we've gone past column 3, so print Y in column 6. Then, because the statement ends in a comma, advance to the next column that is a multiple of 15, but not to a new line.

That explanation was long, but the implementation is short (at least for the parsing part; later we will see the execution part):

In [19]:

def labels_and_expressions() -> list:
    "Parse a sequence of label | comma | semicolon | expression (for PRINT statement)."
    result = []
    while TOKENS:
        result.append(pop(is_label) or pop(',') or pop(';') or expression())
    return result

Parsing: Grammar of Expressions¶

Now for the single most complicated part of the grammar: the expression. The biggest complication is operator precedence: the string "A + B * X + C" should be parsed as if it were "A + (B * X) + C", and not as "(A + B) * (X + C)," because multiplication binds more tightly than addition. There are many schemes for parsing expressions, I'll use an approach attributed to Keith Clarke called Compact Recursive-Descent Parsing.

Like all our grammatical categories, calling expression() pops off some tokens and returns a data object. The first thing it does is parse one of five types of simple "primary" expressions: a number like 1.23; a possibly-subscripted variable like X or A(I); a function call like SIN(X); a unary negation like -X; or a parenthesied expression like (X + 1).

Next, expression looks for infix operators. To parse 'X + 1' as an expression, first primary() would parse 'X', then it would pop off the '+' operator, then a recursive call to expression() would parse 1, and the results can then be combined into an Opcall('X', '+', 1). If there are multiple infix operators, they can all be handled, as in 'X+1+Y+2', which gets parsed as Opcall(Opcall(Opcall('X', '+', 1), '+', 'Y'), '+', 2).

When there are multiple infix operators of different precedence, as in "A + B * X + C", the trick is to know which operators are parsed by the top-level call to expression, and which by recursive calls. When we first call expression(), the optional parameter prec gets the default value, 1, which is the precedence of addition and subtraction. When expression makes a recursive call, it passes the precedence of the current operator, and we only parse off operator/expression pairs at an equal or higher precedence. So, in parsing "A + B * X + C", when we pop off the '*' operator (which has precedence 2), we then recursively call expression(2), which parses off an expression containing operators of precedence 2 or higher, which means the recursive call will parse X, but it won't pop off the '+', because that is at a lower precedence. So we correctly get the structure "(A + ((B * X) + C)".

The function associativity ensures that the operator '^' is right associative, meaning 10^2^3 = (10^(2^3)), whereas the other operators are left associative, so 10-2-3 = ((10-2)-3).

Here is the implementation of expression:

In [20]:

Opcall  = namedtuple('Opcall', 'x, op, y') # E.g., 'X + 1' => Opcall('X', '+', 1)

def expression(prec=1) -> object: 
    "Parse an expression: a primary and any [op expression]* pairs with precedence(op) >= prec."
    exp = primary()                         # 'A' => 'A'
    while precedence(peek()) >= prec:
        op = pop()
        rhs = expression(precedence(op) + associativity(op))
        exp = Opcall(exp, op, rhs)          # 'A + B' => Opcall('A', '+', 'B')
    return exp

def primary() -> object:
    "Parse a primary expression (no infix op except maybe within parens)."
    if is_number(peek()):                   # '1.23' => 1.23 
        return number()
    elif is_varname(peek()):                # X or A(I) or M(I+1, J)
        return variable()
    elif is_funcname(peek()):               # SIN(X) => Funcall('SIN', 'X')
        return Funcall(pop(), primary())
    elif pop('-'):                          # '-X' => Funcall('NEG', 'X')
        return Funcall('NEG', primary())
    elif pop('('):                          # '(X)' => 'X'
        exp = expression()
        pop(')') or fail('expected ")" to end expression')
        return exp
    else:
        return fail('unknown expression')

def precedence(op) -> int: 
    return (3 if op == '^' else 2 if op in ('*', '/', '%') else 1 if op in ('+', '-') else 0)

def associativity(op) -> int: 
    return (0 if op == '^' else 1)

Parsing: Function Definitions and Calls¶

The class Function is used to represent the definition of a user defined function. When the user writes "DEF FNC(X) = X ^ 3", we specify the function object with Function(parm='X', body=Opcall('X', '^', 3)), and whenever the program calls, say, FNC(2) in an expression, the call assigns 2 to the global variable X and computes 8 as the value to return. (In modern languages, X would be a new local variable, but BASIC has no local variables.) Note the technique of making Function be a subclass of a namedtuple; we are then free to add the __call__ method to the subclass.

Note the distinction that Function is used to represent a user-defined function such as FNC, but Funcall is used to represent a function call such as FNC(2).

In [21]:

Funcall = namedtuple('Funcall', 'function, arg') # E.g., 'SIN(X)' => Funcall('SIN', 'X')

class Function(namedtuple('_', 'parm, body')):
    "User-defined function; 'DEF FNC(X) = X ^ 3' => Function('X', Opcall('X', '^', 3))"
    def __call__(self, value):                           
        variables[self.parm] = value # Global assignment to the parameter
        return evaluate(self.body)

Parsing: The Complete Parser in Action¶

I've now written all the grammatical categories, so I can now safely instantiate the global variable grammar by calling Grammar(), and parse a program:

In [22]:

grammar = Grammar()

parse(program)

Out[22]:

[Stmt(num=10, typ='REM', args=[None]),
 Stmt(num=11, typ='DATA', args=[[8.0, 4.0]]),
 Stmt(num=15, typ='READ', args=[['N0', 'P0']]),
 Stmt(num=20, typ='PRINT', args=[['"N"', ',']]),
 Stmt(num=25, typ='FOR', args=['P', 2.0, 'P0', 1]),
 Stmt(num=30, typ='PRINT', args=[['"N^"', 'P', ',']]),
 Stmt(num=35, typ='NEXT', args=['P']),
 Stmt(num=40, typ='PRINT', args=[['"SUM"']]),
 Stmt(num=45, typ='LET', args=['S', 0.0]),
 Stmt(num=50, typ='FOR', args=['N', 2.0, 'N0', 1]),
 Stmt(num=55, typ='PRINT', args=[['N', ',']]),
 Stmt(num=60, typ='FOR', args=['P', 2.0, 'P0', 1]),
 Stmt(num=65, typ='LET', args=['S', Opcall(x='S', op='+', y=Opcall(x='N', op='^', y='P'))]),
 Stmt(num=70, typ='PRINT', args=[[Opcall(x='N', op='^', y='P'), ',']]),
 Stmt(num=75, typ='NEXT', args=['P']),
 Stmt(num=80, typ='PRINT', args=[['S']]),
 Stmt(num=85, typ='NEXT', args=['N']),
 Stmt(num=99, typ='END', args=[])]

You might think that the statement "11 DATA 8, 4" would be parsed into a Stmt with args=[8, 4]. It has two arguments, right? But according to the grammar, a DATA statement actually has only one argument, which is a list of numbers. That's why we get args=[[8.0, 4.0]].

Here are some more tests:

In [23]:

def test_exp(text: str, repr) -> bool:
    "Test that text can be parsed as an expression to yield repr, with no TOKENS left over."
    global TOKENS
    TOKENS = tokenizer(text)
    return (expression() == repr) and not TOKENS
    
def test_parser() -> bool:
    "Tests of `expression` and other category functions."
    assert is_funcname('SIN') and is_funcname('FNZ') # Function names are three letters
    assert not is_funcname('X') and not is_funcname('') and not is_funcname('FN9')
    assert is_varname('X') and is_varname('A2') # Variables names are one letter and an optional digit
    assert not is_varname('FNZ') and not is_varname('A10') and not is_varname('')
    assert is_relational('>') and is_relational('>=') and not is_relational('+')
    
    assert test_exp('A + B * X + C', Opcall(Opcall('A', '+', Opcall('B', '*', 'X')), '+', 'C'))
    assert test_exp('A + B + X + C', Opcall(Opcall(Opcall('A', '+', 'B'), '+', 'X'), '+', 'C'))
    assert test_exp('SIN(X)^2',      Opcall(Funcall('SIN', 'X'), '^', 2))
    assert test_exp('10 ^ 2 ^ 3',    Opcall(10, '^', Opcall(2, '^', 3))) # right associative
    assert test_exp('10 - 2 - 3',    Opcall(Opcall(10, '-', 2), '-', 3)) # left associative
    assert test_exp('A(I)+M(I, J)',  Opcall(Subscript(var='A', indexes=['I']), '+', 
                                            Subscript(var='M', indexes=['I', 'J'])))
    assert test_exp('X * -1',        Opcall('X', '*', Funcall('NEG', 1.0)))
    assert test_exp('X--Y--Z',       Opcall(Opcall('X', '-', Funcall('NEG', 'Y')), 
                                            '-', Funcall('NEG', 'Z')))
    assert test_exp('((((X))))',     'X')
    return True

test_parser()

Out[23]:

True

Execution¶

Now that we can parse programs, we're ready to execute them. I'll first define run to parse and execute a program:

In [24]:

def run(program: str) -> None: 
    """Parse and execute the BASIC program."""
    execute(parse(program))

The function execute(stmts) first defines the runtime state of the program, which is partially held in three global variables (they are global because they are needed within basic_print or evaluate, and because the very nature of BASIC is to have global variables):

column: The column that PRINT will print in next.
variables: A mapping of the values of all BASIC variables (both subscripted and unsubscripted).
For example, {'P1': 3.14, ('M', (1, 1)): 42.0} says that the value of P1 is 3.14 and M(1, 1) is 42.0.
functions: A mapping of the values of all BASIC functions (both built-in and user-defined).
For example, {'FNC': Function('X', Opcall('X', '^', 3)), 'SIN': math.sin}

Program state is also held in these variables that are local to execute:

data: a queue of all the numbers in the program's DATA statements.
pc: program counter; the index into the list of statements. Initially 0, the index of the first statement.
ret: the index where a RETURN statement will return to.
fors: a dict of {varname: ForState(...)} which gives the state of each FOR loop variable.
goto: a dict of {linenumber: index}, for example {10: 0, 20: 1} for a program with two line numbers, 10 and 20.

Running the program means executing the statement that the program counter (pc) is currently pointing at, repeatedly, until we either hit an END or STOP statement, attempt a READ when there is no more data, or fall off the end of the program.

The variable pc is initialized to 0 and is then incremented by 1 each cycle to go to the following statement; but a branching statement (GOTO, IF, GOSUB, or RETURN) can change the pc to something other than the following statement. Note that branching statements refer to line numbers, but the pc refers to the index number within the list of statements (as does the ForSTaate.continu attribute). The variable goto maps from line numbers to index numbers.

In BASIC there is no notion of a stack, neither for variables nor return addresses. If I do a GOSUB to a subroutine that itself does a GOSUB, then the original return address is lost, because BASIC has only one return address register (which we call ret).

The main body of execute checks the statement type, and takes appropriate action. All the statement types are straightforward, except for FOR and NEXT, which are explained in the next section, and PRINT which is complicated by the need to keep track of columns.

In [25]:

ForState  = namedtuple('ForState',  'continu, end, step') # Data for FOR loop 

def execute(stmts: List[Stmt]) -> None: 
    "Execute the statements in the BASIC program."
    global column, variables, functions, column
    column    = 0                  # column to PRINT in next
    variables = defaultdict(float) # mapping of {variable: value}, default 0.0
    functions = {**builtins,
                 **{name: Function(x, body) 
                    for  (_, _, (name, x, body)) in statements_of_type('DEF', stmts)}}
        
    data      = deque(append(s.args[0] for s in statements_of_type('DATA', stmts)))
    pc        = 0                  # program counter
    ret       = 0                  # index (pc) that a GOSUB returns to
    fors      = {}                 # runtime map of {varname: ForState(...)}
    goto      = {stmt.num: i for (i, stmt) in enumerate(stmts)} # {line_number: index}
    while pc < len(stmts):
        (_, typ, args) = stmts[pc] # Fetch and decode the current statement
        pc += 1                    # Increment the program counter
        if typ in ('END', 'STOP') or (typ == 'READ' and not data): 
            return
        elif typ == 'LET':
            V, exp = args
            let(V, evaluate(exp))
        elif typ == 'READ':
            for V in args[0]:
                let(V, data.popleft())
        elif typ == 'PRINT':
            basic_print(args[0])
        elif typ == 'GOTO':
            pc = goto[args[0]]
        elif typ == 'IF':
            lhs, relational, rhs, dest = args
            if functions[relational](evaluate(lhs), evaluate(rhs)):
                pc = goto[dest]
        elif typ == 'GOSUB':
            ret = pc
            pc  = goto[args[0]]
        elif typ == 'RETURN':
            pc = ret
        elif typ == 'FOR':
            V, start, end, step = args
            variables[V] = evaluate(start)
            fors[V] = ForState(pc, evaluate(end), evaluate(step))
        elif typ == 'NEXT':
            V = args[0]
            continu, end, step = fors[V]
            if ((step >= 0 and variables[V] + step <= end) or
                (step <  0 and variables[V] + step >= end)):
                variables[V] += step
                pc = continu
        else:
            assert typ in ('REM', 'DIM', 'DATA', 'DEF') # Ignore these at runtime

Here are the functions referenced by execute:

In [26]:

import math
import random
import operator as op

builtins = {  # A mapping of {name: function}
    'SIN': math.sin, 'COS': math.cos, 'TAN': math.tan, 'ATN': math.atan, 
    'ABS': abs, 'EXP': math.exp, 'LOG': math.log, 'SQR': math.sqrt, 'INT': int,
    '>': op.gt, '<': op.lt, '=': op.eq, '>=': op.ge, '<=': op.le, '<>': op.ne, 
    '^': pow, '+': op.add, '-': op.sub, '*': op.mul, '/': op.truediv, '%': op.mod,
    'RND': lambda _: random.random(), 'NEG': op.neg}

def statements_of_type(typ, statements: List[Stmt]) -> Iterable[Stmt]:
    """All statements of the given type."""
    return (stmt for stmt in statements if stmt.typ == typ)

def append(list_of_lists) -> list:
    """Append together a list of lists."""
    return sum(list_of_lists, [])

def evaluate(exp) -> float:
    "Evaluate an expression, returning a number."
    if isa(exp, Opcall):
        return functions[exp.op](evaluate(exp.x), evaluate(exp.y))
    elif isa(exp, Funcall):
        return functions[exp.function](evaluate(exp.arg))
    elif isa(exp, Subscript):
        return variables[(exp.var, *(evaluate(x) for x in  exp.indexes))]
    elif isa(exp, Varname):
        return variables[exp]
    else: # Must be a number, which evaluates to itself
        return exp
        
def let(V: Union[Variable, Subscript], value: float):
    "Assign value to the variable name or Subscripted variable."
    if isa(V, Subscript): # A subsscripted variable
        variables[(V.var, *(evaluate(x) for x in V.indexes))] = value 
    else:                        # An unsubscripted variable
        variables[V] = value

Execution: `FOR/NEXT` Statements¶

Some aspects of FOR loops are unclear. I have questions!

Are the END and STEP expressions evaluated once when we first enter the FOR loop, or each time through the loop?
- I choose: only once. (Like Python, but unlike C.)
After executing "FOR V = 1 TO 10", is the value of V equal to 10 or 11?
- Upon further review, the manual says 10.
Does "FOR V = 2 TO 1" execute zero times? Or do all loops execute at least once, with the termination test done by the NEXT?
- All loops execute at least once. (Online BASIC interpreters agree.)
What if there are two different FOR V loops, and a GOTO branches from the middle of one to the middle of the other?
- BASIC doesn't really have a notion of a scope for a loop, there is just a sequence of statements, all in the same scope, and all variables are global to the program. Any NEXT V statement always branches to the most recently executed FOR V, regardless of the layout of the code. (Online BASIC interpreters agree.)
What if we execute a NEXT V statement before ever executing the FOR V statement?
- This raises an error. (I asked Dennis Allison, the developer of Tiny Basic, and he said this case is undefined, so whatever I choose is okay.)

Consider this program:

 0 PRINT "TABLE OF SQUARES"
10 LET N = 10
20 FOR V = 1 to N STEP N/5
30   PRINT V, V * V
40 NEXT V
50 END

When control hits the "FOR V" statement in line 20, I make these assignments in the interpreter:

 variables['V'] = 1
 fors['V'] = ForState(continu=3, end=10, step=2)

where 3 is the index of the line right after the FOR statement; 10 is the value of N; and 2 is the value of N/5. A ForState is an object that is used during the dynamic execution of the program (one per FOR loop variable), but does not occur in the static representation of the program.

When control hits the "NEXT V" statement, do the following:

Retrieve fors['V']
Check if V incremented by the step value, 2, is within the bounds defined by the end, 10.
If it is, increment V and assign the program counter pc to be 3, the continu line value.
If not, continue on to the following statement, line 50.

My implementation might not be officially correct in every detail, but it correctly runs all the examples in the manual.

Execution: Printing¶

We showed how to parse a PRINT statement with labels_and_expressions(); now it is time to execute a PRINT statement, printing each of the labels and expressions, and keeping track of what column to print at next, using the global variable column.

In [27]:

def basic_print(items) -> None: 
    """Print the items (',' | ';' | "label" | expression) in appropriate columns."""
    for item in items:
        if item == ',':      pad(15)
        elif item == ';':    pad(3)
        elif is_label(item): print_string(item.replace('"', ''))
        else:                print_string(f"{evaluate(item):g} ")
    if (not items) or items[-1] not in (',', ';'):
        newline()
        
def print_string(s) -> None: 
    "Print a string, keeping track of column, and advancing to newline if at or beyond column 100."
    global column
    print(s, end='')
    column += len(s)
    if column >= 100: 
        newline()
        
def pad(width) -> None: 
    "Pad out to the column that is the next multiple of width."
    while column % width != 0: 
        print_string(' ')

def newline(): global column; print(); column = 0

Running the First Sample Program¶

Let's re-examine, parse, and run our first sample program:

In [28]:

print(program)

10 REM POWER TABLE
11 DATA 8, 4
15 READ N0, P0
20 PRINT "N",
25 FOR P = 2 to P0
30   PRINT "N^" P,
35 NEXT P
40 PRINT "SUM"
45 LET S = 0
50 FOR N = 2 TO N0
55   PRINT N,
60   FOR P = 2 TO P0
65     LET S = S + N ^ P
70     PRINT N ^ P,
75   NEXT P
80   PRINT S
85 NEXT N
99 END

In [29]:

parse(program)

Out[29]:

[Stmt(num=10, typ='REM', args=[None]),
 Stmt(num=11, typ='DATA', args=[[8.0, 4.0]]),
 Stmt(num=15, typ='READ', args=[['N0', 'P0']]),
 Stmt(num=20, typ='PRINT', args=[['"N"', ',']]),
 Stmt(num=25, typ='FOR', args=['P', 2.0, 'P0', 1]),
 Stmt(num=30, typ='PRINT', args=[['"N^"', 'P', ',']]),
 Stmt(num=35, typ='NEXT', args=['P']),
 Stmt(num=40, typ='PRINT', args=[['"SUM"']]),
 Stmt(num=45, typ='LET', args=['S', 0.0]),
 Stmt(num=50, typ='FOR', args=['N', 2.0, 'N0', 1]),
 Stmt(num=55, typ='PRINT', args=[['N', ',']]),
 Stmt(num=60, typ='FOR', args=['P', 2.0, 'P0', 1]),
 Stmt(num=65, typ='LET', args=['S', Opcall(x='S', op='+', y=Opcall(x='N', op='^', y='P'))]),
 Stmt(num=70, typ='PRINT', args=[[Opcall(x='N', op='^', y='P'), ',']]),
 Stmt(num=75, typ='NEXT', args=['P']),
 Stmt(num=80, typ='PRINT', args=[['S']]),
 Stmt(num=85, typ='NEXT', args=['N']),
 Stmt(num=99, typ='END', args=[])]

In [30]:

run(program)

N              N^2            N^3            N^4            SUM
2              4              8              16             28 
3              9              27             81             145 
4              16             64             256            481 
5              25             125            625            1256 
6              36             216            1296           2804 
7              49             343            2401           5597 
8              64             512            4096           10269

Running More Programs¶

Rather than put together a suite of unit tests for execute, I'll run integration tests—additional whole programs. I've also added a few assertions.

In [31]:

# Linear equation solver (page 3 and 19 of the manual)

run('''
READ A1, A2, A3, A4
LET D = A1 * A4 - A3 * A2
IF D = 0 THEN 65
READ B1, B2
 LET X1 = (B1*A4 - B2 * A2) / D
 LET X2 = ( A1 * B2 - A3 * B1)/D
 PRINT X1, X2
GOTO 30
PRINT "NO UNIQUE SOLUTION"
DATA 1, 2, 4
DATA 2, -7, 5
DATA 1, 3, 4, -7
END
''')

assert variables['D'] != 0
assert variables['X1'] == -11/3

4              -5.5 
0.666667       0.166667 
-3.66667       3.83333

In [32]:

# Find max(sin(x)) for 0 <= x <= 3 (page 25)

run('''
PRINT "X VALUE", "SINE", "RESOLUTION"
READ D
 LET M = -1
 FOR X = 0 TO 3 STEP D
 IF SIN(X) <= M THEN 80
   LET X0 = X
   LET M = SIN(X)
 NEXT X
 PRINT X0, M, D
GO TO 10
DATA .1, .01, .001, .0001
END
''')

assert abs(variables['X0'] - math.pi / 2) < 0.00001

X VALUE        SINE           RESOLUTION
1.6            0.999574       0.1 
1.57           1              0.01 
1.571          1              0.001 
1.5708         1              0.0001

In [33]:

# Printing (page 32)

run('''
10 FOR I = 1 TO 12
20   PRINT I,
30 NEXT I
40 END''')

assert variables['I'] == 12

1              2              3              4              5              6              7         
8              9              10             11             12

In [34]:

# Powers (page 33)

run('''
PRINT "THIS PROGRAM COMPUTES AND PRINTS THE NTH POWERS"
PRINT "OF THE NUMBERS LESS THAN OR EQUAL TO N FOR VARIOUS"
PRINT "N FROM 1 TO 7"
PRINT
FOR N = 1 TO 7
 PRINT "N = "; N; "I^N:"
 FOR I = 1 TO N
   PRINT I^N,
 NEXT I
 PRINT
 PRINT
NEXT N
END''')

assert variables['N'] ** variables['I'] == 7 ** 7

THIS PROGRAM COMPUTES AND PRINTS THE NTH POWERS
OF THE NUMBERS LESS THAN OR EQUAL TO N FOR VARIOUS
N FROM 1 TO 7

N =   1  I^N:
1              

N =   2  I^N:
1              4              

N =   3  I^N:
1              8              27             

N =   4  I^N:
1              16             81             256            

N =   5  I^N:
1              32             243            1024           3125           

N =   6  I^N:
1              64             729            4096           15625          46656          

N =   7  I^N:
1              128            2187           16384          78125          279936         823543

In [35]:

# Cubes (page 35)

run('''
10 FOR I = 1 TO 100
20   PRINT I*I*I;
30 NEXT I
40 END
''')

1  8  27 64 125   216   343   512   729   1000  1331  1728  2197  2744  3375  4096  4913  5832  6859 
8000  9261  10648 12167 13824 15625 17576 19683 21952 24389 27000 29791 32768 35937 39304 42875 46656 
50653 54872 59319 64000 68921 74088 79507 85184 91125 97336 103823   110592   117649   125000   132651 
140608   148877   157464   166375   175616   185193   195112   205379   216000   226981   238328   250047 
262144   274625   287496   300763   314432   328509   343000   357911   373248   389017   405224   421875 
438976   456533   474552   493039   512000   531441   551368   571787   592704   614125   636056   658503 
681472   704969   729000   753571   778688   804357   830584   857375   884736   912673   941192   970299 
1e+06

In [36]:

# Squares (variant of page 35 program, designed to test negative STEP)

run('''
10 FOR I = 100 TO 0 STEP -2
20   PRINT I*I;
30 NEXT I
''')

assert variables['I'] == 0

10000 9604  9216  8836  8464  8100  7744  7396  7056  6724  6400  6084  5776  5476  5184  4900  4624 
4356  4096  3844  3600  3364  3136  2916  2704  2500  2304  2116  1936  1764  1600  1444  1296  1156 
1024  900   784   676   576   484   400   324   256   196   144   100   64 36 16 4  0

In [37]:

# Sales ledger (page 37; cleaned up a bit)

run('''
FOR I = 1 TO 3
  READ P(I)
NEXT I
FOR I = 1 TO 3
  FOR J = 1 TO 5
    READ S(I, J)
  NEXT J
NEXT I
FOR J = 1 TO 5
 LET S = 0
 FOR I = 1 TO 3
   LET S = S + P(I) * S(I, J)
 NEXT I
 PRINT "TOTAL SALES FOR SALESMAN"J, "$"S
NEXT J
DIM S(3, 5)
DATA 1.25, 4.30, 2.50
DATA 40, 20, 37, 29, 42
DATA 10, 16, 3, 21, 8
DATA 35, 47, 29, 16, 33
END
''')

TOTAL SALES FOR SALESMAN1     $180.5 
TOTAL SALES FOR SALESMAN2     $211.3 
TOTAL SALES FOR SALESMAN3     $131.65 
TOTAL SALES FOR SALESMAN4     $166.55 
TOTAL SALES FOR SALESMAN5     $169.4

We can look at the variables that have been stored for this program:

In [38]:

variables

Out[38]:

defaultdict(float,
            {'I': 3.0,
             ('P', 1.0): 1.25,
             ('P', 2.0): 4.3,
             ('P', 3.0): 2.5,
             'J': 5.0,
             ('S', 1.0, 1.0): 40.0,
             ('S', 1.0, 2.0): 20.0,
             ('S', 1.0, 3.0): 37.0,
             ('S', 1.0, 4.0): 29.0,
             ('S', 1.0, 5.0): 42.0,
             ('S', 2.0, 1.0): 10.0,
             ('S', 2.0, 2.0): 16.0,
             ('S', 2.0, 3.0): 3.0,
             ('S', 2.0, 4.0): 21.0,
             ('S', 2.0, 5.0): 8.0,
             ('S', 3.0, 1.0): 35.0,
             ('S', 3.0, 2.0): 47.0,
             ('S', 3.0, 3.0): 29.0,
             ('S', 3.0, 4.0): 16.0,
             ('S', 3.0, 5.0): 33.0,
             'S': 169.4})

In [39]:

# Random number generator (page 40)

run('''
10 FOR I = 1 TO 100
20   PRINT INT(10 * RND(X));
30 NEXT I
40 END
''')

8  5  6  5  2  4  8  4  6  0  8  3  5  8  8  4  7  7  0  3  3  7  9  0  5  5  2  8  5  7  2  3  9  1 
9  3  1  7  7  6  1  4  7  7  5  7  0  9  9  3  0  8  0  4  2  4  9  4  1  7  8  8  6  9  0  7  7  5 
4  7  7  6  0  7  0  9  8  0  8  7  3  5  9  7  0  1  3  2  5  5  8  7  0  0  3  0  2  1  8  8

In [40]:

# DEF example: table of SIN(X) and COS(X) in degrees (page 41, expanded some)

run('''
PRINT "D"; "SIN(D)", "COS(D)", "SIN(D)^2 + COS(D)^2"
LET P1 = 3.14159265358979 / 180
FOR X = 0 TO 90 STEP 15
 PRINT X; FNS(X), FNC(X), FNS(X)^2 + FNC(X)^2
NEXT X
DEF FNS(D) = SIN(D * P1)
DEF FNC(D) = COS(D * P1)
END
''')

D  SIN(D)      COS(D)         SIN(D)^2 + COS(D)^2
0  0           1              1 
15 0.258819    0.965926       1 
30 0.5         0.866025       1 
45 0.707107    0.707107       1 
60 0.866025    0.5            1 
75 0.965926    0.258819       1 
90 1           1.61554e-15    1

In [41]:

# GOSUB example (page 43)

run('''
LET X = 3
GOSUB 400
PRINT U, V, W
LET X = 5
GOSUB 400
PRINT U, V, W
LET Z = U + 2*V + 3*W
PRINT "Z = " Z
STOP
LET U = X*X
LET V = X*X*X
LET W = X*X*X*X + X*X*X + X*X + X
RETURN
END
''')

9              27             120 
25             125            780 
Z = 2615

In [42]:

# Sum of non-negative multiples of 0.1 less than or equal to 2, two ways (page 47)

run('''
LET S = 0
LET N = 0
LET S = S + N/10
 IF N >= 20 THEN 60
 LET N = N + 1
GOTO 20
PRINT "SUM = " S
END
''')

SUM = 21

In [43]:

run('''
20 FOR N = 1 TO 20
40   LET S = S + N/10
50 NEXT N
60 PRINT "SUM = "  S
70 END
''')

assert variables['S'] == sum(N/10 for N in range(1, 21))

SUM = 21

Checking for Syntax Errors¶

Here we show a collection of syntax errors, and the messages they generate:

In [44]:

run('''
X = 1
GO TO JAIL
FOR I = 1 
IF X > 0 & X < 10 GOTO 999
LET Z = (Z + 1
PRINT "OH CANADA", EH?
LET Z = +3
LET X = Y ** 2
LET A(I = 1
IF A = 0 THEN 900 + 99
NEXT A(I)
DEF F(X) = X ^ 2 + 1
IF X != 0 THEN 999
DEF FNS(X + 2*P1) = SIN(X)
DEF FNY(M, B) = M * X + B
LET 3 = X
LET SIN = 7 * DEADLY
LET X = A-1(I)
FOR SCORE + 7
STOP IN NAME(LOVE)
REMARKABLY, THE INTERPRETER
REMEDIES THE ERRORS, AND THE PROPGRAM
REMAINS AN EXECUTABLE ENTITY, UN-
REMITTENTLY RUNNING, WITH NO
REMORSE OR REGRETS, AND WITH GREAT
ENDURANCE.
PRINT "No error in line " 3 ^ 3 "despite the errors above"
''')

Error in line '1 X = 1' at 'X = 1': unknown statement type
Error in line '2 GO TO JAIL' at 'J A I L': missing line number
Error in line '3 FOR I = 1 ' at '': expected 'TO'
Error in line '4 IF X > 0 & X < 10 GOTO 999' at '& X < 10 GOTO 999': expected 'THEN'
Error in line '5 LET Z = (Z + 1' at '': expected ")" to end expression
Error in line '6 PRINT "OH CANADA", EH?' at '?': unknown expression
Error in line '7 LET Z = +3' at '+ 3': unknown expression
Error in line '8 LET X = Y ** 2' at '* 2': unknown expression
Error in line '9 LET A(I = 1' at '= 1': expected ")" to close subscript
Error in line '10 IF A = 0 THEN 900 + 99' at '+ 99': extra TOKENS at end of line
Error in line '11 NEXT A(I)' at '( I )': extra TOKENS at end of line
Error in line '12 DEF F(X) = X ^ 2 + 1' at 'F ( X ) = X ^ 2 + 1': expected a function name
Error in line '13 IF X != 0 THEN 999' at '! = 0 THEN 999': expected a relational operator
Error in line '14 DEF FNS(X + 2*P1) = SIN(X)' at '+ 2 * P1 ) = SIN ( X )': expected ')'
Error in line '15 DEF FNY(M, B) = M * X + B' at ', B ) = M * X + B': expected ')'
Error in line '16 LET 3 = X' at '3 = X': expected a variable name
Error in line '17 LET SIN = 7 * DEADLY' at 'SIN = 7 * D E A D L Y': expected a variable name
Error in line '18 LET X = A-1(I)' at '( I )': extra TOKENS at end of line
Error in line '19 FOR SCORE + 7' at 'C O R E + 7': expected '='
Error in line '20 STOP IN NAME(LOVE)' at 'I N N A M E ( L O V E )': extra TOKENS at end of line
Error in line '26 ENDURANCE.' at 'U R A N C E .': extra TOKENS at end of line
NO ERROR IN LINE 27 DESPITE THE ERRORS ABOVE

Longer Program: Life¶

Now for a final, slightly more complicated example: Conway's Game of Life.

In [45]:

run('''
REM CONWAY'S GAME OF LIFE

REM G IS NUMBER OF GENERATIONS, 
REM M IS MATRIX SIZE (M X M)
REM L(SELF, NEIGHBORS_ALIVE) IS 1 IFF CELL WITH THOSE COUNTS LIVES
REM A(X, Y) IS 1 IFF CELL AT (X, Y) IS LIVE
REM B(X, Y) GETS THE NEXT GENERATION

READ G,      M,      L(0,3), L(1,3), L(1,2)
DATA 10,     10,     1,      1,      1
READ A(3,4), A(3,5), A(3,6), A(6,5), A(6,6), A(7,5), A(7,6)
DATA 1,      1,      1,      1,      1,      1,      1

REM MAIN LOOP: PRINT, THEN REPEAT G TIMES: UPDATE / COPY / PRINT
LET I = 0
GOSUB 700
FOR I = 1 TO G
 GOSUB 300
 GOSUB 500
 GOSUB 700
NEXT I
STOP

REM ========== UPDATE B = NEXT_GEN(A)
FOR Y = 1 TO M
 FOR X = 1 TO M
   LET N = A(X-1,Y)+A(X+1,Y)+A(X,Y-1)+A(X,Y+1)+A(X-1,Y-1)+A(X+1,Y+1)+A(X-1,Y+1)+A(X+1,Y-1)
   LET B(X, Y) = L(A(X, Y), N)
 NEXT X
NEXT Y
RETURN

REM ========== COPY A = B
FOR Y = 1 TO M
 FOR X = 1 TO M
   LET A(X, Y) = B(X, Y)
 NEXT X
NEXT Y
RETURN

REM ========== PRINT A
PRINT "GEN " I
FOR Y = 1 TO M
 FOR X = 1 TO M
   IF A(X, Y) = 1 THEN 760
     PRINT ".";
     GOTO 770
   PRINT "O";
 NEXT X
 PRINT
NEXT Y
RETURN

END 
''')

GEN 0 
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  O  .  .  .  .  .  .  .  
.  .  O  .  .  O  O  .  .  .  
.  .  O  .  .  O  O  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
GEN 1 
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  O  O  O  .  O  O  .  .  .  
.  .  .  .  .  O  O  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
GEN 2 
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  O  .  .  .  .  .  .  .  
.  .  O  .  O  O  O  .  .  .  
.  .  O  .  O  O  O  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
GEN 3 
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  O  .  O  .  .  .  .  
.  O  O  .  O  .  O  .  .  .  
.  .  .  .  O  .  O  .  .  .  
.  .  .  .  .  O  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
GEN 4 
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  O  O  O  O  .  .  .  .  
.  .  O  .  O  .  O  .  .  .  
.  .  .  O  O  .  O  .  .  .  
.  .  .  .  .  O  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
GEN 5 
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  O  O  .  .  .  .  .  
.  .  O  .  O  O  .  .  .  .  
.  .  O  .  .  .  O  .  .  .  
.  .  .  O  O  .  O  .  .  .  
.  .  .  .  O  O  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
GEN 6 
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  O  O  O  .  .  .  .  
.  .  O  .  O  O  .  .  .  .  
.  .  O  .  .  .  O  .  .  .  
.  .  .  O  O  .  O  .  .  .  
.  .  .  O  O  O  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
GEN 7 
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  O  .  .  .  .  .  
.  .  .  O  .  O  .  .  .  .  
.  .  O  .  .  .  O  .  .  .  
.  .  O  .  .  .  O  .  .  .  
.  .  O  .  .  .  O  .  .  .  
.  .  .  O  .  O  .  .  .  .  
.  .  .  .  O  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
GEN 8 
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  O  .  .  .  .  .  
.  .  .  O  O  O  .  .  .  .  
.  .  O  O  .  O  O  .  .  .  
.  O  O  O  .  O  O  O  .  .  
.  .  O  O  .  O  O  .  .  .  
.  .  .  O  O  O  .  .  .  .  
.  .  .  .  O  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
GEN 9 
.  .  .  .  .  .  .  .  .  .  
.  .  .  O  O  O  .  .  .  .  
.  .  O  .  .  .  O  .  .  .  
.  O  .  .  .  .  .  O  .  .  
.  O  .  .  .  .  .  O  .  .  
.  O  .  .  .  .  .  O  .  .  
.  .  O  .  .  .  O  .  .  .  
.  .  .  O  O  O  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  
GEN 10 
.  .  .  .  O  .  .  .  .  .  
.  .  .  O  O  O  .  .  .  .  
.  .  O  O  O  O  O  .  .  .  
.  O  O  .  .  .  O  O  .  .  
O  O  O  .  .  .  O  O  O  .  
.  O  O  .  .  .  O  O  .  .  
.  .  O  O  O  O  O  .  .  .  
.  .  .  O  O  O  .  .  .  .  
.  .  .  .  O  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .

Actually, this was an assignment in my high school BASIC class. (We used a slightly different version of BASIC.) Back then, output was on rolls of paper, and I thought it was wasteful to print only one generation per line. So I arranged to print multiple generations on the same line, storing them until it was time to print them out. But BASIC doesn't have three-dimensional arrays, so I needed to store several generations worth of data in one A(X, Y) value. Today, I know that that could be done by allocating one bit for each generation, but back then I don't think I knew about binary representation, so I stored one generation in each decimal digit. That means I no longer need two matrixes, A and B; instead, the current generation will always be the value in the one's place, the previous generation in the ten's place, and the one before that in the hundred's place.

In [46]:

run('''
REM CONWAY'S GAME OF LIFE

REM G IS NUMBER OF GENERATIONS, 
REM M IS MATRIX SIZE (M X M)
REM L(SELF, NEIGHBORS_ALIVE) IS 1 IFF CELL WITH THOSE COUNTS LIVES
REM A(X, Y) STORES THE HISTORY OF THE CELL AT (X, Y); 1 MEANS LIVE,
REM BUT WE STORE SEVERAL GENERATIONS: A(X, Y) = 100 MEANS THE CELL
REM IS DEAD IN THE CURRENT AND PREVIOUS GENERATION (00), BUT LIVE IN THE
REM GENERATION BEFORE THAT (1). WE STORE MULTIPLE GENERATIONS SO THAT
REM WE CAN PRINT THEM OUT ON ONE LINE, SAVING SPACE/PAPER.

READ G,      M,      L(0,3), L(1,3), L(1,2)
DATA 11,     10,     1,      1,      1
READ A(3,4), A(3,5), A(3,6), A(6,5), A(6,6), A(7,5), A(7,6)
DATA 1,      1,      1,      1,      1,      1,      1

REM FNA(N) = THE PREVIOUS GENERATION'S VALUE
DEF FNA(N) = INT(N  / 10) % 10

REM FNC(N) = THE GENERATION IN COLUMN C; FNC(123) = C FOR EACH C IN 1..3
DEF FNC(N) = FNA(N / (10 ^ (2 - C)))

REM MAIN LOOP: DO 3 UPDATES (2 FIRST TIME), THEN PRINT AND SHIFT
FOR I = 1 TO G
 GOSUB 300
 IF I % 3 <> 2 THEN 200
   GOSUB 700
   GOSUB 800
NEXT I
STOP

REM ========== UPDATE A: SHIFT OLD GENS LEFT; ADD IN NEW GEN
FOR Y = 1 TO M
 FOR X = 1 TO M 
   LET A(X, Y) = 10 * A(X, Y)
 NEXT X
NEXT Y
FOR Y = 1 TO M
 FOR X = 1 TO M 
   LET N1 = FNA(A(X+1,Y-1)) + FNA(A(X+1,Y)) + FNA(A(X+1,Y+1)) + FNA(A(X,Y-1))
   LET N2 = FNA(A(X-1,Y-1)) + FNA(A(X-1,Y)) + FNA(A(X-1,Y+1)) + FNA(A(X,Y+1))
   LET S  = FNA(A(X, Y))
   LET A(X, Y) = A(X, Y) + L(S, N1 + N2)
 NEXT X
NEXT Y
RETURN

REM ========== PRINT A (3 GENERATIONS ACROSS THE PAGE)
PRINT "GEN " I-2, " ", "   GEN " I-1, " ", "      GEN " I
FOR Y = 1 TO M
 FOR C = 1 TO 3
   FOR X = 1 TO M
     IF FNC(A(X, Y)) = 1 THEN 760
       PRINT ".";
       GOTO 770
     PRINT "O";
   NEXT X
   IF C = 3 THEN 777
     PRINT "|";
 NEXT C
 PRINT
NEXT Y
RETURN

REM ========== FORGET ALL BUT THE MOST RECENT GENERATION IN A
FOR Y = 1 TO M
 FOR X = 1 TO M
   LET A(X, Y) = A(X, Y) % 10
 NEXT X
NEXT Y
RETURN

END 
''')

GEN 0                            GEN 1                            GEN 2 
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  O  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  O  .  .  .  .  .  .  .  
.  .  O  .  .  O  O  .  .  .  |  .  O  O  O  .  O  O  .  .  .  |  .  .  O  .  O  O  O  .  .  .  
.  .  O  .  .  O  O  .  .  .  |  .  .  .  .  .  O  O  .  .  .  |  .  .  O  .  O  O  O  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
GEN 3                            GEN 4                            GEN 5 
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  O  O  .  .  .  .  .  
.  .  .  O  .  O  .  .  .  .  |  .  .  O  O  O  O  .  .  .  .  |  .  .  O  .  O  O  .  .  .  .  
.  O  O  .  O  .  O  .  .  .  |  .  .  O  .  O  .  O  .  .  .  |  .  .  O  .  .  .  O  .  .  .  
.  .  .  .  O  .  O  .  .  .  |  .  .  .  O  O  .  O  .  .  .  |  .  .  .  O  O  .  O  .  .  .  
.  .  .  .  .  O  .  .  .  .  |  .  .  .  .  .  O  .  .  .  .  |  .  .  .  .  O  O  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
GEN 6                            GEN 7                            GEN 8 
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  O  .  .  .  .  .  |  .  .  .  .  O  .  .  .  .  .  
.  .  .  O  O  O  .  .  .  .  |  .  .  .  O  .  O  .  .  .  .  |  .  .  .  O  O  O  .  .  .  .  
.  .  O  .  O  O  .  .  .  .  |  .  .  O  .  .  .  O  .  .  .  |  .  .  O  O  .  O  O  .  .  .  
.  .  O  .  .  .  O  .  .  .  |  .  .  O  .  .  .  O  .  .  .  |  .  O  O  O  .  O  O  O  .  .  
.  .  .  O  O  .  O  .  .  .  |  .  .  O  .  .  .  O  .  .  .  |  .  .  O  O  .  O  O  .  .  .  
.  .  .  O  O  O  .  .  .  .  |  .  .  .  O  .  O  .  .  .  .  |  .  .  .  O  O  O  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  O  .  .  .  .  .  |  .  .  .  .  O  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  
GEN 9                            GEN 10                           GEN 11 
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  O  .  .  .  .  .  |  .  .  .  O  O  O  .  .  .  .  
.  .  .  O  O  O  .  .  .  .  |  .  .  .  O  O  O  .  .  .  .  |  .  .  O  .  .  .  O  .  .  .  
.  .  O  .  .  .  O  .  .  .  |  .  .  O  O  O  O  O  .  .  .  |  .  O  .  .  .  .  .  O  .  .  
.  O  .  .  .  .  .  O  .  .  |  .  O  O  .  .  .  O  O  .  .  |  O  .  .  .  O  .  .  .  O  .  
.  O  .  .  .  .  .  O  .  .  |  O  O  O  .  .  .  O  O  O  .  |  O  .  .  O  .  O  .  .  O  .  
.  O  .  .  .  .  .  O  .  .  |  .  O  O  .  .  .  O  O  .  .  |  O  .  .  .  O  .  .  .  O  .  
.  .  O  .  .  .  O  .  .  .  |  .  .  O  O  O  O  O  .  .  .  |  .  O  .  .  .  .  .  O  .  .  
.  .  .  O  O  O  .  .  .  .  |  .  .  .  O  O  O  .  .  .  .  |  .  .  O  .  .  .  O  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  O  .  .  .  .  .  |  .  .  .  O  O  O  .  .  .  .  
.  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .  |  .  .  .  .  .  .  .  .  .  .