My Brainfuck CPU - A simple Processor in Python via MyHDL (part 1)

Abstract

Brainfuck is a very simple but Turing-complete programming language (hereafter referred to as 'BF' for the sake of brevity and propriety), consisting of only eight commands. These posts are a record of my attempt to build a simple processor capable of correctly executing BF programs directly/natively (ie: in hardware, ultimately running on an FPGA).

Why the hell would you do such a thing?

You've never met me or worked with me, have you? ;)

  • Knowledge - this little mini-project ties together a whole bunch of different things that pique my interest, including:

    • hardware, at the diy-eeng level
    • digital circuit design (something I knew, and still know, very little about)
    • FPGAs / ASICs
    • MyHDL (python library for hardware simulation/design, converting software to FPGA/ASIC designs, etc)
    • iPython Notebook (ie: this format you're reading/in which all the software work was done)
    • etc
  • Fun - yes, my mind is warped enough by long years of programming, and I have more than enough OCD/other neuroses, that this kind of thing drops lots of dopamine into my brain :)

Who are you?

I'm sandbender on github and SandbenderCa on Twitter... Rudy X. Desjardins for those still lost in meatspace ;)

Enough preamble, where's the meat?

Right. Let's get the links out of the way first, do a quick overview of the BF language/machine, then we'll dive into design and code for our BF processor...

BF instruction set:

>     Increment data pointer by one (move to the "right")
<     Decrement data pointer by one (move to the "left")
+     Increment byte at data pointer
-     Decrement byte at data pointer
,     Read one byte of input into byte at data pointer
.     Output byte at data pointer
[     if byte at data pointer is zero, jump to instruction beyond matching ]   *
]     if byte at data pointers is nonzero, jump to instruction after matching [   *

* Brackets can be nested, and it is illegal for an open- or close-bracket to be used
  without it's match.

Note that any non-command character in the program is simply ignored (a noop).

Also note that we are not concerned (yet) with verifying that the input program is sane - this is assumed to be the case.

The BF Machine

The BF spec/description only calls for a minimal set of resources:

  • memory of some form to hold the program
  • an instruction pointer into the program memory
  • an array of at least (only?) 30,000 data cells (bytes)
  • a data pointer into the array of data memory
  • two byte streams, one each for input and output

Program execution begins with the first ("leftmost") instruction in the program (instruction pointer is initialized to zero) and proceeds sequentially to the right until the last instruction has been executed (except for cases where the instruction being executed results in a different movement of the instruction pointer, ie: the looping instructions).

Our Proposed Design

Our hardware design will consist of a slightly more complex setup compared to the theoretical BF machine model, out of necessity (we're trying to design for actual hardware here, not just a theoretical soft model). Since MyHDL allows us to mix and match regular/high-level Python with lower level mechanisms which would mirror actual hardware components... we could just pipe the BF input through a function that maps it's instructions directly to normal/high-level python code and be done with it.

But that would defeat the whole purpose of this excercise, which is to simulate actual hardware that will be implement-able on a real world FPGA.

As such, we're going to stick to something called Register Transfer Level mechanisms in our simulation - MyHDL constructs which map more or less directly to low-level hardware abstractions at the level of basic logic and signals that will be present in a real digital circuit.

Sequential vs. Combinatorial Logic

Quick primer on the difference: combinatorial logic elements have output which depends solely on their inputs. Sequential logic elements, in contrast, have outputs which depend on their inputs and prior state. Ie: sequential logic elements have/use memory/storage.

Example Sequential Operations:

  • Incrementing or Decrementing a value (depends on prior value)
  • Looping (depends on location of begin/end of loop, nesting, etc)
  • Writing data to memory (depends on value of a data pointer (address) already being set/stable)

Example Combinatorial Operations:

  • Reading instructions/data onto a bus (output to bus is linked directly to input address)

MyHDL provides both - the @always_comb decorator is used to produce combinatorial elements, and @always_seq is used for sequential (usage/etc for both will be explained later).

Timing

We're going to model a very simple system where both data and instruction buses are loaded immediately (ie: in a combinatorial fashion) whenever their respective pointers change - if you increment the data pointer, the data byte on the bus is automatically updated, etc.

We'll have a simple symmetrical clock signal to drive the sequential elements, which will be: the writing of any data to memory (if necessary) at the end of a clock cycle, the execution of the current instruction at the beginning of a clock cycle, and the advancement of the instruction pointer at the end of a cycle.

Registers:

We'll need a set of registers to hold various state inside the processor. In the interest of keeping most of the background detail in one place, I've listed them all here with brief descriptions; each will be further explained later as we come across them.

The following registers are all boolean in nature (flags):

CLK   Our clock signal.
SE    "Scan Enable" - signifies that we're performing a search through the instructions to
      find matching brackets for the purpose of looping.
SD    "Scan Direction" - indicates whether we're moving forwards or backwards when
      advancing the instruction pointer (IP).
WE    "Write Enable" - indicates when the current byte on the data bus (DBUS) should be
      written to memory at DP.

We will also need the following single-byte registers:

IP   Instruction pointer
DP   Data pointer
SA   Scan accumulator - keeps track of how many open/close brackets we've seen while scanning for a match.

And these single-byte buses:

IBUS   Holds the current instruction.
DBUS   Holds the current data byte.

Let's start building!

First we need to import all the necessary pieces from the myhdl package:

In [1]:
from myhdl import Signal, Simulation, StopSimulation, always, always_comb, always_seq, delay, enum, intbv, traceSignals

These are all the bits of hardware simulation we'll need from MyHDL... I won't bother describing them all here - you're encouraged to browse the MyHDL documentation, or keep reading for explanations below as each is introduced.

Next we'll setup some boilerplate:

In [2]:
# self-explanatory constants  
HIGH = 1
LOW = 0


ops_map = {
    '+': 'INC',
    '-': 'DEC',
    '>': 'INC_DP',
    '<': 'DEC_DP',
    ',': 'READ',
    '.': 'WRITE',
    '[': 'OPEN_LOOP',
    ']': 'CLOSE_LOOP'
}
valid_ops = ops_map.keys()
ops = enum(*(ops_map.values()))

def bf_to_native(program):
    '''
    'program' is a string.
    
    Any non-BF characters are pruned.
    '''
    return [ops.__getattribute__(ops_map[x]) for x in filter(lambda x: x in valid_ops, program)]

The enum(...) type is provided by MyHDL. We can use this as a convenient way of passing around labelled members of a set with the underlying implementation corresponding to integers. Here, we're setting up the set of valid instructions supported by the BF language.

We've also just made a few concessions for the sake of convenience: The ops_map dict maps string representations of the actual BF instructions to the names of the corresponding members of our enum. This in turn enables the bf_to_native function, which converts a string-based BF program to a sequence of 'native' instructions for our fledgling CPU.

Next, let's setup the python-based arrays to be used as the BF data array (memory), program memory, input and output buffers...

Note that for this phase of the BF CPU madness, we're only concerned with the core processor itself, and are basically ignoring memory (program & data chunks) as well as "real" I/O. This is to keep this phase of the hardware design simple - once (if?) I get to the point where I'm implementing something physical, I'll worry about adding in sub-designs to handle things like I/O and actual storage implementations. For now, it's just the processor we're modelling, so the rest will be left as simple, high-level Python stuff without any directly corresponding hardware modelling.

In [3]:
# BF spec requires 30000 bytes of data memory
bf_data = [0 for i in xrange(30000)]

# Our program - "Hello World!\n" to output
bf_program = "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++."

bf_program = bf_to_native(bf_program)

bf_input, bf_output = [], []

The "Hello World!" program above is courtesy of the BF Wikipedia page (see the links section above).

Now we get to the real meat...

The Clock

In [4]:
def clock_driver(clk, we):
    '''
    Symmetrical clock signal, high then low then repeat, update/change every iteration of the sim.
    
    We also set we (write enable) low when the clock goes low, regardless of whether it's already
    high or not.
    '''
    
    @always(delay(1))
    def _drive_clock():
        if clk:
            we.next = LOW
            
        clk.next = not clk
        
    return _drive_clock

At a basic level, the clock driver component simply toggles a clock signal on each iteration of the simulation.

However there's a bunch of MyHDL-isms introduced here which warrant some explanation...

Signals

All "paths" between hardware components are modelled using MyHDL's Signal objects. In the case of our clock driver above, the clk and we arguments are both Signal objects. These objects are passed to any hardware component which requires read or write access to them at the time that component is instantiated. Any component driven by the main clock will be passed the same Signal object for it's clk argument, etc... this 'connects' the components via the Signal.

MyHDL decorators, and the delay function

All hardware components in a MyHDL simulation are modelled using specialized generator functions. You can manually craft a generator function and mark it as a hardware component with the @instance decorator. More often though, you'll use one of the other decorators which accept classic (non-generator) functions and automatically add the appropriate generator behaviour.

The value yield'ed from a MyHDL generator controls when the component will be resumed. Typically this will either be a length of time like we do here with @always(delay(1)) (which says "run this function every time 1 iteration of the sim has elapsed... ie: every iteration), or a Signal edge as we'll see shortly with sequential decorators.

Hardware component models are built using top-level functions which return generators when called. Any parameters to the component (Signals and any other arguments used for conditional instantiation, etc) are passed to the top-level function. The inner generator function which is returned takes no arguments.

Signal.next

This is the mechanism for specifying what the value of a Signal should be during the next iteration of a sim. Ie: to change a Signal's value, you assign the new value to <Signal>.next.

It's worth noting here that digital circuits are massively concurrent by nature (which is kinda the point from the context of a programmer interested in ASIC/FPGA circuit design... performance right?).

When writing software, regardless of the language and programming style, at the end of the day you have an abstraction of some kind in your head roughly corresponding to some incarnation of 'first this happens, then this...'. With a digital circuit, all the pieces are potentially active all the time. Every component whose output is updated on the rising edge of a clock signal will 'run' every time the clock goes high, etc. The sequential aspect is still present in the form of sequential logic elements whose output depends on prior state, but that's as far as it goes at a low level. Every time the state/system updates, everything updates all together. The mental model is very different from how you may picture a piece of software.

The data-reading component

In [5]:
def read_data(dp, dbus):
    @always_comb
    def _read_data():
        dbus.next = bf_data[dp]
        
    return _read_data

This component simply places the appropriate byte on the data bus whenever the data pointer (address) changes.

The @always_comb decorator

This is the MyHDL mechanism for specifying a combinatorial element; there is no clock required here, the element does not depend on any prior state - the output (dbus.next) simply changes immediately as soon as the input (dp) changes. You can think of the output being a function of the input which is constantly being evaluated.

MyHDL automatically infers which arguments to the top-level function are inputs, you don't need to specify which are which.

The data-writing component

In [6]:
def write_data(dp, dbus, we):
    '''
    Write data byte to memory at DP whenever the write enable signal goes from high to low.
    '''
    
    @always_seq(we.negedge, reset=None)
    def _write_data():
        bf_data[dp] = int(dbus.val)   # Need to get a COPY of the data, not a ref! (int() usage)
        
    return _write_data

This is an example of a sequential logic component - it's sensitive to the falling edge of the we signal, meaning that whenever the we signal goes low, the output will update (not necessarily change - depends on input).

The reset argument to the decorator can be used to specify a different signal which will reset the sequential component - MyHDL apparently figures out what to reset/how automatically, although there's an alternate method of accomplishing the same thing manually. Our simple processor model doesn't use any resets in this first iteration, so I'll say no more about that feature as I haven't used or fully grokked it yet.

The instruction pointer advancer

In [7]:
def instruction_advance(ip, sd, clk):
    '''
    Update the instruction pointer on every falling clock edge.
    '''
    
    @always_seq(clk.negedge, reset=None)
    def _advance_instruction():
        if sd:
            ip.next = ip + 1
        else:
            ip.next = ip - 1
            
    return _advance_instruction

Pretty obvious... advances ip appropriately for either forward or backward processing/scanning. (Backwards scanning is used when returning to the beginning of a loop "block".)

The instruction reader

In [8]:
def instruction_read(ip, ibus):
    '''
    Read instruction from program at instruction pointer onto ibus.
    
    Stop ("exit") if we've reached the end of the program.
    '''
    
    @always_comb
    def _read_instruction():
        if ip == len(bf_program):
            raise StopSimulation
            
        ibus.next = bf_program[ip]
        
    return _read_instruction

If ip points beyond the end of the program, exit the simulation (we're done). The rest of that should be self-explanatory at this point. Moving on...

The instruction executor

In [9]:
def process(ip, ibus, dp, dbus, we, se, sd, sa, clk):
    '''
    Meat.
    
    Execute the instruction on ibus.
    '''
    
    @always_seq(clk.posedge, reset=None)
    def _process():
        if ibus == ops.OPEN_LOOP:
            if not se and 0 == dbus:   # New forward scan
                se.next = HIGH
                sd.next = HIGH
                sa.next = 1
            elif se:   # Existing scan
                if sd:   # Forwards
                    sa.next = sa + 1
                else:
                    sa.next = sa - 1
                    
                    if 1 == sa:   # This is the droid you are looking for
                        se.next = LOW
                        
                        # For finalizing backward scans, we need to reset SD
                        # so that regular instruction advancing moves *forwards*
                        sd.next = HIGH
        elif ibus == ops.CLOSE_LOOP:
            if not se and 0 != dbus:   # New backward scan
                se.next = HIGH
                sd.next = LOW
                sa.next = 1
            elif se:   # Existing scan
                if not sd:   # Backwards
                    sa.next = sa + 1
                else:
                    sa.next = sa - 1
                    
                    if 1 == sa:   # This is the droid you are looking for
                        se.next = LOW
        elif not se:
            if ibus == ops.INC:
                dbus.next = dbus + 1
                we.next = HIGH
            elif ibus == ops.DEC:
                dbus.next = dbus - 1
                we.next = HIGH
            elif ibus == ops.INC_DP:
                dp.next = dp + 1
            elif ibus == ops.DEC_DP:
                dp.next = dp - 1
            elif ibus == ops.READ:
                dbus.next = ord(bf_input.pop(0))   # consume head of input array
                we.next = HIGH
            elif ibus == ops.WRITE:
                bf_output.append(int(dbus.val))
        
    return _process

The core of our processor, this is the module which performs the core logic, ie: executes instructions.

Looping

We start off by checking whether the instruction is either a start- or end-loop command. If not, the rest of the instruction types will be executed only if there isn't a loop-scan in progress; if there is a loop-scan in progress, no commands are executed, the instruction pointer simply moves until it reaches it's destination (start/end of the loop).

Open-loop commands are ignored if the byte on the data bus is nonzero, and close-loop commands are ignored if the data byte is zero - in both these cases, it's basically a noop.

If we do need to scan (to jump to the end of the loop, or the beginning), we basically step through in the appropriate direction, counting open/close braces appropriately until we find our match. This is facilitated via se ("scan enable" - indicates no non-loop processing should happen), sa ("scan accumulator" - used to count brackets) and sd ("scan direction" - used to control which way the ip advances while scanning).

The OPEN_LOOP logic is basically a mirror of CLOSE_LOOP, with one small exception: since the sd flag is used for loop scanning and regular ip advancement, at the end of a backwards scan we need to reset it to HIGH so that the next instruction advancement (normal non-looping program progression) will proceed in the right direction.

Writing the byte on the data bus back to memory

For the operations which modify the byte on the data bus (INC, DEC and READ), we perform the appropriate operation by setting dbus.next, and then set we.next ("write enable") to HIGH so that it'll be stored in memory at the appropriate spot (dp) on the falling edge of the we signal, which is set LOW by the clock at the same time that the main clock goes LOW.

Putting it all together...

In [10]:
def cpu():
    # Boolean signals
    clk, se, we = [Signal(bool(0)) for i in range(3)]
    sd = Signal(bool(1))
    
    # Byte signals
    ip, dp, sa = [Signal(0) for i in range(3)]
    
    # Buses
    ibus = Signal(bf_program[0])
    dbus = Signal(intbv(bf_data[0], min=0, max=255))

    # Instantiate all our hardware components...
    clock = clock_driver(clk, we)
    data_reader = read_data(dp, dbus)
    data_writer = write_data(dp, dbus, we)
    op_advancer = instruction_advance(ip, sd, clk)
    op_reader = instruction_read(ip, ibus)
    op_process = process(ip, ibus, dp, dbus, we, se, sd, sa, clk)
    
    return clock, data_reader, data_writer, op_advancer, op_reader, op_process

The system function instantiates all our Signals, instantiates all the hardware components and passes each the appropriate Signal set, and then returns all the component instances. This is how you build a compound component in MyHDL - a component is either a list of instances or a function which will return an instance, so we build the cpu by building all the sub-components and returning them as a set.

Last but not least, the following little block demonstrates how to instantiate the cpu, wrapping it with traceSignals so that a simulation run will output a .vcd file that we can import in gtkwave to view the waveform, and how to execute the simulation (which returns when a StopSimulation exception is raised from a component within):

In [11]:
cpu_instance = traceSignals(cpu)

Simulation(cpu_instance).run()

# This is pure high-level python so we get/see human-readable output (ASCII bytes)
print(''.join(chr(x) for x in bf_output[:12]))
Hello World!

gtkwave screenshot sample

The following is a partial screenshot of the view of the signals in gtkwave, loaded from the .vcd file generated by traceSignals(...)

gtkwave sample screenshot

And that's it!

Hope you enjoyed part 1 of my BF cpu build... stay tuned for further installments!