%load_ext watermark
%watermark -a 'Sebastian Raschka' -u -d -v
Sebastian Raschka last updated: 2017-08-06 CPython 3.6.1 IPython 6.1.0
Breadt-first search (BFS) is algorithm that can find the closest members in a graph that match a certain search criterion.
BFS requires that we model our problem as a graph (nodes connected through edges). BFS can be applied to directed and undirected graph, where it can be applied to answer to types of question:
To answer these questions, BFS starts by checking all direct neighbors of a given node -- neighbors are nodes that have a direct connection to a particular node. Then, if none of those neighbors satisfies the criterion that we are looking for, the search is expanded to the neighbors of the nodes we just checked, and so on, until a match is found or all nodes in the graph were checked.
To keep track of the nodes that we have already checked and that we are going to check, we need two additional data structures:
A hash table to keep track of nodes we have already checked. If we don't check for previously checked nodes, we may end up in cycles depending on the structure of the graph.
A queue that stores the items to be checked.
To represent the graph, its nodes and edges, we can simply use a hash table such as Python's built-in dictionaries. Imagine we have an undirected, social network graph that lists our direct friends (Elijah, Marissa, Nikolai) and friends of friends:
Say we are going to move to a new apartment next weekend, and we want to ask our friends if they have a pick-up truck that can be helpful in this endeavor. First, we would reach out to our directed friends (or 1st degree connections). If none of these have a pick-up truck, we ask them to ask their 1st degree connections (which are our 2nd degree connections), and so forth:
We can represent such a graph using a simple hash table (here: Python dictionary) as follows:
graph = {}
graph['You'] = ['Elijah', 'Marissa', 'Nikolai', 'Cassidy']
graph['Elijah'] = ['You']
graph['Marissa'] = ['You']
graph['Nikolai'] = ['John', 'Thomas', 'You']
graph['Cassidy'] = ['John', 'You']
graph['John'] = ['Cassidy', 'Nikolai']
graph['Thomas'] = ['Nikolai', 'Mario']
graph['Mario'] = ['Thomas']
Next, let's setup a simple queue data structure. Of course, we can also use a regular Python list like a queue (using .insert(0, x)
and .pop()
, but this way, our breadth-first search implementation is maybe more illustrative. For more information about queues, please see the Queues and Deques notebook.
class QueueItem():
def __init__(self, value, pointer=None):
self.value = value
self.pointer = pointer
class Queue():
def __init__(self):
self.last = None
self.first = None
self.length = 0
def enqueue(self, value):
item = QueueItem(value, None)
if self.last:
self.last.pointer = item
if not self.first:
self.first = item
self.last = item
self.length += 1
def dequeue(self):
if self.first is not None:
value = self.first.value
self.first = self.first.pointer
self.length -= 1
else:
value = None
return value
qe = Queue()
qe.enqueue('a')
print('First element:', qe.first.value)
print('Last element:', qe.last.value)
print('Queue length:', qe.length)
First element: a Last element: a Queue length: 1
qe.enqueue('b')
print('First element:', qe.first.value)
print('Last element:', qe.last.value)
print('Queue length:', qe.length)
First element: a Last element: b Queue length: 2
qe.enqueue('c')
print('First element:', qe.first.value)
print('Last element:', qe.last.value)
print('Queue length:', qe.length)
First element: a Last element: c Queue length: 3
val = qe.dequeue()
print('Dequeued value:', val)
print('Queue length:', qe.length)
Dequeued value: a Queue length: 2
val = qe.dequeue()
print('Dequeued value:', val)
print('Queue length:', qe.length)
Dequeued value: b Queue length: 1
val = qe.dequeue()
print('Dequeued value:', val)
print('Queue length:', qe.length)
Dequeued value: c Queue length: 0
val = qe.dequeue()
print('Dequeued value:', val)
print('Queue length:', qe.length)
Dequeued value: None Queue length: 0
qe.enqueue('c')
print('First element:', qe.first.value)
print('Last element:', qe.last.value)
print('Queue length:', qe.length)
First element: c Last element: c Queue length: 1
Now, back to the graph, where we want to identify the closest connection that owns a truck, which can be helpful for moving (if we are allowed to borrow it, that is):
graph = {}
graph['You'] = ['Elijah', 'Marissa', 'Nikolai', 'Cassidy']
graph['Elijah'] = ['You']
graph['Marissa'] = ['You']
graph['Nikolai'] = ['John', 'Thomas', 'You']
graph['Cassidy'] = ['John', 'You']
graph['John'] = ['Cassidy', 'Nikolai']
graph['Thomas'] = ['Nikolai', 'Mario']
graph['Mario'] = ['Thomas']
For simplicity, let's assume we have function that checks if a person ows a pick-up truck. (Say, Mario owns a pick-up truck, the check function knows it but we don't know it.)
def has_truck(person):
if person == 'Mario':
return True
else:
return False
Now, the breadth_first_search
implementation below will check our closest neighbors first, then, it will check the neighnors of our neighbors and so forth. We will make use both of the graph we constructed and the Queue
data structure that we implemented. Also, note that we are keeping track of people we already checked to prevent cycles in our search:
def breadth_first_search(graph):
# initialize queue
queue = Queue()
for person in graph['You']:
queue.enqueue(person)
people_checked = set()
degree = 0
while queue.length:
person = queue.dequeue()
if has_truck(person):
return person
else:
degree += 1
people_checked.add(person)
for next_person in graph[person]:
# check to prevent endless cycles
if next_person not in people_checked:
queue.enqueue(next_person)
breadth_first_search(graph)
'Mario'