Notebook

Lab 4 - Codes for Efficient Transmission of Data¶

Authors:¶

v1.0 (2014 Fall) Kangwook Lee **, Kannan Ramchandran **

v1.1 (2015 Fall) Kabir Chandrasekher *, Max Kanwal *, Kangwook Lee **, Kannan Ramchandran **

v1.2 (2016 Spring) Kabir Chandrasekher, Tony Duan, David Marn, Ashvin Nair, Kangwook Lee, Kannan Ramchandran

In last week's lab, we learned about source coding and then briefly touched upon the channel coding problem. We learned about the simplest and most intuitive code: repetition codes. This week you will explore implementing your own code for transmission of data!

To see why efficient coding techniques are important, consider an erasure channel in which each packet sent gets dropped with some probability $p$. If we wanted to convey a message, we could consider a feedback channel in which the receiver tells the sender which messages were received and the sender re-sends the dropped packets. This process can be repeated until the receiver gets all of the intended message. While this procedure is indeed optimal in all senses of the word, feedback is simply not possible in many circumstances. What can we do in this situation?

In this lab, we will be looking at a specific type of code that will apply your knowledge of random bipartite graphs (the balls and bins model).

Introduction¶

The coding scheme we will work with in this lab will be a type of erasure code. Under the binary erasure channel (BEC) model, bits that are sent through a noisy channel either make it through unmodified or are tagged as "corrupt", in which case the recieved information is dropped in all further information processing steps.

A code is considered optimal generally when all $n$ source symbols that need transmitting can be recovered from *any* $n$ encoded symbols. Think of it this way: You want to stream the new season of House of Cards from Netflix at midnight. But so do 1 million other people! Obviously the server loads are going to be high, so how can Netflix ensure that everyone watches without intermittent lags (which are frustrating and will lose them customers)? Well, one way is to increase the number of servers so that the load on any one server is decreased, but this costs a lot of money. Another important thing for them to consider is in the transmission process. When users are downloading the movie, Netflix servers don't want to have too much overhead to figure out which users need which bits of the video to get smooth playback. If they use near-optimal codes to encode and constantly send out the same random chunks of the video's data to all users, then they can be sure that users get what they need in only a little more than $n$ transmissions *no matter what parts of the show each individual user lost through their specific channel*!

So what's the secret to this magic? It's a two step process of clever encoding and decoding:

Encoding¶

For $n$ packets you want to transmit in total, pick $d$ such that $1\leq d \leq n$ according to some distribution.
With $d$ picked, now select $d$ random packets of the $n$ and combine the bits together using the XOR operator.
Transmit this combined packet, along with the metadata telling which actual packet indices were XOR'd.

Decoding¶

Reconstruct the list of packet indices which were XOR'd (in our case these are just explicitly sent with the data)
For each source packet, check if the packet is a singleton, in which case the encoded packet is exactly equal to the actual packet. If not, we can check if any of the packets that have been XOR'd are already known, in which case XOR that packet with the encoded packet and remove it from the list of packet indices that make up the encoded chunk.
If there are two or more indices in the list for the encoded packet we cannot figure out any more information! Put it on the side for looking at later.
With any newly decoded information, we may be able to decode previously undecodable packets that we had put on the side. Go through all unsolved packets and try to decode more packets until nothing more can be done.
Wait for the next encoded packet to come and repeat!

Now what's left for you to do? Well, remember that number $d$? It needs to be picked according to some distribution, and which distribution is the million dollar question!

Example¶

Consider the above bipartite graph. Here, $x_i$ represents the encoded incoming packet, $y_i$ represents the actual packet, and the edges represent the actual packets that make up each encoded packet. Consider each $x_i$ coming in chronologically:

For incoming packets $x_1, x_2, x_3$ we will not be able to decode anything.
As soon as $x_4$ comes, we see that it is a singleton. This means that $y_4$ can be fully recovered from $x_4$.
Finally, when $x_5$ comes in, we can suddenly decode everything: $x_5$ is made up of $y_4$ and $y_5$, but we already know $y_4$, so we can then fully decode $y_5$. Similarly, we can then decode $y_3$, then $y_1$, and lastly $y_2$ (in that order specifically).

As you might be able to tell, by choosing a good degree distribution for $d$, even when random incoming packets were lost (not shown), you were still able to recover all $5$ symbols only from $5$ received packets, despite the sender not knowing what packets you lost through the BEC.

Code¶

1) Packet Class & Utility functions¶

A packet consists of...

['chunk_indices', 'data' ]¶

chunk_indices: Which packets are chosen¶

data: The 'XOR'ed data (we will just add the bits in this lab for simplicity)¶

In [ ]:

%matplotlib inline  
import matplotlib.pyplot as plt
import numpy as np
import json
import random

class Packet:
    size_of_packet = 256
    def __init__(self, chunks, chunk_indices):
        self.data = self.add(chunks)
        self.chunk_indices = chunk_indices

    def add(self, chunks):
        tmp = np.zeros(Packet.size_of_packet, 'uint8')
        for each_chunk in chunks:
            tmp += each_chunk
        return tmp
            
    def num_of_chunks(self):
        return len(self.chunk_indices)

2) Transmitter & Encoder Class¶

You can initiate an encoder with a string! Then, generate_packet() will return a randomly encoded packet.

In [ ]:

class Transmitter:
    def __init__(self, chunks, channel, degree_distribution):
        self.chunks = chunks
        self.num_chunks = len(chunks)
        self.channel = channel
        self.degree_distribution = degree_distribution
        
    def generate_new_packet(self):
        if self.degree_distribution == 'single':
            #Always give a degree of 1
            n_of_chunks = 1
        elif self.degree_distribution == 'double':
            #Always give a degree of 2
            n_of_chunks = 2
        elif self.degree_distribution == 'mixed':
            #Give a degree of 1 half the time, 2 the other half
            if random.random() < 0.5:
                n_of_chunks = 1
            else:
                n_of_chunks = 2
        elif self.degree_distribution == 'baseline':
            """
            Randomly assign a degree from between 1 and 5.
            If num_chunks < 5, randomly assign a degree from 
            between 1 and num_chunks
            """
            n_of_chunks = random.randint(1,min(5, self.num_chunks))
        chunk_indices = random.sample(range(self.num_chunks), n_of_chunks)
        chunks = [ self.chunks[x] for x in chunk_indices ]
        return Packet( chunks, chunk_indices )
        
    def transmit_one_packet(self):
        packet = self.generate_new_packet()
        self.channel.enqueue( packet )

3) Channel Class¶

Channel class takes a packet and erase it with probability eps.

In [1]:

class Channel:
    def __init__(self, eps):
        self.eps = eps
        self.current_packet = None
        
    def enqueue(self, packet):
        if random.random() < self.eps:
            self.current_packet = None
        else:
            self.current_packet = packet
            
    def dequeue(self):
        return self.current_packet

4) Receiver & Decoder Class¶

You can initiate a decoder with the total number of chunks. Then, add_packet() will add a received packet to the decoder.

In [ ]:

class Receiver:
    def __init__(self, num_chunks, channel):
        self.num_chunks = num_chunks
        self.received_packets = []
        self.chunks = [ ]
        for i in range(self.num_chunks):
            self.chunks.append(  np.zeros(Packet.size_of_packet, 'uint8') )
        self.found = [ False for x in range(self.num_chunks) ]
        self.channel = channel
        
    def receive_packet(self):
        packet = self.channel.dequeue()
        if not packet == None:
            self.received_packets.append( packet )
            if packet.num_of_chunks() == 1:
                self.peeling()
        
    def peeling(self):
        flag = True
        while flag:
            flag = False
            for each_packet in self.received_packets:
                if each_packet.num_of_chunks() == 1: # Found a singleton
                    flag = True
                    idx = each_packet.chunk_indices[0]
                    break
            
            # First, declare the identified chunk
            if self.found[ idx ] == False:
                self.chunks[ idx ] = np.array(each_packet.data, 'uint8')
                self.found[ idx ] = True
            # Second, peel it off from others
            for each_packet in self.received_packets:
                if idx in each_packet.chunk_indices:
                    each_packet.chunk_indices.remove( idx )
                    each_packet.data -= self.chunks[ idx ]
            
    def isDone(self):
        return self.chunksDone() == self.num_chunks

    def chunksDone(self):
        return sum(self.found)

$\mathcal{Q}$uestion 1. Sending the raccoon

a. Break up the image shown below into $1024$ packets of size $256$ each.

In [ ]:

from scipy import misc
import matplotlib.cm as cm
import Image
import numpy as np
l = np.asarray(plt.imread("raccoon.jpg"))
#converts the image to grayscale
x = np.zeros((512,512))
for i in xrange(512):
    for j in xrange(512):
        x[i][j] = l[i][j][0]*0.299+l[i][j][1]*0.587+l[i][j][2]*0.113

plt.imshow(x, cmap = cm.Greys_r)

In [ ]:

tt= x.reshape(1,512*512)[0]
size_of_packet = 256 
num_of_packets = 1024
assert len(tt) == size_of_packet * num_of_packets

# Your code here

b. Using the 'single' degree distribution defined in the Transmitter class, send the raccoon over a channel with erasure probability 0.2. How many packets did you need to send? Display the data you receive every $100$ packets in addition to the data you receive at the end.

i. Plot the number of packets decoded as a function of the number of packets you receive. (The current_sent array should be helpful here)

You may find the following function useful:

In [ ]:

def visualize( chunks ):
    plt.imshow(chunks, cmap = cm.Greys_r)

In [ ]:

eps = 0.2
ch = Channel(eps)
tx = Transmitter( chunks, ch, 'single' )
rx = Receiver( len(chunks), ch )

ct = 0
intermediate_data = []
current_sent = []
while not rx.isDone():
    tx.transmit_one_packet()
    rx.receive_packet()
    ct += 1
    if ct % 100 == 0:   
        intermediate_data.append( np.array(rx.chunks, 'uint8').reshape(512,512) )
        current_sent.append(sum(rx.found))

received_data = np.array(rx.chunks, 'uint8').reshape(512,512)
print "The number of packets received: {}".format(ct)

### Incrementally show the data received
n_of_figures = int(ct/100)
fig = plt.figure( figsize=(8, 3*n_of_figures) )

for i in range(n_of_figures-1):
    fig.add_subplot(n_of_figures,1,i+1)
    visualize(intermediate_data[i])

fig.add_subplot(n_of_figures,1,n_of_figures)
visualize(received_data)

In [ ]:

#Your plotting code here

b ii. Looking at the graph, we see that it gets harder and harder to find the rest as we decode more and more chunks. Does this remind you of a well known theoretical problem?

Hint: Try out some small examples!

Your answer here.¶

c. Using the 'double' degree distribution defined in the Transmitter class, send the raccoon over a channel with erasure probability 0.2. What happens?

In [ ]:

eps = 0.2
ch = Channel(eps)
tx = Transmitter( chunks, ch, 'double' )
rx = Receiver( len(chunks), ch )

ct = 0
while not rx.isDone():
    tx.transmit_one_packet()
    rx.receive_packet()
    ct += 1

received_data = np.array(rx.chunks, 'uint8').reshape(512,512)
print "The number of packets received: {}".format(ct)

d. You have seen two degree distributions so far. Both of these have been deterministic, and one worked better than the other. Let's try one final degree distribution. Using the 'baseline' degree distribution, send the raccoon over a channel with erasure probability 0.2. Plot the number of packets decoded against the number of packets transmitted.

In [ ]:

eps = 0.2
num_trials = 1
ch = Channel(eps)
tx_baseline = Transmitter( chunks, ch, 'baseline' )
baseline_packets = []
current_baseline = []

for _ in xrange(num_trials):
    rx = Receiver( len(chunks), ch )
    ct = 0
    intermediate_data = []
    while not rx.isDone():
        tx_baseline.transmit_one_packet()
        rx.receive_packet()
        ct += 1
        if ct % 100 == 0:   
            current_baseline.append(sum(rx.found))
    baseline_packets.append(ct)

In [ ]:

#Plotting code here

$\mathcal{C}$ompetition

Alice has just finished eating dinner, and with her EE 126 homework completed early for once, she plans to sit down for a movie night (she wants to make use of the 30-day free trial of Netflix!). While Alice is surfing Netflix she decides she wants to stream Interstellar. Alice's laptop drops packets with $p=0.25$. You, the Chief Technology Officer of Netflix, know that given the heavy workload of EE 126, this may be your only chance to convert this freeloading customer into a permanent one, but to do so you're going to have to make sure her viewing experience is perfect.

Concrete specs:¶

You are given an erasure channel with drop probability $p=0.25$.
You must define a degree distribution (which can vary as a function of the # of transmissions already sent) to minimize the number of total packets needed to be sent to Alice for her to decode their images. Run your code for 10 trials to get a good estimate of the true number of transmissions needed per image while they watch their movies. Each trial, your score is $\frac{\text{# of packets successfully decoded from the first 512 packets}}{512}+\frac{\text{# of packets successfully decoded from the first 1024 packets}}{1024}+\lfloor\frac{\text{# of packets successfully decoded from the first 2048 packets}}{1024}\rfloor+\lfloor\frac{\text{# of packets successfully decoded from the first 4096 packets}}{1024}\rfloor+\lfloor\frac{\text{# of packets successfully decoded from the first 6144 packets}}{1024}\rfloor$.
Note the floor function in the later stages - you can only get the point if you fully decode the file with the alloted number of packets
You may work in teams of up to three.

Good luck!

If you place in the top 3 in the class you will be awarded bonus points and full credit for the homework, as well as get to present your strategy to the entire course staff!

Besides the top 3 submissions:

Everyone who scores above 3 points will receive bonus credit that is proportional to their score!¶

In [ ]:

#Your code here

$\mathcal{R}$esults

Report the average score (averaged over 10 trials):

$$\text{Average Score: } \boxed{<INSERT\ HERE>}$$

$\mathcal{S}$ummary

Answer the following in 1-2 paragraphs (this should be answered individually):

Who were your teammates?
What did you learn?
What is the basic inuition behind your final strategy?
How did your strategy evolve from your first attempt (what worked and what failed)?
How would your strategy change if the value of $p$ of the BEC was not known?

Your Response Here