Lab 4 - Codes for Efficient Transmission of Data

Authors:

v1.0 (2014 Fall) Kangwook Lee **, Kannan Ramchandran **
v1.1 (2015 Fall) Kabir Chandrasekher *, Max Kanwal *, Kangwook Lee **, Kannan Ramchandran **

In last week's lab, we learned about source coding and then briefly touched upon the channel coding problem. We learned about the simplest and most intuitive code: repetition codes. This week you will explore implementing your own code for transmission of data!

To see why efficient coding techniques are important, consider an erasure channel in which each packet sent gets dropped with some probability $p$. If we wanted to convey a message, we could consider a feedback channel in which the receiver tells the sender which messages were received and the sender re-sends the dropped packets. This process can be repeated until the receiver gets all of the intended message. While this procedure is indeed optimal in all senses of the word, feedback is simply not possible in many circumstances. What can we do in this situation?

In this lab, we will be looking at a specific type of code that will apply your knowledge of random bipartite graphs (the balls and bins model).

Introduction

The coding scheme we will work with in this lab will be a type of erasure code. Under the binary erasure channel (BEC) model, bits that are sent through a noisy channel either make it through unmodified or are tagged as "corrupt", in which case the recieved information is dropped in all further information processing steps.

A code is considered optimal generally when all $n$ source symbols that need transmitting can be recovered from any $n$ encoded symbols. Think of it this way: You want to stream the new season of House of Cards from Netflix at midnight. But so do 1 million other people! Obviously the server loads are going to be high, so how can Netflix ensure that everyone watches without intermittent lags (which are frustrating and will lose them customers)? Well, one way is to increase the number of servers so that the load on any one server is decreased, but this costs a lot of money. Another important thing for them to consider is in the transmission process. When users are downloading the movie, Netflix servers don't want to have too much overhead to figure out which users need which bits of the video to get smooth playback. If they use near-optimal codes to encode and constantly send out the same random chunks of the video's data to all users, then they can be sure that users get what they need in only a little more than $n$ transmissions no matter what parts of the show each individual user lost through their specific channel!

So what's the secret to this magic? It's a two step process of clever encoding and decoding:

Encoding

  1. For $n$ packets you want to transmit in total, pick $d$ such that $1\leq d \leq n$ according to some distribution.
  2. With $d$ picked, now select $d$ random packets of the $n$ and combine the bits together using the XOR operator.
  3. Transmit this combined packet, along with the metadata telling which actual packet indices were XOR'd.

Decoding

  1. Reconstruct the list of packet indices which were XOR'd (in our case these are just explicitly sent with the data)
  2. For each source packet, check if the packet is a singleton, in which case the encoded packet is exactly equal to the actual packet. If not, we can check if any of the packets that have been XOR'd are already known, in which case XOR that packet with the encoded packet and remove it from the list of packet indices that make up the encoded chunk.
  3. If there are two or more indices in the list for the encoded packet we cannot figure out any more information! Put it on the side for looking at later.
  4. With any newly decoded information, we may be able to decode previously undecodable packets that we had put on the side. Go through all unsolved packets and try to decode more packets until nothing more can be done.
  5. Wait for the next encoded packet to come and repeat!

Now what's left for you to do? Well, remember that number $d$? It needs to be picked according to some distribution, and which distribution is the million dollar question!

Example

Consider the above bipartite graph. Here, $x_i$ represents the encoded incoming packet, $y_i$ represents the actual packet, and the edges represent the actual packets that make up each encoded packet. Consider each $x_i$ coming in chronologically:

  1. For incoming packets $x_1, x_2, x_3$ we will not be able to decode anything.
  2. As soon as $x_4$ comes, we see that it is a singleton. This means that $y_4$ can be fully recovered from $x_4$.
  3. Finally, when $x_5$ comes in, we can suddenly decode everything: $x_5$ is made up of $y_4$ and $y_5$, but we already know $y_4$, so we can then fully decode $y_5$. Similarly, we can then decode $y_3$, then $y_1$, and lastly $y_2$ (in that order specifically).

As you might be able to tell, by choosing a good degree distribution for $d$, even when random incoming packets were lost (not shown), you were still able to recover all $5$ symbols only from $5$ received packets, despite the sender not knowning what packets you lost through the BEC.

Code

1) Packet Class & Utility functions

A packet consists of...

['chunk_indices', 'data' ]

chunk_indices: Which packets are chosen
data: The 'XOR'ed data (we will just add the bits in this lab for simplicity)
In [3]:
%matplotlib inline  
import matplotlib.pyplot as plt
import numpy as np
import json
import random

class Packet:
    size_of_packet = 256
    def __init__(self, chunks, chunk_indices):
        self.data = self.add(chunks)
        self.chunk_indices = chunk_indices

    def add(self, chunks):
        tmp = np.zeros(Packet.size_of_packet, 'uint8')
        for each_chunk in chunks:
            tmp += each_chunk
        return tmp
            
    def num_of_chunks(self):
        return len(self.chunk_indices)

2) Transmitter & Encoder Class

You can initiate an encoder with a string! Then, generate_packet() will return a randomly encoded packet.

In [1]:
class Transmitter:
    def __init__(self, chunks, channel, degree_distribution):
        self.chunks = chunks
        self.num_chunks = len(chunks)
        self.channel = channel
        self.degree_distribution = degree_distribution
        
    def generate_new_packet(self):
        if self.degree_distribution == 'single':
            #Always give a degree of 1
            n_of_chunks = 1
        elif self.degree_distribution == 'double':
            #Always give a degree of 2
            n_of_chunks = 2
        elif self.degree_distribution == 'mixed':
            #Give a degree of 1 half the time, 2 the other half
            if random.random() < 0.5:
                n_of_chunks = 1
            else:
                n_of_chunks = 2
        elif self.degree_distribution == 'baseline':
            """
            Randomly assign a degree from between 1 and 5.
            If num_chunks < 5, randomly assign a degree from 
            between 1 and num_chunks
            """
            n_of_chunks = random.randint(1,min(5, self.num_chunks))
        chunk_indices = random.sample(range(self.num_chunks), n_of_chunks)
        chunks = [ self.chunks[x] for x in chunk_indices ]
        return Packet( chunks, chunk_indices )
        
    def transmit_one_packet(self):
        packet = self.generate_new_packet()
        self.channel.enqueue( packet )

3) Channel Class

Channel class takes a packet and erase it with probability eps.

In [5]:
class Channel:
    def __init__(self, eps):
        self.eps = eps
        self.current_packet = None
        
    def enqueue(self, packet):
        if random.random() < eps:
            self.current_packet = None
        else:
            self.current_packet = packet
            
    def dequeue(self):
        return self.current_packet 

4) Receiver & Decoder Class

You can initiate a decoder with the total number of chunks. Then, add_packet() will add a received packet to the decoder.

In [6]:
class Receiver:
    def __init__(self, num_chunks, channel):
        self.num_chunks = num_chunks
        self.received_packets = []
        self.chunks = [ ]
        for i in range(self.num_chunks):
            self.chunks.append(  np.zeros(Packet.size_of_packet, 'uint8') )
        self.found = [ False for x in range(self.num_chunks) ]
        self.channel = channel
        
    def receive_packet(self):
        packet = self.channel.dequeue()
        if not packet == None:
            self.received_packets.append( packet )
            if packet.num_of_chunks() == 1:
                self.peeling()
        
    def peeling(self):
        flag = True
        while flag:
            flag = False
            for each_packet in self.received_packets:
                if each_packet.num_of_chunks() == 1: # Found a singleton
                    flag = True
                    idx = each_packet.chunk_indices[0]
                    break
            
            # First, declare the identified chunk
            if self.found[ idx ] == False:
                self.chunks[ idx ] = np.array(each_packet.data, 'uint8')
                self.found[ idx ] = True
            # Second, peel it off from others
            for each_packet in self.received_packets:
                if idx in each_packet.chunk_indices:
                    each_packet.chunk_indices.remove( idx )
                    each_packet.data -= self.chunks[ idx ]
            
    def isDone(self):
        return self.chunksDone() == self.num_chunks

    def chunksDone(self):
        return sum(self.found)

$\mathcal{Q}$uestion 1. Sending Lena

a. Break up the lena image (shown below) into $1024$ packets of size $256$ each.

In [7]:
from scipy import misc
import matplotlib.cm as cm
plt.imshow(misc.lena(), cmap = cm.Greys_r)
Out[7]:
<matplotlib.image.AxesImage at 0x107dbf6d0>
In [10]:
import numpy as np

l = misc.lena()
tt= l.reshape(1,512*512)[0]
size_of_packet = 256 
num_of_packets = 1024
assert len(tt) == size_of_packet * num_of_packets

# Your code here

b. Using the 'single' degree distribution defined in the Transmitter class, send Lena over a channel with erasure probability 0.2. How many packets did you need to send? Display the data you receive every $100$ packets in addition to the data you receive at the end.

i. Plot the number of packets decoded as a function of the number of packets you receive. (The current_sent array should be helpful here)

ii. Does this remind you of something you've seen before?

You may find the following function useful:

In [8]:
def visualize( chunks ):
    plt.imshow(chunks, cmap = cm.Greys_r)
In [13]:
eps = 0.2
ch = Channel(eps)
tx = Transmitter( chunks, ch, 'single' )
rx = Receiver( len(chunks), ch )

ct = 0
intermediate_data = []
current_sent = []
while not rx.isDone():
    tx.transmit_one_packet()
    rx.receive_packet()
    ct += 1
    if ct % 100 == 0:   
        intermediate_data.append( np.array(rx.chunks, 'uint8').reshape(512,512) )
        current_sent.append(sum(rx.found))

received_data = np.array(rx.chunks, 'uint8').reshape(512,512)
print "The number of packets received: {}".format(ct)

### Incrementally show the data received
n_of_figures = int(ct/100)
fig = plt.figure( figsize=(8, 3*n_of_figures) )

for i in range(n_of_figures-1):
    fig.add_subplot(n_of_figures,1,i+1)
    visualize(intermediate_data[i])

fig.add_subplot(n_of_figures,1,n_of_figures)
visualize(received_data)
In [ ]:
#Your plotting code here

c. Using the 'double' degree distribution defined in the Transmitter class, send Lena over a channel with erasure probability 0.2. What happens?

In [12]:
eps = 0.2
ch = Channel(eps)
tx = Transmitter( chunks, ch, 'double' )
rx = Receiver( len(chunks), ch )

ct = 0
while not rx.isDone():
    tx.transmit_one_packet()
    rx.receive_packet()
    ct += 1

received_data = np.array(rx.chunks, 'uint8').reshape(512,512)
print "The number of packets received: {}".format(ct)

d. You have seen two degree distributions so far. Both of these have been deterministic, and one worked better than the other. Let's try one final degree distribution. Using the 'baseline' degree distribution, send Lena over a channel with erasure probability 0.2. Plot the number of packets decoded against the number of packets transmitted.

In [ ]:
eps = 0.2
num_trials = 1
ch = Channel(eps)
tx_baseline = Transmitter( chunks, ch, 'baseline' )
tx_soliton = Transmitter( chunks, ch, 'sd')
baseline_packets, soliton_packets = [], []
current_baseline, current_soliton = [], []

for _ in xrange(num_trials):
    rx = Receiver( len(chunks), ch )
    ct = 0
    intermediate_data = []
    while not rx.isDone():
        tx_baseline.transmit_one_packet()
        rx.receive_packet()
        ct += 1
        if ct % 100 == 0:   
            current_baseline.append(sum(rx.found))
    baseline_packets.append(ct)

for _ in xrange(num_trials):
    rx = Receiver( len(chunks), ch )
    ct = 0
    intermediate_data = []
    while not rx.isDone():
        tx_soliton.transmit_one_packet()
        rx.receive_packet()
        ct += 1  
        current_soliton.append(sum(rx.found))
    soliton_packets.append(ct)
In [ ]:
#Plotting code here

$\mathcal{C}$ompetition

Alice and Bob have both just finished eating dinner, and with their EE 126 homework completed early for once, they plan to sit down for a movie night (they want to make use of the 30-day free trial of Netflix they just signed up for!). While Alice is surfing Netflix she decides she wants to stream Interstellar. Bob is not happy — he just saw that last weekend while he was secretly procrastinating on 126 homework. He wanted to watch Inception instead. As a result, Bob sat down with Alice, but pulled out his phone sneakily to stream Inception with subtitles on. Given the bad service in the area, Bob's phone drops packets with $p=30\%$, while Alice's laptop only drops packets with $p=20\%$. You, the Chief Technology Officer of Netflix, know that given the heavy workload of EE 126, this may be your only chance to convert these freeloading customers into permanent ones, but to do so you're going to have to make sure their viewing experience is perfect.

Concrete specs:

  • You are given two erasure channels with drop probability $p=0.20$ and $p=0.30$.
  • You must define a degree distribution (which can vary as a function of the # of transmissions already sent) to minimize the number of total packets needed to be sent to Alice and Bob for them to decode their images. Run your code for 10 trials to get a good estimate of the true number of transmissions needed per image while they watch their movies. Each trial, your score is $\frac{\text{# of Transmissions to Alice + # of Transmissions to Bob}}{2}$.
  • You may work in teams of up to three.

Good luck!

If you place in the top 3 in the class you will be awarded bonus points and full credit for the homework, as well as get to present your strategy to the entire course staff!

For the rest of you, the grading scheme will be as follows: $$\text{Transmissions} \Rightarrow \text{Grade}$$ $$[1,3500] \Rightarrow A\ (10)$$ $$(3500, 4500] \Rightarrow B\ (8)$$ $$(4500, 6000] \Rightarrow C\ (5)$$ $$(6000, 10000] \Rightarrow D\ (2)$$ $$(10,000,\infty) \Rightarrow F\ (0)$$

In [1]:
#Your code here

$\mathcal{R}$esults

Report the average # of transmissions needed to give Alice and Bob the complete file (averaged over 10 trials):

$$\text{Average Transmissions Needed: } \boxed{<INSERT\ HERE>}$$

$\mathcal{S}$ummary

Answer the following in 1-2 paragraphs (this should be answered individually):

  • Who were your teammates?
  • What did you learn?
  • What is the basic inuition behind your final strategy?
  • How did your strategy evolve from your first attempt (what worked and what failed)?
  • How would your strategy change if the value of $p$ of the BEC was not known?

Your Response Here

This website does not host notebooks, it only renders notebooks available on other websites.

Delivered by Fastly, Rendered by OVHcloud

nbviewer GitHub repository.

nbviewer version: cf693bd

nbconvert version: 5.6.1

Rendered (Tue, 01 Dec 2020 08:52:31 UTC)