In [1]:
%matplotlib inline
import pandas as pd

import numpy as np
from __future__ import division
import itertools

import matplotlib.pyplot as plt
import seaborn as sns

import logging
logger = logging.getLogger()

16 Greedy Algorithms

intent: it makes a locally optimal choice in the hope that the choice will lead to a globally optimal solution.

16.1 An activity-selection problem

an activity $a_i$ has a start and finish time $[s_i, f_i)$.
activities set $S = \{a_1, a_2, \dotsc, a_n\}$, ordered by $f_i$.

$a_i$ and $a_j$ are compatible if their intervals don't overlap.

We wish to select a maximum-size subsets of mutually compatible activities.

In [2]:
# eg
raw_data = [
    [1, 3, 0, 5, 3, 5, 6, 8, 8, 2, 12],
    [4, 5, 6, 7, 9, 9, 10, 11, 12, 14, 16]

df = pd.DataFrame(raw_data, index=['s', 'f'], columns=xrange(1,12,1))
1 2 3 4 5 6 7 8 9 10 11
s 1 3 0 5 3 5 6 8 8 2 12
f 4 5 6 7 9 9 10 11 12 14 16

Dynamic Programming

denote: $S_{ij} = \{a_{i+1}, \dotsc, a_{j-1}\}$.

Suppose $a_k$ in an optimal solution, then the left two subproblems: $S_{ik}$ and $S_{kj}$, whose optimal solution should be within the optimal solution of $S_{ij}$.

\begin{equation} c[i,j] = \begin{cases} 0 & \quad \text{ if } S_{ij} = \emptyset \\ max_{a_k \in S_{ij}} c[i,k] + c[k,j] + 1 & \quad \text{ otherwise} \end{cases} \end{equation}

In [3]:

Making the greedy choice

we should choose an activity that leaves the resource available for as many other activities as possible.
Hence, the first one should be the activity in $S$ with the earliest finish time.

Is the greedy choice always part of some optimal solution?

Theorem 16.1:
Let $S_k = \{a_i \in S: s_i \geq f_k\}$.
We have $a_m$ s.t. $f_m = min(f_i : a_i \in S)$
Then $a_m$ must be included in some maximum-size subset of mutually compatible activities of $S_k$.

Proof: (构造法)

Let $A_k$ be a maximum-size subset, and $a_j$ s.t. $f_j = min(f_i: a_i in A_k)$.
Because $f_m \leq f_j$,
hence we could replace $a_j$ with $a_m$, namely, $A'_k = (A_k - a_j) \cup a_m$.
then we get a new maximum-size subset $A'_k$ which includs $a_m$.

An recursive greedy algorithm

tail recursive

An iterative greedy algorithm
In [4]:
1 2 3 4 5 6 7 8 9 10 11
s 1 3 0 5 3 5 6 8 8 2 12
f 4 5 6 7 9 9 10 11 12 14 16
In [5]:
def greedy_activity_selector(df):
    n = df.shape[1] + 1
    k = 1
    A = [k] 

    for m in xrange(2, n):
        if df.loc['s', m] >= df.loc['f', k]:
            k = m
    return A

[1, 4, 8, 11]

16.2 Elements of the greedy strategy

  1. Cast the optimization problem as one in which we make a choice and are left with one subproblem to solve.

  2. Prove that there is always an optimal solution to the original problem that makes the greedy choice, so that the greedy choice is always safe.

  3. Demonstrate optimal substructure, and combine solutions.

Beneath every greedy algorithm, there is almost always a more cumbersome dynamic-programming solution.

Two key ingredients: greedy-choice property and optimal substructure.

Greedy -choice property

idea: we can assemble a globally optimal solution by making locally optimal (greedy) choices.

Dynamic programming: solves the subproblems before making the first choice. $\to$ bottom-up, iteration
Greedy algorithm: makes its first choice before solving any subproblems. $\to$ top-down, recursive

  • usually more efficient without considering a wider set of choices.
Optimal substruture

A problem exhibits optimal substruture if an optimal solution to the problem contains whithin it optimal solutions to subproblems.

namely, an optimal solution to the subproblem, combined with the greedy choice already made, yields an optimal solution to the original problem.

Greedy versus Dynamic programming
  1. 0-1 knapscak problem:
    Given: A thief robbing a store finds $n$ items. The $i$th item is worth $v_i$ dollars and weighs $w_i$ pounds.
    The thife can carry at most $W$ pounds in his knapsack.
    For a item, the thife must either take it or leave it behind.

    Problem: Which combination of itmes the thief choose to take as valuable a load as possible?

  2. fractional knapsack problem:
    diff: For a item, the thife can take fraction of items.

16.3 Huffman codes

Prefix codes

prefix codes: codes in which no codeword is also a prefix of some other codeword.
$implies$ unambiguous.

  • A prefix code can always achieve the optimal data compression among any character code.

  • An optimal code is always represented by a full binary tree.
    The number of bits required to encode a file is thus: $$B(T) = \displaystyle\sum_{c \in C} c.\text{freq} \cdot d_T(c)$$ which we define as the cose the cost of the tree $T$.

Constructing a Hunffman code

The algorithm uses a min-priority queue $Q$, keyed on the freq attribute, to identify the two least-frequent object to merge together.

running time: $O(n \lg n)$

Correctness of Huffman's algorithm
1. greedy-choice property


2. optimal-substructure property


详见书和 penultimate 笔记