Skim Section 4.1 but take a good look at Figure 4.10.
Read the beginning of Section 4.2, up to but not including "The Diagonalization Method."
Section 4.1 is long. It shows:
Our focus for today is the universal TM $U$ defined at the beginning of the proof of Theorem 4.11. It is a TM that can read in the description of another TM and simulate it. For some reason, the book deemphasizes this idea. It's significant not only because of its relevance to Theorem 4.11, but also because it gave rise to (or at least anticipated) the idea of a stored program computer.
First, we have to show how to encode a TM as a string. This is easy to do and the details are not important, but it's worth at least mentioning one way to do it for concreteness. Turing's original encoding used an alphabet $\{\mathtt{A}, \mathtt{C}, \mathtt{D}, \mathtt{L}, \mathtt{R}, \mathtt{N}\}$. If the states in $Q$ are numbered $q_1, q_2, \ldots$, then state $q_i$ is encoded as $\mathtt{DA}^i$. If the symbols in $\Gamma$ are numbered $a_0, a_1, \ldots$, where $a_0$ is the blank symbol, then symbol $a_i$ is encoded as $\mathtt{DC}^i$. Then, the transition $\delta(q_i, a_j) = (q_k, a_\ell, \textrm{L})$ is encoded as $\mathtt{DA}^i \mathtt{DC}^j \mathtt{DC}^\ell \mathtt{L} \mathtt{DA}^k$, and similarly if the move is R or N. We also need some convention for indicating the start, accept, and reject states: perhaps $q_1$ is always the start state, $q_2$ the accept state, and $q_3$ the reject state.
Second, we have to show the universal TM itself. It is often constructed as a TM with three tapes:
An implementation-level description would be: On input $\langle M, w\rangle$, where $M$ is a TM and $w$ is a string:
There's a cottage industry of seeing who can make the smallest universal TM. The current record holder, due to Rogozhin, has only a couple dozen instructions.
Read Section 4.2 and, for fun, the poem "Scooping the Loop Snooper: A proof that the Halting Problem is undecidable."
The book has a long digression about countable and uncountable infinities. If you are not familiar with these concepts and Cantor's diagonalization argument, pay careful attention to this part of the section, even though it may not seem relevant at first. The point of this digression is that the vast majority of languages are not Turing-recognizable (or decidable).
The next subsection gives an actual example of an undecidable language. In the 2nd edition of the book, this subsection (beginning on page 179) is misleadingly called "The Halting Problem." But the language $A_{\mathsf{TM}}$ is not the halting problem, and this subsection was retitled "An Undecidable Language" in the 3rd edition.
What would it mean for $A_{\mathsf{TM}}$ to be decidable? If $A_{\mathsf{TM}}$ were decidable, then that would mean we could simulate any non-decider TM in such a way that looping could be avoided (as is the case for PDAs).
The proof that $A_{\mathsf{TM}}$ is undecidable is a mind-bender. Perhaps this version of the proof is conceptually easier (similar to the proof in Hopcroft and Ullman).
Suppose that there exists a decider $H$ that decides $A_{\mathsf{TM}}$. Then we can imagine a big table of all possible outcomes of $H$ (assuming an alphabet $\{\mathtt{0}, \mathtt{1}\}$ for illustration's sake):
$\varepsilon$ | $\mathtt{0}$ | $\mathtt{1}$ | $\mathtt{00}$ | $\cdots$ | |
---|---|---|---|---|---|
$M_1$ | reject | accept | reject | accept | |
$M_2$ | reject | accept | accept | reject | |
$M_3$ | accept | reject | accept | reject | |
$M_4$ | reject | reject | reject | accept | |
$\vdots$ |
We should be able to compute the value of any cell of this table since $H$ is supposedly a decider. Therefore, we should be able to compute a function $D$ that is the opposite of every diagonal entry of this table. But this is a contradiction, because $D$ must itself have a row in the table, yet it differs in at least one column from every row in the table.
More precisely, let $M_i$ be the $i$th machine in some ordering of Turing machines, and let $w_i$ be the $i$th string in some ordering of strings in $\Sigma^\ast$ (page 14). Then $D$ is defined as follows. On input $w$,
This new TM $D$ must be different from every single row of the table, which is a contradiction, because we said that the table contains every possible TM. We conclude that $H$ does not exist, and $A_{\mathsf{TM}}$ is undecidable.
To modify this version of the proof into the version in the book, observe that we can label the columns of the table with whatever strings we want as long as there's an infinite number of them. So label $i$th column with the encoding of the $i$th machine, that is, $w_i = \langle M_i \rangle$. The fact that this labeling leaves out some strings is not a problem. Then $D$ is defined as follows. On input $w$,
Again, this new TM $D$ must be different from every single row of the table, which is a contradiction.
We can also demonstrate the argument in any programming language, here Python. Assume that we have a function called accepts
that takes two arguments, function
and input
, and supposedly can decide whether function
accepts input
(even if function
does not halt on input
).
def accepts(M, w):
"""Supposedly returns True if M(w) is True
False if M(w) is False or does not halt."""
# Note: this code is nonsense. You can replace it with whatever you want.
return (hash(M.__code__.co_code) + hash(w)) % 2 == 0
Then we can design a function and an input that "breaks" accepts
.
def D(w):
if callable(w): # if w is a function,
return not accepts(w, w)
else:
return False # arbitrary return value for non-callables
w_bad = D
Let's test it out! When we use accepts
to test whether D
accepts w_bad
, we get:
accepts(D, w_bad)
But the correct answer is:
D(w_bad)
The last subsection exhibits a language that is not even Turing-recognizable. It is, frankly, not that interesting. The more important thing we learn here is Theorem 4.22,
\begin{align*} \text{decidable} &= \text{Turing-recognizable} \cap \text{co-Turing-recognizable}. \end{align*}