Week 1: Introduction and background

About these notes

These notes are not a replacement for the (excellent) textbook. They are partly my commentary on the textbook and partly supplementary material that isn't found in the book. Please use them in conjunction with the textbook.

Tuesday reading

Read 0.1, which is an overview of the whole course.

Tuesday class

Welcome to Theory of Computing! I'd like to introduce the course with a game called Poco. Play a few levels to familiarize yourself with how the game works. How would you write a computer program to play this game automatically?

Computable and uncomputable

In order to answer this question, we need to ask a more basic question: What is a computer? Historically, the answer to this question came from a totally different direction. Although people had built calculating machines, like Leibniz's stepped reckoner or Babbage's difference engine, the question was not "what is a computer," but "what does it mean to compute?" For in 1936, everyone knew what a computer was. Just as a printer is a person who prints, and a dishwasher is a person who washes dishes, a computer in 1936 was a person who computes.

For example, when you do arithmetic or solve a system of linear equations or find a derivative, you're computing -- you're following an algorithm that you learned in school. But some math requires more ingenuity, like writing proofs, and in the early 20th century, mathematicians were starting to wonder whether there was an "effective procedure" (we would say, an algorithm) for writing proofs too. The theory of computing developed as the answer to these questions; computing machines were only somewhat incidental.

In 1936, Turing published the paper that many consider to be the founding document of computer science. His answer to the question about what was meant by an "effective procedure" was not the first, but it was the most convincing. He imagined a computing person in an idealized scenario:

  • He has an infinitely long paper tape, divided into squares.
  • He can write one symbol in each square, and the number of possible symbols is finite (e.g., 0 to 9).
  • He can only look at a finite number of squares at a time.
  • He can only move a finite distance at a time.
  • He has only a finite number of “states of mind.”

He then imagined building machines (which he called a-machines, but which we now call Turing machines) that would do the same thing automatically.

Turing machines have been constructed out of wood or Legos, but they were really invented just as a mathematical construct, a way of formalizing what it means to compute. Nevertheless, Turing machines are important for computer science because any current digital computer (under some reasonable assumptions) can be emulated by a Turing machine. For example, it's possible to compile C programs into Turing machines. We will further argue that even computers of the future can be emulated by Turing machines. If you accept these arguments, then Turing machines serve as our definition of what a computer is: if something can be done on a computer, it can be done on a Turing machine; if something cannot be done on a Turing machine, it cannot be done on any other computer.

Under this definition, we'll be able to prove that there are problems that no Turing machine can solve, and therefore that no program written for any current digital computer can solve. Automatically solving "Poco" (more commonly known as the Post Correspondence Problem) is one of these.

Restricted computation

Even though they're so simple, Turing machines are very powerful -- too powerful, it was felt by some. One annoying property they have is that they can go into an infinite loop, something that you've no doubt experienced both as a computer user and a computer programmer. So theoretical computer scientists became interested in restricted machines which are in some sense better behaved. In the first half of the course, we'll look at two important ones: finite automata and pushdown automata.

Tractable and intractable

Our investigation of Turing machines and models equivalent to it will lead to a reasonable definition of what “tractable” means, namely, those that can be solved in polynomial time. In Unit IV, we’ll see that there is a large class of problems, called NP-complete, that are interesting and useful but don’t seem to be tractably solvable. One of them is the bounded version of Post's Correspondence Problem: If I give you a set of tiles and tell you that the solution uses at most $n$ (copies of) tiles, can you find the solution? Probably the most well-known NP-complete problem is the Traveling Salesman problem: If you have to visit $n$ cities, what is the shortest way to visit all of them exactly once? I say that these problems don't seem to be tractably solvable, because no one knows if they are. The amazing thing is that if you were to discover a tractable solution for one of them, then you would have a solution for all of them!

Wednesday reading

Skim Sections 0.2-4, which cover mathematical preliminaries that you should have gotten in Discrete Math. If anything seems unfamiliar to you, study it a little more carefully. The subsection "Strings and Languages" is especially important (and surprisingly short); we'll be focusing on it in class.

Thursday class

In this course we want to study all the things that a computer can do. But there are so many kinds of things that computers can do, and it would seem that we have to study each kind separately. But maybe we can reduce them all to one kind of problem, and focus our study on that one kind of problem. Namely, we want to treat every kind of object as a string of symbols (e.g., bits), and we want to treat every kind of computational problem as a yes/no question about strings.

People are fairly comfortable these days with the idea that any kind of object we might want to compute about can be represented as a string. We deal every day with messages, music, pictures, movies, books, etc., in digital form, with the awareness that these are just strings of 0’s and 1’s.

Less obvious, perhaps, is that it makes sense to reduce all kinds of computations down to yes/no questions about strings. For one, our experience with computers today is highly interactive. But user-computer interaction can be thought of as a sequence of inputs and outputs.

Second, the output from a computer is much more than just a "yes" or "no" answer. But if you have any function $f(x)$ that takes an input $x$ and outputs another string, we can think of it as the related function $f'(x,y)$ that takes an input $x$ and a possible output $y$, and returns "yes" iff $f(x) = y$. Then you would use $f'$ by calling it on all possible outputs $y$ until you found one such that $f'(x, y)$ is "yes".

Caveat: Using $f'$ in this way might be much slower than calling the original $f$. This will matter when we get to Unit IV, and we'll deal with this issue when the time comes.

Strings

An alphabet is a nonempty finite set of symbols. We use $\Sigma$ (uppercase sigma) or sometimes $\Gamma$ (uppercase gamma) to stand for an alphabet.

A string is a finite sequence of symbols. We usually use $w$ for a variable that stands for a string and $u,v$ or $x,y,z$ when we need more variables.

Alphabet symbols are analogous to characters in most programming languages, and strings are analogous to strings. But the notation is different:

  • In (say) Python, the constant 'a' and the variable a are two different things. In Theory, this distinction is just as important, but not as clearly reflected in the notation. Sipser writes actual symbols in typewriter font, like $\texttt{a}, \texttt{b},$ etc., but variables standing for symbols in italics, like $a, b$, etc. We will always follow this convention, but not all authors do.

  • We don't distinguish between symbols and length-1 strings. (This is the same as Python, but unlike most other programming languages.)

  • Instead of "", we write the empty string as $\varepsilon$ (epsilon, also written as $\epsilon$). This may be confusing at first because it looks like a symbol, but it is not.

Question. (This question and all questions below are not for credit; they will be discussed in class.)

  • Is $\varepsilon$ a substring of $\texttt{abc}$?
  • What is $|\varepsilon\varepsilon\varepsilon|$?

Languages

A set of strings is called a language. We write $\Sigma^\ast$ ("sigma star") for the language of all strings over $\Sigma$. We write $\emptyset$ (empty set) for the empty language.

Question. True or false? \begin{align} \varepsilon &\in \{\mathtt{abc}\} \\ \varepsilon &\in \Sigma^\ast \\ \{\texttt{abc}\} \cup \emptyset &= \{\texttt{abc}\} \\ \{\texttt{abc}\} \cup \{\varepsilon\} &= \{\texttt{abc}\} \end{align}

Question. Is there such a thing (according to the definitions above) as

  • an infinite alphabet?
  • an infinite string?
  • an infinite language?

Question. Prove that the length of strings in a infinite language is unbounded. That is, if $L$ is infinite, there does not exist an $N$ such that for all $w \in L$, $|w| \leq N$.

Operations on strings often induce analogous operations on languages. For example, if $L$ and $L'$ are languages, \begin{align} L^R &= \{ w^R \mid w \in L \} \\ L \circ L' = L L' &= \{ w w' \mid w \in L, w' \in L'\}. \end{align}

For example, if $L = \{\texttt{a}, \texttt{b}\}$ and $L' = \{\texttt{c}, \texttt{d}\}$, then $L \circ L' = \{\texttt{ac}, \texttt{ad}, \texttt{bc}, \texttt{bd}\}$.

Question. According to the above definition, what is

  • $\emptyset \circ \{\texttt{a}\}$
  • $\{\varepsilon\} \circ \{\texttt{a}\}$

The $\ast$ operator that we saw above, known as the Kleene star, can be applied to any language. That is, $L^\ast$ is defined as the smallest language such that:

  • $\varepsilon \in L^\ast$.
  • If $v \in L$ and $w \in L^\ast$, then $vw \in L^\ast$.

For example, if $L = \{\texttt{aa}, \texttt{bb}\}$, then $L^\ast = \{\varepsilon, \texttt{aa}, \texttt{bb}, \texttt{aaaa}, \texttt{aabb}, \texttt{bbaa}, \texttt{bbbb}, \texttt{aaaaaa}, \ldots\}$.

Question. According to the above definition, what is

  • $\emptyset^\ast$?
  • $\{\varepsilon\}^\ast$?

Above, we said that we treat every computational problem as a yes/no question about strings. Another way of saying this is that we treat every computational problem as a language (does string $w$ belong to language $L$?). For example, the problem of testing whether a number is prime could be treated as the language $\{w \in \{\texttt{0}, \ldots, \texttt{9}\}^\ast \mid \text{the number with decimal representation $w$ is prime}\}$.

Language classes

Since some problems are harder than others, it follows that some languages are harder to recognize than others. Some languages can be recognized by a Turing machine and some can't. Some can be recognized by a Turing machine in polynomial time ($O(n^k)$ time for some $k$) and some can't. So we can talk about sets of languages, commonly called language classes. Be careful not to confuse languages and language classes! (It's easy to do.)

Question. Is the class of all finite languages finite? Is the language class $\{\Sigma^\ast\}$ finite or infinite?

There are only a handful of language classes we will learn this semester. They are: Overview of language classes Each of the six boxes is a language class, with the name of the class written above; inside each box are kinds of machines that accept that language class. Each class is a strict superset of the previous one, except NP is not known to be a strict superset of P.