Weeks 11: Countable And Uncountable Infinities
Chris Tralie
We're going to take a brief diversion and to talk about infinities, but this will help us to understand some of the limits of computation related to decidability. As it turns out, some infinities are bigger than others. To get at this, let's first think about a tool we can design to measure how big sets are. We formally define a bijection or a 1-1 correspondence between two sets A and B as a subset of the Cartesian product A x B so that every element of A shows up in exactly one tuple and every element of B shows up in exactly one tuple. Visually, a bijection is a bipartite graph between two sets in which exactly one edge is incident to every element. Below are two examples of bijections between the sets {a, b, c, d, e} and the sets {1, 2, 3, 4, 5}, as well as a non bijection between these two sets and a non bijection between {a, b, c, d, e} and {1, 2, 3, 4}:
{(a, 1), (b, 2), (c, 3), (d, 4), (e, 5)}This is a bijection |
{(a, 3), (b, 2), (c, 5), (d, 4), (e, 1)}This is a bijection |
{(a, 3), (b, 2), (c, 1), (c, 5), (d, 4), (e, 5)}This is not a bijection because c and 5 each show up in two tuples |
{(a, 3), (b, 2), (c, 4), (e, 1)}This is not a bijection because d does not show up in any tuples. |
Now we're ready to state a theorem that will help us measure the size of sets
Lemma: Relative Size of Sets
It is possible to devise a bijection between two sets A and B if and only if A and B are the same size.
This is pretty intuitive for finite sets, such as {a, b, c, d, e} and {1, 2, 3, 4, 5}. And it shows us that no matter how hard we try, we can't come up with a bijection between {a, b, c, d, e} and {1, 2, 3, 4} because there aren't enough elements in the second set to match with the first set. What's interesting is that this notion of size correspondence applies to infinite sets as well, and it allows us to draw much less obvious conclusions:
Countably Infinite Sets
Now let's talk about infinities of a very particular size, known as countable infinities
Example 1: Even Natural Numbers
Claim: The set of even natural numbers is the same size as the set of natural numbers
Proof: We can come up with the 1-1 correspondence (x, 2x), where x varies over all natural numbers. Every natural number shows up exactly once on the left side of the tuple, and every even natural number shows up exactly once on the right side. The table below shows the first few tuples in the 1-1 correspondence:
1 | 2 |
2 | 4 |
3 | 6 |
... | ... |
This is a bit surprising because it seems like there are fewer even natural numbers than natural numbers. But this is not the case when we think about all infinitely many of them. In fact, the even natural numbers are an instance of what are called countably infinite sets
Def. Countably Infinite Sets
A set is countably infinite if it can be put into a 1-1 correspondence with the natural numbers. In this case, we say it is the same size as the natural numbers.
Let's look at some more examples of countably infinite sets.
Example 2: Even Integers
Claim: The even integers are countable
Proof: We can use the 1-1 correspondence
\[ \left( x, (-1)^{x} (2 \lfloor \frac{x}{2} \rfloor) \right) \]
where ⌊ u ⌋ is the "floor function" of u, which rounds u down to the nearest integer. This expression is a bit nasty, so let's look at the first few entries in the table
1 | 0 |
2 | 2 |
3 | -2 |
4 | 4 |
5 | -4 |
6 | 6 |
... | ... |
From this pattern, we can see more easily that we pair all even natural numbers with even positive integers, and we pair the odd numbers x with the even negative integers -x-1. We'll use this sort of "zig zagging" in a slightly more complicated example next:
Example 3: Positive Rational Numbers
Claim: The positive rational numbers a/b, where a and b are natural numbers, are countable.
Proof: We can think of the positive rational numbers as a subset of the cartesian product of the natural numbers with themselves. Each element (a, b) in this set is identified with the rational number a/b. Some rational numbers are counted more than once this way, such as 1/4 and 2/8. But all rational numbers are covered.
To construct a bijection with the natural numbers, we could create the bijection (x, 1/x). But this will only cover the first row of the table. Instead, we should zig zag through the table starting at the upper left, skipping numbers whose reduced fractions we've already matched with a natural number. The animation below shows the first 31 such numbers in the bijection. Notice how we skip numbers whose reduced fraction have already been matched
This also seems like a surprising result since all of the natural numbers are in the first row of the rational numbers, but we were able to get creative enough to find a different bijection.
Example 4: Strings in An Alphabet / Strings in An Infinite Language
Claim: The set of the strings Σ^{*} in an alphabet Σ are countable.
Proof: We simply order the strings by lexographic order and put them into a correspondence with the natural numbers in this order.
As an example, suppose we have the alphabet Σ = {a, b, c}
1 | a |
2 | b |
3 | c |
4 | aa |
5 | ab |
6 | ac |
7 | ba |
8 | bb |
9 | bc |
10 | ca |
.. | .. |
From this it follow that any infinite language has countably many strings, since a language is a subset of all possible strings over an alphabet.
Example 5: Turing Machines
Finally we get to something related to theory of computation! Consider the set of all Turing machines as string encodings <M> (click here to review how Turing machine string encodings work). If we fix an input alphabet Σ and a tape alphabet Γ, and we number each state as q_{x}, where x is (for instance) a number in base 10, then we can describe all possible Turing machines that use this alphabet as strings in the following alphabet:
{q,0,1,2,3,4,5,6,7,8,9,L,S,R,,(,),{,},▢} ∪ Σ ∪ Γ
These Turing machine encodings form an infinite subset of all possible strings in this alphabet, which we just showed in the prior example are countable. We could enumerate a 1-1 correspondence for Turing machines by looking at all strings in lexographic order and skipping those that do not form valid encodings, much like we skipped the rational numbers that we would be double counting while zig zagging.
So far, it seems like we're able to put all infinite sets into bijection with the natural numbers. However, as we will see momentarily, this is not always possible. This means that there are sets that have more elements than all possible Turing machines
Example 6: Regular Languages
Let's look at one last example of a countable infinity: the set of all regular languages. The easiest way to prove that the set of all regular languages is countable is by noting that every regular language can be described by a regular expression, which is a string over some alphabet. So this is actually even easier to see than the above proof. However, as we will see, this argument does not work for languages in general.
Uncountably Infinite Sets
Anything infinite that's too big to be put into bijection with the natural numbers is known as uncountable. Let's consider such an infinite set
Example 1: The Set of Infinite Binary Strings
Claim: The set of infinite length binary strings is uncountably infinite
Proof: We proceed via contradiction by assuming that this set is countable. In the spirit of being overly generous, we'll allow ourselves to construct any bijection with the rational numbers. Take the first 10 elements of a random bijection, for example
j = 1 | j = 2 | j = 3 | j = 4 | j = 5 | j = 6 | j = 7 | j = 8 | j = 9 | j = 10 | ||
i = 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | ... |
i = 2 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | ... |
i = 3 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | ... |
i = 4 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | ... |
i = 5 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | ... |
i = 6 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | ... |
i = 7 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... |
i = 8 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | ... |
i = 9 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | ... |
i = 10 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Let's focus on the elements along the diagonal of this table rendering of the bijection. In particular, let's consider the sequence of the i^{th} digit of infinite binary string associate to the i^{th} natural number, as shown in red in the table. We can now construct a binary string that never could have possibly been enumerated by taking the complement of this string. In the above example, the first 10 digits
0010100010...
would turn into
1101011101...
Since this string always disagrees with at least one character of every row, it could not possibly have existed in any rows. Hence, we've shown a way to construct a string that is missing in the bijection, regardless of the bijection, which is a contradiction. Hence, no possible bijection exists which puts the set of infinite binary strings into correspondence with the natural numbers. Hence, this set is uncountable
Intuitively, there's something about the fact that these binary strings are infinite in length that makes this set bigger than the set of all strings in an alphabet (which are finite).
Example 2: The Set of Languages
Now we arrive at some bad news for Turing machines
Claim: The set of all languages over an alphabet Σ is uncountable
Proof: Each language consists of a subset of all possible strings in the alphabet. We can describe each such subset as a unique infinite binary string. For example, consider the alphabet Σ = {0, 1} and the language of all binary strings with an even number of 0's. We let each digit of the binary string correspond to the i^{th} binary string in lexographic order. If this string is in our language, we'll put a 1 at that digit. Otherwise, we'll put a 0. Below is the beginning of this string
λ | 0 | 1 | 00 | 01 | 10 | 11 | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 | 0000 | 0001 | 0010 | 0011 | 0100 | 0101 | ... |
0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | ... |
We just showed that the set of all infinite binary strings is uncountable. Therefore, the set of all possible languages is uncountable.
Since the set of all Turing machines, on the other hand, is countable, this means that there are tons of languages that cannot even be recognized, let alone decided, by a Turing machine.
This is an instance of what's known as Cantor's diagonal argument. We will use a similar argument soon as the backbone behind proofs that certain languages are undecidable