Set Theory Jottings 3. The Paradoxes

Prev TOC Next

Frege added an appendix to volume II of his 1903 magnum opus Grundgesetze der Arithmetik (Foundations of Arithmetic). It began:

A scientist can hardly meet with anything more undesirable than to have the foundations give way just as the work is finished. I was put in this position by a letter from Mr. Bertrand Russell when the work was nearly through the press.

In a letter to his wife, Russell wrote:

I have heard from Frege, a most candid letter; he says that my conundrum makes not only his Arithmetic, but all possible Arithmetics, totter.

Frege regarded Cantor’s “acts of abstraction” as irredeemably vague. He based his development on the “extension of a concept”, a basic and unproblematical logical notion—or so he thought.

The extension of a concept is the class of everything that falls under that concept. If we have the concept “trio” for a group of three things, then the extension of this concept is the class of all trios. In other words, the set of all sets with exactly three elements. That’s the Frege definition of the number three: the class of all trios. The circularity is only apparent. You can define the concept “trio” by saying, “M is a trio if it contains distinct elements a, b, and c, and every element of M is either a or b or c.” More generally, Frege defined the cardinality of M as the set of all sets equivalent to M.

Russell had the same idea. But then he came up with the famous Russell “set”:

R is the set of all sets that are not elements of themselves, i.e., R={X:XX}. So RR if and only if RR.

It seems “extension of a concept” is problematical after all.

Russell came up with his set by examining Cantor’s diagonal argument, as applied to an arbitrary set X. This shows that any map f:XP(X) is not onto. (P(X) is the power set of X.) We let D={t∈X: tf(t)}. D cannot be in the image of f: if we had f(d)=D, then dD if and only if df(d)=D. (If you let X=ℕ, and treat subsets of ℕ as bitstrings, and write down a matrix of bits with the n-th row being f(n), you’ll see that this really is the diagonal argument.)

Now let X be the set of all sets. Cantor’s proof shows us that \overline{\overline{X}}<\overline{\overline{P(X)}}. But P(X) is a subset of X! If we let f be the identity map f(t)=t, then D becomes the Russell set.

This is known as the Paradox of the Largest Cardinal. As a variation, we could look at the sum of all cardinals—Cantor defined the sum of an arbitrary set of cardinals. If the sum of all cardinals is 𝔪 then clearly 𝔪 is the largest cardinal, contradicting 𝔪<2𝔪.

Zermelo had also discovered the Russell paradox slightly before Russell (although he did not publish it). But strangely enough, Cantor already discovered the Paradox of the Largest Cardinal. And it didn’t bother him that much! To quote Dauben, “Cantor … regarded the antimonies as positive results which fully complemented the advance of his research.” He seems to have unconsciously anticipated the adage, “Never let a crisis go to waste!”1

Cantor also discovered the Paradox of the Largest Ordinal, nowadays named after Burali-Forti because it can be inferred from an article he wrote, published in 1897. The collection of all ordinals (Cantor called this Ω) is well-ordered. Thus its order-type (\overline{\Omega} in Cantor’s notation) is the largest ordinal. But then what about \overline{\Omega}+1?

In a letter to Dedekind in 1899, Cantor expressed the paradox as a theorem:

Theorem A: The system Ω of all (ordinal) numbers is an absolutely infinite, inconsistent collection.

Shades of the “proper classes” of axiomatic set theory!

Cantor offered various elucidations for the term set (collection, multitude) over the years, none fully satisfactory:

By an “aggregate” or “set” I mean generally any multitude which can be thought of as a whole, i.e., any collection of definite elements which can be united by a law into a whole. [Foundations of a General Theory of Sets, 1883]

A set is indeed completely restricted in that everything belonging to it is determined in itself and is fully distinguished from all those not belonging to it. [Communications on the Theory of the Transfinite, 1887]

By a “set” we mean any collection M into a whole of definite, distinct objects m (which are called the “elements” of M) of our perception or of our thought. [Contributions to the Founding of Transfinite Set Theory, 1895]

As logically precise definitions, these don’t cut it. In the axiomatic approach, “set” is an undefined term. Munkres’ quote earlier (basically we all know what it means) shares the same outlook.

But one aspect of Cantor’s descriptions does stand out: the collection “can be thought of as a whole”. The set is united into a single object. For Cantor, the collection of all sets, or of all cardinal or all ordinal numbers, is too vast for the human mind to grasp. In a second letter to Dedekind, Cantor wrote:

A collection can be so constituted that the assumption of a “unification” of all its elements into a whole leads to a contradiction, so that it is impossible to conceive of the collection as a unity, as a “completed object”. Such collections I call absolutely infinite or inconsistent collections.

Cantor also associated the Absolutely Infinite with God. He corresponded with Catholic theologians about this. Dauben’s biography has much more on the topic.

Cantor’s “Theorem A” provided a proof that every set M can be well-ordered, at least in his mind. We sketched the argument above: just keep transfinitely picking new elements from M to pair with ordinals. If mβ has been assigned for all β<α, pick an unassigned element if there is one, and call it mα. If this process went on without ever exhausing M, then M would be an inconsistent collection, like Ω. So for any set, the process eventually well-orders it.

Cantor wasn’t completely satisfied with this argument; he asked Dedekind if he could offer a better one. Dedekind never did.

Let’s compare Cantor’s treatment of Theorem A with his introduction of ℵ1. Cantor defined the second number class (which he denoted Z(ℵ0)) as the set of all countably infinite ordinals. He showed that Z(ℵ0) is well-ordered, and therefore its order-type \overline{Z(\aleph_0)} is an uncountable ordinal—but every ordinal less than \overline{Z(\aleph_0)} is countable. From this it quickly follows that the cardinality \aleph_1=\overline{\overline{Z(\aleph_0)}} is the next cardinality after ℵ0.

If we remove the word “countable” from all this, we get Ω instead Z(ℵ0), and a paradox instead of ℵ1. Somehow, Cantor’s second principle of generation legitimizes Z(ℵ0) as a “consistent set”, but not Ω. Cantor never explains why Z(ℵ0) is hunky-dory but Ω remains beyond the pale.

While Cantor saw a silver lining in the paradoxes, and maybe even God, other mathematicians were less welcoming. Famously the intuitionists proposed a complete overhaul, banning non-constructive existence proofs and throwing the law of the excluded middle into the trash.

Burali-Forti did not call his result a paradox; Russell was the first to do so  (Moore & Garciadiego). Initially Russell decided that the class of all ordinals was not well-ordered, even though he granted that every initial segment was. Later he came to other conclusions.

Bernstein’s resolution: if W is the set of all ordinals, and b is an element not in W, then W∪{b} is not well-ordered. Although Berstein was okay with W∪{b} as a set, he denied that you could say b follows all elements of W. This bizarre solution, and Russell’s first attempt, shows how confusing the issues appeared at the time.

So the dawn of the 20th century bedeviled set theory with three paradoxes: the Largest Cardinal, the Largest Ordinal, and Russell’s.

Russell and others focused their attention on the implicit circularity. This led to Russell’s Theory of Types, expounded at three volume length in the joint work (with Alfred Whitehead) Principia Mathematica. In simple type theory you have a base level of objects of type 0. Sets of type 0 objects are of type 1. Sets of type 1 objects are of type 2, etc.

Bluntly put, Principia Mathematica is a clumsy and clunky mess. Technical reasons partly bear the blame. For example, “mixing levels” is forbidden. As a result, each type has its own copy of the positive integers and other standard mathematical objects.

But Principia Mathematica goes overboard in its phobia of even a hint of circularity. Russell (and also Poincaré) campaigned against so-called impredicative definitions. In 1908, Russell asserted the Vicious Circle Principle:

Whatever involves all of a collection must not be one of the collection.

That would rule out Russell’s “set”. But it would also rule out an innocuous definition like:

The smallest positive integer that is not the sum of two squares.

Impredicative definitions occur frequently in mathematics. For example, the least upper bound of a set is defined in terms of the collection of all upper bounds. The subfield of a field F genererated by a subset XF may be defined as the intersection of all subfields containing X.

To avoid the Vicious Circle Principle, Russell and Whitehead changed simple type theory into ramified type theory. This proved so unworkable that they had to add an Axiom of Reducibility to Principia Mathematica. Essentially this says, “Ignore everything we’ve said about impredicative definitions, they’re OK!” Nowadays the market for Principia Mathematica mainly consists of historians, philosophers, and students looking for a thesis topic.

[1] To the best of my knowledge, this quote comes from the politician Rahm Emanuel, but is often misattributed to Winston Churchill.

Prev TOC Next

8 Comments

Filed under History, Set Theory

8 responses to “Set Theory Jottings 3. The Paradoxes

  1. “The smallest positive integer that is not the sum of two squares.” This strikes me as involving all of the collection (integers) to figure out which number it refers to, but that doesn’t mean that number itself (say, defined from its numeral, not by this definition) involves all of the collection.

    That is distinct from something like “the set of all sets”, unless (by analogy with how we interpret the first definition above) you just say “sorry, there is no set like that”.

    So I don’t see how anyone can object to “The smallest positive integer that is not the sum of two squares” as a way of pointing to some integer once the whole collection of them is already present. In particular it does not sound to me like it violates that Vicious Circle Principle, as stated.

    • You raise a couple of interesting points.

      The quote I gave is part of a longer passage (see below). Russell and Whitehead intended to wield the principle against definitions, not entities. Even if one accepts the application in the “two square” case, it would outlaw only the definition I mentioned, not deep-six the number 3.

      Russell had in mind things like the Berry Paradox. This definition

      the least natural number not definable in the English language by a phrase using fewer than thirty-three syllables

      apparently defines a natural number using only thirty-two syllables. It implicitly refers to the totality of English language definitions—but it’s a member of that totality.

      In Principia Mathematica, Russell and Whitehead write:

      An analysis of the paradoxes to be avoided shows that they all result from a kind of vicious circle. The vicious circles in question arise from supposing that a collection of objects may contain members which can only be defined by means of the collection as a whole. Thus, for example, the collection of propositions will be supposed to contain a proposition stating that “all propositions are either true or false.” It would seem, however, that such a statement could not be legitimate unless “all propositions” referred to some already definite collection, which it cannot do if new propositions are created by statements about “all propositions”. We shall, therefore, have to say that statements about “all propositions” are meaningless. … The principle which enables us to avoid illegitimate totalities may be stated as follows: “Whatever involves all of a collection must not be one of the collection”; or, conversely: “If, provided a certain collection had a total, it would have members only definable in terms of that total, then the said collection has no total.” We shall call this the “vicious-circle principle”, because it enables us to avoid the vicious circles involved in the assumption of illegitimate totalities.

      This brings us to the second point: definition vs. creation. In the not-so-vicious cases of such definitions, the totality, with all its elements, already exists. The definition merely identifies the element; it doesn’t create it. I think that’s what you were getting at.

      So does the “totality of all sets” already exist? A fraught question. Good place to stop.

      • Yes, that clarifies it — the key word is “only” in “members which can only be defined by means of the collection as a whole”. Once the collection is already there, using all of it to pick out one of its members is no problem. Trying to “construct” one of its members using the whole, is a different matter.

        “Definition vs. creation” — probably we are running up against two distinct meanings of “to define something” — “create” or “construct”, vs “find” or “point out” or “refer to”. (Though as a Platonist, it might be more accurate to say our creating-definition “describes how God could have created something” rather than saying it “creates something”.)

      • Yes. A small nit. I wouldn’t say that “only” is the key word, unless one implicitly accepts the equation “definition=creation”. Say we have a definition of a real number in terms of the set ℝ of all real numbers, and every definition for this real number refers to ℝ. But the definition still identifies a unique real number. (Maybe it’s the least upper bound of some well-defined set.) If one believes that ℝ is “already there” in its entirety, then the definition is okay, although it violates the vicious circle principle, even when expressed with the word “only”.

      • That’s a really good point! I guess what makes it ok in your example is there is already some sort of proof (or axiom) which says “anything defined (picked out, not created) in this particular way, exists”. But yes, that means it’s harder to figure out how to state exactly what is not ok, than I thought.

      • Probably the most popular formal answer to what’s ok and what’s not, is ZFC. The most popular answer overall is, “don’t worry, be happy (doing math)”. “Shut up and calculate”, as David Mermin said in another context. Most math doesn’t get close to the guardrails, at least if you reject the vicious circle principle.

        I’ll be saying a lot more about ZFC as this series continues.

        People have proposed other solutions, of course. Quine had something called NF (for New Foundations), that tries avoid circularity in a smoother way. Russell had what he called his “no class theory”—just propositions, no classes. I don’t know much about it.

        For those into category theory, Tom Leinster just finished posting a course using Lawvere’s ETCS. It’s on my To-Read list.

        Oh yes, about Platonism: “fake it or make it”. That is, one can reject it philosophically and still adopt it as a manner of speaking. (What I’ve called the Platonic style.) But you know that already from our old conversation with John.

      • I remembered that Feferman wrote a paper called, simply Predicativity. It traces the history of the concept, starting with Russell, Poincaré, and Weyl. He also gives a fuller discussion of the Axiom of Reducibility.

      • Thanks, I’m reading that now and it’s interesting. I got to the impredicative Axiom of Separation in ZF on page 12; if I understand correctly, it would fall to Russell’s Paradox if there was a “truth predicate”, but it doesn’t mind there being only a “provability predicate”.

Leave a reply to Michael Weiss Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.