Ernst Zermelo is remembered today chiefly for two results. His 1904 paper “Proof that every set can be well-ordered” introduced the Axiom of Choice. His 1908 paper “Investigations in the foundations of set theory” led to the most popular axiomatization of set theory. He thus claims credit for two of the letters of ZFC: Zermelo-Fraenkel with Choice.
He had more arrows in his quiver. His dispute with Boltzmann clarified the foundations of statistical mechanics. The editors of his Collected Works divided them into two volumes, with the second devoted entirely to physics and to the calculus of variations. Volume I, mainly about set theory and logic, also contains a paper on transcendental numbers and another on primes—and also a translation of a passage from Homer’s Odyssey!
This section looks at the well-ordering theorem. We’ll look at the Zermelo Axioms later.
In 1904, at the Third International Congress of Mathematicians, Julius König gave a “proof” that the real numbers cannot be well-ordered. By one account, Zermelo ferreted out the error within a day. (See Moore, pp.86–87, for the full story.) Less than two months after that, he had his proof of the well-ordering theorem (with assistance from Erhard Schmidt). (I will change some of Zermelo’s notations, and add some of my own.)
The axiom of choice is used in the following form: if M is a nonempty set, then there is a choice function γ defined for all nonempty subsets M′⊆M with γ(M′) being an element of M′:
γ:𝒫(M)∖{∅}→M; γ(M′)∈M′
Say (M′,≺) is a well-ordered set with M′⊆M (≺ is irreflexive). Let L≺(m)={x∈M′: x≺m}, the segment determined by m. We say M′ is a γ-set if:
For all m∈M′, γ(M∖L≺(m))=m.
Note that m is the smallest element of M′∖L≺(m). It is crucial that γ is applied to M∖L≺(m), and not to M′∖L≺(m), in this equation.
Zermelo notes that the singleton {γ(M)} is a γ-set, so γ-sets exist. Next he shows that if (M′,≺′) and (M″,≺″) are two γ-sets, then they are identical or one is a segment of the other, with the same ordering. Transfinite induction provides the juice for this. (Remember, we can use transfinite induction on any well-ordered set.)
The intuition for this resembles Cantor’s argument for well-orderability: choose an element mβ of M for each ordinal β<α. Having done that, if this has exhausted M, we’re done. If not, chose an element mα of M∖{mβ:β<α}. Cantor’s notion of “inconsistent” multitudes insures that we eventually exhaust M. In Zermelo’s proof, γ(M) chooses the first element of any γ-set. Call it m0. Thus {m0} is the only singleton γ-set. Next, γ(M∖{m0}) (=m1, say) chooses the next element of any other γ-set, so {m0,m1} is the only two-element γ-set. Continuing, we set γ(M∖{m0,m1})=m2. Etc., where “etc.” means “Use transfinite induction on (M′,≺′)”. The function γ makes all the choices in advance, all at once.
Zermelo appeals to one of Cantor’s results, that given any two well-ordered sets, one is order-equivalent to a segment of the other, or they are order-equivalent. (Proof at the end of this section.) Suppose that M′ is the set with an order-preserving mapping (call it f) into M″. We claim that f is the identity mapping. Suppose m′ is an element of M′, and for all x≺′m′, f(x)=x. Let f(m′)=m″; if we can show that m′=m″, then transfinite induction finishes the job. The segments of (M′,≺′) and (M″,≺″) determined by m′ and m″ are identical: L≺′(m′)=L≺″(m″). Since M′ and M″ are γ-sets, γ(M∖L≺′(m′))=m′ and γ(M∖L≺″(m″))=m″. As just noted L≺′(m)=L≺″(m″), and therefore m′=m″. (This is why we had γ(M∖L≺(m)) above, rather than γ(M′∖L≺(m)).)
To finish the proof, Zermelo takes the union of all the γ-sets; he calls this Lγ. Any two orderings ≺′ and ≺″ are compatible; in fact, one is a restriction of the other. Using this he now easily defines a relation ≺ on all of Lγ, and shows that it’s a well-ordering. Finally, if Lγ is a proper subset of M, then tacking γ(M∖Lγ) onto the end of Lγ gives a γ-set properly containing Lγ. Contradiction. So Lγ=M, and M is well-ordered. QED
Again we have a striking contrast with Cantor’s argument. Zermelo’s proof unfolds entirely inside 𝒫(M). No mention of “inconsistent, absolutely infinite” multitudes. We keep the Largest Ordinal Paradox firmly off-stage.
Zermelo’s paper ignited a firestorm of controversy. The objections fell into three categories: (i) fear of the Largest Ordinal (Burali-Forti) Paradox; (ii) complaints about impredicative definitions; (iii) rejection of the Axiom of Choice.
(i) Fear of the Largest Ordinal Paradox. First, a comment on notation. The participants in the debate used W for the “set” of all ordinals, so I will do the same.
Recall the paradox: W is well-ordered, so its order type W would be the largest ordinal, and would have to belong to W. Already this is a contradiction if one examines it closely, but it becomes more obvious when we look at W+1. Felix Bernstein and Arthur Schoenflies both noted the similarity between W and the Lγ of Zermelo’s proof. The paradox arises from tacking one more element onto the end of W; the final step of Zermelo’s proof does the same to Lγ.
I noted that Zermelo’s argument plays out inside the (presumably) safe arena of 𝒫(M). But Zermelo did not emphasize this point, and his axioms for set theory lay four years in the future. In 1904, things remained murky. Recall Bernstein’s solution to the paradox: W is a set, and so is W∪{b}, but you can’t say that b comes after all the elements of W. Bernstein also argued that the power set of W could not be well-ordered, appealing to Cantor’s result its cardinality would be greater than W’s. Given the unsettled state of set theory, it seemed legitimate to worry that the Largest Ordinal Paradox might infect Zermelo’s proof.
(ii) Complaints about impredicative definitions. Poincaré objected to Zermelo’s Lγ for subtler reasons: it was defined impredicatively. Lγ is the union of elements of a class (the γ-sets), of which it is a member. We encountered Russell’s Vicious Circle Principle in post 3; Poincaré held similar views.
Some context illuminates Poincaré’s position. A bit before Poincaré’s article appeared, Jules Richard published a paradox about definable real numbers. List all definitions of real numbers in some language. There are clearly only countably many. Now diagonalize to get an undefinable real. But you can elaborate this paragraph into a definition of the number!
Richard noted that his definition of the real number depends on the notion of “definition of a real number”. This implicit circularity makes the very notion of “valid definition” ambiguous. Jumping ahead several decades, I note that in a formal system like first-order logic, you can give a precise definition of “definition”. But (if the system is consistent), you can’t carry out the diagonalization within the system.
Poincaré recognized that outlawing impredicative definitions would resolve not only Richard’s paradox, but Burali-Forti and others as well. So he found Zermelo’s proof fatally flawed.
Curiously, Russell did not object to Zermelo’s proof because of the Vicious Circle Principle, but because of the Axiom of Choice.
(iii) Rejection of the Axiom of Choice. In his 1904 paper, Zermelo did not call his assumption the Axiom of Choice. He introduced it this way:
Imagine that with every subset M′ there is associated an arbitrary element m1′ that occurs in M′ itself; let m1′ be called the “distinguished” element of M′… The number of these [functions from 𝒫(M) to M] is equal to the product ∏ 𝔪′ taken over all subsets M′ and is therefore certainly different from 0.
[I have modernized terminology in the part in brackets.]
Several mathematicians disagreed. This proved the most durable objection to Zermelo’s argument.
Borel fired the first salvo. In a brief solicited article in Mathematische Annalen, he wrote that Zermelo had shown only the equivalence of two problems: well-ordering a set, and making all those choices. But for the latter,
one needs a means, at least a theoretical one, for determining a distinguished element m′ from an arbitrary subset M′; and this problem appears to be one of the most difficult, if one supposes, for the sake of definiteness, that M coincides with the continuum.
And making uncountably many arbitrary choices was, he declared, “outside mathematics”.
Hadamard wrote a letter to Borel, defending Zermelo on all points. Further exchanges of letters brought Baire and Lebesgue into the debate, on the anti-Zermelo side. In 1905, the letters appeared in the Bulletin de la Société Mathématique de France.
Lebesgue concisely summed up the issue:
The question comes down to this, which is hardly new: Can one prove the existence of a mathematical object without defining it?
And he invoked the ghost of Kronecker.
Russell independently came up with the Axiom of Choice, which he called the Multiplicative Axiom. His concern was not well-ordering, but cardinal arithmetic. Suppose card(Mk)=𝔪k for all k in some index set K. The definition of the product ∏k∈K 𝔪k is the cardinality of the cross-product ∏k∈K Mk. If none of the 𝔪k=0, is ∏k∈K 𝔪k necessarily non-zero? In a letter to Jourdain, Russell wrote
At first I thought probably a proof [that the product is non-zero] could easily be found; but gradually I saw that, if there is a proof, it must be very recondite.
Russell soon became more skeptical, without outright rejecting the axiom. To the philosopher Couturat he wrote that the assumption “is questionable. I have been trying to prove it for a long time, without the least success.” And later: “I do not know if it is true or false, and I find it too complicated for a [postulate]… one should hardly admit an axiom that is so complicated and so dubious.” Russell wrote to G.H. Hardy that the assumption “may be true, but I know of no reason for believing it. The point is, that there is not necessarily any definition of such a class as we want. The results are very awkward.” Hardy had a more favorable take:
I should not like to say that anything so abstract and general as Zermelo’s axiom strikes me as obvious, exactly; but … the more one thinks of it the more paradoxical the contrary seems, so that unless it appeared to lead to contradictions I should … be disposed to assume it and hope for the best.
Russell became better at detecting implicit uses of the axiom. In 1901, Russell had proved this result, fundamental for the addition of cardinals: If we have two collections of disjoint sets {Ak:k∈K} and {Bk:k∈K}, and every Ak is equivalent to Bk, then ⋃k Ak is equivalent to ⋃k Bk. Now he recognized that one has to choose a bijection between Ak and Bk for each k.
Russell came up with an evocative example. Given an infinite collection of pairs of shoes, we have the choice function “always pick the left shoe”. But given an infinite collection of pairs of socks, we can give no rule for the choice function.
In 1908. Zermelo published “A new proof of the possibility of a well-ordering”, much longer than the 1904 paper. While standing behind his first proof, he hoped to win over some of his critics with the new proof. This didn’t happen. I won’t delve into the technical details, but the paper merits a few paragraphs.
§1 presents the new proof. It begins with an explicit statement of two assumptions: the Separation Axiom and the Power Set Axiom. These appear under those names in his axiom system, published the same year.
Next comes the argument that if one has a choice function, then one can use it to well-order the set. He then writes:
Now in order to apply our theorem to arbitrary sets, we require only the additional assumption that a simultaneous choice of distinguished elements is in principle always possible for an arbitrary set of sets, or, to be more precise, that the same consequences always hold as if such a choice were possible. In this formulation, to be sure, the principle taken as fundamental still appears to be somewhat tainted with subjectivity and liable to misinterpretation.
… the “general principle of choice” can be reduced to the following axiom, whose purely objective character is immediately evident.
IV. Axiom. A set S that can be decomposed into a set of disjoint parts A,B,C,… each containing at least one element, possesses at least one subset S1 having exactly one element in common with each of the parts A,B,C,… considered.
A few more remarks finish this section, totalling five pages. §2, 17 pages long, is titled “Discussion of the objections to the earlier proof”. He divides this into five parts:
- Objections to the principle of choice
- Objection concerning nonpredicative definition
- Objections based upon the set W
- Objection concerning particular generating principles and the set Z
- Summary
In (a), he concedes that his critics “are to some extent justified, since I cannot prove this postulate, as I expressly emphasized at the end of my note…” But he offers this rebuttal:
Now even in mathematics unprovability, as is well known, is in no way equivalent to nonvalidity, since, after all, not everything can be proved, but every proof in turn presupposes unproved principles. Thus, in order to reject such a fundamental principle, one would have had to ascertain that in some particular case it did not hold or to derive contradictory consequences from it; but none of my opponents has made any attempt to do this.
Zermelo’s position is implicitly Platonist: the choice axiom is self-evident, if not provable. “No matter if this self-evidence is to a certain degree subjective—it is surely a necessary source of mathematical principles…” He contends that the axiom is “necessary for science” because of “a number of elementary and fundamental theorems and problems that, in my opinion, could not be dealt with at all without the principle of choice.” Zermelo uncovers implicit uses of Choice in a bunch of theorems, up till then not controversial. Modern logic has generally confirmed Zermelo’s intuition that these results cannot be proven without some form of Choice.
Part (b) is devoted to refuting Poincaré. Zermelo does not seem to fully grasp Poincaré’s viewpoint. For Poincaré, the totality of γ-sets is ill-defined, since Lγ is both defined in terms of it and belongs to it. Definitions in some sense create their objects. As Zermelo cannot give an independent definition of Lγ, his proof is fatally flawed. But for Zermelo, all the ordered subsets of M already exist. Definitions just select elements from pre-existing totalities.
Zermelo observes that Poincaré’s criticism seems to exclude referring to the maximum and minimum of a closed set. He concludes the section with a bit of over-the-top rhetoric: “the strict observance of Poincaré’s demand would make every definition, hence all of science, impossible.”
For (c), Zermelo says that fears of the Burali-Forti paradox are misplaced. Russell’s paradox tells us that
the solution of these difficulties is not to be sought in the surrender of well-ordering but only in a suitable restriction of the notion of set. Already in my 1904 proof, having such reservations in mind, I avoided not only all notions that were in any way dubious but also the use of “ordinals” in general; I clearly restricted myself to principles and devices that have not yet by themselves given rise to any “antinomy” [i.e., paradox].
Part (d) displays Zermelo’s polemical gifts, eviscerating an article by Schoenflies. Zermelo says up front that the claims in the article are “taken care of by the preceding discussion”, but then goes on for pages poking holes in them.
Zermelo’s summary is uncompromising. Poincaré’s critique “would threaten the existence of all of mathematics…”. He continues
… all [other] opponents can be divided into two classes. Those who have no objection at all to my deductions protest the use of an unprovable general principle, without reflecting that such axioms constitute the basis of every mathematical theory and that precisely the one I adduced is indispensable for the extension of the science in other respects, too. The other critics, however, who have been able to convince themselves of this indispensability by a deeper involvement with set theory, base their objections upon the “Burali-Forti antinomy”, which in fact is without significance for my point of view, since the principles I employed exclude the existence of a set W.
Zermelo concludes with the hope that “in time all of this resistance can be overcome through adequate clarification.”
Resistance did fade over time, but largely because of the results that flowed from the well-ordering theorem and transfinite induction. Axiomatizations of set-theory dispelled much of the fog. Moore’s book traces both these developments. Gödel’s proof that if ZF is consistent, so is ZF+AC, put to bed any lingering qualms for most mathematicians.
For the participants in the debate, truth was at stake. No one proposed that axiom systems are arbitrary, and Choice is just one option on the menu. Nowadays this seems like a very natural perspective.
Some mathematicians are strong Platonists, and accept Choice because they believe it’s true. But you can justify adopting AC on purely pragmatic grounds. Picking an alternative means foregoing a huge portion of modern mathematics. Even knowing which results depend on Choice takes some study. If you’re not doing research in axiomatic set theory, assuming Choice (at least implicitly) is the path of least resistance.
Finally we turn to Cantor’s result that given well-ordered sets (X,<) and (Y,≺), exactly one of three things holds: either X is order-equivalent to a segment of Y, or X and Y are order-equivalent, or Y is order-equivalent to a segment of X. Moreover the order-equivalence is unique.
We let L<(x) denote the segment determined by x∈X, {a∈X: a<x}. L≤(x) denotes the reflexive version, {a∈X: a≤x}. (As it happens, we will be using these segment notations only for segments of (X,<).)
We say A⊆X is downward closed when a<x∈A implies a∈A. Then A is either X or the segment L<(m), where m is the smallest element of X∖A.
We start with uniqueness. Say f, g are order-preserving mappings from X to Y that map downward closed sets to downward closed sets. Then f=g. Proof: We use transfinite induction. Suppose that for all a<x∈X, f(a)=g(a). Let S be the common image of x’s segment: S=f(L<(x))=g(L<(x)). S is downward closed, so S equals either Y or a segment of it. Since f and g are order-preserving, f(x) and g(x) must follow S; thus S=Y is impossible. Suppose S is the segment determined by some y∈Y, S={b:b≺y}. We must have f(x)=y, for if f(x)>y, then the image f(L≤(x)) would not be downward closed—there would be a “gap” somewhere below f(x) and above S. Likewise g(x)=y. So f=g over all of X.
Next we show existence. If there is an order-preserving map from Y to a segment of X, then we are done. Assume not; we show one of the other two possibilities holds. Let X′⊆X be the set of all x’s such that there is an order-preserving map from L≤(x) to Y mapping downward closed sets to downward closed sets. Inductively assume all a<x belong to X′. If a<b<x with maps f:L≤(a)→Y and g:L≤(b)→Y, then g restricted to L≤(a) equals f by the uniqueness just proved. So we can take the union of all the maps, obtaining a map h:⋃a<x L≤(a) → Y, order-preserving and mapping downward closed sets to downward closed sets. But ⋃a<x L≤(a)=L<(x). The image of h cannot be all of Y, otherwise its inverse is an order-preserving map from Y to a segment of X, contrary to assumption. Let y be the smallest element of Y minus this image. Extend h to L≤(x) by setting h(x)=y. This map is order-preserving and maps downward closed to downward closed. So the transfinite induction is complete, X′=X, and we are done.
This is very interesting, but some unclear notation makes it hard for me to follow some details.
What is the symbol Mγ — a “new variable name” like M’, or something that is somehow related to the M and γ that have just been discussed? (To me the choice function γ looks like the same symbol as that subscript γ, though I can’t be sure that is true.)
So the definition of L≺ depends on Mγ, but Mγ is not used in the notation L≺; on the other hand “We say Mγ is a γ-set if …” and then we talk about more than one γ-set, which means the notation L≺ now depends on an object in its implicit context (Mγ) which is not going to be fixed as our discussion continues.
All this makes it pretty impossible for me to carefully follow the subsequent details. (I tried, but got stuck at various different points which depended on my notation-assumptions.)
(I still finished the article and got the gist of almost all of it, but it would be nice to be able to follow every detail too!)
Well, that’s what I get for mixing Zermelo’s notation with my own.
Mγ is Zermelo’s notation, and is just another variable name. Zermelo doesn’t explain why he uses a subscript. Perhaps I should drop it.
Zermelo has no notation for segments; L≺(m) is mine. When I refer to L≺′(m′) and L≺″(m″), the idea is that the subscripts ≺′ and ≺″ indicate we are talking about segments of (M′γ,≺′) and (M″γ,≺″) respectively.
At the end, when dealing with the well-ordered sets (X,<) and (Y,≺), it turns out that I need the notation only for segments of (X,<).
I will edit the post, hopefully clarifying these issues.
Thanks. In the meantime, there is another objection someone could have had… the set of all the choice-sets exists and is nonempty, but it is too large and unstructured for it to make sense to “pick an arbitrary element (specific choice function) from, and give it a name, to use in a subsequent construction”. Would you consider this the same as one of the other objections, or just a different one that didn’t happen to be raised?
(Some of the objection-wording you quoted does remind me of this one, eg the one that implied this would be ok if you could specify a definition for how exactly to do the choosing from each subset.)
In fact, I could imagine arguing that being able to do the choosing is sort of equivalent to having a well-ordering on the large set. If you have that, you can choose the lowest element; or if you have a way to choose an element, you can arrange to designate that the lowest element (after some work to make the choices consistent, as done in this proof). Ie “you are assuming what you want to prove, but describing it differently”.
Is it by any chance true that AC is *equivalent* to “every set can be well-ordered”? Then this proof is just one direction of the proof of equivalence of two forms of one new independent axiom.
This sounds a bit like a proposal of Bernstein.
This notion died on the vine. Those who objected to the lack of definition for the choice function, usually held that you can’t claim an object exists unless you can define it uniquely. (See the Lebesgue quote.) The mindset seems to be, the definition creates the object. Although I don’t think anyone said it quite that baldly. But it’s the only way the Vicious Circle Principle makes any sense for me.
The Partition Principle says that the cardinality of a disjoint union of a family of sets is greater than or equal to the cardinality of the original family. In other words, you can’t partition a set into more parts than the number of elements of the set.
The Well-Ordering Principle is indeed equivalent to the Axiom of Choice, and this was realized quite early. See Borel’s reaction.