Zermelo’s 1904 proof of the well-ordering theorem got a lot of blowback, as we’ve seen. On the other hand, the very next year Hamel used it to prove the existence of a so-called Hamel basis. In 1910, Steinitz made numerous applications in the theory of fields. He wrote:
Many mathematicians still stand opposed to the Axiom of Choice. With the increasing recognition that there are questions in mathematics which cannot be decided without this axiom, the resistance to it must increasingly disappear.
In 1916, Dénes Kőnig applied it to graph theory. The 1920s and 1930s saw more and more applications, across much of the mathematical landscape. Steinitz foretold correctly: the desire to do fun math broke down the resistance as the philosophical wranglings never did.
After the 1930s, proofs usually appealed to a maximal principle like Zorn’s lemma instead of explicit use of well-ordering. As one wit put it, “Mathematicians would rather Zornicate than transfinitely induct.”
Zorn’s Lemma: If (S,≤) is a partial ordering in which every chain (linearly ordered subset) has an upper bound (that is not necessarily an element of the chain), then S has a maximal element.
[This form first appeared in Bourbaki; Zorn stated the special case where ≤ is set-inclusion, and the upper bound is the union of the chain.]
Moore (§4.4) remarks, “The history of maximal principles, such as Zorn’s Lemma, is strewn with multiple independent discoveries of fundamentally similar propositions.” Hausdorff gets priority with papers in 1909 and 1914, but these were largely overlooked; his 1909 formulation closely resembles Zorn’s lemma.
Zorn’s lemma and its close relatives are all equivalent to the well-ordering theorem, by routine proofs. For example, let’s use Zorn to prove that a set M can be well-ordered. Look at all binary relations on M that well-order subsets of M. This is partially ordered by ⊆ (treating the relations as subsets of M×M). The union of a chain satisfies the upper bound condition, and a maximal element must well-order all of M.
Let’s look at a few examples comparing transfinite induction with Zorn’s lemma, We’ll see that transfinite inductive proofs extend older inductive arguments in a natural way. With Zorn’s lemma, choices are still there, though they may be implicit.
The simplest standard algebraic example: every vector space has a basis, i.e., a linearly independent subset such that every element is a finite linear combination of elements from the basis. (Note that this is different from the typical Hilbert space basis, which allows convergent series instead of just finite linear combinations.) Hamel proved the special case where the vector space is ℝ over the field ℚ, so in this case we speak of a Hamel basis. Using this, he showed the existence of discontinuous functions from ℝ to ℝ satisfying f(x+y)=f(x)+f(y).
The finite version: every finite-dimensional vector space V has a basis. Proof: choose a vector v1≠0. If {v1} doesn’t generate V, then there is a v2 such that {v1,v2} is independent. Inductively, if {v1,…,vk} is independent, then either it generates V or there exists a vk+1 such that {v1,…,vk+1} is independent. The finite-dimensionality assumption means this can’t go on forever, so eventually we get a basis.
Now drop the finite-dimensionality assumption. Modify the proof as follows: let {vα:α<κ} be a well-ordering of V. We define a set Bβ of independent vectors for each β≤κ by transfinite induction, with Bβ chosen from the set {vα:α<β}. For a successor ordinal β+1, look at Bβ∪{vβ}. If this is independent, set Bβ+1=Bβ∪{vβ}. Otherwise set Bβ+1=Bβ. For a limit ordinal λ, set Bλ=⋃α<λBα. Obviously Bβ is independent for all β. For any vector vβ∈V, either vβ was added to Bβ to get Bβ+1, or else Bβ already generated vβ, which is why vβ wasn’t added. So Bβ generates V.
Here’s the Zorn’s lemma proof. Let ℬ be the collection of all independent subsets of V. (ℬ,⊆) is a partial order. If 𝒞⊆ℬ is a chain, then it’s easy to show that ⋃C∈𝒞 C is independent, so (ℬ,⊆) satisfies the hypotheses of Zorn’s lemma. Let B be a maximal element of ℬ. Then B generates V, for if v∈V was not generated by B, then B∪{v} would be a larger independent set.
Note the modest part played by algebra in both proofs: essentially just the fact that for an independent B and v∉B, B∪{v} is independent if and only if B does not generate v.
It’s clear how the well-ordering proof grows out of the finite-dimensional proof. Next, consider that bases exist in abundance. Most trivial example: we can include e1 or 2e1 in a basis, not both. In the well-ordering proof, we make the choice when we pick the well-ordering. In the Zorn’s lemma proof, we choose a particular maximal element of ℬ.
Next, a graph theory example. We need some preliminary definitions. An undirected graph is a set of vertices V plus a subset E of the set of all unordered pairs of vertices, pictured as edges naturally; we say the edge touches its vertices, its ends. A subgraph of (V,E) is a graph (V′,E′) with V′⊆V and E′⊆E; we say the graph contains the subgraph. A cycle is a sequence of different edges e1,…,ek where ei and ei+1 both touch the same vertex for all i<k, as do ek and e1. A forest is a graph containing no cycles. A spanning forest is a subgraph that is a forest, and touches every vertex, i.e., the union of all its unordered pairs equals the set of all vertices of the graph. A vertex is isolated if it is not touched by any edge. The graph theory result: every graph without isolated vertices contains a spanning forest (see the firgure below). (Using this, you can derive one of Kőnig’s 1916 results: a graph with no odd-length cycles is bipartite, i.e., can be partitioned into subsets V1⊔V2 such that every edge has one end in V1 and the other in V2.)
Structurally, the proof here looks much like the vector basis proof, despite the different mathematical fabric. Finite result: every finite graph without isolated points has a spanning forest. Proof by induction: order the edges {e1,…,en}. Inductively define a set of edges Fk by setting F0=∅, and at step k add in edge ek if and only if this does not result in a cycle: Fk+1=Fk∪{ek} if no cycle, otherwise Fk+1=Fk. So Fk remains a forest at each step. Fn+1 must span the graph because for any vertex v and edge ek touching v, if ek isn’t in Fk+1, it’s because ek was part of a cycle all of whose other edges belong to Fk. In other words, v is touched by some edge belonging to Fk.
Extend to an arbitrary graph without isolated vertices: well-order the edges {eα:α<κ}. Define forests Fβ for each β≤κ as follows. Set Fβ+1=Fβ∪{eβ} if this does not produce a cycle, otherwise Fβ+1=Fβ. For a limit ordinal λ, Fλ=⋃α<λFα. Fκ spans the graph by exactly the same argument as in the finite case.
For the Zorn’s lemma argument, we let ℱ be the collection of all forests contained in the graph. It is partially ordered under ⊆, the union of a chain of forests is a forest, and a maximal element spans the graph, of course.
There are typically many spanning forests; for example, in the figure above, we could have added edge ab instead of ac or bc. The choice of well-ordering, or the choice of maximal element of ℱ, determines which spanning forest we end up with.
Contrast this with another result in graph theory: every vertex belongs to a maximum connected set of vertices, called its component. Proof: Let C0(v)={v}; let Cn+1 be Cn plus all vertices joined to Cn by an edge; let C(v)=⋃n<ωCn. No choices involved—the proof doesn’t need AC. Characteristically, C(v) is a maximum and not just maximal.
Zorn-type proofs usually seem slicker than well-ordering proofs, but there are exceptions. One example: Every subgroup of a free abelian group is free, i.e., has a basis. (A basis of an abelian group G is a subset of G such that every element of G has a unique expression as a linear combination of basis elements with integer coefficients.) Let G be a free abelian group with basis {vα:α<κ}. If g≠0, suppose g=c1vα1+…+ckvαk with α1>…>αk, with each ci≠ 0. (Different g’s can have different k’s.) We say α1 is the index of g; c1 is its leading coefficient. We arbitrarily assign 0 the index 0 and leading coefficient 0.
Suppose H is a subgroup of G. We let Zβ be the set of all leading coefficients of elements of H with index β. It’s easy to show that Zβ is a subgroup of ℤ, so in fact it consists of all multiples of (say) rβ. (We allow rβ=0 as a possibility.) For each β, let wβ be an element of H with leading coefficient rβ; if rβ=0 we let wβ=0. We claim that {wα:α<κ and wα≠0} is a basis for H. If not, there are elements of H not in the subgroup spanned by the wα’s; let h be such an element with minimal index, and suppose β is that minimal index. Say h=c1vβ+…. Now c1 belongs to Zβ, so it’s a multiple of rβ, say prβ. Thus h and pwβ have the same leading coefficient, and so h−pwβ is either zero or else is of lower index than h. But h had minimal index, so h=pwβ, and h is in the subgroup spanned by the wα’s after all.
Unlike many transfinite induction proofs, it’s far from clear how to Zornify this. Without the well-ordered basis, what do we use instead of the index and the Zβ’s? To quote Kaplansky (p.124): “[Lefschetz asserts] that for this theorem well-ordering gives a shorter, more intuitive proof than Zorn’s lemma. I agree, though on page 44 of my Infinite Abelian Groups … I have stubbornly given a Zorn style proof.”
One more example. In 1962, John Wetzel asked the following question. Say a family F of analytic functions on some common domain D is pointwise countable if for each z∈D, the set of values {f(z):f∈F} is countable. Does it follow that F is countable?
Erdős answered the question very soon afterwards: pointwise countability implies countability if and only if the Continuum Hypothesis is false.
Before plunging into the proof, let’s look an easier question. Say F (as before, analytic functions on domain D) is pointwise finite if for each z∈D, the set of values {f(z):f∈F} is finite. Does it follow that F is finite?
Yes it does. Let F be infinite. A countable subset of F will still be pointwise finite if F is, so assume F countable, say F={fn:n∈ℕ}, with fi≠fj whenever i≠j. Pick a pair i,j and look at all the z’s for which fi(z)=fj(z). By the identity theorem for analytic functions, we can have only finitely many such z’s on any bounded subset of D; hence only countably many such z’s on all of D. As there are only countably many pairs i,j, there are only countably many z’s for which the values {f(z):f∈F} are not all distinct. With uncountably many z’s in D, we can find one for which {f(z):f∈F} is infinite. So if F is infinite, it’s not pointwise finite.
If c>ℵ1, this proof adapts to a pointwise countable family F. This time we assume, without loss of generality, that F has ℵ1 elements, say F={fα:α<ℵ1} and fα≠fβ whenever α≠β. Again there can be only countably many z’s with fα(z)=fβ(z) for any pair α,β, and thus at most ℵ1 z’s for which the values {f(z):f∈F} are not all distinct. If c>ℵ1, then we have a z for which all the values are distinct. We conclude: if F is uncountable, then it is not pointwise countable.
This argument does not use well-ordering in any essential way—instead of the ordinals less than ℵ1, we could have indexed F with an unordered set of cardinality ℵ1. The other direction is a different story.
Suppose c=ℵ1. The goal is to construct an uncountable family F that is pointwise countable. We now have just as many z∈D as f∈ F. To capitalize on this, we well-order both D and F: D={zα:α<ω1}, F={fβ:β<ω1} (where ω1 is the first uncountable ordinal). F is, naturally, constructed by transfinite induction up to ω1. Here a crucial feature of ω1 proves invaluable: all smaller ordinals are countable. When we are trying to define fβ, we have only countably many fγ with γ<β to worry about. Also, look at the set of values {fβ(zα):α,β<ω1}. If we fix α and restrict β to β<α, we get a countable set; likewise if we fix β and restrict α to α<β. It is hard to imagine how a Zorny argument could exploit these facts.
With these ingredients, we can (a) insure that {fβ(zα):β<ω1} is countable for every zα; (b) define fβ by a convergent series involving the zα with α<β (countability helps a lot with convergent series); and (c) guarantee that all the fβ are distinct. Here’s how that works. Let S be a countable everywhere dense set of complex numbers. We will arrange that fβ(zα)∈S whenever α<β. So looking at the totality of values fβ(zα)∈S for a given α, we have countably many in S, and countably many with β<α.
We construct fβ by transfinite induction. Suppose fγ has already been defined for all γ<β. Since {fγ:γ<β} is countable, let it be the same as {gn:n∈ℕ}. We also let {zα:α<β}={wn:n∈ℕ}. We will insure that for all n,
fβ(wn)∈S and fβ(wn)≠gn(wn) (†)
This guarantees that fβ(zα) is in S for all α<β, and that fβ≠fγ for all γ<β. So we will have an uncountable, pointwise countable family.
Write
fβ(z) = ε0+∑k=1∞ εk∏i=0k−1 (z−wi)
If εn tends to zero fast enough, then this expression defines an entire function. How fast is fast enough? That depends only on the wn’s. When z=wn, the product is zero for all k>n, so we have a finite sum on the right. We choose the εi’s successively, in a manner that makes the conditions (†) hold. If that’s been done for all i<n, then only the final term εn∏i=0n−1 (z−wi) is under our control. But that’s all we need: since S is everywhere dense, we can choose εn as small as we wish, while also insuring that fβ(wn) belongs to S and that fβ(wn)≠gn(wn). QED
