Set Theory Jottings 6. Zorn’s Lemma

Prev TOC Next

Zermelo’s 1904 proof of the well-ordering theorem got a lot of blowback, as we’ve seen. On the other hand, the very next year Hamel used it to prove the existence of a so-called Hamel basis. In 1910, Steinitz made numerous applications in the theory of fields. He wrote:

Many mathematicians still stand opposed to the Axiom of Choice. With the increasing recognition that there are questions in mathematics which cannot be decided without this axiom, the resistance to it must increasingly disappear.

In 1916, Dénes Kőnig applied it to graph theory. The 1920s and 1930s saw more and more applications, across much of the mathematical landscape. Steinitz foretold correctly: the desire to do fun math broke down the resistance as the philosophical wranglings never did.

After the 1930s, proofs usually appealed to a maximal principle like Zorn’s lemma instead of explicit use of well-ordering. As one wit put it, “Mathematicians would rather Zornicate than transfinitely induct.”

Zorn’s Lemma: If (S,≤) is a partial ordering in which every chain (linearly ordered subset) has an upper bound (that is not necessarily an element of the chain), then S has a maximal element.

[This form first appeared in Bourbaki; Zorn stated the special case where ≤ is set-inclusion, and the upper bound is the union of the chain.]

Moore (§4.4) remarks, “The history of maximal principles, such as Zorn’s Lemma, is strewn with multiple independent discoveries of fundamentally similar propositions.” Hausdorff gets priority with papers in 1909 and 1914, but these were largely overlooked; his 1909 formulation closely resembles Zorn’s lemma.

Zorn’s lemma and its close relatives are all equivalent to the well-ordering theorem, by routine proofs. For example, let’s use Zorn to prove that a set M can be well-ordered. Look at all binary relations on M that well-order subsets of M. This is partially ordered by ⊆ (treating the relations as subsets of M×M). The union of a chain satisfies the upper bound condition, and a maximal element must well-order all of M.

Let’s look at a few examples comparing transfinite induction with Zorn’s lemma, We’ll see that transfinite inductive proofs extend older inductive arguments in a natural way. With Zorn’s lemma, choices are still there, though they may be implicit.

The simplest standard algebraic example: every vector space has a basis, i.e., a linearly independent subset such that every element is a finite linear combination of elements from the basis. (Note that this is different from the typical Hilbert space basis, which allows convergent series instead of just finite linear combinations.) Hamel proved the special case where the vector space is ℝ over the field ℚ, so in this case we speak of a Hamel basis. Using this, he showed the existence of discontinuous functions from ℝ to ℝ satisfying f(x+y)=f(x)+f(y).

The finite version: every finite-dimensional vector space V has a basis. Proof: choose a vector v₁≠0. If {v₁} doesn’t generate V, then there is a v₂ such that {v₁,v₂} is independent. Inductively, if {v₁,…,v_k} is independent, then either it generates V or there exists a v_k+1 such that {v₁,…,v_k+1} is independent. The finite-dimensionality assumption means this can’t go on forever, so eventually we get a basis.

Now drop the finite-dimensionality assumption. Modify the proof as follows: let {v_α:α<κ} be a well-ordering of V. We define a set B_β of independent vectors for each β≤κ by transfinite induction, with B_β chosen from the set {v_α:α<β}. For a successor ordinal β+1, look at B_β∪{v_β}. If this is independent, set B_β+1=B_β∪{v_β}. Otherwise set B_β+1=B_β. For a limit ordinal λ, set B_λ=⋃_α<λB_α. Obviously B_β is independent for all β. For any vector v_β∈V, either v_β was added to B_β to get B_β+1, or else B_β already generated v_β, which is why v_β wasn’t added. So B_β generates V.

Here’s the Zorn’s lemma proof. Let ℬ be the collection of all independent subsets of V. (ℬ,⊆) is a partial order. If 𝒞⊆ℬ is a chain, then it’s easy to show that ⋃_C∈𝒞 C is independent, so (ℬ,⊆) satisfies the hypotheses of Zorn’s lemma. Let B be a maximal element of ℬ. Then B generates V, for if v∈V was not generated by B, then B∪{v} would be a larger independent set.

Note the modest part played by algebra in both proofs: essentially just the fact that for an independent B and v∉B, B∪{v} is independent if and only if B does not generate v.

It’s clear how the well-ordering proof grows out of the finite-dimensional proof. Next, consider that bases exist in abundance. Most trivial example: we can include e₁ or 2e₁ in a basis, not both. In the well-ordering proof, we make the choice when we pick the well-ordering. In the Zorn’s lemma proof, we choose a particular maximal element of ℬ.

Next, a graph theory example. We need some preliminary definitions. An undirected graph is a set of vertices V plus a subset E of the set of all unordered pairs of vertices, pictured as edges naturally; we say the edge touches its vertices, its ends. A subgraph of (V,E) is a graph (V′,E′) with V′⊆V and E′⊆E; we say the graph contains the subgraph. A cycle is a sequence of different edges e₁,…,e_k where e_i and e_i+1 both touch the same vertex for all i<k, as do e_k and e₁. A forest is a graph containing no cycles. A spanning forest is a subgraph that is a forest, and touches every vertex, i.e., the union of all its unordered pairs equals the set of all vertices of the graph. A vertex is isolated if it is not touched by any edge. The graph theory result: every graph without isolated vertices contains a spanning forest (see the firgure below). (Using this, you can derive one of Kőnig’s 1916 results: a graph with no odd-length cycles is bipartite, i.e., can be partitioned into subsets V₁⊔V₂ such that every edge has one end in V₁ and the other in V₂.)

Spanning Forest

Structurally, the proof here looks much like the vector basis proof, despite the different mathematical fabric. Finite result: every finite graph without isolated points has a spanning forest. Proof by induction: order the edges {e₁,…,e_n}. Inductively define a set of edges F_k by setting F₀=∅, and at step k add in edge e_k if and only if this does not result in a cycle: F_k+1=F_k∪{e_k} if no cycle, otherwise F_k+1=F_k. So F_k remains a forest at each step. F_n+1 must span the graph because for any vertex v and edge e_k touching v, if e_k isn’t in F_k+1, it’s because e_k was part of a cycle all of whose other edges belong to F_k. In other words, v is touched by some edge belonging to F_k.

Extend to an arbitrary graph without isolated vertices: well-order the edges {e_α:α<κ}. Define forests F_β for each β≤κ as follows. Set F_β+1=F_β∪{e_β} if this does not produce a cycle, otherwise F_β+1=F_β. For a limit ordinal λ, F_λ=⋃_α<λF_α. F_κ spans the graph by exactly the same argument as in the finite case.

For the Zorn’s lemma argument, we let ℱ be the collection of all forests contained in the graph. It is partially ordered under ⊆, the union of a chain of forests is a forest, and a maximal element spans the graph, of course.

There are typically many spanning forests; for example, in the figure above, we could have added edge ab instead of ac or bc. The choice of well-ordering, or the choice of maximal element of ℱ, determines which spanning forest we end up with.

Contrast this with another result in graph theory: every vertex belongs to a maximum connected set of vertices, called its component. Proof: Let C₀(v)={v}; let C_n+1 be C_n plus all vertices joined to C_n by an edge; let C(v)=⋃_n<ωC_n. No choices involved—the proof doesn’t need AC. Characteristically, C(v) is a maximum and not just maximal.

Zorn-type proofs usually seem slicker than well-ordering proofs, but there are exceptions. One example: Every subgroup of a free abelian group is free, i.e., has a basis. (A basis of an abelian group G is a subset of G such that every element of G has a unique expression as a linear combination of basis elements with integer coefficients.) Let G be a free abelian group with basis {v_α:α<κ}. If g≠0, suppose g=c₁v_α1+…+c_kv_αk with α₁>…>α_k, with each c_i≠ 0. (Different g’s can have different k’s.) We say α₁ is the index of g; c₁ is its leading coefficient. We arbitrarily assign 0 the index 0 and leading coefficient 0.

Suppose H is a subgroup of G. We let Z_β be the set of all leading coefficients of elements of H with index β. It’s easy to show that Z_β is a subgroup of ℤ, so in fact it consists of all multiples of (say) r_β. (We allow r_β=0 as a possibility.) For each β, let w_β be an element of H with leading coefficient r_β; if r_β=0 we let w_β=0. We claim that {w_α:α<κ and w_α≠0} is a basis for H. If not, there are elements of H not in the subgroup spanned by the w_α’s; let h be such an element with minimal index, and suppose β is that minimal index. Say h=c₁v_β+…. Now c₁ belongs to Z_β, so it’s a multiple of r_β, say pr_β. Thus h and pw_β have the same leading coefficient, and so h−pw_β is either zero or else is of lower index than h. But h had minimal index, so h=pw_β, and h is in the subgroup spanned by the w_α’s after all.

Unlike many transfinite induction proofs, it’s far from clear how to Zornify this. Without the well-ordered basis, what do we use instead of the index and the Z_β’s? To quote Kaplansky (p.124): “[Lefschetz asserts] that for this theorem well-ordering gives a shorter, more intuitive proof than Zorn’s lemma. I agree, though on page 44 of my Infinite Abelian Groups … I have stubbornly given a Zorn style proof.”

One more example. In 1962, John Wetzel asked the following question. Say a family F of analytic functions on some common domain D is pointwise countable if for each z∈D, the set of values {f(z):f∈F} is countable. Does it follow that F is countable?

Erdős answered the question very soon afterwards: pointwise countability implies countability if and only if the Continuum Hypothesis is false.

Before plunging into the proof, let’s look an easier question. Say F (as before, analytic functions on domain D) is pointwise finite if for each z∈D, the set of values {f(z):f∈F} is finite. Does it follow that F is finite?

Yes it does. Let F be infinite. A countable subset of F will still be pointwise finite if F is, so assume F countable, say F={f_n:n∈ℕ}, with f_i≠f_j whenever i≠j. Pick a pair i,j and look at all the z’s for which f_i(z)=f_j(z). By the identity theorem for analytic functions, we can have only finitely many such z’s on any bounded subset of D; hence only countably many such z’s on all of D. As there are only countably many pairs i,j, there are only countably many z’s for which the values {f(z):f∈F} are not all distinct. With uncountably many z’s in D, we can find one for which {f(z):f∈F} is infinite. So if F is infinite, it’s not pointwise finite.

If c>ℵ₁, this proof adapts to a pointwise countable family F. This time we assume, without loss of generality, that F has ℵ₁ elements, say F={f_α:α<ℵ₁} and f_α≠f_β whenever α≠β. Again there can be only countably many z’s with f_α(z)=f_β(z) for any pair α,β, and thus at most ℵ₁ z’s for which the values {f(z):f∈F} are not all distinct. If c>ℵ₁, then we have a z for which all the values are distinct. We conclude: if F is uncountable, then it is not pointwise countable.

This argument does not use well-ordering in any essential way—instead of the ordinals less than ℵ₁, we could have indexed F with an unordered set of cardinality ℵ₁. The other direction is a different story.

Suppose c=ℵ₁. The goal is to construct an uncountable family F that is pointwise countable. We now have just as many z∈D as f∈ F. To capitalize on this, we well-order both D and F: D={z_α:α<ω₁}, F={f_β:β<ω₁} (where ω₁ is the first uncountable ordinal). F is, naturally, constructed by transfinite induction up to ω₁. Here a crucial feature of ω₁ proves invaluable: all smaller ordinals are countable. When we are trying to define f_β, we have only countably many f_γ with γ<β to worry about. Also, look at the set of values {f_β(z_α):α,β<ω₁}. If we fix α and restrict β to β<α, we get a countable set; likewise if we fix β and restrict α to α<β. It is hard to imagine how a Zorny argument could exploit these facts.

With these ingredients, we can (a) insure that {f_β(z_α):β<ω₁} is countable for every z_α; (b) define f_β by a convergent series involving the z_α with α<β (countability helps a lot with convergent series); and (c) guarantee that all the f_β are distinct. Here’s how that works. Let S be a countable everywhere dense set of complex numbers. We will arrange that f_β(z_α)∈S whenever α<β. So looking at the totality of values f_β(z_α)∈S for a given α, we have countably many in S, and countably many with β<α.

We construct f_β by transfinite induction. Suppose f_γ has already been defined for all γ<β. Since {f_γ:γ<β} is countable, let it be the same as {g_n:n∈ℕ}. We also let {z_α:α<β}={w_n:n∈ℕ}. We will insure that for all n,

f_β(w_n)∈S and f_β(w_n)≠g_n(w_n) (†)

This guarantees that f_β(z_α) is in S for all α<β, and that f_β≠f_γ for all γ<β. So we will have an uncountable, pointwise countable family.

Write

f_β(z) = ε₀+∑_k=1^∞ ε_k∏_i=0^k−1 (z−w_i)

If ε_n tends to zero fast enough, then this expression defines an entire function. How fast is fast enough? That depends only on the w_n’s. When z=w_n, the product is zero for all k>n, so we have a finite sum on the right. We choose the ε_i’s successively, in a manner that makes the conditions (†) hold. If that’s been done for all i<n, then only the final term ε_n∏_i=0ⁿ⁻¹ (z−w_i) is under our control. But that’s all we need: since S is everywhere dense, we can choose ε_n as small as we wish, while also insuring that f_β(w_n) belongs to S and that f_β(w_n)≠g_n(w_n). QED

Prev TOC Next

	Michael Weiss on Set Theory Jottings 16. Axioms…
	Bruce Smith on Set Theory Jottings 16. Axioms…
	Michael Weiss on Set Theory Jottings 12. Zermel…
	nicks808 on Set Theory Jottings 12. Zermel…
	Michael Weiss on From Kepler to Ptolemy 20

Set Theory Jottings 6. Zorn’s Lemma

Leave a comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

Set Theory Jottings 6. Zorn’s Lemma

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta