Monthly Archives: August 2025

Set Theory Jottings 17. Ordinals Revisited

Prev TOC Next

In post 13 I sketched proof by transfinite induction, and definition by transfinite recursion. Let’s take a closer look at that. By definition, < means ∈ for ordinals—treat that also as vernacular. Sometimes one symbol or the other makes the meaning clearer. We must keep in mind though that we have not shown that < is a well-ordering on Ω, or even a simple ordering.

Let Ω(x) be the formula for “x is an ordinal”, thus:

(∀u,v)(uvxux)
∧(∀u,vx)(u<v v<u u=v)
∧(∀ux)(¬u<u)
∧(∀u,vx)(¬(u<v v<u))
∧(∀u,v,wx)(u<v<w u<w)
∧(∀yx)(y≠∅ → ∃u(uy ∧ ∀v(¬(v<u vy))))

The first line says x is transitive, the next four lines say that < simply-orders x, and the last line says that < is a well-ordering on x. (Quite a bit of vernacular, but by now you should know how to deal with “uvx’’, for example.)

The clauses are not as economical as possible: for example, Foundation implies the third and last line, and a transitive irreflexive relation is automatically asymmetric, so even without Foundation we don’t need the fourth line. For the moment, we proceed without using Foundation.

Within an ordinal, ∈ well-orders; we want to bootstrap this to all of Ω. First an easy observation. Given a formula φ(x), if any β∈α satisfy φ, then there is a least such β. We need only invoke Separation to provide us with the set of all β∈α satisfying φ, and then appeal to the last clause in the definition of an ordinal.

How about smallest elements over all of Ω? That is,

∃αφ(α)→∃α(φ(α) ∧ ∀β(¬(β<α∧φ(β))))

(By vernacular convention, α and β are ordinals. Formally one begins “∃x(Ω(x)∧φ(x))…’’.)

Suppose ∃γφ(γ). If none of γ’s predecessors satisfy φ, then we’re done. Otherwise suppose φ(α) with α∈γ. Because γ is well-ordered, there is is an element of γ—let’s still call it α—such that α is the smallest element of the set {β∈γ:φ(β)}. We need to know that there are no ordinals, period, that satisfy φ and precede α. But if β was such an ordinal, we’d have β∈α∈γ. Since γ is transitive, we’d have β∈γ, and we’ve already ruled that out.

The contrapositive is transfinite induction. Letting ψ be ¬φ,

∀α((∀β<α)ψ(β)→ψ(α)) → ∀αψ(α)

With transfinite induction, we can show that Ω is simply-ordered. Only trichotomy presents any wrinkles. We induct on the property, “α is comparable with every ordinal.” Suppose this is true for all β<α. Let γ be an arbitrary ordinal. If γ<β or γ=β for some β∈α, then γ<α, since α is transitive (and < means ∈). Suppose then that β∈γ for all β∈α. That is, α⊆γ. If α=γ, we’re done. Suppose α⊂γ. Let δ be the smallest element of γ∖α. We will now show that α=δ, proving α∈γ, i.e., α<γ.

The key point: both α and δ are subsets of γ, and within γ, ∈ is a simple ordering. Extensionality kicks in: we have to prove x∈α↔x∈δ, or turned around, x∉δ↔x∉α. Now, α is an initial segment of γ because α is transitive. Therefore δ>x for all x∈α because δ∉α. So x∉δ implies x≥δ implies x∉α. On the other hand, x∉α implies x>y for all y∈α, again because α is an initial segment. Since δ is the least element of γ∖α, it follows that x≥δ, i.e., ¬x<δ, i.e., x∉δ. We’re done.

So < is a well-ordering on Ω, and we can trust our intuition about it (mostly).

Next consider transfinite recursion. The discussion in post 13 applies, with a few remarks. As before, we have a “rule” ρ(α,f,s) that assigns a value s to an ordinal α given as input a function f:α→···. Our rule is a formula, and may even include parameters. Thus: ρ(α,f,s,).

Suppose that for the parameter values = this formula is “function-like”; then we’re in business. One proves in ZF that the following condition

∀α,ρ(α,f,s,)
∧ ∀α,f,s,t (ρ(α,f,s,)∧ρ(α,f,t,)→s=t)

implies, for all α, the existence of a unique function gα:α+1→··· “obeying the recursion”. The proof uses transfinite induction, of course; also the set-existence axioms, especially Replacement. I won’t comb out all the hair, but essentially: if we have an “obedient” gβ for all β<α, Replacement plus Union plus the uniqueness allow you to combine all these gβ’s into one fα with domain α. Then you use ρ to assign a value to α, extending fα to gα.

The conclusion, even leavened with vernacular, is more than I’m prepared to write. But you get a theorem—technically, a different theorem for each formula ρ. (So it’s a theorem schema.) It has the form ∀(AB), where A is the above condition and B asserts the existence of g.

We don’t need to prove that the condition holds; rather, the theorem says that whenever it holds (perhaps only for some values of the parameters), then the conclusion holds. The proof does not require Choice: the ordinals are already well-ordered. As before, we also have a formula G(α,,s); if the condition holds for some c, then G(α,,s) assigns a unique s to every ordinal—a “function-like” thingy with “domain” Ω.

As a bonus, we can rehabilitate Cantor’s “proof” of the Well-Ordering Theorem. Let M be a set, and let c be a choice function for M. Our rule ρ says that the s assigned to α is c(M∖{f(β):β<α}), provided that the difference set is not empty. Otherwise we assign M to α. (We choose M simply because it is not an element of M.) We define G by transfinite recursion; informally, we can write Ĝ(α) for the value assigned to α. (I have left the parameters M and implicit.) If we never have Ĝ(α)=M, that means that G defines an injection of Ω into M, and inverting this gives us a map from M onto Ω. Replacement then says that Ω is a set. The Largest Ordinal paradox however proves this is not so. (The formal statement: ¬∃S x(Ω(x)↔xS).) Therefore for some α, gα maps α+1 1–1 onto M, and we’ve well-ordered M.

I’ve made a full meal of this argument. Typically, one would condense this discussion into a couple of sentences: “Given a choice function c, use it to define (by transfinite recursion) an injective function from an initial segment of Ω into M. As this function cannot be defined for all ordinals, it must map the ordinals less than some α onto M, and so M is well-orderable.”

While this proof resembles Cantor’s demonstration, it borrows essential features from Zermelo’s. Gone are the successive choices with their psychological aspect, replaced with a choice function. And the “union of partial well-orderings” in Zermelo’s 1904 proof lies buried inside the transfinite recursion.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

Set Theory Jottings 16. Axioms of ZFC

Prev TOC Next

The Axioms of ZFC

The language of ZF (ℒ(ZF)) consists of basic first-order syntax, with a single binary predicate symbol ∈. Here is a list of the axioms, with a tag-line (an imprecise description) for each.

Extensionality:
Every set is determined by its elements.
Foundation:
All sets are built up in levels by starting with the empty set.
Pairs:
There is a set whose elements are any two given sets. (Or any one set.)
Union:
For every set, there is a set consisting of the union of its elements.
Power Set:
For every set, there is a set consisting of all its subsets.
Infinity:
There is an infinite set.
Replacement:
Given a set and a rule for replacing its elements, there is a set consisting of all these replacements.
Choice:
Given a set of pairwise disjoint nonempty sets, there is a set containing exactly one element from each of them.

Two more axioms:

Null Set:
There is a set with no elements.
Separation:
Given a set and a property, there is a set consisting of all the elements satisfying the property.

These are redundant. The version of Replacement we adopt implies Separation; Null Set follows from Pairs plus Separation. But they are historically important, and help understand ZFC.

Now for the formal, or at least more formal, statements. (I discussed “vernacular” in post 15. Recall that stands for stands for a list (t1,…,tn).)

Extensionality:

z(zxzy)→x=y

Using the defined symbol ⊆ (defined as ∀z(zxzy)), we could also write this as “xyyxx=y’’.

Foundation:

x(∃yx)(yx=∅)

Of course, ∩ and ∅ are defined terms, and (∃yx) is vernacular. Here is Foundation without using them:

xy(yx∧∀u(¬(uxuy)))

People sometimes call this Regularity. Foundation says there is an ∈-minimal element of x, that is, a yx none of whose elements belong to x. If that were false, we’d have an infinite descending chain xy1y2∋…. (Possibly it could end in a loop, i.e., ym=yn for some m<n.) But such an x is not “built up from ∅’’.

Foundation has an equivalent statement: There are no infinite descending ∈-chains. To prove this equivalence, you need Choice. The version above (due to von Neumann) avoids this issue, and is simpler to express.

Pairs:

xyz(z={x,y})

or without the defined term {x,y}:

xyzu(uz ↔(u=xu=y))

Since x=y is not excluded, this also covers singletons.

Null Set:

xyyx)

Union:

xu(u=⋃yxy)

or without the defined term ⋃

xuz(zu ↔∃y(zyyx))

In other words, the elements of the union are the elements of the elements of the original set.

Power Set:

xps(spsx)

Or in other words, p={s : sx}, just the kind of set-builder definition that was uncritically accepted before the paradoxes. The power set of x is denoted 𝒫(x). Without the defined term sx:

xps(sp↔∀y(ysyx))

Infinity:

w(∅∈w∧∀x(xwx∪{x}∈w))

We mentioned earlier the von Neumann representation for natural numbers: 0={∅}, 1={0}, 2={0,1}, etc. This axiom insures the existence of a set containing all the von Neumann natural numbers. I leave the vernacular-free expression of Infinity as an exercise for the fastidious.

Separation (sometimes called Subset) isn’t a single axiom, but an axiom schema. We have an instance of the schema for each formula φ(y,). For any set x and any choice of values  for , the instance says there is a set of all elements y of x satisfying φ(y,).

Separation:

 ∀xsy(ysyx∧φ(y,))

Separation justifies the use of the set-builder notation s={yx:φ(y,)}.

Replacement (sometimes called Substitution) is also an axiom schema, with an instance for each formula φ(y,z,). Suppose for some particular choice =, the formula φ(y,z,) defines z as a partial function φ° of y: Given y, there is at most one z making φ(y,z,) true. Then the range of φ° on any set x is a set.
Replacement:

x(∀y≤1z φ(y,z,) →∃w(w={φ°(y):yx}))

But the set-builder term is not justifiable by Separation. Eliminating it gives:

x[∀y≤1zφ(y,z,)
→∃wz(zw↔(∃yx)φ(y,z,))]

And we can eliminate ∃≤1z and ∃yx. We replace ∃≤1z with this:

uv(φ(y,u,)∧φ(y,v,)→u=v)

and (∃yx)φ(y,z,) with

y(yx ∧φ(y,z,))

This version of Replacement demands only that φ° be a partial function, not necessarily total. You can derive Separation from it. If φ(y,) is the separating property, then let ψ(y,z,) (for a particular =) define the partial function ψ°(y)=y when φ(y,) is true, and undefined when φ(y,) is false. Then applying ψ° to x gives the subset we want. Some authors use the “total” version of Replacement, and include Separation in their list of axioms.

Choice (AC) was the scene for major philosophical combat early in the 20th century, as we’ve seen. It’s easier to express than Replacement.

Choice:

x[(∀yx)y≠∅∧ (∀yx)(∀zx)(yzy∩z=∅)
→∃c(∀y∈x)#(c∩y)=1]

Eliminating some vernacular should be routine by now: y≠∅, yz=∅. As for #(cy)=1, this becomes

u(ucy∧∀v(vcyv=u))

where of course ucy is formally ucuy, likewise for v.

Another version of Choice does not require disjointness. It’s easy to express formally once you have the machinery of ordered pairs, and thus functions as sets of ordered pairs. It says: for every set x whose elements are all nonempty sets, there is a so-called choice function c such that c(y)∈y for all yx.

Unlike every other “set existence” axiom of ZFC, we can’t define c with set-builder notation, or indeed give any other explicit description of the choice function. We’ve seen how this lead to much distrust of AC. In 1938, Gödel showed if ZF is consistent, then ZFC is too. That calmed things down somewhat. I’ve cited Moore’s book before on the history of the Axiom of Choice.

Prev TOC Next

2 Comments

Filed under History, Set Theory

Set Theory Jottings 15. From Zermelo to ZFC: Formal Logic

Prev TOC Next

The Role of Formal Logic

In Zermelo’s original system, the Separation Axiom refers to a “definite property”. The Replacement Axiom refers to a “rule”. In 1922, Skolem proposed interpreting “definite” as “first-order definable”. So properties and rules are just formulas in the language of ZF, ℒ(ZF). With this clarification, ZFC assumes its modern form as a first-order theory.

ZF boasts a spartan vocabulary: just ∈ plus the basic symbols of first-order logic. Writing things out formally, with no abbreviations or short-cuts, rapidly snows us under unreadable expressions. We handle this (like everyone else) with semi-formal expressions; I like to call these “vernacular”. Copious hand-waving suggests how a masochist could write these out in the formal language ℒ(ZF).

Example: here’s how we say p=〈x,y〉={{x},{x,y}}, partially expanded:

a,b(p={a,b}∧a={x}∧b={x,y})

“∃a,b’’ is vernacular for “∃ab’’. Next we expand p={a,b} into

u(up↔(u=au=b))

and likewise for a={x} and b={x,y}.

A relation r is a set of ordered pairs, so more formally

(∀pr)∃a,b(p=(a,b))

which still has some vernacular. (∀pr)… more formally is ∀p(pr→…). Likewise, (∃xz)… in formal dress is ∃x(xz∧…).

To say f is a function with domain D, we start with the vernacular

f is a relation
∧ (∀(a,b)∈f)aD
∧ (∀aD) ∃!b((a,b)∈f)

“∀(a,b)∈f’’ expands to “(∀pf)∀a,b(p=(a,b)→…)’’. ∃!bφ(b), “exists a unique b satisfying φ’’, expands to ∃bφ(b)∧∀b,c(φ(b)∧φ(c)→b=c).

The vernacular f(x)=y becomes 〈x,y〉∈f. These should be enough to give you the flavor.

Often we have lists of variables, like x1,…xn. We write to reduce clutter; ∀ and ∃ have the obvious meanings.

When we get to the formal versions of Separation and Replacement, we’ll see how “property” and “rule” are made precise.

Coda

We’ve seen the informal use of “class” in ZF. This proved so convenient that NBG, a theory developed in succession by von Neumann, Bernays, and Gödel, gave a home to it. It turns out that NBG is a so-called conservative extension of ZFC: any formula of NBG that “talks only about sets” is provable in NBG iff it is provable in ZF.

In NBG, we still have only the symbol ∈ plus the basic logical symbols. However, certain members of the “universe” have the left-hand side of ∈ barred to them. If xy, then we say x is a set; anything that’s not a set is a proper class. So proper classes can have sets as elements, but cannot themselves be elements. The term class encompasses both sets and proper classes; in NBG, the variables range over classes.

Here’s how NBG skirts around Russell’s paradox. We can still write the formal expression R={x:xx}. This defines the class R, which is the class of all sets that are not elements of themselves. Is RR? No, because if it were, it would have to be a set that was not an element of itself. OK, if RR, doesn’t that mean that R satisfies the condition to be an element of R? No, not if R is a proper class—R contains only sets, no proper classes allowed!

Zermelo’s original system did not include Replacement and Foundation, although it did include Choice. Somewhat ahistorically, people use Z to refer to ZF minus Replacement, but including Separation. ZC is Z plus Choice.

ZF is a theory of “pure sets”. ZFA is “ZF with atoms”. An atom is an object that is not the null set but has no elements. The axioms of ZF can be modified to allow for this. Before the invention of forcing, Mostowski used ZFA to investigate theories without Choice.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

Set Theory Jottings 14. From Zermelo to ZFC: Replacement and Foundation

Prev TOC Next

Replacement and Foundation

Thirteen years after Zermelo published his axioms, Fraenkel pointed out that they weren’t quite strong enough. For example, you couldn’t prove the existence of the set ⋃n∈ω𝒫n(0).

Fraenkel suggest the Axiom of Replacement (aka Substitution): If we have a way of “associating” with every element m of a set M a set Xm, then {Xm:mM} is a set. In other words, if you go through a set, replacing each element with another set, you get a set. The intuition: {Xm:mM} is “no bigger” than M, and so ought to be a set also.

Combined with Axiom of Unions, this is the gateway to big sets. Replacement gives us {𝒫n(0):n∈ω}, and then Union gives us Fraenkel’s set.

Zermelo agreed with Fraenkel for this addition to the axioms. Skolem also pointed out the need.

Using transfinite recursion, we now define the Cumulative Hierarchy:

V0 = ∅
Vα+1 = Vα∪𝒫(Vα)
Vλ = ⋃α<λ Vα

Fraenkel’s set is Vω.

(Two variations: it turns out that “Vα∪’’ isn’t neeeded in the second line, you get the same sets with “Vα+1=𝒫(Vα)’’. Some authors modify the definition so that their Vλ is the same as our Vλ+1.)

Power Set plus Replacement plus Union shows that all the Vα are sets. The rank of a set is where it first appears in the cumulative hierarchy, or more precisely: rk(X) is the least α with XVα+1. The “+1’’ is so that α has rank α.

How do we know that every set appears somewhere in the cumulative hierarchy? That brings us to our second additional axiom: Foundation.

Skolem noticed the need to exclude “pathological” sets, such as A={A} or infinite descending chains A1A2∋…. Such sets will never appear in any of the Vα’s. The Foundation Axiom does the trick.

von Neumann gave the most elegant version of Foundation: Every nonempty set contains an element disjoint from it. That is, for any set A, there is an aA with aA=∅. This a is minimal in the ∈ ordering of the elements of M. That’s another way to phrase it: Every nonempty set A has a minimal element in its ∈-ordering.

Foundation is equivalent to every set having a rank. Writing this the classy way, V=⋃α∈ΩVα. The proof of this takes a few steps; I’ll save it for the end of this post.

For so-called “ordinary mathematics”, you usually don’t need to go any higher than rank ω2. Example: the complex Hilbert space L2(ℝ3), beloved by analysts and physicists alike. We build up to this by contructing all the natural numbers, then the integers, rational numbers, and reals, ordered pairs and ordered triples of reals, then functions from the ordered triples to ordered pairs, and finally equivalence classes of these functions. (Recall that two L2 functions are identified if they agree except for a set of measure zero; hence, the actual elements of L2(ℝ3) are equivalence classes.) We have all the natural numbers by the time we get to Vω. If x,yVα, then the ordered pair 〈x,y〉∈Vα+2. So if AVα, then any subset of A×A is in Vα+3. Integers are often defined as sets of ordered pairs of natural numbers, rational numbers as sets of ordered pairs of integers, and real numbers as sets of rational numbers (Dedekind cuts). If I counted right, that puts all the elements of ℝ in Vω+7 and any subset of ℝ in Vω+8. The Hilbert space in question should belong to Vω+15, again if I haven’t miscounted.

Although sets of high rank aren’t “needed” by most mathematicians, it would be quite strange to impose a “rank ceiling” limitation on the power set axiom. Just maybe if Vω+1 were adequate—but it isn’t.

Without Choice, we cannot use von Neumann’s definition of card(M) as the least ordinal equivalent to M. An alternative is Scott’s trick: card(M) is the set of all sets of least rank equivalent to M. In other words, S∈card(M) iff SM and for all TM, rk(T)≥rk(S). This lacks the simplicity of von Neumann’s definition, but it’s the best we’ve got. Scott’s trick can be used for other purposes, e.g., to define the order type of a simply-ordered set.

Okay, let’s look at the equivalence of Foundation with V=⋃α∈ΩVα. We’ll need some machinery: transitive closures, ∈-induction, and the suprenum of a set of ordinals.

Recall that a set A is transitive if baA implies bA, i.e., aA implies aA. Every set A is contained in a transitive set called its transitive closure, tc(A). First we set f(A)=⋃aAa. Then set tc(A)=⋃n∈ωfn(A), where f0(A)=A. The transitive closure is the smallest transitive set containing A: A⊆tc(A), and if AB with B transitive, then tc(A)⊆B.

Suppose φ is a property where the following implication holds: If every element of A satisfies φ, then so does A. Then ∈-induction says that φ holds for all sets. Foundation justifies this. Suppose on the contrary that A is a non-φ set. Then some element of A is non-φ. Consider the subset S of tc(A) consisting of all non-φ elements. This is nonempty by hypothesis, so let sS be ∈-minimal in S. All the elements of s belong to tc(A) because tc(A) is transitive, so therefore all of s’s elements must satisfy φ (otherwise s wouldn’t be ∈-minimal). But then s must also satisfy φ, contradiction.

Lemma: If S is a set of ordinals, then ⋃α∈Sα is an ordinal. This is a set by the Union Axiom. That S is simply-ordered under ∈ is easy as pie. Foundation makes the proof of the well-ordering property trivial, though it’s not hard to prove without it. For ordinals, α⊆β is equivalent to α≤β (we’ll prove this in a later post), so S is an upper bound to all its elements. As usual, we call the least uppper bound to S its suprenum, denoted sup S.

Now suppose X is a set, and we associate an ordinal αx with each xX. Replacement says that {αx:xX} is a set; we denote its suprenum by supxXαx.

Time to prove that every set has a rank. We use ∈-induction. Suppose all the elements of A have ranks. Set α=supaA(rk(a)+1). As aA belongs to Vrk(a)+1 and VβVα whenever β≤α, all the elements of A belong to Vα. Therefore A belongs to Vα+1.

The reverse implication is straightforward, given this fact about rank: ab implies rk(a)<rk(b). (Proof: transfinite induction.) It follows that an element of A of lowest rank is ∈-minimal in A.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

From Kepler to Ptolemy 20

Prev TOC Next

The Whirlpool Force: Early Thoughts

In the Astronomia nova, Kepler introduced the whirlpool force this way:

… since there are (of course) no solid orbs, as Brahe has demonstrated from the paths of comets, the body of the sun is the source of the power that drives all the planets around. Moreover, I have specified the manner [in which this occurs] as follows: that the sun, although it stays in one place, rotates as if on a lathe, and out of itself sends into the space of the world an immaterial species of its body, analogous to the immaterial species of its light. This species itself, as a consequence of the rotation of the solar body, also rotates like a very rapid whirlpool throughout the whole breadth of the world, and carries the bodies of the planets along with itself in a gyre, its grasp stronger or weaker according to the greater density or rarity it acquires through the law governing its diffusion.

Voelkel calls this the motive force hypothesis. As we’ve seen, Kepler devised it quite early, inspired by the steady decrease in planetary speeds with increasing orbital radii.

In the Mysterium cosmographicum, in a chapter titled “Why a planet moves uniformly about the center of the equant”, he adduced a new argument: the changing speed in a single orbit. The speed at perihelion is faster than at aphelion; in fact, the speed ratio is the inverse of the distance ratio:

vperi/vap = rap/rperi

We saw in post 4 that all three laws (the equant, the inverse distance, and the area law) give this relation.

So we have two similar phenomena. Jupiter (for example) moves slower than Mars, and Mars at aphelion moves slower than Mars at perihelion. The distance from the Sun seemed to be the common factor.

Kepler delighted at finding a physical cause to replace the equant. He was out of step with most astronomers of the time. True, they despised the equant. But Copernicus had replaced its non-uniform motion with the uniform motion of a small epicycle (often called an epicyclet). This they admired, while rejecting heliocentrism.

Kepler’s old teacher Maestlin discovered a geometrical demonstration for the near-equivalence of the Copernican epicyclet with Ptolemy’s equant; he communicated this to Kepler in a letter in 1595. (See Voelkel (p.19) or Evans (p.1013) for the proof.) The next year, in the Mysterium cosmographicum, Kepler wrote:

The path of the planet is eccentric, and it is slower when it is further out, and swifter when it is further in. For it was to explain this that Copernicus postulated epicycles, Ptolemy equants… Therefore at the middle part of the eccentric path … the planet will be slower, because it moves further away from the Sun and is moved by a weaker power; and in the remaining part it will be faster, because it is closer to the Sun and subject to a stronger power…

Nowadays we know that these two phenomena stem from different physics. Kepler’s second law reflects the conservation of angular momentum; it would hold with any central force. Kepler’s third law comes from the inverse square law for gravity plus the formula for centripetal force: F ∝ 1/r2 and F v2/r. For orbits with small eccentricities, we have approximately

v ∝ 1/r Kepler’s 2nd
v ∝ 1/√r Kepler’s 3rd

From the start, Kepler favored an inverse distance law for the whirlpool force. In a letter to Maestlin in 1595, he suggested a way to derive the orbital radii from the much more accurately known periods. (He needed the radii to test his polyhedral hypothesis.) He noted that two factors contributed to the longer periods of the more distant planets. First, they have to traverse a longer orbit. Second, they do so at a slower speed. Now, the motive force originates in the Sun and spreads out evenly over the orbits, so it should diminish in inverse proportion. Here is the passage, quoted in Voelkel (p.39). (Kepler uses ‘motion’ to mean motive force (proportional to speed), ‘orbs’ to mean orbits, and ‘circles’ to mean circumferences.)

There is, as I said, a moving spirit [motrix anima] in the Sun. If equal motion and the same strength came from the Sun into all orbs, one would still circulate more slowly than another on account of the inequality of the orbs. The periodic times would be as the circles. For quantity measures motion. However, circles [go] as the radius, namely as the distance. Thus from the certainly-known mean motions we could easily construct also the mean distances. But another cause enters which makes the more remote slower. Let us take the experience [experimentum] of light. For as both light and motion are connected in their origin so also [are they connected] in their actions, and perhaps light itself is the vehicle of motion. Therefore, in a small orb and also in a small circle near the Sun, there is as much light as there is in a large and more remote sphere. Therefore the light is thinner in the large, and denser and stronger in the narrow. And this strength is in inverse proportion to the circles, or the distances.

Note the last sentence. In the Mysterium cosmographicum Kepler repeated the argument:

Let us suppose, then, as is highly probable, that motion is dispensed by the Sun in the same proportion as light. Now the ratio in which light spreading out from a center is weakened is stated by the opticians. For the amount of light in a small circle is the same as the amount of light or of the solar rays in the great one. Hence, as it is more concentrated in the small circle, and more thinly spread in the great one, the measure of this thinning out must be sought in the actual ratio of the circles, both for light and for the moving power. Therefore in proportion as Venus is wider than Mercury, so Mercury’s motion is stronger, or swifter, or brisker, or more vigorous than that of Venus, or whatever word is chosen to express the fact. But in proportion as one orbit is wider than another, it also requires more time to go round it, although the force of the motion is equal in both cases. Hence it follows that one excess in the distance of a planet from the Sun acts twice over in increasing the period; and conversely, the increase in the period is double the difference in the distances.

Perhaps you already see two problems with this. First, the analogy with light indicates an inverse square dependence, not inverse linear. Second, neither of these are the right law. Let T be the period and r the orbital radius. An inverse linear dependence for the whirlpool force dictates that T is proportional to r2; an inverse square, to r3. But T is proportional to r3/2, as Kepler would eventually discover.

Kepler convinced himself that the inverse distance law fit the available data. Note the end of the passage above: “the increase in the period is double the difference in the distances”. For example (with made-up numbers): Say Planets 1 and 2 have T1=100, T2=400, and r1=1000. T2T1=300, or 3 times T1. The increase in the radii, i.e., r2r1, should then be 1.5 times r1, or 1500. So r2 should be 2500. Compare this with the correct result of assuming T proportional to r2: r2=2000. (Kepler carries out a similar computation with Mercury and Venus.)

Kepler applied this procedure to adjacent pairs of planets, using the Copernican periods to calculate the ratios of the distances. The results agreed (sort of1) with the Copernican ratios for the distances. Still, the procedure makes no sense. In the second edition of the Mysterium cosmographicum (twenty five years later) Kepler added this footnote:

Here the mistake begins… Now what I ought to have inferred … is that the ratio of the periods is the square of the ratio of the distances, not because I hold it to be true, for it is only the 3/2th power, as we shall hear, but because it was the legitimate conclusion from this line of argument. You see how at this point the arithmetic mean was taken, by halving the difference, when the geometrical mean should have been taken.

Next: inverse square or inverse linear? Voelkel remarks

whereas light propagates spherically, Kepler confined his attention to the plane of the orbit… Only much later did he reconsider the spherical propagation of the motive virtue and address the problem of whether the strength ought, as light, to decrease as the square of the distance. [p.40]

And in a footnote Voelkel adds, “In his first thoughts about the propagation of motor virtue, he appears to have thought only about the plane of the orbit.”

Kepler’s Pars Optica (1604) clearly stated the inverse square law for light. By the time of the Astronomia nova, Kepler realized he had a problem. In his letter to Maestlin, he had said “perhaps light itself is the vehicle of motion”. In Chapter 33 of the Astronomia nova, he asserts

…although this light of the sun cannot be the moving power itself, I leave it to others to see whether light may perhaps be so constituted as to be, as it were, a kind of instrument or vehicle, of which the moving power makes use.

This seems gainsaid by the following: first, light is hindered by the opaque, and therefore if the moving power had light as a vehicle, darkness would result in the movable bodies being at rest; again, light spreads spherically in straight lines, while the moving power, though spreading in straight lines, does so circularly; that is, it is exerted in but one region of the world, from east to west, and not the opposite, not at the poles, and so on. But we shall be able to reply plausibly to these objections in the chapters immediately following.

In Chapter 36 he amplifies the second objection before resolving it:

This objection wearied me for a long time without offering any prospect of a solution.

It was demonstrated in Chapter 32 that the intension and remission of a planet’s motion [i.e., the time taken to traverse a given length] is in simple proportion to the distances. It appears, however, that the power emanating from the Sun should be intensified and remitted in the duplicate or triplicate ratio of the distances or lines of efflux. [I.e., as the square or cube of the distances.]

As this post is long enough, I’ll resume the story next time.

[1] But as Stephenson notes, the Copernican distances were not that accurate; Kepler’s third law would not have fit very well either.

Prev TOC Next

2 Comments

Filed under Astronomy, History