Author Archives: Michael Weiss

Set Theory Jottings 19. GCH implies AC.

Prev TOC Next

Sierpiński’s Theorem: GCH implies AC

I had a look at the version of the proof in Cohen (§IV.12). Sierpiński was a clever fellow, and he came up with a few tricks that would be hard to motivate.

Here I will try to imagine how Sierpiński could have devised his proof. Cohen does offer one bit of intuition:

The GCH is a rather strong assertion about the existence of various maps since if we are ever given that ABP(A) then there must be a 1–1 map either from B onto A or from B onto P(A). Essentially this means that there are so many maps available that we can well-order every set.

Let A be the set we wish to well-order. Let’s write AB to mean there is an injection from A into B. GCH tells us that for any U,

AUP(A) implies UA or U≡𝒫(A)

If U is well-ordered, then UA and U≡𝒫(A) both imply that A can be well-ordered, the latter because A is naturally imbedded in 𝒫(A). But this is too simple an approach: AU already makes A well-ordered for a well-ordered U, so if we could show the antecedent we’d be done—we wouldn’t need GCH to finish the job.

Let’s not assume U is well-ordered, but instead suppose it contains a well-ordered set. Say we could show that

AW+A≤𝒫(A)

for a well-ordered set W, where ‘+’ stands for disjoint union. (That is, W×{0}∪A×{1}, or some similar trick to insure disjointness.) Then we’d have

W+A≡𝒫(A) or W+AA

Now, if W+A≡𝒫(A), then we ought to have W≡𝒫(A), just because A is “smaller” than 𝒫(A) (in some sense)—W should just absorb A, if W is “big enough”. Also, if W is “big enough” then that should exclude the other arm of the choice, where W+AA. And if W≡𝒫(A), then 𝒫(A) and so also A can be well-ordered, as we have seen.

At this point Hartog’s theorem shows up at the door. This gives us a well-ordered set W with

W≤𝒫4(A) and WA

So we have

AW+A ≤𝒫4(A)+A ?≡? 𝒫4(A)

(where ?≡? means that the equivalence needs to be proven). WA excludes W+AA, good. Let’s postpone the issue of the ‘?’. Deal first with the problem that the bounds are not tight enough for GCH to apply. We fix that by looking at:

𝒫3(A)≤W+𝒫3(A)≤𝒫4(A)+𝒫3(A) ?≡? 𝒫4(A)

So if W+𝒫3(A)≡𝒫4(A), we ought to have W≡𝒫4(A) and hence a well-ordering of A. What about the other case, W+𝒫3(A)≡𝒫3(A)? Ah, then we have

𝒫2(A)≤W+𝒫2(A)≤W+𝒫3(A)≡𝒫3(A)

and so we can repeat the argument: either W+𝒫2(A)≡𝒫3(A), which ought to make 𝒫3(A) well-ordered and hence also A well-ordered; or W+𝒫2(A)≡𝒫2(A), in which case we repeat the argument yet again. Eventually we work our way down to

AW+AW+𝒫(A)≡𝒫(A)

and W+AA is excluded since WA, and we are done.

All this relies on the intuition that if W+M≡𝒫(M), then we should have W≡𝒫(M): we used this with M=𝒫n(A) for n=0,…,3. Well, we can prove something a little weaker.

Lemma: If W+M≡𝒫(M)×𝒫(M), then W≥𝒫(M).

Proof: Suppose h:W+M→𝒫(M)×𝒫(M) is a bijection. Restrict h to M and compose with the projection to the second factor: π2⚬(hM):M→𝒫(M). Cantor’s diagonal argument shows that this map cannot be onto. (The fact that π2⚬(hM) might not be 1–1 doesn’t affect the argument.) So for some s0∈𝒫(M), we know that h(x) never takes the form (−,s0) for xM. In other words, the image of hW must include all of 𝒫(M)×{s0}. Therefore 𝒫(M)×{s0} can be mapped 1–1 to a subset of W. qed.

The missing pieces of the proof now all take the form of absorption equations. We know that 𝒫(M)×𝒫(M)≡𝒫(M+M)—as an equation for cardinals, 2𝔪2𝔪=22𝔪. If we had 2𝔪=𝔪, that would take care of that problem. The ?≡? above also takes the form 2𝔪+𝔪 ?=? 2𝔪, for 𝔪 the cardinality of 𝒫3(A).

The general absorption laws for addition depend on AC. But we do have these suggestive equations even without AC:

𝔞+ω+1 = 𝔞+ω, 2𝔪+1 = 2·2𝔪

and so if 𝔞+ω=𝔪 and 2𝔪=𝔟, then 2𝔟=𝔟. So let’s say we set B=𝒫(A+ω). Then we have 2B≡𝒫(A+ω+1)≡B (where 2B, of course, is the disjoint union of B with itself). Also BB+1≤2BB, so BB+1 and so 𝒫(B)≡2𝒫(B). So if we replace A with B, then all gaps in the argument are filled and we conclude that B can be well-ordered. But obviously A can be imbedded in B, so A also can be well-ordered. QED.

Ernst Specker proved a “local” version of Sierpiński’s Theorem: if 𝔪 and 2𝔪 both satisfy CH, then 2𝔪=ℵ(𝔪).

Prev TOC Next

Leave a comment

Filed under Set Theory

Set Theory Jottings 18. The Axiom of Determinacy

Prev TOC Next

Just denying the axiom of choice doesn’t buy you much. If you’re going to throw away AC, you should add some powerful incompatible axiom in its place. The Axiom of Determinacy (AD) has been studied in this light.

Here’s one formulation. Let S be ℕ, i.e., the set of all infinite strings of natural numbers. Let GS. Alice and Bob play a game where at step 2n, Alice chooses a number s2n, and at step 2n+1, Bob chooses a number s2n+1. If s0s1s2…∈G, Alice wins, otherwise Bob wins. We say elements of G are assigned to Alice, and elements not in G are assigned to Bob. We’ll call the infinite strings results (of the game). Rather than think of G as a set of results, think of it as a function G:S→{Alice,Bob}.

A strategy for Alice tells her how to play each move. Formally, it’s a function from the set of all number strings of finite even length to ℕ. Likewise, a strategy for Bob maps number strings of finite odd length to numbers. A game is determined if Alice or Bob has a winning strategy, i.e., if the player follows the strategy then that player will win. The Axiom of Determinacy says that each game is determined.

Interesting thing about the proof that AC → ¬AD: it’s much easier using the well-ordering theorem instead of Zorn’s lemma.

First note that there are c=ℵ00 strategies (lumping together both Alice and Bob strategies), likewise c results. Assuming AC, well-order the strategies {Sα:α<ωc}. Here ωc is the least ordinal with cardinality c, so the set {α:α<κ} has cardinality less than c for each κ<ωc.

We construct a game G by inducting transfinitely through all the strategies, at step κ considering Sκ. Our goal is to assign some result to Alice or Bob that prevents Sκ from being a winning strategy. Say Sκ is an Alice strategy. Since we assign only one result at each step, fewer than c results have been assigned before step κ. However, there are c possible results if Alice follows Sκ, since Bob can play his numbers however he wants. So there exists a result where Alice follows Sκ but this result has not yet been assigned to either player. Assign it to Bob; this thwarts Sκ. If Sκ is a Bob strategy, just switch everything around. QED

The cardinality argument at the heart of this proof is harder to pull off with Zorn’s lemma (though possible, of course). (The exact same argument works with bit strings instead of strings of natural numbers, but for some reason AD is generally stated using ℕ instead of 𝒫(ℕ).)

Prev TOC Next

Leave a comment

Filed under Set Theory

From Kepler to Ptolemy 22

Prev TOC Next

Libration Force

The Libration Force

Kepler coined the term “libration” for the oscillation of a planet’s distance from the Sun, approaching and receding.

He analyzed the libration for the eccentric-equant model, and found it unexpectedly complicated. Stephenson (p.78):

Many absurdities were involved in supposing that a planet could move, … non-uniformly, about the vacant center of the eccentric, with no guide except the apparent magnitude of the solar disk. Such complicated hypotheses, although designed to yield a perfectly simple eccentric circular path, were not physically credible…

Notice the remarkable thing that Kepler was doing here. He was analyzing motion on an eccentric circle, a model that had been in general use for nearly two millenia, apparently the simplest possible model with any empirical accuracy. He took apart this beautifully simple model and showed that as a physical process (and in the absence of solid spheres) it was really quite complicated, so complicated as to raise doubt about whether it could be real. He had performed so radical a reassessment by interpreting astronomy, for the first time, as a physical science.

Eventually Kepler achieved the elliptical orbit. Seeking a physical explanation, he hit on a magnetic force to produce the libration:

What if all the bodies of the planets are enormous round magnets? Of the earth (one of the planets, for Copernicus) there is no doubt. William Gilbert has proved it.

But to describe this power more plainly, the planet’s globe has two poles, of which one seeks out the sun, and the other flees the sun. So let us imagine an axis of this sort, using a magnetic strip, and let its point seek the sun. But despite its sun-seeking magnetic nature, let it remain ever parallel to itself in the translational motion of the globe…

Astronomia nova, Chapter 57.

The figure at the top of this post (taken from the Epitome of Copernican Astronomy) shows how it works. (The figure in the Astronomia nova has extra clutter.) Kepler explains:

[When] the strip is at A and E, there is no reason why the planet should approach or recede, since it holds its ends at equal distance from the sun, and would undoubtedly turn its point towards the sun if it were allowed to do so by the force that holds its axis straight and parallel. When the planet moves [counterclockwise] away from A, the point approaches the sun perceptibly, and the tail end recedes. Therefore, the globe begins perceptibly to navigate towards the sun. After E, the tail end perceptibly approaches and the head end recedes from the sun. Therefore, by a natural aversion, the whole globe perceptibly flees the sun…

Astronomia nova, Chapter 57. [I have changed the letters from C and F to A and E to match the diagram from the Epitome.]

Implicit: the magnetic force weakens with distance, so when the head
is closer to the Sun than the tail, the net force is attractive. And vice versa.

Kepler argued that this scheme gave the force a sinusoidal dependence on the longitude, and showed that this agreed with the libration for an elliptical orbit. Some aspects of this demonstration needed special pleading. Stephenson details the strong and the weak points of the reasoning (pp.110–117).

But: “The theory had one glaring flaw, however. The magnetic axis of the planet had to maintain a constant direction, perpendicular to the apsidal line.” (Stephenson, p.117.) The Earth’s rotational axis doesn’t come close to meeting this requirement. So why should we believe it holds for Mars? Kepler acknowledged the problem:

I will be satisfied if this magnetic example demonstrates the general possibility of the proposed mechanism. Concerning the details, however, I have my doubts. For when the earth is in question, it is certain that its axis, whose constant and parallel direction brings about the year’s seasons at the cardinal points, is not well suited to bringing about this reciprocation… And if this axis is unsuitable, it seems there is none suitable in the earth’s entire body, since there is no part of it that rests in one position while the whole body of the globe revolves in a ceaseless daily whirl about that axis.

As one possible out, Kepler appealed to a planetary mind.

Besides the radial libration, planets have a libration in latitude. This enmeshed the theory in further difficulties. Ever inventive, Kepler devised ad hockery around all these rough spots. But we have a contrast: we can trace a direct path from the whirlpool force to the area law. This cannot be said for Kepler’s libration theory. Kepler’s whirlpool speculations came years before the area law. The libration force came after the elliptical orbit.

There is a reason for this. You can justify the whirlpool force (more or less) using the conservation of angular momentum. Kepler’s libration force has no counterpart in Newtonian physics.

Prev TOC Next

Leave a comment

Filed under Astronomy, History

From Kepler to Ptolemy 21

Prev TOC Next

Nature of the Whirlpool Force

Kepler was unsure of the nature of the whirlpool force. Sometimes he compares it to light, sometimes to magnetism. Three successive marginal notes in Chapter 33 lay out the analogy with light: “The kinship of the solar motive power with light”; “Whether light is the vehicle of the motive power”; “The motive power is an immaterial species of the of the solar body”.

The Latin term species calls for a short discussion. Donahue (in his glossary) says:

This word, related to the verb “specio” (see, observe) has an extraordinarily wide range of meaning. Its root meaning is is “something presented to view”, but it can also mean “appearance”, “surface”, “form”, “semblance”, “mental image”, “sort”, “nature”, or “archetype” … I have therefore thrown up my hands, admitted defeat, and declined to translate it at all.

while Stephenson (p.68,footnote) writes:

“Image” is our rendering of Kepler’s species, which has for the most part been left untranslated in other accounts. As Kepler used it the word seems to mean the appearance or visible manifestation of the sun…

Gilbert’s De magnete furnished Kepler with another analogy, and a clue. Chapter 34 is titled “The body of the sun is magnetic, and rotates in its space.” A few quotes from it:

The magnet … has filaments (so to speak) or straight fibers (seat of the motor power) extended throughout its length. …it is credible that [the sun] … has circular fibers all set up in the same direction, which are indicated by the zodiac circle.

… It is therefore plausible, since the earth moves the moon through its species and is a magnetic body, while the sun moves the planets similarly through an emitted species, that the sun is likewise a magnetic body.

Kepler postulates filaments or fibers in the Sun as the source of the whirlpool force. These encircle the Sun along the circles of latitude. The Sun rotates, and the image of the moving fibers acts upon the planet to move it in the same direction.

Inverse Square vs. Inverse Linear

Taking a modern perspective, we have an inverse square law whenever we have a conservation law plus spherical symmetry. But with cylindrical symmetry (like the whirlpool force), we can have an inverse linear dependence.

A nice modern analogy: dipole radiation. Feynman discusses this in his Lectures, Chapter I-28:

The gradually discovered properties of electricity and magnetism … showed that these forces … fell off inversely as the square of the distance… As a consequence, for sufficiently great distances there is very little influence of one system of charges on another.

… Maxwell [to obtain a consistent system] … had to add another term to his equations. With this new term there came an amazing prediction, which was that a part of the electric and magnetic fields would fall off much more slowly with the distance than the inverse square, namely, inversely as the first power of the distance!

It seems a miracle that someone talking in Europe can, with mere electrical influences, be heard thousands of miles away in Los Angeles. How is it possible? It is because the fields do not vary as the inverse square, but only inversely as the first power of the distance.

He goes on to treat the dipole radiator. That is, an antenna. The key point: We have charges moving up and down the antenna. What matters is how that motion looks to a distant observer:

… all we have to do is project the motion on a plane at unit distance. Therefore we find the following rule: Imagine that we look at the moving charge … like a painter trying to paint a scene on a screen at a unit distance … We want to see what his picture would look like. So we see a dot, representing the charge, moving about in the picture. The acceleration of that dot is proportional to the electric field.

At root we have a simple matter of geometry. Substituting an eye for the painter, and the eye’s retina for the screen, we have the diagram below. We have a vertical antenna of height H at distance r from the eye’s lens. The image of the antenna on the retina has height I. The antenna has unit distance 1 from the lens. Thus:

From similar triangles, I/1=H/r. In other words, the length of the image is inversely proportional to the first power of the distance.

Note also that the symmetry is cylindrical and not spherical. If the line of sight is not perpendicular to the antenna, the image will be smaller, vanishing completely when the line of sight passes through the antenna.

Returning from Feynman’s Lectures to Kepler’s Astronomia nova, we can resolve the inverse-square problem. Instead of an antenna, we have the Sun’s circular fibers. Instead of the retina, we have the image (species) of those fibers moving the planet. The whirlpool force results from the sum total of those fiber images. Each image’s contribution diminishes inversely with distance, so the sum does too. For all the difference in the physics, the basic geometry remains the same for Feynman’s dipole and Kepler’s whirlpool.

Stephenson (p.75) puts it this way:

The composite motion was directed along a circle parallel to the sun’s equator (the resultant of the images of all the filaments) and it was therefore weakened—at any latitude—only as this circle expanded, in the simple proportion of distance.

This one sentence states the matter more clearly than Kepler does in Chapter 36. Nonetheless, the ingredients are all there, just scattered through the chapter.

One more thing: is the the whirlpool force is confined to the ecliptic? Some authors say that Kepler claimed this. Stephenson shows, with quotes, that this is not true. But something like it holds effectively. A planet above the solar pole would see the fibers moving in all directions, and the net effect would be complete cancellation. At intermediate latitudes, you get partial cancellation. Only at the ecliptic does the planet get the full effect. Figure 14 (p.75) of Stephenson illustrates this:

A final footnote. In 1645 Ismael Boulliau published his Astronomia philolaica. In it he noted, as Kepler had, that the whirlpool force should exhibit an inverse square dependence on distance. But he ignored Kepler’s solution to this problem. On the strength of this, the noted historian Thony Christie credits him with being The man who inverted and squared gravity. I am far less inclined to award him this accolade.

Prev TOC Next

Leave a comment

Filed under Astronomy, History

Set Theory Jottings 17. Ordinals Revisited

Prev TOC Next

In post 13 I sketched proof by transfinite induction, and definition by transfinite recursion. Let’s take a closer look at that. By definition, < means ∈ for ordinals—treat that also as vernacular. Sometimes one symbol or the other makes the meaning clearer. We must keep in mind though that we have not shown that < is a well-ordering on Ω, or even a simple ordering.

Let Ω(x) be the formula for “x is an ordinal”, thus:

(∀u,v)(uvxux)
∧(∀u,vx)(u<v v<u u=v)
∧(∀ux)(¬u<u)
∧(∀u,vx)(¬(u<v v<u))
∧(∀u,v,wx)(u<v<w u<w)
∧(∀yx)(y≠∅ → ∃u(uy ∧ ∀v(¬(v<u vy))))

The first line says x is transitive, the next four lines say that < simply-orders x, and the last line says that < is a well-ordering on x. (Quite a bit of vernacular, but by now you should know how to deal with “uvx’’, for example.)

The clauses are not as economical as possible: for example, Foundation implies the third and last line, and a transitive irreflexive relation is automatically asymmetric, so even without Foundation we don’t need the fourth line. For the moment, we proceed without using Foundation.

Within an ordinal, ∈ well-orders; we want to bootstrap this to all of Ω. First an easy observation. Given a formula φ(x), if any β∈α satisfy φ, then there is a least such β. We need only invoke Separation to provide us with the set of all β∈α satisfying φ, and then appeal to the last clause in the definition of an ordinal.

How about smallest elements over all of Ω? That is,

∃αφ(α)→∃α(φ(α) ∧ ∀β(¬(β<α∧φ(β))))

(By vernacular convention, α and β are ordinals. Formally one begins “∃x(Ω(x)∧φ(x))…’’.)

Suppose ∃γφ(γ). If none of γ’s predecessors satisfy φ, then we’re done. Otherwise suppose φ(α) with α∈γ. Because γ is well-ordered, there is is an element of γ—let’s still call it α—such that α is the smallest element of the set {β∈γ:φ(β)}. We need to know that there are no ordinals, period, that satisfy φ and precede α. But if β was such an ordinal, we’d have β∈α∈γ. Since γ is transitive, we’d have β∈γ, and we’ve already ruled that out.

The contrapositive is transfinite induction. Letting ψ be ¬φ,

∀α((∀β<α)ψ(β)→ψ(α)) → ∀αψ(α)

With transfinite induction, we can show that Ω is simply-ordered. Only trichotomy presents any wrinkles. We induct on the property, “α is comparable with every ordinal.” Suppose this is true for all β<α. Let γ be an arbitrary ordinal. If γ<β or γ=β for some β∈α, then γ<α, since α is transitive (and < means ∈). Suppose then that β∈γ for all β∈α. That is, α⊆γ. If α=γ, we’re done. Suppose α⊂γ. Let δ be the smallest element of γ∖α. We will now show that α=δ, proving α∈γ, i.e., α<γ.

The key point: both α and δ are subsets of γ, and within γ, ∈ is a simple ordering. Extensionality kicks in: we have to prove x∈α↔x∈δ, or turned around, x∉δ↔x∉α. Now, α is an initial segment of γ because α is transitive. Therefore δ>x for all x∈α because δ∉α. So x∉δ implies x≥δ implies x∉α. On the other hand, x∉α implies x>y for all y∈α, again because α is an initial segment. Since δ is the least element of γ∖α, it follows that x≥δ, i.e., ¬x<δ, i.e., x∉δ. We’re done.

So < is a well-ordering on Ω, and we can trust our intuition about it (mostly).

Next consider transfinite recursion. The discussion in post 13 applies, with a few remarks. As before, we have a “rule” ρ(α,f,s) that assigns a value s to an ordinal α given as input a function f:α→···. Our rule is a formula, and may even include parameters. Thus: ρ(α,f,s,).

Suppose that for the parameter values = this formula is “function-like”; then we’re in business. One proves in ZF that the following condition

∀α,ρ(α,f,s,)
∧ ∀α,f,s,t (ρ(α,f,s,)∧ρ(α,f,t,)→s=t)

implies, for all α, the existence of a unique function gα:α+1→··· “obeying the recursion”. The proof uses transfinite induction, of course; also the set-existence axioms, especially Replacement. I won’t comb out all the hair, but essentially: if we have an “obedient” gβ for all β<α, Replacement plus Union plus the uniqueness allow you to combine all these gβ’s into one fα with domain α. Then you use ρ to assign a value to α, extending fα to gα.

The conclusion, even leavened with vernacular, is more than I’m prepared to write. But you get a theorem—technically, a different theorem for each formula ρ. (So it’s a theorem schema.) It has the form ∀(AB), where A is the above condition and B asserts the existence of g.

We don’t need to prove that the condition holds; rather, the theorem says that whenever it holds (perhaps only for some values of the parameters), then the conclusion holds. The proof does not require Choice: the ordinals are already well-ordered. As before, we also have a formula G(α,,s); if the condition holds for some c, then G(α,,s) assigns a unique s to every ordinal—a “function-like” thingy with “domain” Ω.

As a bonus, we can rehabilitate Cantor’s “proof” of the Well-Ordering Theorem. Let M be a set, and let c be a choice function for M. Our rule ρ says that the s assigned to α is c(M∖{f(β):β<α}), provided that the difference set is not empty. Otherwise we assign M to α. (We choose M simply because it is not an element of M.) We define G by transfinite recursion; informally, we can write Ĝ(α) for the value assigned to α. (I have left the parameters M and implicit.) If we never have Ĝ(α)=M, that means that G defines an injection of Ω into M, and inverting this gives us a map from M onto Ω. Replacement then says that Ω is a set. The Largest Ordinal paradox however proves this is not so. (The formal statement: ¬∃S x(Ω(x)↔xS).) Therefore for some α, gα maps α+1 1–1 onto M, and we’ve well-ordered M.

I’ve made a full meal of this argument. Typically, one would condense this discussion into a couple of sentences: “Given a choice function c, use it to define (by transfinite recursion) an injective function from an initial segment of Ω into M. As this function cannot be defined for all ordinals, it must map the ordinals less than some α onto M, and so M is well-orderable.”

While this proof resembles Cantor’s demonstration, it borrows essential features from Zermelo’s. Gone are the successive choices with their psychological aspect, replaced with a choice function. And the “union of partial well-orderings” in Zermelo’s 1904 proof lies buried inside the transfinite recursion.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

Set Theory Jottings 16. Axioms of ZFC

Prev TOC Next

The Axioms of ZFC

The language of ZF (ℒ(ZF)) consists of basic first-order syntax, with a single binary predicate symbol ∈. Here is a list of the axioms, with a tag-line (an imprecise description) for each.

Extensionality:
Every set is determined by its elements.
Foundation:
All sets are built up in levels by starting with the empty set.
Pairs:
There is a set whose elements are any two given sets. (Or any one set.)
Union:
For every set, there is a set consisting of the union of its elements.
Power Set:
For every set, there is a set consisting of all its subsets.
Infinity:
There is an infinite set.
Replacement:
Given a set and a rule for replacing its elements, there is a set consisting of all these replacements.
Choice:
Given a set of pairwise disjoint nonempty sets, there is a set containing exactly one element from each of them.

Two more axioms:

Null Set:
There is a set with no elements.
Separation:
Given a set and a property, there is a set consisting of all the elements satisfying the property.

These are redundant. The version of Replacement we adopt implies Separation; Null Set follows from Pairs plus Separation. But they are historically important, and help understand ZFC.

Now for the formal, or at least more formal, statements. (I discussed “vernacular” in post 15. Recall that stands for stands for a list (t1,…,tn).)

Extensionality:

z(zxzy)→x=y

Using the defined symbol ⊆ (defined as ∀z(zxzy)), we could also write this as “xyyxx=y’’.

Foundation:

x(∃yx)(yx=∅)

Of course, ∩ and ∅ are defined terms, and (∃yx) is vernacular. Here is Foundation without using them:

xy(yx∧∀u(¬(uxuy)))

People sometimes call this Regularity. Foundation says there is an ∈-minimal element of x, that is, a yx none of whose elements belong to x. If that were false, we’d have an infinite descending chain xy1y2∋…. (Possibly it could end in a loop, i.e., ym=yn for some m<n.) But such an x is not “built up from ∅’’.

Foundation has an equivalent statement: There are no infinite descending ∈-chains. To prove this equivalence, you need Choice. The version above (due to von Neumann) avoids this issue, and is simpler to express.

Pairs:

xyz(z={x,y})

or without the defined term {x,y}:

xyzu(uz ↔(u=xu=y))

Since x=y is not excluded, this also covers singletons.

Null Set:

xyyx)

Union:

xu(u=⋃yxy)

or without the defined term ⋃

xuz(zu ↔∃y(zyyx))

In other words, the elements of the union are the elements of the elements of the original set.

Power Set:

xps(spsx)

Or in other words, p={s : sx}, just the kind of set-builder definition that was uncritically accepted before the paradoxes. The power set of x is denoted 𝒫(x). Without the defined term sx:

xps(sp↔∀y(ysyx))

Infinity:

w(∅∈w∧∀x(xwx∪{x}∈w))

We mentioned earlier the von Neumann representation for natural numbers: 0={∅}, 1={0}, 2={0,1}, etc. This axiom insures the existence of a set containing all the von Neumann natural numbers. I leave the vernacular-free expression of Infinity as an exercise for the fastidious.

Separation (sometimes called Subset) isn’t a single axiom, but an axiom schema. We have an instance of the schema for each formula φ(y,). For any set x and any choice of values  for , the instance says there is a set of all elements y of x satisfying φ(y,).

Separation:

 ∀xsy(ysyx∧φ(y,))

Separation justifies the use of the set-builder notation s={yx:φ(y,)}.

Replacement (sometimes called Substitution) is also an axiom schema, with an instance for each formula φ(y,z,). Suppose for some particular choice =, the formula φ(y,z,) defines z as a partial function φ° of y: Given y, there is at most one z making φ(y,z,) true. Then the range of φ° on any set x is a set.
Replacement:

x(∀y≤1z φ(y,z,) →∃w(w={φ°(y):yx}))

But the set-builder term is not justifiable by Separation. Eliminating it gives:

x[∀y≤1zφ(y,z,)
→∃wz(zw↔(∃yx)φ(y,z,))]

And we can eliminate ∃≤1z and ∃yx. We replace ∃≤1z with this:

uv(φ(y,u,)∧φ(y,v,)→u=v)

and (∃yx)φ(y,z,) with

y(yx ∧φ(y,z,))

This version of Replacement demands only that φ° be a partial function, not necessarily total. You can derive Separation from it. If φ(y,) is the separating property, then let ψ(y,z,) (for a particular =) define the partial function ψ°(y)=y when φ(y,) is true, and undefined when φ(y,) is false. Then applying ψ° to x gives the subset we want. Some authors use the “total” version of Replacement, and include Separation in their list of axioms.

Choice (AC) was the scene for major philosophical combat early in the 20th century, as we’ve seen. It’s easier to express than Replacement.

Choice:

x[(∀yx)y≠∅∧ (∀yx)(∀zx)(yzy∩z=∅)
→∃c(∀y∈x)#(c∩y)=1]

Eliminating some vernacular should be routine by now: y≠∅, yz=∅. As for #(cy)=1, this becomes

u(ucy∧∀v(vcyv=u))

where of course ucy is formally ucuy, likewise for v.

Another version of Choice does not require disjointness. It’s easy to express formally once you have the machinery of ordered pairs, and thus functions as sets of ordered pairs. It says: for every set x whose elements are all nonempty sets, there is a so-called choice function c such that c(y)∈y for all yx.

Unlike every other “set existence” axiom of ZFC, we can’t define c with set-builder notation, or indeed give any other explicit description of the choice function. We’ve seen how this lead to much distrust of AC. In 1938, Gödel showed if ZF is consistent, then ZFC is too. That calmed things down somewhat. I’ve cited Moore’s book before on the history of the Axiom of Choice.

Prev TOC Next

2 Comments

Filed under History, Set Theory

Set Theory Jottings 15. From Zermelo to ZFC: Formal Logic

Prev TOC Next

The Role of Formal Logic

In Zermelo’s original system, the Separation Axiom refers to a “definite property”. The Replacement Axiom refers to a “rule”. In 1922, Skolem proposed interpreting “definite” as “first-order definable”. So properties and rules are just formulas in the language of ZF, ℒ(ZF). With this clarification, ZFC assumes its modern form as a first-order theory.

ZF boasts a spartan vocabulary: just ∈ plus the basic symbols of first-order logic. Writing things out formally, with no abbreviations or short-cuts, rapidly snows us under unreadable expressions. We handle this (like everyone else) with semi-formal expressions; I like to call these “vernacular”. Copious hand-waving suggests how a masochist could write these out in the formal language ℒ(ZF).

Example: here’s how we say p=〈x,y〉={{x},{x,y}}, partially expanded:

a,b(p={a,b}∧a={x}∧b={x,y})

“∃a,b’’ is vernacular for “∃ab’’. Next we expand p={a,b} into

u(up↔(u=au=b))

and likewise for a={x} and b={x,y}.

A relation r is a set of ordered pairs, so more formally

(∀pr)∃a,b(p=(a,b))

which still has some vernacular. (∀pr)… more formally is ∀p(pr→…). Likewise, (∃xz)… in formal dress is ∃x(xz∧…).

To say f is a function with domain D, we start with the vernacular

f is a relation
∧ (∀(a,b)∈f)aD
∧ (∀aD) ∃!b((a,b)∈f)

“∀(a,b)∈f’’ expands to “(∀pf)∀a,b(p=(a,b)→…)’’. ∃!bφ(b), “exists a unique b satisfying φ’’, expands to ∃bφ(b)∧∀b,c(φ(b)∧φ(c)→b=c).

The vernacular f(x)=y becomes 〈x,y〉∈f. These should be enough to give you the flavor.

Often we have lists of variables, like x1,…xn. We write to reduce clutter; ∀ and ∃ have the obvious meanings.

When we get to the formal versions of Separation and Replacement, we’ll see how “property” and “rule” are made precise.

Coda

We’ve seen the informal use of “class” in ZF. This proved so convenient that NBG, a theory developed in succession by von Neumann, Bernays, and Gödel, gave a home to it. It turns out that NBG is a so-called conservative extension of ZFC: any formula of NBG that “talks only about sets” is provable in NBG iff it is provable in ZF.

In NBG, we still have only the symbol ∈ plus the basic logical symbols. However, certain members of the “universe” have the left-hand side of ∈ barred to them. If xy, then we say x is a set; anything that’s not a set is a proper class. So proper classes can have sets as elements, but cannot themselves be elements. The term class encompasses both sets and proper classes; in NBG, the variables range over classes.

Here’s how NBG skirts around Russell’s paradox. We can still write the formal expression R={x:xx}. This defines the class R, which is the class of all sets that are not elements of themselves. Is RR? No, because if it were, it would have to be a set that was not an element of itself. OK, if RR, doesn’t that mean that R satisfies the condition to be an element of R? No, not if R is a proper class—R contains only sets, no proper classes allowed!

Zermelo’s original system did not include Replacement and Foundation, although it did include Choice. Somewhat ahistorically, people use Z to refer to ZF minus Replacement, but including Separation. ZC is Z plus Choice.

ZF is a theory of “pure sets”. ZFA is “ZF with atoms”. An atom is an object that is not the null set but has no elements. The axioms of ZF can be modified to allow for this. Before the invention of forcing, Mostowski used ZFA to investigate theories without Choice.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

Set Theory Jottings 14. From Zermelo to ZFC: Replacement and Foundation

Prev TOC Next

Replacement and Foundation

Thirteen years after Zermelo published his axioms, Fraenkel pointed out that they weren’t quite strong enough. For example, you couldn’t prove the existence of the set ⋃n∈ω𝒫n(0).

Fraenkel suggest the Axiom of Replacement (aka Substitution): If we have a way of “associating” with every element m of a set M a set Xm, then {Xm:mM} is a set. In other words, if you go through a set, replacing each element with another set, you get a set. The intuition: {Xm:mM} is “no bigger” than M, and so ought to be a set also.

Combined with Axiom of Unions, this is the gateway to big sets. Replacement gives us {𝒫n(0):n∈ω}, and then Union gives us Fraenkel’s set.

Zermelo agreed with Fraenkel for this addition to the axioms. Skolem also pointed out the need.

Using transfinite recursion, we now define the Cumulative Hierarchy:

V0 = ∅
Vα+1 = Vα∪𝒫(Vα)
Vλ = ⋃α<λ Vα

Fraenkel’s set is Vω.

(Two variations: it turns out that “Vα∪’’ isn’t neeeded in the second line, you get the same sets with “Vα+1=𝒫(Vα)’’. Some authors modify the definition so that their Vλ is the same as our Vλ+1.)

Power Set plus Replacement plus Union shows that all the Vα are sets. The rank of a set is where it first appears in the cumulative hierarchy, or more precisely: rk(X) is the least α with XVα+1. The “+1’’ is so that α has rank α.

How do we know that every set appears somewhere in the cumulative hierarchy? That brings us to our second additional axiom: Foundation.

Skolem noticed the need to exclude “pathological” sets, such as A={A} or infinite descending chains A1A2∋…. Such sets will never appear in any of the Vα’s. The Foundation Axiom does the trick.

von Neumann gave the most elegant version of Foundation: Every nonempty set contains an element disjoint from it. That is, for any set A, there is an aA with aA=∅. This a is minimal in the ∈ ordering of the elements of M. That’s another way to phrase it: Every nonempty set A has a minimal element in its ∈-ordering.

Foundation is equivalent to every set having a rank. Writing this the classy way, V=⋃α∈ΩVα. The proof of this takes a few steps; I’ll save it for the end of this post.

For so-called “ordinary mathematics”, you usually don’t need to go any higher than rank ω2. Example: the complex Hilbert space L2(ℝ3), beloved by analysts and physicists alike. We build up to this by contructing all the natural numbers, then the integers, rational numbers, and reals, ordered pairs and ordered triples of reals, then functions from the ordered triples to ordered pairs, and finally equivalence classes of these functions. (Recall that two L2 functions are identified if they agree except for a set of measure zero; hence, the actual elements of L2(ℝ3) are equivalence classes.) We have all the natural numbers by the time we get to Vω. If x,yVα, then the ordered pair 〈x,y〉∈Vα+2. So if AVα, then any subset of A×A is in Vα+3. Integers are often defined as sets of ordered pairs of natural numbers, rational numbers as sets of ordered pairs of integers, and real numbers as sets of rational numbers (Dedekind cuts). If I counted right, that puts all the elements of ℝ in Vω+7 and any subset of ℝ in Vω+8. The Hilbert space in question should belong to Vω+15, again if I haven’t miscounted.

Although sets of high rank aren’t “needed” by most mathematicians, it would be quite strange to impose a “rank ceiling” limitation on the power set axiom. Just maybe if Vω+1 were adequate—but it isn’t.

Without Choice, we cannot use von Neumann’s definition of card(M) as the least ordinal equivalent to M. An alternative is Scott’s trick: card(M) is the set of all sets of least rank equivalent to M. In other words, S∈card(M) iff SM and for all TM, rk(T)≥rk(S). This lacks the simplicity of von Neumann’s definition, but it’s the best we’ve got. Scott’s trick can be used for other purposes, e.g., to define the order type of a simply-ordered set.

Okay, let’s look at the equivalence of Foundation with V=⋃α∈ΩVα. We’ll need some machinery: transitive closures, ∈-induction, and the suprenum of a set of ordinals.

Recall that a set A is transitive if baA implies bA, i.e., aA implies aA. Every set A is contained in a transitive set called its transitive closure, tc(A). First we set f(A)=⋃aAa. Then set tc(A)=⋃n∈ωfn(A), where f0(A)=A. The transitive closure is the smallest transitive set containing A: A⊆tc(A), and if AB with B transitive, then tc(A)⊆B.

Suppose φ is a property where the following implication holds: If every element of A satisfies φ, then so does A. Then ∈-induction says that φ holds for all sets. Foundation justifies this. Suppose on the contrary that A is a non-φ set. Then some element of A is non-φ. Consider the subset S of tc(A) consisting of all non-φ elements. This is nonempty by hypothesis, so let sS be ∈-minimal in S. All the elements of s belong to tc(A) because tc(A) is transitive, so therefore all of s’s elements must satisfy φ (otherwise s wouldn’t be ∈-minimal). But then s must also satisfy φ, contradiction.

Lemma: If S is a set of ordinals, then ⋃α∈Sα is an ordinal. This is a set by the Union Axiom. That S is simply-ordered under ∈ is easy as pie. Foundation makes the proof of the well-ordering property trivial, though it’s not hard to prove without it. For ordinals, α⊆β is equivalent to α≤β (we’ll prove this in a later post), so S is an upper bound to all its elements. As usual, we call the least uppper bound to S its suprenum, denoted sup S.

Now suppose X is a set, and we associate an ordinal αx with each xX. Replacement says that {αx:xX} is a set; we denote its suprenum by supxXαx.

Time to prove that every set has a rank. We use ∈-induction. Suppose all the elements of A have ranks. Set α=supaA(rk(a)+1). As aA belongs to Vrk(a)+1 and VβVα whenever β≤α, all the elements of A belong to Vα. Therefore A belongs to Vα+1.

The reverse implication is straightforward, given this fact about rank: ab implies rk(a)<rk(b). (Proof: transfinite induction.) It follows that an element of A of lowest rank is ∈-minimal in A.

Prev TOC Next

Leave a comment

Filed under History, Set Theory

From Kepler to Ptolemy 20

Prev TOC Next

The Whirlpool Force: Early Thoughts

In the Astronomia nova, Kepler introduced the whirlpool force this way:

… since there are (of course) no solid orbs, as Brahe has demonstrated from the paths of comets, the body of the sun is the source of the power that drives all the planets around. Moreover, I have specified the manner [in which this occurs] as follows: that the sun, although it stays in one place, rotates as if on a lathe, and out of itself sends into the space of the world an immaterial species of its body, analogous to the immaterial species of its light. This species itself, as a consequence of the rotation of the solar body, also rotates like a very rapid whirlpool throughout the whole breadth of the world, and carries the bodies of the planets along with itself in a gyre, its grasp stronger or weaker according to the greater density or rarity it acquires through the law governing its diffusion.

Voelkel calls this the motive force hypothesis. As we’ve seen, Kepler devised it quite early, inspired by the steady decrease in planetary speeds with increasing orbital radii.

In the Mysterium cosmographicum, in a chapter titled “Why a planet moves uniformly about the center of the equant”, he adduced a new argument: the changing speed in a single orbit. The speed at perihelion is faster than at aphelion; in fact, the speed ratio is the inverse of the distance ratio:

vperi/vap = rap/rperi

We saw in post 4 that all three laws (the equant, the inverse distance, and the area law) give this relation.

So we have two similar phenomena. Jupiter (for example) moves slower than Mars, and Mars at aphelion moves slower than Mars at perihelion. The distance from the Sun seemed to be the common factor.

Kepler delighted at finding a physical cause to replace the equant. He was out of step with most astronomers of the time. True, they despised the equant. But Copernicus had replaced its non-uniform motion with the uniform motion of a small epicycle (often called an epicyclet). This they admired, while rejecting heliocentrism.

Kepler’s old teacher Maestlin discovered a geometrical demonstration for the near-equivalence of the Copernican epicyclet with Ptolemy’s equant; he communicated this to Kepler in a letter in 1595. (See Voelkel (p.19) or Evans (p.1013) for the proof.) The next year, in the Mysterium cosmographicum, Kepler wrote:

The path of the planet is eccentric, and it is slower when it is further out, and swifter when it is further in. For it was to explain this that Copernicus postulated epicycles, Ptolemy equants… Therefore at the middle part of the eccentric path … the planet will be slower, because it moves further away from the Sun and is moved by a weaker power; and in the remaining part it will be faster, because it is closer to the Sun and subject to a stronger power…

Nowadays we know that these two phenomena stem from different physics. Kepler’s second law reflects the conservation of angular momentum; it would hold with any central force. Kepler’s third law comes from the inverse square law for gravity plus the formula for centripetal force: F ∝ 1/r2 and F v2/r. For orbits with small eccentricities, we have approximately

v ∝ 1/r Kepler’s 2nd
v ∝ 1/√r Kepler’s 3rd

From the start, Kepler favored an inverse distance law for the whirlpool force. In a letter to Maestlin in 1595, he suggested a way to derive the orbital radii from the much more accurately known periods. (He needed the radii to test his polyhedral hypothesis.) He noted that two factors contributed to the longer periods of the more distant planets. First, they have to traverse a longer orbit. Second, they do so at a slower speed. Now, the motive force originates in the Sun and spreads out evenly over the orbits, so it should diminish in inverse proportion. Here is the passage, quoted in Voelkel (p.39). (Kepler uses ‘motion’ to mean motive force (proportional to speed), ‘orbs’ to mean orbits, and ‘circles’ to mean circumferences.)

There is, as I said, a moving spirit [motrix anima] in the Sun. If equal motion and the same strength came from the Sun into all orbs, one would still circulate more slowly than another on account of the inequality of the orbs. The periodic times would be as the circles. For quantity measures motion. However, circles [go] as the radius, namely as the distance. Thus from the certainly-known mean motions we could easily construct also the mean distances. But another cause enters which makes the more remote slower. Let us take the experience [experimentum] of light. For as both light and motion are connected in their origin so also [are they connected] in their actions, and perhaps light itself is the vehicle of motion. Therefore, in a small orb and also in a small circle near the Sun, there is as much light as there is in a large and more remote sphere. Therefore the light is thinner in the large, and denser and stronger in the narrow. And this strength is in inverse proportion to the circles, or the distances.

Note the last sentence. In the Mysterium cosmographicum Kepler repeated the argument:

Let us suppose, then, as is highly probable, that motion is dispensed by the Sun in the same proportion as light. Now the ratio in which light spreading out from a center is weakened is stated by the opticians. For the amount of light in a small circle is the same as the amount of light or of the solar rays in the great one. Hence, as it is more concentrated in the small circle, and more thinly spread in the great one, the measure of this thinning out must be sought in the actual ratio of the circles, both for light and for the moving power. Therefore in proportion as Venus is wider than Mercury, so Mercury’s motion is stronger, or swifter, or brisker, or more vigorous than that of Venus, or whatever word is chosen to express the fact. But in proportion as one orbit is wider than another, it also requires more time to go round it, although the force of the motion is equal in both cases. Hence it follows that one excess in the distance of a planet from the Sun acts twice over in increasing the period; and conversely, the increase in the period is double the difference in the distances.

Perhaps you already see two problems with this. First, the analogy with light indicates an inverse square dependence, not inverse linear. Second, neither of these are the right law. Let T be the period and r the orbital radius. An inverse linear dependence for the whirlpool force dictates that T is proportional to r2; an inverse square, to r3. But T is proportional to r3/2, as Kepler would eventually discover.

Kepler convinced himself that the inverse distance law fit the available data. Note the end of the passage above: “the increase in the period is double the difference in the distances”. For example (with made-up numbers): Say Planets 1 and 2 have T1=100, T2=400, and r1=1000. T2T1=300, or 3 times T1. The increase in the radii, i.e., r2r1, should then be 1.5 times r1, or 1500. So r2 should be 2500. Compare this with the correct result of assuming T proportional to r2: r2=2000. (Kepler carries out a similar computation with Mercury and Venus.)

Kepler applied this procedure to adjacent pairs of planets, using the Copernican periods to calculate the ratios of the distances. The results agreed (sort of1) with the Copernican ratios for the distances. Still, the procedure makes no sense. In the second edition of the Mysterium cosmographicum (twenty five years later) Kepler added this footnote:

Here the mistake begins… Now what I ought to have inferred … is that the ratio of the periods is the square of the ratio of the distances, not because I hold it to be true, for it is only the 3/2th power, as we shall hear, but because it was the legitimate conclusion from this line of argument. You see how at this point the arithmetic mean was taken, by halving the difference, when the geometrical mean should have been taken.

Next: inverse square or inverse linear? Voelkel remarks

whereas light propagates spherically, Kepler confined his attention to the plane of the orbit… Only much later did he reconsider the spherical propagation of the motive virtue and address the problem of whether the strength ought, as light, to decrease as the square of the distance. [p.40]

And in a footnote Voelkel adds, “In his first thoughts about the propagation of motor virtue, he appears to have thought only about the plane of the orbit.”

Kepler’s Pars Optica (1604) clearly stated the inverse square law for light. By the time of the Astronomia nova, Kepler realized he had a problem. In his letter to Maestlin, he had said “perhaps light itself is the vehicle of motion”. In Chapter 33 of the Astronomia nova, he asserts

…although this light of the sun cannot be the moving power itself, I leave it to others to see whether light may perhaps be so constituted as to be, as it were, a kind of instrument or vehicle, of which the moving power makes use.

This seems gainsaid by the following: first, light is hindered by the opaque, and therefore if the moving power had light as a vehicle, darkness would result in the movable bodies being at rest; again, light spreads spherically in straight lines, while the moving power, though spreading in straight lines, does so circularly; that is, it is exerted in but one region of the world, from east to west, and not the opposite, not at the poles, and so on. But we shall be able to reply plausibly to these objections in the chapters immediately following.

In Chapter 36 he amplifies the second objection before resolving it:

This objection wearied me for a long time without offering any prospect of a solution.

It was demonstrated in Chapter 32 that the intension and remission of a planet’s motion [i.e., the time taken to traverse a given length] is in simple proportion to the distances. It appears, however, that the power emanating from the Sun should be intensified and remitted in the duplicate or triplicate ratio of the distances or lines of efflux. [I.e., as the square or cube of the distances.]

As this post is long enough, I’ll resume the story next time.

[1] But as Stephenson notes, the Copernican distances were not that accurate; Kepler’s third law would not have fit very well either.

Prev TOC Next

2 Comments

Filed under Astronomy, History

From Kepler to Ptolemy 19

Prev TOC Next

Kepler’s Physics

Let me repeat the motto, already implicit in the Mysterium cosmographicum, but come into full bloom in the Astronomia nova:

Forces emanating from the Sun guide or drive the planets in their orbits.

A word about the word ‘force’. F=ma this is not. Kepler’s forces are Aristotelian, satifying Fv (force is proportional to speed). Not only that: we must be careful (as Stephenson writes (p.187)) not to “impute to Kepler a conceptual exactness which his physics—in contrast with his astronomy—did not possess …The word force, vis, did not yet have the clarity of an accepted technical term.”1

I wrote “forces” above. Kepler ends up with three forces in the Astronomia nova: (a) Gravity, which plays no role in determining the orbits. (b) A whirlpool-like force that sweeps the planets around the Sun. Kepler calls this the Sun’s species; I will call it the whirlpool force. (c) A force responsible for the elliptical shape of the orbit. Kepler calls this the magnetic virtue; I will call it the libration force, for reasons that will become apparent2.

Souls and Minds

Ptolemy in the Planetary Hypotheses endowed the planets with souls, the source of their motion. This seems odd to us. But remember that for the ancient Greeks, the planets were celestial beings, perhaps even gods. On Earth, what moves under its own power? Animals and people, of course. Add to this the authority of Aristotle. Here’s how Kepler recaps Aristotle’s Metaphysics (Book 12, part 8) in the Astronomia nova, Ch.2:

…having thus accumulated 49 orbs in all (or 53 or 55, following Callipus), he attributed to each its own mover … In this way, he introduced us to separate minds which, it turned out, were gods, as the perpetual administrators of the heavens’ motions. They also bestowed a moving soul [anima motrix], more closely attached to the orbs and giving them form, so that the mind would only have to give assistance … They therefore transferred this potentiality for creating motion to a soul …

Now this coupling of mind and soul is indeed quite in agreement with the detailed considerations of the astronomers, even though the philosophers’ mode of argument is chiefly metaphysical. For it is the same in humans: the moving faculty is one thing; that which makes use of the moving faculty according to the indications of the senses—the Will—is another … Similarly, if we should propose these Aristotelian orbs as objects of contemplation, two things will present themselves to us: (1) the motive force, from whose activity and constant strength the time of revolution arises; (2) the direction in which it acts. The former is more correctly ascribed to the animate faculty, and the latter to its intelligent or remembering nature.

More succinctly, Stephenson writes:

In these early chapters of the Astronomia nova, souls furnished the motion and minds the guidance.

Stephenson (p.30)

From the first, Kepler preferred a different explanation for the planets’ motion. The decisive clue: the farther a planet is from the Sun, the slower it moves. In Kepler’s earliest extant astronomical writing (a disputation he wrote while still at Tübingen, six years before the Mysterium cosmographicum), he proposed that the motive cause resided in the Sun, “whence as if from the center and the certain heart of the world it would extend itself through its effect most equally to all surrounding orbs” (quoted in Voelkel, p.29). Five years later, in a letter to his old teacher Maestlin, Kepler repeated this idea:

…there is in the Sun a moving spirit [anima movens] and an infinite motion … that strength of motion (as light in optics), which is further from the source is that much weaker.

—quoted in Voelkel (p.36)

The next year, in the Mysterium cosmographicum, he wrote

… one of two conclusions must be reached: either the moving souls are weaker the further they are from the Sun; or, there is a single moving soul in the center of all the spheres, that is, in the Sun, and it impels each body more strongly in proportion to how near it is. In the more distant ones on account of their remoteness and the weakening of its power, it becomes faint, so to speak.

—quoted in Voelkel (p.54)

He immediately opts for the second conclusion. Notice that Kepler has not dispensed with souls, just relocated and united them all in the Sun. This line of thought led to the whirlpool force, culminating in the Second Law—as we will see.

Twenty-five years later Kepler published a second edition of the Mysterium cosmographicum with new notes. Here is the note for the phrase “there is a single moving soul”:

If for the word “soul” you substitute the word “force”, you have the very same principle on which the Celestial Physics is established in the Commentaries on Mars [i.e., Astronomia nova], and elaborated in Book IV of the Epitome of Astronomy. For once I believed that the cause which moves the planets was precisely a soul, as I was of course imbued with the doctrines of J.C. Scaliger on moving intelligences. But when I pondered that this moving cause grows weaker with distance, and that the Sun’s light also grows thinner with distance from the Sun, from that I concluded, that this force is something corporeal, that is, an emanation which a body emits, but an immaterial one.

The planets do not revolve in circular orbits concentric with the Sun. Here Kepler thought that planetary minds might come into play. Even with an eccentric circular orbit, the direction is sometimes partly towards the Sun, sometime away from it. As Stephenson puts it,

[W]hen Kepler spoke of a mind in the heavens he was considering the problem of controlling motion; more specifically, the problem of obtaining sufficient information to constrain motion into the regular path that was observed.

Stephenson (p.30)

What computations would the mind have to make, what data would it need, and how would it obtain it? Even for a circular orbit, this perplexed him. The notion of the libration force, along with the possibility of an oval orbit, gradually took shape in Kepler’s thought. Although references to souls and minds occur throughout the Astronomia nova, physics began to displace planetary “psychology”. Stephenson again:

Kepler thus proposed two quite different theories in Chapter 57 to account for the planet’s [oval orbit]. One relied upon natural forces, magnetic or quasi-magnetic, while the other supposed some kind of primitive planetary mind.

Stephenson (p.120)

By this time, in fact, Kepler was decidedly less enthusiastic than he had once been about planetary minds. Years of physical speculation had brought an increasingly confident belief that natural means alone could explain the patterns of motion for which he had, hesitantly, once invoked mental control. … The change in Kepler’s attitude was a gradual one, occurring over a period of years and never really completed. Celestial minds remained a part of Kepler’s universe; only from his astronomy can one see them discreetly withdrawing.

Stephenson (p.132)

Souls, too, did not entirely disappear:

Kepler believed that a motive soul was necessary to account for the persistence of the solar rotation… He offered several reasons why it should be thought plausible that there was a soul in the body of the sun … Generally he thought that a soul was necessary to sustain the rotation, but that no mind or intelligence was needed, because the motion was constant in direction and speed.

Stephenson (p.141)

[1] Kepler also employed the Latin virtus and species, sometimes translated ‘power’ and ‘image’, not with a clear distinction from vis. I will mostly steer clear of these subtleties. (For his translation of the Astronomia nova, Donahue (pp.107–108) wrote about species: “The translator, in the end, has thrown up his hands, admitted defeat, and has declined to translate it at all.”)

[2] In a previous version of the post, I called this the quasi-magnetic force, but I now feel that libration force is better. Stephenson calls it the libratory force.

Prev TOC Next

Leave a comment

Filed under Astronomy, History