The Albert Small report on methods of breaking Tunny showed clearly how folks were using what we would now call principles of communication theory, for which Gallager’s highly-engineering-math-oriented course notes ‘book’ is excellent. Gallager writes today as thought a 1930s math type about advanced math research topics of the day. Though Small is not of the same intellectual class as the Turing and Newman (etc) on whose methods he was reporting, he also had that rather American knack of of writing clearly. In particular, he focuses our attention on the core nature of xor: that one is counting when pairs of bits are the same or different.

In Small’s report, he is at pains to teach the relevance to the binary symmetric channel to analyzing sequences (of bits). Given a history of the bits in the current sequence, what is the next bit likely to be? Is it the same as the previous one (or different)? Small shows how folks modeled that question producing an answer in terms of the expressions for the cross-over probabilities of the source channel: whether a motor bit stream is +/-, and/or whether some ciphertext stream has the same value, and whether this means two bits in the sequence then should match or differ.

Going beyond setting, one focused on tuning up the techniques for breaking. In this case the motivation of counting same and different pairings is because of the bigger-picture XOR property. That is

http://www.alanturing.net/turing_archive/archive/t/t07/TR07.php

Now, built into that overly-concise, logic-centric statement is Small’s better-expressed mental model: that two “parametric” equations are at work. Having noted the bias to dots over crosses, we know that xor enables us to count when same pairs are encountered. giving dots. The evidence for dots from delta-D holds as (noisy) evidence for the same property in delta-Z.

In GCCS terminology, we have the the number of pips (an integer count) as the magnitude of this evidence. The magnitude is the “score” in pips, note. It’s the number of pips, which is the number of xor = dot (against some breaking condition). In general the condition is the antecdents pertaining to one bit in the cam patterns of the wheel being attacked (and then the next…) – which obviously aligns with a single column in the turingismus cage.

Now, we look at the column of the Turingismus cage as a the state of the ‘signal space” at time t (where t’s unit is chi wheel-cam turnovers). For a model of the signal space, use the PAM model: a subset of the possible set of M points in a signal constellation. For tunny, the PAM model of a point in the space, the matrix M, is not square (though recall how Small noted it was made so (23*23) for certain counts of same and different pairings!

At this point we get to quote Gallager’s engineering model for inner products of wave functions under a correlation transform:

where (from earlier chapter) one may project onto a “subspace”

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare

(http://ocw.mit.edu/), Massachusetts Institute of Technology.

and focus on the next point which adds in noise, giving a “parametric” world view:

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare

(http://ocw.mit.edu/), Massachusetts Institute of Technology.

We can think of the abstract pulse function p as a vector subspace (treating it as irrelevant which concrete function is assigned, providing it is a cyclical shift). The subspace is formed from each shift of the pulse in its cycle, in some unit time; and upon which one can project. Then, we note Gallager’s admonition: that the noise integral Zk is, for wave functions, the inner product of the (i) noise __sequence __with (ii) the subspace-of-pulses. In short, the integral noise sequence is projected onto the pulses, as they shift through the cycle-offsets (t/T) forming up their subspace.

With that in mind we can get back to Tunny breaking concerning scores of evidence (in integral pips) and the interaction with the cycles due to turnovers of the cams on the Tunny wheels. Note that the time unit t/T in the Tunny break is 1/1271, for a composite wheel X1/X2. But, what is the “measure” per pip? For this, we look for a theory and for that, first, a rationale:

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare

(http://ocw.mit.edu/), Massachusetts Institute of Technology.

To make sure we are not in confusing a random error as the signal, we want the uncertainty of that particular event to be properly modeled, noting that we can drive the uncertainty to the minimum quanta of energy by filtering the data to only consider that which is in the tails of the distribution (n standard deviations from the mean). Of course, govt-endorsed cipher design may add to a signal not Gaussian noise but their own, ahem, “characteristic” noise… But Shush!

Getting back to the modern text, lets rewrite it in Tunny speak

(http://ocw.mit.edu/), Massachusetts Institute of Technology.

To be more specific about the relationship between M, the 1271 of points of the Chi 1+2 rectangle, d the d-ottage of the motor controlling stop-and-go of the aperiodic “__noisy __PSI”), and the variance σ^{2 }of *guassian *noise Z_{k}, suppose a function of d is selected to be ασ (a certain number of standard deviations), where α is “noted” to be high enough to make detection sufficient reliable.

This is course is a summary of chi-setting, on Colossus, where tunny d is the d–as-distance between points in a signal constellation’s “nearest neighbors” in the PSI noise stream – where neighbors are inherently separated out as the extended PSI by the motor stream pseudo-Gaussian process (controlled by its dottage).

In Tuuny speak, folks converted flagged rectangles with such scores into a set of measures ( in decibans/db units) in order to then perform the BP process of inference. But what was the scale of measure to be?

Gallager helps us understand the Tunny report’s own reporting, imposing the stream room model of energy:

(http://ocw.mit.edu/), Massachusetts Institute of Technology.

We see the minimum energy due to just the right distancing and thus mixing of energy sources as Es, derived from mean-square error arguments (where we just can add/subtract easily, once in the squared-world). Of course, rearrangement allows then one to figure the bits per point (which is our Tunny measuring stick, per cipher stream Z):

(http://ocw.mit.edu/), Massachusetts Institute of Technology.

Gallagers next paragraph clears up my personal confusion (held for 2 years now) on how to see Tunny’s use of SNR alongside the Shannon’s capacity theorem’s use of SNR. (How comes only 1948 was the year of the “US” communication breakout, when clearly Freidman/GCCS folks were looking at the very same “information theory” terms deeply in 1939!). Gallager distinguishes subtly FROM that theorem’s formula:

(http://ocw.mit.edu/), Massachusetts Institute of Technology.

For Turing, he didn’t care how small was b, and simply focused on the detection theory (upon which BP then builds). For Shannon, he happened to remark upon the bound. b is just some scaling unit, and some long decimal (which perhaps explains why the ACE and Manchester baby designs had floating point and counts for 1 bits in a word array built into the very hardware, as early as 1946!!)

This is what is said in the Tunny report, albeit very obliquely:

http://www.alanturing.net/turing_archive/archive/t/t07/TR07.php

One can reduce the process to “normalized” measure theory, ie. b, and for that one would use the “accurate convergence” process (rather than the merely the actually-used estimated evidence convergence process, a limiting case).

In Tunny terms, the ratio of “energy” to sigma (SD) in units of “the bulge” is the extra power required (in measure units of beta/b) due to the excess of dots/crosses.

Now how do Tunny disclosures handle α^{2} ? And what what folks in Tunny era really doing, when they say “find the X1/X2 cams” that make beta a maximum (apart from get the functional output… the embryonic wheels!)?

We can look at the answer to the latter question as one of geometrically “skewing” the underlying euclidean plane such that the distance between the signs, when integrated under the mean-square rule, is that now-non-linearized placements of signs such that this placement maximizes the measure beta – the operational meaning of which is that the wheel bits (vectors which code up the needed skews) best reflect the evidence. We have taken the top vertical and left horiztonal “vectors” on the originally rectangular plane and progressively lengthened and shortened them (on scales of 43b and 31b) to create a custom angle between them, that is beta. We have formed a custom inner product rule, that is, working in phase space (when now thinking of points on the bloch sphere so beta is just the difference between the skew-vectors express in that coordinate system as (the cosine of) the their angle, cos(theta) = beta necessarily less than 1 to reflect the invariant).

The Tunny Report does a good job, in contrast to its explanation of beta maximization, of then explaining convergence using “inference measure logic” (needed later as part of BP, recall)

It’s the stats of delta-D that drive the whole process, of course, so fiddling with optimal skew fields at the end of the day is about fitting chi onto delta-D, as summarized by the conditional expression for delta-D_{12} is dot in the power-regime.

The Tuuny report goes on indicate nicely how flagging is a reduced set of accurate convergence processes, applied not to measure the vector-similarity of two wheel patterns (in this custom measure metric with its unique “rod”, recall), but to two rows, when looking for starts.