## Turing and Tunny wheel breaking cryptanalysis (via norms and averaging)

http://turingarchive.org/viewer/?id=133&title=28

its tempting to talk about spectral gaps, expander graphs and modern notions as if Turing had a secret understanding of these topics. But we would be deluding ourselves. What Turing had was relatively more simple understanding, that happens to show that the fancy modern stuff can also be cast in terms of the simpler models. When “thinking like a cryptanalysis” and “using the data-mining toolbox of the GCHQ cryptanalysis” available on their high-end workstations with custom accelerator boards, we have to remember just what their predecessors were doing, when attacking enigma and tunny with less mechanical aids than folks use today.

When we look at the above, imagine that g is a tunny wheel, with unknown pattern. Wheel breaking is about finding that pattern! It’s summarizes a data set – as a sequential set of bits, each of which has a relative weight – much like any histogram. We are counting – by approximation – how many times the value of that bit of the wheel concurs with the values found in data set, for the associated position on the wheel-derived impulse stream.

in the lemma we are given, we see turing considering two or three wheel bits – in a “local function” with “local effects” – and their mutual relationships. These are analyzed using operators norms, where for a mean=1 world (normalized to unitary, that is) the “compression” of the information expressed by norm2(g) can be contrasted with the compression of the information by the Rf operator, norm2(R.f). Of course, this is just a ratio with the “data” of g() being the quotient of numerator. Or, once one gets back to using group theory, a factor group of H/H1.

Now Turing knows, from standard math, that the average of the sum of the next few “local terms” from the data stands in a particular relation to the sum of averages. And so he generalizes this. He knows that “rules of wheel construction” mean that there are local relationship between pairs of wheel bits – being correlated therefore. A “decoder-detector” doing operator-averaging can thus be testing for these correlates.

When we see him modeling v and c, he is just looking at a “local horizon” of 3 wheel bits (or two changes, that is). And  note his lemma, expressed in conditional entropy or “mutual information” terms. The “operator norm” is indistinguishable from the norm of the data being averaged, a case that would occcur if the markov conditions were to ever hold (there are no correlations between the wheel bits). Of course, that’s not reality. But its just a modeling technique.

Turing can later create a sequence of Rf terms, noting that the larger the “local sampler” gets, the smaller the correlations become – as reflected by considering the “measure” for the whole wheels bit under averaging to the measure of all of them but 1, say. The correlation is then 1 – meaning they are not correlated, since there is no (fractional) proportion of compression.

While the above may be his point in the paper (since he was not in cryptanalysis mode, but keystream generating mode where one WANTS no correlations…), the thinking always takes us back to Tunny, doing the reverse process of wanting correlations. When applied to turingismus, one considers the averaging of the next few bits, given the sampled data, knowing that local effects should show up and act as predictors/detectors. Thus one can guess bit 3, assuming bits 1 and 2, and look to the data set to see what its value should be (by guessing).

With this, we can “capture” the general mental model of wheel breaking processes, circa WWII – be they tunny or otherwise.

Of course the General Report on Tunny goes into far more detail, when considering pairwise mutual information. And, rather than being a turingismus predictor based on branch analysis, its more of a classical power analysis, driving errors to the margins of distributions so that this “custom norm” (a point distinguisher, at the end of the day) can find the needle (the key) that shows up the not-random “signature”, without deluding oneself to much that its not the false positive case due to nature’s inherent randomness. That is one has an improbability averaging system.

When you see Turing talk pure math and use terms like “condensation points” … just think of it as a tunny wheel bit, being condensed out from the information from all the depths. Then its easier! If you want you can even think of H1 (later in the analysis) as having the role of the Tunny motor, the “factor” that induces periodicity into the Chi stream – under appropriate averaging. Of course, the designers intended it to have the exact opposite analysis (induce aperiodicity) – not realizing the mutual information effect.

Of course, the same is true in DES, when one considers what the particular DES-designed subkey schedule is doing to that graph evolution! Its inducing dependencies too, presumably detectable if one has the right “averaging operator”.

Anyways:

what he could be saying above is that while its true that the distinguishing power becomes less-resolving as one tends to n, recall that k (the entropy-defining distribution for the data set) is what is is. if one takes the uniform distribution and averages it, putting in in ratio with the uniform distribution, of course one has a “infinite” impulse response, or a continuous function (compared to the discrete cases considered while not [quite] “at uniformity”). The “dirac-delta” fiction can then be called upon, allow onto to indicate that the operator-averaging applied to the k-data distribution, correlated with k distribution itself is 1 – meaning we have a one way function.

Strange to see how he thought about modern concepts! And his conceptions are somewhat more useful that the dumbed down American crypto theory given in academia (i.e. MIT).