Sometimes things click, as they did for me tonight. Something I read in the Tunny report (on how Colossus was used in the attack on Tunny ciphering machine) never made much sense – in that I could not find modern theory to support its class of thinking. But that changed.

We had been struggling with the concept of a norm – no so much in learning how to calculate it but how to think about it, design with it, specify with it, or reason with it. So why….is it a useful crypto engineering principle?

The material above makes it clear.

First, folk contrast classical thinking (about probability distributions) with quantum algorithm thinking (about probability densities). Then, we get to see the “classful” equivalences between the concepts. Its like a plain pudding vs a rich pudding (both of which are still puddings, though).

Then for each, three types of entropy calculations are noted. In much the same way that three popular norms are labeled as L1, L2 or Linf, so one sees three means of calculating entropies: H1, H2 and H.Inf.

Click (or Aha!)…. Finally, I’ve found out why the H2() convention exists – as seen in a million elementary textbooks (without motivation). H2 entropy is to H1 entropy what L2 norm is to L1 norm. And similarly, for Linf and Hinf. We even see now why Linf (Hinf) is pertinent (a second Aha).

Its particularly interesting to see how “just well written” is that paragraph. For when discussing the contrast between the world of probability densities (rich pudding) versus (plain pudding world of) probability distribution folks can still note that densities happen to have a ‘distribution character’ of their own – not that the latter should be confused with the class of ‘distributions’ we are contrasting densities against. For the second meaning of distribution happens to refer to the array of eigenvalues of the density operator “as-if” it (too) were a distribution. We could call these classical-distribution and eigenvalue-distribution (a special case of quantum-density), to be clear.

In a theoretical part of the Tunny report, folks discussed “accurate convergence” – noting how in practice simple summing of pairs was used (versus accurate scoring of the type being discussed). This was a part of the wheel breaking process in which a rectangle of numbers were formed by scanning a series of tunny ciphertext or intermediate-cipher messages assumed to be in depth. Considering just two columns of such “evidence”, one remembers that any one column of values could be thought of the set of components of a single vector … which “summed” to the value of wheel bit at the column head. But more than this, one could be interested next in comparing two side-by-side columns (of evidence) in order to get information on whether two neighboring wheel bits (two neighboring column heads) were the same or different.

In looking a the classical definition of H2 (thinking of this symbol as a special-class-norm – involving, like more typical pure-math L2 norms, power squaring ideas leading to energy difference ideas) we see Col() – a function for counting up collisions in the samples. Or, put more simply, when two neighboring bits are the same – both dots, or both crosses – in the Colossus world. Of course, this is 2 of the 4 cases distinguished by xor – which detects sameness (in either duality).

When it comes to analogously seeing H2 defined over quantum-densities, we first see how it turns out that one can drop looking at the density of the quantum-operator itself and one can look at the operator’s eigenvalue spectrum instead, treating it as a distribution and thus applying the same H2 formula (to this “projection” of densities back onto the simpler world of distribution). No, one is not calculating H2.classical. one is calculating H2.quantum – but happenstance one gets to use the same calculator due to the projection theorem.

It just so happens that in the quantum case we see that H2.quantum is about the Trace (of the density vector squared) as we “look at things diagonally”. And this happens to be a sum of the amplitudes (squared), which just happens to the definition of a pure-math norm…

Folks polish off my understanding by then showing the Hinf. < H2 < H1 relationship, which explains why Turing was so focused on showing (custom) norms, in similar relationships. For him, evidently, his norms were entropies all along. We can now read his theorems as engineering statements about entropy flows. We can re-read his manuscript, now as indirectly making statements about expander graphs, in which the entropy flows induced by the norms are a “calculating mechanism”. This of course gets us to NSA tech (but shush!)

it looks a little funny to see the classical “L2 norm symbol”, squared (as applied to a density vector). Until one remembers that the L2 norm (vs L2 inner product) is a classical square root – when not being show in its symbolic form. In the H2 entropy case, we happen to be squaring (the hidden square root) over what is , recall, the (xor) sum of various xor’ed pairs (filtered for the cases where there is “same-ness” (or “collisions”)).

Which is of course the core of the colossus mechanism- whether looking at a columns in the rectangle during wheel breaking or similar delta-ing of two streams during setting. It makes perfect sense now to read folks 1945 writing as they were THINKING at the time – that rectangling on colossus was just a variant of a setting run. It was indeed a long run, since serial tapes bearing the evidence in sequential-access form (being a tape…) had to be scanned over and over, to do non-sequential access to the particular bits to be delta’ed and then xor-summed.

Of course, wheel breaking was an express of H2 – I now see- since they scored the (summed/delta’ed) evidence from wheel breaking by “decibanning” – which was application of the lg() function, as discussed in the modern theory. On the right scale, this allowed fast mental arithmeatic, once the values were then to be applied to turingismus – or a backtracking-powered search through the cayley graph of possible evolutions of the wheel patterns (viewed as codeword to be decoded by a viterbi decoder), leveraging log-likelihood decoding (aka tunny-era cryptanalysis).

It now makes a lot more sense why, when fedya was talking about the L2 gradient-norm, his calculation was about summing over the even-numbered vertices (only). Since gradient-norms are really all about differential norms, it makes some intuitive sense now- since we are interested in flows – to be looking at pairwise calculations like this, considering much as with the legrange operator, how one differences in-bound edges and outband-edges, in order to capture +flow-differences”+. And this channels nicely into chu;s teaching, about the L0 infinitesimal-differential, all of which motives why cryptanalysis is so focussed on the understanding the dynamics seen the rest of the dimensions and the very first eigenspace (upon which they are “projected”)

I see lots of potential for the UK “codes” school (BP) as a “superb” education resource for math-centric teaching, if one thinks like the above. I wonder if it really is teaching “higher” math and coding theory, or if its just a bunch of old men types re-telling WWII stories and potted histories, while doing the usual English thing of “failing to obtain the engineering advantage” since old-men-itis sets in, which deems it more important to say dumb platitudes (like cameron does) and thus not actually consolidate the knowhow advantage – which means the Americans get the advantage (instead), being less tonto about the platitudes and “secrecy” for its own sake