Comparing des sbox design theory with Colossus-era language about proportional bulges, reliability metrics, and walsh transforms of bulge functions.

It’s fun to compare modern language of cryptanalysis with that used in the Colossus era. Perhaps the best cipher-engineering paper I’ve ever read uses that language aiming to justify how to resist DES-era cryptanalysis in a special case.  This elaboration of the design has the side effect of teaching well how one design a cipher such as conventional DES. In particular, it gives us the terminology with which to better apply the analytical methods used in the Colossus-era attack on Tunny. The combination of precise language and applied method allows us to see both sides of the coin: designing a cipher to resist the kind of linear approximations used in the colossus era class of cryptanalysis.

But first, we actually start by focusing on DES-era design of countermeasure to differential cryptanalysis. We get to understand how the original expansion process contributes to the countermeasure and how it continues in that role when defining a world in which there are now 8 copies of the same (stronger) sbox – rather than 8 DES-sboxes with a differential trail (presumably injected by NSA with the (still secret) help of IBM to help machine processing detect DES ciphertext).

This all gets us in the mood for thinking about inputs and outputs (of the now stronger sbox). Its fun to now see quantum computing math notation being applied – which we not only understand now but we see just why it is so relevant when analyzing linear approximations between the (6) input and the (4) outputs of the sbox.

so

Note, at a), the inner product of b and S(x). b is a function – choosing some of the outputs bits to consider as a linear combination.

At b), note the focus on the fact that there are many b values. The statement are to be about what they do as a collection (when individually applied to output bits, notated as S applied to x (the input bits).

At c), we see the Walsh coefficient being applied – this being a measure on a spectral scale. One should think of the scale itself as a series of final-row buckets from –XYZ to +XYZ, on the quincux board. There is one value/coefficient for the set of all b values – where each b is chooser of which several output bits to combine.

The right way to think of this is that the value of a is a fraction, expressed in binary. One represents this fraction as a square wave (oscillating between 1 and –1). But, note that the square wave varies by choice of b. The walsh coefficient says: for any time period, compute for a given b whether the a comes out 1, or –1; and start accumulating. a run of zeroes in a might may there be several –1 values, which one “accumulates” (making –2… for a run of 2). If the run type switches from zeroes to, next, a 1 bit in a then the accumulator would head back to –1, before continuing on to 0, 1, 2… as the 1 bits “add one”. Clearly we get a range of the most negative swing point to the most positive value point, as the bits of the fraction are tree-walked in sequence. One should now be thinking of the harmonic oscillator!

Now, the next mental model is to recharacterize b in terms of something you know better: the enigma stecker. Consider that the wheels of the enigma are the s-box, and b is the (output) stecker – a rather simple linear combiner function. Similarly, the function a is the linear combiner for inputs. In each case, as in 1940, one is interested in the impact the 2 linear combiner have on each other (and the sbox within). Since the steckering a and b represents an Enigma alphabet (in base 26) – ie just two rows of the roman alphabet, with the second row jumbled – and this steckering modifes the security of the sbox, one has to ask: what is the impact of a particular  enigma-era steckering (on the fixpoints of the plaintext and ciphertext)? And more vitally, how does that thought train then apply to modern sbox design?

Compare all this conception in terms of walsh functions, forming up an orthogonal basis, to how Susskind modeled his inner product, using p-adic numerals. Susskind also performed tree walking. In his case, however, any branch would be analyzed for its color property, which might change from X to Y (or Y to X!) as one branches. In Susskind’s case, his gamma terms were about rates of color changes on walks through branching space, rather than counting the spread of branching choices per se. But, also recall how he defined a distance function between two nodes, that implicitly captures the notions of “walsh coefficients” used in the cryptanalysis context.

Now, for what its worth, it can be noted that the Susskind calculation about branching is rather like Colossus-era rectangling and convergence processes, designed to work up a hidden markov model leading to guesses about wheel (key) bits. There, one is interested in constraining a walk to meet external coordinates (the constraints about legal wheels), which impose rules on the color changes. This all drives generation of soft reliability numbers, usable much as in Viterbi decoding. While fascinating, this gets us off track, slightly. We need to keep in mind the notions of “maximum distance”, in p-adic numbering systems – as a nice analytical tool for sbox design.

At d) we see how to calculate the Walsh coefficient. It looks hard; but it is not since the right mental model is just the “quincux” board. You drop the marble in at the top and see into which bucket it lands at the bottom as the ball has to follow either a left (-1) or right (+1) path, as it hits each pin at each level of the board. Left at any one level/pin gives a –1 to the accumulator (of branching choices), and right gives a +1. Note how, getting back to point d) that there are 2 influences on the branching at any given level: that due to lienar combiners acting on the 4 output bits (i.e. some b function), and that due to a similar function acting on the 6 input bits (i.e. a). Note how all possible walks are considered, when enumerating over x. Its not so hard now, no?

In e) note how the b applies to S(x) to  form an “output linear combinations” centric inner product, since S applied to x is just the output. The a gives an inner product focused on the input bits, and possible linear combinations – expressed as bit-chooser function a.

At f) note how one forms a more classical statement of proportion – of the count due to the above which select some slices of the whole cake and a count of all the pieces of that same cake. Note how the subset of cake pieces selected for counted are defined : only count those which align with the notion that the projection of the 4 output bits onto linear combiner b (output stecker b, think!) is is equal to the projection of the 6 input bits measure on to linear combiner a (input stecker a, think!). Then count how many cake slices there are that meet this rule.

(doesn’t this make REALLY obvious the math theory behind enigma attack, and justify why folks spent so much time looking for fixpoints in the plaintext, depths, etc – to help break the steckering by forming up a linear constraint system – that then acted as search guide for the bombe!)

The way to think of an inner product, such as <a, x>, in the language of c# and computer science is that the data vector x is be subject to  the delegate/lambda a – an anonymous block of code that implements some bit selection function, denoted a. When, per classical  geometric teaching about inner products, the input vector’s shadow is projected onto the “selecting” vector denoted by a (a function, recall), one gets as result the “application” of the c# lambda. This reduces the length of the input vector (a certain sum of unit length vectors) in accordance with how many of the now-selected units/bits (of x) are to be allowed to contribute to the projected sum.

at g), we see our first Colossus-era vs enigma moment, in which the concepts of the above are turned inside out so that the counts and sums give a proportional bulge (or probability, recalling how these two things related in 1942). The probability is a half (h), plus or minus a bit (defined by the counting rules above). Note how the max accumulation is over 2**7, so it walks in a space of 2**6 can never contribute more than a half…. (i.e. see point i) )

at h) we see a scaling rule applied, since from the range of the buckets on the quincux only certain paths can contribute. One wants a “concentration measure” – for security engineering purposes – which is a limited range of buckets into which one wants a  ball on average to fall when walking a s-box-constrained quinux.

Here one should be thinking about the first wheel cage in the ECM mark II, and the wiring plan for the wheels. An Early sbox, the goal, using wire ganging at the final output, is to ensure that a particular density is calculated, concentrating paths through the graph by wiring up the pins according to the desired relationships between the 4 fixed inputs and the ganged outputs – that induces the sbox to produce probability in the desired ranges.

Later in the paper, we see lots more Tunny-era formula – in which two streams can be enabled to give an quantum-entanglement style “inter-stream correlation” that requires fourier domain calculation (just as in Tunny theory, which Chi and Psi). Here, one considers the plaintext and the key (where Psi played the role of key, in Tunny era). And, one gets to now look inside-out at the dependency not just on input or output bit (combinations) but one sees dependency on keybits, as one moves in thinking in terms of the characteristic functions.