## power limited tunny, with points on a sphere.

The Albert Small report on methods of breaking Tunny showed clearly how folks were using what we would now call principles of communication theory, for which Gallager’s highly-engineering-math-oriented course notes ‘book’ is excellent. Gallager writes today as thought a 1930s math type about advanced math research topics of the day. Though Small is not of the same intellectual class as the Turing and Newman (etc) on whose methods he was reporting, he also had that rather American knack of of writing clearly. In particular, he focuses our attention on the core nature of xor: that one is counting when pairs of bits are the same or different.

In Small’s report, he is at pains to teach the relevance to the binary symmetric channel to analyzing sequences (of bits). Given a history of the bits in the current sequence, what is the next bit likely to be? Is it the same as the previous one (or different)? Small shows how folks modeled that question producing an answer in terms of the expressions for the cross-over probabilities of the source channel: whether a motor bit stream is +/-, and/or whether some ciphertext stream has the same value, and whether this means two bits in the sequence then should match or differ.

Going beyond setting, one focused on tuning up the techniques for breaking. In this case the motivation of counting same and different pairings is because of the bigger-picture XOR property. That is

http://www.alanturing.net/turing_archive/archive/t/t07/TR07.php

Now, built into that overly-concise, logic-centric statement is Small’s better-expressed mental model: that two “parametric” equations are at work. Having noted the bias to dots over crosses, we know that xor enables us to count when same pairs are encountered. giving dots. The evidence for dots from delta-D holds as (noisy) evidence for the same property in delta-Z.

In GCCS terminology, we have the the number of pips (an integer count) as the magnitude of this evidence. The magnitude is the “score” in pips, note. It’s the number of pips, which is the number of xor = dot (against some breaking condition). In general the condition is the antecdents pertaining to one bit in the cam patterns of the wheel being attacked (and then the next…) – which obviously aligns with a single column in the turingismus cage.

Now, we look at the column of the Turingismus cage as a the state of the ‘signal space” at time t (where t’s unit is chi wheel-cam turnovers). For a model of the signal space, use the PAM model: a subset of the possible set of M  points in a signal constellation. For tunny, the PAM model of a point in the space, the matrix M, is not square (though recall how Small noted it was made so (23*23) for certain counts of same and different pairings!

At this point we get to quote Gallager’s engineering model for inner products of wave functions under a correlation transform:

where (from earlier chapter) one may project onto a “subspace”

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

and focus on the next point which adds in noise, giving a “parametric” world view:

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

We can think of the abstract pulse function p as a vector subspace (treating it as irrelevant which concrete function is assigned, providing it is a cyclical shift). The subspace is formed from each shift of the pulse in its cycle, in some unit time; and upon which one can project. Then, we note Gallager’s admonition: that the noise integral Zk is, for wave functions, the inner product of the (i) noise sequence with (ii) the subspace-of-pulses. In short, the integral noise sequence is projected onto the pulses, as they shift through the cycle-offsets (t/T)  forming up their subspace.

With that in mind we can get back to Tunny breaking concerning scores of evidence (in integral pips) and the interaction with the cycles due to turnovers of the cams on the Tunny wheels. Note that the time unit t/T in the Tunny break is 1/1271, for a composite wheel X1/X2. But, what is the “measure” per pip? For this, we look for a theory and for that, first, a rationale:

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

To make sure we are not in confusing a random error as the signal, we want the uncertainty of that particular event to be properly modeled, noting that we can drive the uncertainty to the minimum quanta of energy by filtering the data to only consider that which is in the tails of the distribution (n standard deviations from the mean). Of course, govt-endorsed cipher design may add to a signal not Gaussian noise but their own, ahem, “characteristic” noise… But Shush!

Getting back to the modern text,  lets rewrite it in Tunny speak

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

To be more specific about the relationship between M, the 1271 of points of the Chi 1+2 rectangle, d  the d-ottage of the motor controlling stop-and-go of the aperiodic “noisy PSI”), and the variance σ2 of guassian noise Zk,  suppose a function of d is selected to be ασ (a certain number of standard deviations), where α is “noted” to be high enough to make detection sufficient reliable.

This is course is a summary of chi-setting, on Colossus, where tunny d is the d–as-distance between points in a signal constellation’s “nearest neighbors” in the PSI noise stream – where neighbors are inherently separated out as the extended PSI by the motor stream pseudo-Gaussian process (controlled by its dottage).

In Tuuny speak, folks converted flagged rectangles with such scores into a set of measures ( in decibans/db units) in order to then perform the BP process of inference. But what was the scale of measure to be?

Gallager helps us understand the Tunny report’s own reporting, imposing the stream room model of energy:

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

We see the minimum energy due to just the right distancing and thus mixing of energy sources as Es, derived from mean-square error arguments (where we just can add/subtract easily, once in the squared-world). Of course, rearrangement allows then one to figure the bits per point (which is our Tunny measuring stick, per cipher stream Z):

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

Gallagers next paragraph clears up my personal confusion (held for 2 years now) on how to see Tunny’s use of SNR alongside the Shannon’s capacity theorem’s use of SNR. (How comes only 1948 was the year of the “US” communication breakout, when clearly Freidman/GCCS folks were looking at the very same “information theory” terms deeply in 1939!). Gallager distinguishes subtly FROM that theorem’s formula:

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

For Turing, he didn’t care how small was b, and simply focused on the detection theory (upon which BP then builds). For Shannon, he happened to remark upon the bound. b is just some scaling unit, and some long decimal (which perhaps explains why the ACE and Manchester baby designs had floating point and counts for 1 bits in a word array built into the very hardware, as early as 1946!!)

This is what is said in the Tunny report, albeit very obliquely:

http://www.alanturing.net/turing_archive/archive/t/t07/TR07.php

One can reduce the process to “normalized” measure theory, ie. b, and for that one would use the “accurate convergence” process (rather than the merely the actually-used estimated evidence convergence process, a limiting case).

In Tunny terms, the ratio of “energy” to sigma (SD) in units of “the bulge” is the extra power required (in measure units of beta/b) due to the excess of dots/crosses.

Now how do Tunny disclosures handle α2 ? And what what folks in Tunny era really doing, when they say “find the X1/X2 cams” that make beta a maximum (apart from get the functional output… the embryonic wheels!)?

We can look at the answer to the latter question as one of  geometrically “skewing” the underlying euclidean plane such that the distance between the signs, when integrated under the mean-square rule, is that now-non-linearized placements of signs such that this placement maximizes  the measure beta – the operational meaning of which is that the wheel bits (vectors which code up the needed skews) best  reflect the evidence. We have taken the top vertical and left horiztonal “vectors” on the originally rectangular plane and progressively lengthened  and shortened them (on scales of 43b and 31b) to create a custom angle between them, that is beta. We have formed a custom inner product rule, that is, working in phase space (when now thinking of points on the bloch sphere so beta is just the difference between the skew-vectors express in that coordinate system as (the cosine of) the their angle, cos(theta) = beta necessarily less than 1 to reflect the invariant).

The Tunny Report does a good job, in contrast to its explanation of beta maximization, of then explaining convergence using “inference measure logic” (needed later as part of BP, recall)

It’s the stats of delta-D that drive the whole process, of course, so fiddling with optimal skew fields at the end of the day is about fitting chi onto delta-D, as summarized by the conditional expression for delta-D12 is dot in the power-regime.

The Tuuny report goes on indicate nicely how flagging is a reduced set of accurate convergence processes, applied not to measure the vector-similarity of two wheel patterns (in this custom measure metric with its unique “rod”, recall), but to two rows, when looking for starts.

Posted in colossus

## distance and count of terms hat differ

I like the way this guy writes, as we gear up for a big one!

http://www.scribd.com/doc/43395812/Coding-Theory

Having seen that BP algorithm is about find converging a channels conditional probability map for all symbols influencing a series of weights on the interacting nodes of a tanner graph, we are getting a profound respect for the notion of distance, being it Tunny dots D, the distance between points in a PAM signal constellation, or the d in (n,k,d).

Posted in coding theory, colossus

## Small and wheel breaking

See http://wp.me/p1fcz8-3qe It’s interesting to see the presumed bits derived from a guessed / in delta K for each wheel period be then counted as if on a period of 23 (psi 5) and then be used in chi1/chi2 rectangle. Flagging the underlying cages for all pairs and pair counts that are the same is analogous to a formulating a std rectangle.

Aside | Posted on

## Laurent Series, Complex Fourier, automorphisms, sigma-cycles

Rauch’s paper broaches communication theory linking series to transforms. It feels a little like the theory presentation made by Galagger who similarly explained engineering principles of communication similarly. One starts starting with continuous Fourier series, addresses discreteness,  figure in fourier transforms, worry about L2 convergence, and end up with a series of numbers (that project onto a sinc(t) cyclic shifter)!

and then

When looking at DES, we were contemplating pseudo-random means of engineering t-resilient functions. In the field of convolution code encoders, based on relative simple finite state machine ideas not unknown to Turing (!), we see how the design of the state machine can so bias the sequences such that minimum distances can be generated. In the world of Tunny cryptanalysis and the scoring of  repeat bits (see Small) , we saw how folks modeled scored probabilities in custom  (analytic) cryptographic channels. Let’s see how folks say all this properly!

also citation for following, unless otherwise noted

First, our intuition on leveraging both cyclic code and non-commutability seems right (since both were featured in Turing’s On Permutations manuscript, featuring the use of the Octonions, the Fano plane, parity constraints, and automorphisms (of the Fano plane))

The notion of ideals needs some background, particularly concerning the relationship with generators for normal and dual spaces:

wikipedia

Further (really well written) tutorial information on ideals and coding generators can be found at Cyclic Error-Correcting Codes.

But, getting back to convolutional (or correlational) codes, we see a variant of what we saw in Turing’s code-centric On Permutations manuscript (remembering we read also concerning 1930s sigma-algebras, in general)

Of course, in Turing’s infamous enigma-work, shifts of cycles (between the compartments of boxes in a steckering relation) were of paramount importance, requiring custom measures (in decibans, of course).

Now what is that mapping? It says that a polynomial (with n terms, and throwaway z placeholder) maps to an automorphism “function”, parameterized by z and sigma. Z is the  specification(s) of the delay circuit (a multi-tap shift register acting as a custom multiplier for polynomial-fields). sigma is the possible re-labelings of the nodes in such as the 7-point fano plane – where the set of all possible re-labelings are limited to the potential set by the mutual-dependency relation specification built into each “curves” (lines) of the “design”. For rationale and intuition concerning the relevance of such constraint “networks,” just think of Turing bombe menus or think colossus cryptographer evaluation the 32 count… to see it the sub-patterns are consistent with the dual space constraint set).

In webby terms, the automorphism keeps changing the names of the references, whose identity is stable despite re-naming. An ontology instance (the fano plane) constrains the graph-rewriting.

Note now how when renaming graphs, the world of constrained-renaming of nodes itself can have an algebra and said algebra itself have properties – which is what folks are saying with sigma-cylclicity. The renaming (changing the address map of the bits in a vector) is a bit like the pulse shifts in PAM!

Posted in coding theory, crypto

## turingery calculations of significance in pairwise counting

Continuing the Small analysis:

Having made columns of depths in cages (which Small calls rectangles), folks count pairwise matches. Now one wishes to test the assumption that the start that generated the assumed bits was indeed a delta-PSI’ position aligned with an actual repeat on a motor-stopped position.

Its not a total surprise that folks are forming up a ratio of the random case to the “combined flag” count case.

Presumably, a similar process could be used for the relationship between motor stops and psi wheel patterns.

The rationale and basis for applying a threshold for ignoring weights is given, nicely:

The relationship between weight classes and scoring is given here MUCH better than in any other material I’ve seen.

Small summarizes weight classes in terms of odds, in a matching paradigm, assuming a given beta and the usual factor of 2 to deciban mapping rule.

.

Posted in colossus

## Small and PSI breaking

continuing the last memo, we can focus on the method of settting wheels much as must have been used in the days before machines, for a single wheel-set subtractor ciphers, anyways.

We see our weight table come back into vogue, where it is essentially a distribution.

This derives into a set of heuristics, used when teasing out two ciphertexts (dechis, remember) that are in depth.

Good discussion of skills requires in breaking, versus setting, vs anagramming, vs motor pattern determination. You really get a feel for what methods folks are using.

Also, some turingery context, rather more operational than the theory. Note how it overlaps with rectangling…

Posted in coding theory

## Small and wheel breaking

Continuing the last memo:

Small goes on to outline wheel breaking, and the math processes in particular. He assumes the use of (well-known to the US) means of guessing wheel bits (probably from cryptanalysis techniques applied against hagelin or other types of bit-wise subtractors, leveraging “crude convergence”

What is interesting is then the further acts of convergence of sign-counting rectangles made on the fly;, assuming one wheels bits and the other wheel as a unit vector.

We recall in the Accurate convergence discussion in the General Report the derivation from the bulge concept (1 + delta) becomes (1 + delta)/(1-delta); and why we have the power concept. Small shows clearly how to go from pairwise counts to the calculate the entropy per pip – the numeric “basis” for the set of impulses applied by setting each other-wheels bit-pattern to some bit being 1 while all others 0, in turn.

Its for this reason I always think of wheel breaking as calculating the same value as log (1 + SNR), giving the baseline entropy rate for the specific density of the Cipher, itself. In American terms, Small goes on to show the estimation process (and rationale for the estimating method) for the spectral density. This is not too interesting (at my level).

If we look at the test applied to determine if the guessed wheel pips has indeed helped get out a good estimate of the other wheel, they seem to be doing what we see in the BP method at the end, after all the rounds of inference. And, they apply a threshold.

We can look at the scoring of each “component” impulse as similar in BP to gathering the beliefs from each of the parent nodes (before passing on some accumulation onto children). The factor graph is such that all variable nodes connect to all impulse nodes, taken one inference calculation at a time of course, in a serial evaluation model.

But, Small conveys in his tutorial style numerous interesting things: that wheel breaking and forming up the factor graph is a very interactive process, involving multiple rectangles , multiple messages – done to build up the weighted space. And, its clear the US has not been doing anything like the same thing – which begs the question what DID they do, beyond crude convergence of rectangles to then leverage them?? Did the bits just help them cut down brute force searching a bit!!?

Posted in colossus

## Small report on Colossus

In the General report on Tunny, till now I always struggled to comprehend the disclosures on how folks modeled the machine as a cross-over channel (one thinks BSC, etc). The small report makes it intuitively very obvious!

First, even though its implied in the Tunny report, the stats on D were collected via the Testery process. Note also how Small gives D a rather simple operational interpretation (compared to the dechi conception): it is “pseudo-ciphertext”. Having read the Tunny report, I always thought of it its key-centric conception: that key from which they had removed multi-p6eriodic Chi. Small focused on later-GCCS conceptions, after the era of using key depths.

Now, we must remember our weight classes that extend the model, above. For in the later period where one really is filtering cipher (rather than key), the channel model also considered the cross-overs for each weight class (allowing colossus then to search and count-up for those filtering criteria, specifically). This extended model reminds us of a q-ary symmetric channel, where q is the number of weight classes in the energy spectrum rather than q=26, or q=2. Whereas a basic q-ary channel is like a group whose normal subgroup allows one to formulate cosets partitions (think weight classes, now), now considser the quotient group formed when the cosets are themselves the group elements, giving one a q=6 (or so) cosets/weight cayley-table/graph – and a custom cross-over model.

Now, despite having done a great job on motivating 2 of the liens in the table, Small doesn’t focus on the remaining lines (though prints them):

Now what small’s particular phrasing does do is point out the “pairwise” counting process – which reminds me for the first time of the counting theory used in Turingismus (and repeat rate for pairings, in Enigma hand methods search for depths). One thinkgs of Delta’ing in terms of its counter-measure rationale (since it was built into various of the machine’s countermeasures’ security-strength concepts, recall). But here, the concept is “simple observation” phenomenon, usable as is Banburismus and Turingismus decoding or sieving strategies for max likelihood ordering.

With weight classes from JUST the upper set denoted, (partially) matching the weight table above.

While the colossus room might have been focused on leveraging “fits with distributions” to the filtering-problem, its interesting to see the comment concerning hand methods:

Newmany-centric folks don’t seem to have captured that well, in their post-war reporting, just how much stats tables (and scoring presumably) was driving “anagramming”. The associated “method” material seems to be still largely withheld. But, we can think of it as the means of choosing the univariables  ( g(ki) ) in the BP method, so as to project the de-chied cipher (into its two components).

Small conveys the relevance of the conditions expressions nicely, building the rationale on top of the stats of de-chi’ed cipher (which he calls D). These expressions do not have to align in any way with the nature of the weight classes, note; being simple stats.

The next table, and his phrasing, is also instructive. Rather than think that using a pair of conditions is about applying 2 filters SEQUENTIALLY when focussed on 1 and 2 streams say (a fact which requires the factor/product graph to account for), he is simply saying that the / means one can do 2 runs at once (usefully, too).

The particular  case he points out is super-interesting, as one makes settings on the colossus panels to represent the condition(s). One sees 3 conditions applied: count/filter for 4 being dot or cross, in parallel count 5 xor 4 and count 3 xor 4, filter ing the latter two for having value dot and cross respectively.

Now, what small does not talk about is the notion of sigmage  in terms of it being the ratio of the bulge beta against the SD for the random noise case. If I’ve got it right, beta varies by conditional expression, and of course the gaussian SD is obtained from the counting a random tape (enabling the table to be specified). So how did they get the betas per expression (just count the excess for dots, perhaps ?)

Now, obviously, the table of expressions are is sorted by “expected sigmage” about which Small talks in plain language:

But it’s the next few phrases that are especially interesting – given he is delivering a “tutorial” on the “cryptographic art” to his colleagues back at Arlington Hall. We should think of variance here as “uncertainty” measure – to give it an operational meaning – that itself reflect the impact of that tiny (phase-space) oscillator due to the the bulge. Under such interpretation, the variance of the run is the (linear) sum of the uncertainties from the characters specified by the conditional expression – which are NOT weight classes, but unique little non-linear truth-tables/functions:

Under such an interpretation the non-linear truth table (of a run) really is a graph, with weighted arcs from the weight classes to the target filter – with the weighting assigned to the arcs reflecting the degree to which the particular weight class impacts the stats (through its representatives referenced by the expression). Of course, a class might be just that which is know to be significant:

Im not sure from he then gets 16, when he then talks about 16 messages:

But it seems clear that he is essential refererring to each line in the truth table, that  substantiates the estimated sigmage of the run class. Perhaps 16 = 2**4, since there may be 4 impulse streams involved?

when we follow his tutorial looking, post X1X2 setting, at X3X4X5 setting mid-process, how does colossus know, for the red-noted material, say, to ONLY print the setting of K3!!?

The ultimate smell test of all assumptions and their consequences are tested by human intuition, in the 32 count: the smell-iness factors against which are nicely expressed by Small:

o

As Good said, it’s the gray code ordering that enables an inference from the error pattern – to predict which settings of the 5 were not  correct (so go and look at the next best scores…).

It would be fascinating to know how folks were trained in all this…or rather, how their experience and abilities to apply smell tests was “pre-honed”. That sounds like a Turing trick!

But it all just reminds me of today’s home-fiasco. Someone wanted to rush out and buy a new water heater, since there is so little hot water for days now (at 15F outside!). Having whined earlier and turned up the dial to max, it turns out someone turned it all the way down (not all the way up) – on the assumption that you turn a dial clockwise… to raise temp. Hmm. Given the probability of exactly such a binary error – when humans are involved – is high, how did such human frailties affect operator training and abilities!?

Be interested to see the original now, with its original notation.

Posted in colossus

## BP 1941 and Lincoln 1953

Image | Posted on

updating an old post, from 1953 American decoding on national-security projects

Originally posted on Peter's ruminations:

1953 was an interesting year, evidently.

citation covers other quotes, below

Above, we steal figure 1 from remembering that this is 1953 and folks think in terms of radio, differential equations for calculations, and signal responses. Everything is analogue.

We are interested in the encoder as a code generator, so tuning up the code so that the output symbols have particular probability –  and where each generated symbol has its own unique waveform when modulated onto a carrier. Then, since the channel may have noise (i.e. is “perturbed”), what does it take for a receiver to guess the degraded symbols, given knowledge of how the transmitter fashioned its custom probability distribution?

The probability computing receiver is defined next – requiring the computation of matrices. Note how matrices are used IN THE CONTEXT OF a probability-inference model – meaning they are more like markov chains seeking to converge to a stationary…

View original 478 more words

Posted in coding theory

## Turing’s mind cased in terms of a 1930s steam room

GCHQ apparently posted a Turing paper on repeat rates (which doesn’t appear that interesting, to be honest, from the public view we have). Without having seen what was billed as a theoretical case for the method of repeats we can yet see  the clear lineage between elements of Turingismus and Banburismus. From the Prof’s book (ellsbury edition)

Prof’s Book, transcribed at http://www.ellsbury.com/profsbk/profsbk-134.htm

Having guessed that the operators producing several enigma ciphered messages have chosen the same “position” (of rotors) or that the position chosen on one happens to overlap later with the other’s start, once things have stepped a bit, Turing advocates building a matrix — with n rows and columns whose number would have been 100-200, given the length of typical messages. For a given column, count the number of identical pairs and the number of all pairs. Averaging many such counts, form the ratio of the two to act as a discriminant of enigma traffic (of a certain era, contingent on the value).

James Coughlan does a great job of leading us back in time to understand how folks thought about cryptanalysis, in 1939 – while actually discussing application of techniques to stereoscopic vision problems. Presumably the term BP was originally a joke…(Bletchley Park, Belief Propagation)

citations for all quotes, below

http://www.ski.org/Rehab/Coughlan_lab/General/TutorialsandReference.html

You have to start out in outer-space. No, not the NASA canned type with pretty pictures but Newton’s outer space, where the nutty professor spent his mental time. You are in a gassy “fluid”, perhaps similar to steamy steam room at a spa. While you the machine may move, your impact on the fluid is pretty minimal, but at least you define an inner space. At most, you move some molecules around outer space, and the thermal energy re-distributes in accord with your use of it as you change the boundaries of your inner space. Using a thermal camera, one might imagine being able to see multi-colored “flux lines” connecting regions of the fluidic gas to your motion, reflecting the change in energies in the density. If you a nutty math type, you can think that fluxion particles are released and undergo motion, to account for the energy change (in the “vortex”). The imaginary fluxion is the infamous x.dot (of calculus fame). It strikes you with a certain momentum that reflects the very energy redistribution induced by you and the steam cloud interacting – thus describing what just happened. It also has math properties that “keep the invariants” of fluxions in a particular energy band, per the master plan of Newton’s god, now proven to have a fluidic hand in (or on) everything.

It’s not clear to me whether Turing invented BP in the early days of his BP tenure, brought it with him (from Cambridge or Princeton), or it was already at BP in genus form. The latter is quite plausible, note. But what is it?

Well to Turing trained in real analysis, he is well used to an idea: the idea of the L2 space (of finite energy functions) and integration using horizontal bands across the area-under-the curve (rather than the vertical bands know to any 14 year old math student). The change of perspective means you are used to counting up multiple components sliced out by the band to form the composite area, formed as the curve interacts with the band at multiple points. He is also used to using math for engineering-modeling – including considering the molecules in the super-misty steam room to each be a meandering functions, or a tiny histogram representing one tiny probability function of said molecule as it wanders “in a density”. The very idea of the totting up of the various contributions “area under a curve for a given band” can be generalized – to adding up the contributing functions that are located in a given multi-dimensional band, cross-crossing the multi-dimensional fluidic space.

Depending whether we want to model from the standpoint of the inner space (probability domain) or outer space (energy domain) we get:

f g lower case

F G upper case

Think of f, g and h as places in the integration systems “horizontal band” to be accumulated. Think of F and G as Newton’s fluxions – that “describe the changes” (as does the spec of any “differential” gearbox).

Now compress the two worlds into one, removing the distinction between information and meta-information. Its all the same now, so that (formally imaginary) fluxions actually interact (imparting “causal momentum” that induces changes in probabilities on molecules and other fluxions, alike).

Now my intuition tells me that Turing though in terms of tree-like “boolean networks” , rather than factor graphs associated with the edges in a network. But its also clear that he thought of  roup theory as a connected graph, whose edges (collectively) expressed the group properties. Those groups and quotient groups that naturally express linear coding calculations must not have strayed too far from his mind, as he focused on “expression trees”. For the group-graph and the expression tree allow us to see, specifically for cryptanalysis two sides of the same coin, and see them at the same time.

Anyways, let’s reduce fluxions and gassy fluids to the steam room model:

all 4 particles are modeled as a combinatoric-product (in some inner space)

red particles have mutual group properties with each other (and their implications too)

blue particles express their innate causality relations with one or more red particles, being thus “conditional probabilities”

Now at this point we can go back to the 1938/1939 crypto world, a time when folks are able to use the methode de batons to cryptanalyse certain enigma machine wheels using rod squares; but for the rest featuring additional countermeasure BP staff are focused on inventing such as depth metrics (i.e. pairwise notions) and stecker theories (i.e. more pairwise notions). The permutation group theory and its cycle-notation conventions are very much in mind, given their role in forming a catalogue of “box shapes” (conjugate classes) used when cryptanalysing the indicator function of a given enigma graph-puzzle.

So when we see

one should look at (w, x, y) as 1 cycle (i.e. a cycle expression) or “compartment” in the “boxing” of a given (13*2 bigram) alphabet for the wheel. Think of the f not as a function of (w,x,y) as 3 arguments producing a value, but as the 1 fluxion which constrains the 3 molecules in the denoted cycle.  It’s the generator matrix or its parity matrix constraint, if you will. We have take the side of fluxions, as it were, looking at the stream room from the perspective of the power shifts occurring along flux lines.

Our goal in cryptanalysis is model the flux lines, denoting each by a weighting or probability. To this end we need to introduce the first calculation model which wants to sum up the individual flux lines and express them as a final probability of the overall state space conformed of an entire system of fluxions driving causality. Of course the fluxion is the bit on the Tunny wheel (or wire in the Enigma wheel). And the cycles are the n ciphertexts in depth, whose first 6 characters are bound up in a “parity expression” relationship – that is of course the infamous Polish discovery not about parity constraints (infact) but congjuate-class constraints in a world of algebras defined by rod squares bound  up with the symmetric (and reversible) permutation group.

The first element gave us the model of conditionality. The second version of the same thing is more group-like. It says there is archetypical structure involved, with a top row (wheel bits recall) that is a sequence (left to right say), a bottom row (a similar structure, modeling the sequences in cipher/text), and the fluxions’ natures (in the connectives). Or, we might say there is history (in the wheel), present (in the flux), and future (in the ciphertext sequencing). Since the purple wheel “setting” is hidden, we can think of the history (as regards the particular green line) as being hidden and that which we seek to discover.

In this form, the diagram gives us one more thing. Looking at the edges, we can given them labels. Pairwise labels. Though our architypes have  imposed structure on the stereotypes of the ciphertext, we can always reduce the model to one of simple connectivity between nodes, all playing some stream room role as peons in a thermodynamic dance. And, in a local sense, we can always look at any one node and consider its particular neighbourhood of connectivity. Call it a clique, or call it a cliché (from 1915  world of older cipher cryptanalysis based on pattern matching and edge detection), it doesn’t matter. We have, per Turing’s later mental model about computers as giant brains, the brain neuron, its connectively to other neurons, and the role of DNA that pre-forms certain connectivity (a starter network whose design “primes the pump” beyond the threshold point in differential signaling that induces all further fluxes to converge).

we saw I<j constraint in Banburismus and Turingismus…focusing on “pairwise” metrics

Having seen how the whole is a sum of its parts (where cause-inducing fluxions and causal-relating molecules have equal bit parts on the stage), enter parallelism (and tensor products)

which derives into a debate about evaluation order and its impacts on convergence:

Once all is said and done and the system has settled down, you can use your “magnetic resonance device” to slice the inner space, so either looking out from a subspace or onto the particular subspace – giving you the cryptanalyst a video game “controller” with the space that can give either the first-person interaction or third-person interaction with the objects in the space. Of course, in crypto, we go beyond 3d and our  visualizations are somewhat different.

ok, so without ado, the overall inference world of banburimus and turingismus gets reduced to some plain words (and steam room metaphors, perhaps befitting Turing’s inner life).

But on this stage we never saw the weight of evidence, the ban, etc. SO how does this play in?

Its really quite easy. First, its 1940 and time is of the essence. So lets reduce the calculation to only what we MOST need (when sieving and sifting “wheel breaking stories” derived from the field of received data)

Since we have finally got to scoring, we get to go one final step:

Its fun to see how little this particular explanation focuses on conditionality, hypothesis, and odds. Of course, this is a vital element of the story. For that, the Judea Pearl material seems to do a better job of showing how to calculate with those conditional expressions that, even in binary fields, are really focused on conditional probabilities (not merely the if-then sequential logic of American computing). And, one sees how to calculate in correlation space (a cousin of convolution space), which like a modern turbocode decoder accomodates the idea that you need to recall that may be actually wrong, when you think you are right (or vice-versa). So plan for a pairwise world i.e. 4-state group table (a bit like coset addition tables for quotient groups!)

:

Posted in crypto, enigma

## gene sequencing and stochastic processes

Seems useful to look at  other application, beyond comms and crypto, of sequencing theory = particularly where one is looking for underlying theories for natural sequences.

Stochastic Models – Markov Chains

limiting distributions get an airing at

http://www.scribd.com/doc/51523621/Stochastic

Posted in crypto

## DES Design Theory (#14)

The notion of L2 convergence comes with a necessary element: that there is a threshold of energy distribution such that convergence is unstoppable (once you get past the threshold).

L2 is also a measure space, and in particular (for use with quantum mechanics) a space of functions. In the communication space, we see folks mapping [-T/2,T/2] –> C (or Cn), specifically. One type of function is the probability function, a representation of a non-linear function perhaps.

In the world of decoding capacity-approaching codes, we saw how the encoder interleaves the information bits with 2 stream of parity bits done with the intention of creating a particular type of state space based on a finite state machine which generates transitions in a (implied) graph fashioned by the outputs of the encoder – one in which the resulting decision regions (for a decoder) maximise in the Euclidean space the distance of those coded points in the typical set to the decision boundary(s).

The nature of encoding is to bias the probability within the transition matrix for the state space to minimise the number of paths that correspond to low-weight hamming codes (considering the trajectories). Lets guess that the better-stated point is to get the underlying indicator function of the graph to be so biased such that one is over the L2 convergence threshold.

In order to bias the graphs weights one considers the design of the one-way function in DES.

Let’s assume that DES is a SPN – substitution/permutation network. Lets assume that the hamiltonian cycles “encoded into” the particular sbox values and the permutation matrix (leveraging the expander) are spreaders. They do the very opposite of what happens in decoding, where probability weights of say 2 outer-neighboring columns are coalesced into the middle term to infer the conditional probability mass function that models the non-linear field. As IBM-operational-theory implies, concentrations are diffused outwards, instead, until the distribution has gone beyond a probabalistic threshold that impacts L2 convergence.

One measure of the success of the process is that, indirectly, one has generated t-resiliency – which means that only a nested series of subspaces, and their nearest co-linear vector, over the codewords of weight less than d, d-1, d-2… can, by ANF constructions, act as a basis. This we get a series of factor products, as in Tunny theory.

Assume that the process has the “effect” that one generates the pseudo-bent and balanced features. Its not that there is some magic constructive formula for the latter, but the generated metric space can in effect have its properties.

Let’s postulate that the expansion and dilation properties of the state space generated by the convolutional-coders finite state machine related to the role of the hamiltonians which in turn play the role of the interleaver’s parity expressions and exist to generate the required trajectory weightings given a sampling of the plaintext and the one-way function of the plaintext at each round.). This all relates to the criteria for perfect diffusion

Let’s also postulate that the 2 different hamiltonians also align with the history-generator and future-generator (and the present, that spans the two).

Well… its not “inconsistent” with the previous #13 theories, at least. And it holds with the “history and timeline” of the generation of knowhow… from Tunny rectangling, rotor machine generators, and rotor machine SPNs. I seems to remember in and old theory contemplating the 2 main hamiltonians as probability spaces on the surface on a special kind of multi-dimensional ball, with connections between the two surface regions specified by the present connector graph.

May be worth looking at the non-linear transistor network of the Russian Failka rotor-era machine again to see how it may align with the notion of generating state spaces with the right form, supproting the right stochastic noise generator.

Interesting that one can thus look at the DES spreader as a noise GENERERATOR, where one requires a MINIMUM amount of noise, for the complexity requirements underpinning the encryption-function to work. Rather counter-intuitive to how one thinks of substitution-based ciphering, if you are 12. if one then consider the shannon limit, which is just SNR – a signal to noise RATIO, then obviously we don’t want more noise entropy than that which takes the log(1 + SNR) over the shannon limit. We want just a bit less than that (1 bitsworth of entropy, aping Tunny’s (1 + beta64)).

Posted in DES

## computational-mapping

Turing was as interested in biology as physics – a rather remarkable fact given his putrid English-public school education focused on fashioning the next upper-middle class for oppressing the lowly folk of a 19th-century empire.

Yesterday, we saw a picture from physics. It was a vector space, or “manifold for hosting such vector spaces” rather, that enables one’s  ‘algorithmic thinking’ to contemplate both future and past timelines of events at the same time. This is a critical (and for long-time suppressed) element of  cryptanalysis.

Grossly put, if the pointer into a list of real numbers inbound from a channel designated that the pointer is addressing the “middle number” then the ‘history’ is the sequence to the left; and the ‘future’ is the sequence to the right.  Of course such perspective is relative (since we have time-translated and are able to see the both the future or the history, and use both at the same time furthermore to re-define the present).

from Wicker and wikipedia

Discussing Pearl, Metzler gives us lambda (lookup) FROM parents and pi (permutation) FROM CHILDREN (which reminds one of SPNs!)

Concerning the light cones, remember that nothing in minkowski space is actually limited to merely 3 spatial dimensions! Concerning the “intuition” of the Pearl work, alpha is the “tunny bulge” for a Chi12 rectangle  (the non-linear bias between signs to be cast as a non-linear boolean function in an (L2) probability metric space). Then we have a product of the history (a priori information about the wrong case) and the future (the updated information about the right hypothesis). There may be several possible futures (meaning we have a permutation). Concerning history, we have a lookup from tables

Today, we get to see another of Turing’s interests from biology. It concerns “tree branching” – which comes to cryptanalysis as search trees, with particular branching factors, that enable one to map spaces in terms of nearest neighbors..

http://www.cc.gatech.edu/~phlosoft/voronoi/

Interesting to see Sergey Brin (also of Google search business fame) have worked in local search algorithms so closely related to cryptanalysis. Very “privileged” group were even allowed into that club (which you don’t get to leave).

We got to hear Gallagher talk at length on voronoi regions (leading up to differential quantization); and Forney too motivated soft-decoding based on retaining the “extra-quantization-level” information available from using such masks as decision regions (that minimize the probability of error)

Posted in crypto, early computing

## Tunny 2back, Convolutional 2back

In some ways is possible to look at the WWII German Tunny stepping machine as a (n,k,d) style code generator for 5 different values of N. One can look at NKD’s k as the K described in the General Report. And, one can look at NKD’s d much as D is used, similarly. The analogy is not meant to be literal (stop-and-go analysis might be closer) – but be illustrative that the purpose of stepper limiters is to create “distance’.

Since the dottage was so important to getting a bulge sufficient for optimal decoding, its also interesting to look at the other limiting (on stepping) X2. One sees how the analysts treated it: essentially as an additional impulse stream (with known character).

http://www.ellsbury.com/tunny/tunny-000.htm

One seems “limitations” in effect in the get-to-capacity world of convolutional encoders, including the example stream from Tunny!

Now, its interesting that coding practice was “held back” to linear codes from capacity for years, probably as a means to disable folks from using commodity chips to performing the same kind of cryptanalysis as folks with close-to-capacity algorithms and knowhow were using (albeit for cryptanalysis uses of coding theory, vs communications-uses).

Anyways, a Wicker presentation gives some more background:

Belief Propagation

Posted in crypto

## minkowski to tunny

There is little evidence that Turing, Newman and co in 1942 thought in terms of linear coding. However all the apparatus we use today was available to them. And, furthermore, the advanced worlds using those tools were known to them (being perhaps more interesting to the intellectual than something merely practical!). IN particular, folks were interested in Riemannian manifolds, and the light cone in particular – given its value in making special relativity so amenable to mathematical treatment.

Since there are lots of fancy words above, lets dumb it all down so we can still remember the principles , after tomorrow!

http://en.wikipedia.org/wiki/Minkowski_space

What is interesting to us is not the application, shown in the wikipedia introduction. We are more interested in the the pseudo-vector space that has pseudo-norms, pseudo-math properties etc. They are in the same class as the mathematically rigorous definitions, but “need to deviate a bit” to hold.  one the less, they tell the same story as the math profs which required full rigour.

We see below 4 concepts: bilinearity (which we think of as splitting a bit vector into two parts, with mutual relationships), a signature (of the splitting criteria), the N (of (n, k, d) fame), and similarly the k giving n-k (the number of non-space like dimension(s) of the manifold).

http://en.wikipedia.org/wiki/Minkowski_space

from the norm definition (a particular projection theorem recall, onto some spanning set), we have the interesting notion that the projection can be plus or minus 1

http://en.wikipedia.org/wiki/Minkowski_space

Something we have not seen before ( I think ) is the idea that once we have an orthonormal basis, derived from the orthogonal vectors in a projection geometry for some subspace (e.g. 3 space vectors), then we have “Sylvester’s law of inertia” – give a fixed “signature” in terms of the number of positive and then the number of negative (obviously now signed “unit vectors”).

Now where have we heard a version of THAT before!

Posted in colossus

## union bound estimate & pairwise probability of error in Tunny

In the 1980s, the likes of Forney were basking either in the work flowing from the US NRO (satellites for spying) requiring bandwidth-limited-regime coding design or the work flowing from NASA (probes for advanced surveillance R&D) requiring power-limited-regime coding design. After Reagan, folks become more academic – but with a strong engineering excellence. Can we interpret the “codification” into short-courses that “institutional knowhow” that took decades to condense? In particular, can we see any linkage between the 1943 Turingismus methods in totting up “signs-that-agree and signs-that-disagree” and the union bound estimate?

First some background on the world BEYOND Turingismus (and more akin to Colossus-supported breaks):

note this citation for all following quotes

See http://www.ima.umn.edu/talks/workshops/PISG6.8-26.04/forney/NDSlides.pdf

In his MIT video lecture from the same period as this 2-day course, Forney teaches engineering – how to measure the performance of a channel leveraging modulation and coding. From the world of signal constellations his teaching focuses on plotting the probability of (decoding) error against Eb/No – that 1940s measure of a “wave function” channel in terms of signal constellations, squared energy differences, etc. Plotting log of probability of error against the log of Eb/No (which is just SNR normalized as SNR/roe), he teaches why one uses the log scales.

Moving from physics to pure coding theory, his notes state

Of course, here we see k/n from the signature of (N, k,d) codes – the ratio of the size in bits (k) of the information content ot the packet to the length of the packet to which one attaches the checksum (n-k bits wide) – obviously combined to make (n). As the “Rate” of the code, his statement says k/n is bounded by the maximum channel capacity (for ‘reliable’ transmission only, noting that cryptanalysis need not need to respect such bound). Moreover, much as we saw log2( 1 + SNR ) earlier when thinking in terms of the physics, here in the coding theory world we see 1 + (x) and (y) – where x and y are the classical terms from 1890-1930s fischer/stats/entropy theory (applied to a symmetric crossover channel) and where x and y involve log().

But we must not think of k/n in purely math terms. Its also intimately tied up to physics  – or at least to harmonic topology theory. It’s the unit of bits/Hz, meaning that applying W hertz gives the (multiplied) unit-rate (of information-bit transfer) in terms of the bandwidth multiplier applied the channel.

Now, note just so far how a “repeat” rate (or sampling rate) is built into the concept set. In the American tradition, this harkens back to Nyquist sampling (etc etc) and the minimum distance (between neighboring points) required to ensure one can use Fourier estimation and L2 convergence for engineering purposes. But Forney builds a little more into his model, getting to SNRnorm and his infamous gap-to-capacity idea:

His video focus on teaching the history of performance gain originating from the modulation world

where one sees the value of both perspective in reducing the shannon posulate about channel capacity to a numeric quantity – based on the properties of a raw math convergence)

Independently of (1) the ‘log’ properties concerned with k, n and the probabilities of the cross-over channel, and (2) the ‘log’ properties of ‘ln 2’, and (3) the log properties used in the log-scales applied to each of the axis of the the baseline performance graph, one has the independent “human-factors” use of a log scale formulating “decibels” (or decibans from GCCS-era and their crypto-channels).

now, in some  throwaway remarks in his video talks, forney  notes how the performance curve and its log scales are “designed for” a world in which the “pre-constructed” math allows a couple of parameters to “shift up” or “translate left” the Pb(e) vs SNR/roe curve – parameters that align with characteristics of the channel that are indeed under the control of the design engineer. And here we get to the “union bound” estimate.

in any case, he motivates all the discussion by bringing it all to head by noting how certain geometries (in signal space) give a transitive (i.e. no coset!) symmetric group with a distance measure that aligns with the desire in L2 inner spaces to minimize the square of the energy (and thus for points and limits to be well-defined, distinct, and limiting – always). The norms in this geometry can thus be a representation of the hamming norm (for linear codes).

While this all cute, we learn that we are merely in one class of such arrangements (whether though of in hamming or in the equivalent Euclidean geometry ):

But, we also see that for any given class of representation for the hamming model, there is a cut-point – there is a threshold that is – in the SNR world above/below which convergence is guaranteed allowing one to separate out the noise from the information – when summing up sequences of contributing ratios.

Now Forney makes a leap, taking us into the special region of power limited communications – where special cases hold allowing the physics of wave functions to closely align with the discrete math formalisms.We see the classical hamming cube mapping onto a signal constellation geometry, linking one notion of distance to the other:

Now when discussing Turingismus, we noted how for an assumed set of depths a given bit in a bit vector (1 bit of a wheel pattern inferred from an assumed chi stream) one could compute “pairwise” the 2d-measure conformed  of the (i) number of same-signs and (ii) the number of different-signs amongst the set (of “neighboring”) signs found in a column with variable number of rows. In the union bound estimate rationale, similarly one computes the “pairwise probability (of error)” – one of the axes on the performance graph of course.

Much as Turing did in his fellowship dissertation (1932?), Forney discusses the erc – the complementary error function of the guassian distribution. Forney uses it to estimate Pb(e) from the raw parameters:

Note two things: (i) (d) lots of k/n  becomes reformulated as gamma, and (ii) one uses a new parameter calls Kmin in a ratio of that minimum to the space of k.

What is Kmin? Well it’s name says its all. it’s the “minimum weight”  -that “expression of” the inherent threshold that ensures noise/signal can be separated out. Were we not discussing t-resiliency at length, recently?

Forney goes on to complete the point:

Of course in Tunny land Gamma may be 1, meaning their test for Chi-ness ( which implies that they go the peer hold in K right) is entirely a function of Kmin/k – and whether a given sampling does or does give an “effective” coding gain number that fall within certain bounds that characterize a Chi stream (with lots of specifically PSI-induced noise, defining the custom channel and its parameters).

Now, what is interesting is that to be time-effective, one doesn’t wish to work with the UBE directly. One wishes to work with the “indicator” Kmin/k whose own character is a good-enough indication of the desired detector – merely detecting the “smell of Chi” in the specific presence of PSI-defined noise, recall.

For this one wants the Pb(e) from each point in the signal set to be lumped with all the others, “pairwise”, to find an “average”. in the Tunny case, each sign is a signal in a signal set, in a channel 1 bit wide, and thus we can do a pairwise comparison – counting up the agreement and disagreements. The ratio falling in the right bounds (on the average, of ALL bits in the assumption table, for ALL tables in the table set), indicates the marker: of the effective coding gain due to having chosen correctly the right peer hole in the PSI. Since there is a confidence limit in this estimation process, now do it again for a higher-tolerance bound based on detecting the signal’s character when an inference based on an inference showcases the desired marker.

Concerning a little history Tunny history “probably related” to this topic (but with different application), see

More modern material at at

Pairwise Probability of Error for Differential Space-Time

Posted in coding theory, colossus

## modern crypto for kids

Take 2 knobs and twist them. Imagine each is controlling the vertical position of the pen or the horizontal position. You did this with your etch-a-sketch when you were 3 years old , and many interesting designs resulted!

Now learn to so control the knobs so you draw a circle. (Its harder to do this than it sounds).

Go one step further in adding knob-control skills and learn to add a twist so that the former-circle turns out to be the figure 8.

Now transfer the 8 to a piece of paper, ad fold it over along the short-axis of the 8. Bring the fold back to 90 degrees. You have an 8, containing a twist recall, with a bend such that half of the 8-figure is now upright (and the other is not-upright)

Now shrink the upright section (and I had such a contraption when I was 10), so the length of the line traced around it is in some geometric proportion of the length of the line traced around the non-upright part of the 8-curve. If you imagine each half to be a apparent edge of a (3d) balloon in 2d, let some air out of one allowed to rise vertically!!

put a piece of paper behind your balloons, and spray paint it (with the balloons in the way …, holding the gun at 90% to the paper always.

For the adult teaching the 3 year olds:

In the classical (n,k,d) code, there is redundancy (which defines the code rate, too).

In the non-linear chaotic world,  flow lines in orthogonal spaces are embedded in larger phase spaces, redundantly, but with the condition that much as in (n,k,d) in which one has mutual relationships such a linear independence constraints so in the chaotic space one has a “contrarian” rule ; similarly! the subspace must be any one of several spaces all of which are equivalent (in the same equivalence class) in L2 inner space represented by fourier series  – under the assumptions of measurability and the requirement of countably infinite sets (and functions).

The redundancy concept that is semantically-equivalent to that from (n,kd,) is aligned with the size of the equivalence class (which Gallager tried to deflect, probably threatened if he didnt). The constraints that ensure decodability and also the fourier-transformability of L2 zero-measurable functions quantized using fourier series or quantized fourier-transforms of a series of terms from n random variables modeling the analog source as a conditional PMF of course depend on …

http://www.scribd.com/doc/21374148/An-Introduction-to-Control-Systems

and then

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

Posted in crypto

## imaginary portals and third spaces

In Galagger’s MIT Communications course (notes), he does a good job of building a comprehensive engineering-math model for communications. In particular, he uses advanced math as should an engineer both for modeling and for calculating solutions. In so doing, he teaches design method (vs a textbook curriculum).

Carlet also teaches, like Galagger building up from a space of L2 functions (that are integratable and measurable) into vectors – which abstract into the world of engineering-math even more than such a space has already abstracted from pure math. Lets turn off the math-brain, and turn up two complementary engineering knobs looking for the sweet spot.

also citation for further quotes, below.

Look at the formalism denoted blue-1 as an inner product, and blue-2 as an outer product. The conditions that each blue-x relate are condition 1 and condition 2 and they form strange loop, with only 1 holding at a time. One contracts, the other expands.

If we think intuitionistically (its holiday!), the inner product condition says that we have a generating relation, at the DNA level. Any offspring, produced by projection, must continue to project onto a subspace bearing the same constraint as the space in which the offsprings’s parents are operating. The condition is essentially that we are in hyperspace – unmapped by *any* relative-coordinate system. If we jump in hyperspace (having jumped there from normal space), we are still in hyperspace. In orthogonality terms, we are a space specifically not-described-by orthogonality – but which is also not not-orthogonal. We are in a third space (complex phase space) that is. We are measured by sign-? (to use a Tunny-ism). If we shape third space suitably, it doesn’t have to impact sign-dot and sign-cross too much. We can control the properties of the unconditionality.

Similarly, if we consider now the outer product, we look at its combinatorical properties as creating an interface between two parallel spaces, whose portal requires that one space has walsh function orthonormality while the other is third space.

Thus the strangeloop means that our ship is actually IN THE PORTAL, which needs its exit space to have the properties they do, that the properties said exit spaces are precisely those which create the portal, and a jump within portal cannot exceed the limits of the portal, which is not to say you cannot enter and leave.

Its within this improbable-duality portal that we have a world of limits based on pure order. And given order, we have direction, and thus asymmetry.

To enter the portal we have to transport up into the higher realm, but we must also only travel in a particular direction, around a cycle. While we can travel around in either direction, cyclically, the properties are not the same – for the field is not-symmetric. Any contra-indicating motion creates a tiny oscillator, which puts the two motions every so slightly out of phase, giving us a non-commutative algebra and thus the imaginary world of portals!

So, ultimately, we are coding the information of the plaintext into the coefficients of all the tiny oscillators, held in a stationary field governed by two parameters guarding each entry/exit point: the key and the plaintext/ciphertext.

Posted in coding theory

## Turingismus looked at as digital signature verification

At 20 I learned of the principle that from an improper random sequence (generated by an unsuitable simplistic LFSR) one could compute the seed value and the feedback coefficients. We can see the history of this technique in the attack on Tunny. For twice in Tunny history folks were faced with the need to compute the Tunny machine’s wheels’ patterns; several times a year at first, and then daily towards the end of the attack. Different techniques were used in each era.  We can thus ask: did the core method used really change? Did the core method really even change much since then, given 80 years of further analysis?

In the era of Turingismus, folks took the core intellectual methods learned from Banburismus and applied them to wheel breaking. Whereas in the attack on the naval enigma indicator-protection countermeasures (the bigram tables and wheel order changes) folks invented new decoding-style methods to tease out the bigram tables by predictive-learning methods that tuned into the underlying conditional probabilities of the source channel, by the time folks were attacking Tunny wheels they had available to them the by-then established and solid theory of “weights of evidence” that underpins even modern decoding.

The method of Turingismus looks at the length of key and samples it, 6 different ways. The 5 F2 individual impulse streams making up a character are each sampled at each of their natural periods : the number of bits (wi) on that (I’th) wheel. However, the F32 characters as a whole are also sampled, each 5 (parallel) bits of a key character – where key characters are , remember, elements of a long stream formed from the addition of the characters generated-by the chi and extended psi wheels, as they step.

When looking at a particular 1 of 5 impulse streams, one forms an “assumption table” – the stylized organization of “most likelihood”. If wi for i=1 is 41, look at the next 41 bits from the key stream as a digital signature on that wheel’s bit pattern. Form a table by putting the next signature (from the next group of 41 bits) on the next row, thereby forming up a matrix whose number of rows is obviously a simple function of the key/ciphertext/plaintext length mod 41 – assuming no padding or deception-countermeasures. To any communication engineer (trained in the MIT math tradition) this should remind one of something (where a particular wheel is an F2 frequency band giving supporting to a F32-wide baseband, and the wheel period is the sampling bandwidth of each band).

Folks are initially attempting to figure the Chi signal and treat the Psi as noise – to be eliminated. Given the infamous properties to the Tunny machine stepping action, folks sample the key stream at an assumed position where key is chi (with zero noise, from extended PSI). Since the bits conforming this character of Chi are each repeated for their impulse stream on the 5 associated wheel periods, one copies the bit at its repeating position. A little statistical learning logic (from the particular machine’s (non-linear) mechanisms) then allows one to fill out the remaining 4 bits of the F32 char anywhere in the stream where 1 F2 bit was inserted. Filled out as all 4 as 1 or as 0 (depending on correlation rules with the Key stream), one then treated said derived-from-assumed-Psi-non-stepping character as if it too was a a correctly decoded Chi F32 character, to obviously also be broken up into 5 F2 character each on their own period. Thus one could form up another set of 5 assumption tables … as the various bits are copied into the sub-streams of the even-more-assumed Chi stream.

Now the really cute part comes next; probably inherited from Banburismus theory. We can look at those 2 sets of assumption tables not as used typically (to guide inferences from cribbing) but as a digital signature that may or MAY NOT satisfy the rules on “good digital signature design”. This is a modern conception (and is not how folks thought, I suspect)! However, it comes to the same thing. For each table in each table-set, folks count in each column the weight profile (or perhaps, since its all non-linear, the “distance” profile) : how many times there is agreement or disagreement amongst the signs of the (n) rows; normalizing an n-dimensional vector thereby to a standard measure on 2-dimensions by pairwise comparison, note.

Totting up these measure values for all columns, and for all assumption tables, for each assumed-PSI-as-no-noise assumption (upon which the conditional probability graph-as-matrix depends) one computes a real number using what is a custom inner-product rule for a non-linear phase space. If the value falls within a region demarcated by above and below (within and without?) by a couple of (radial) distance markers, the tables have shown via their additive inner-correlations measure that they align with the measure from a Chi with a no-noise PSI contribution. That is… the initial guess on the no-noise position was right – if the correlation measure reduced to its kernel is “on the line”. (If not, start all again, and guess another place). Furthermore, if this bound holds for both table-sets, one gets the level of confidence needed – that this is not all due to the one case in a billion-billion that *could* be randomly-instigated. One has driven the experiment design and its associated “improbability” out and into the region defined by the tails of normal curves characterizing the confidence of any and all “statistical” inferences – which is where you want your solutions to lie.

The rest of the story is well known. One wishes to know the plaintext (the wheel pattern) behind the wheel-signature (represented as a matrix, now) ; and one can think of RSA signature methods here by analogy (where the public key is retained as a secret). First realize that each column of  the matrix is a decision region. If the evidence from the weight distribution is at least 2 bits from the boundary (and it doesn’t matter with which sign), treat THIS level of certainty as that that allow a deduction: the sign of the weight IS the bit of the plaintext (i.e. the wheel pattern for that bit position). Done 41 times for 41 columns, one is left with a bit vector for the Chi wheel pattern each of whose bits has one of three values (note) ; sign-dot, sign-cross, or sign-? (uncertain). An integrity test comes into play too: since only certain patterns of signed-bits (in what is actually a delta-wheel-i-pattern) would “integrate” and still satisfy the known constraints on (non-delta) pattern formation. Enough non-? bit can still identity in in valid (delta) pattern, note, allowing for dismissal of the run.

sign-? is the really big breakthrough. Now we take the known key stream (gleaned from depths, recall) and partly remove the chi-stream (one whose sub-stream – for wheel 41 – can now be computed by simulation of the real machine – albeit inaccurately – from the incomplete wheel) thus removing just a bit of accumulated noise from this “improving” de-chi.

Now repeat! for the next wheel (i=4), and its de-noised de-chi, given its embryonic wheel, inferred from the correlation constraints. And repeat too for the next wheel, till finally one gets around to doing the initial wheel again (taking advantage of the much-improved  (i.e. noise-filtered) Chi now available)

With a good Chi stream, one has a rather better de-chi stream, which gives a reasonable extended-PSI stream by subtraction from the common key, from which the removal of the extensions gives a PSI stream…from which… one can all the above again (now for the PSI wheel patterns). Or.. there are Testery methods available, not well disclosed even today – relying on cribbing-centric use of assumption tables.

So, as we finish up, lets focus on what interest to us today. And that is the way in which a custom inner product was defined, creating a real fractional number whose xyz digits (.xyz for dimension #1, .123xyz for dimension #2…), using measure theory for nested intervals and L2 functions in general.

Posted in colossus

## System Center 2012 SP1

One of the things Ive been waiting for was the cloud-centric system center suite to be made available to both run on windows server 2012 platform and to manage such platforms! Released late December 2012, SP1 of System Center components seems to achieve that goal.

So HOW HARD is it now t o have SCOM manage by Azure images, web roles and VM running 2012? and HOW HARD is it now to migrate VMs from Azure to my private cloud, using AppManager (which I think of as an add-on to VMM manager)?

These are the questions we pose and use to motivate some basic experirments.

So, we install on a new Windows 2012 Server Enterprise edition VM (with 4G of RAM) Sql Server 2012 (not SP1, note), and the SP1 edition of System Center VMM 2012. The first improvement noted is that we can use the file server of Windows Server 2012 itself as a “managed storage device” (in addition to acting as a traditional file server):

On the fileserver, make a virtual disk and expose it as iSCSI target to the VM hosting the VMM server:

On the VMM server, create a storage pool in the “primordial storage” – having mounted the target via iSCSI client, having made the target online in disk manager, and having initialized the disk.

Having create a virtual disk within the pool and then a volume, we expose the an SMB share tuned up for hosting VM images in hyper-V

IN VMM console, we can add our VM doing the SMB hosting of the share as a “storage array”

giving us a “managed” file share (alongside a local share for blob storage).

Next we try to install System Center’s App Manager – which gives us a tuned up view of private clouds. Its interesting to note what it installs – giving an guarded API (i.e. powershell comands backed by webservices) for management of the private clouds

We don’t get very far as the installed server will not cooperate with the development edition (we presume) of SQL Server 2012 or, 2012 SP1! Honestly, Microsoft! Its listed as supported with SQL Server (but not the developer edition!?!)

Posted in system center

## Multi-permutations and factor chains

A rather long time ago, we wrote about latin squares in indicator-style key agreement

https://yorkporc.wordpress.com/2011/06/05/more-on-latin-squares-moving-beyond-bigrams/

Built on the foundations of good old Vigenere (square-based) transcoding, its interesting to see variants of the problem applied to key management, too.

Its also interesting to see the underlying math lying at the heart of even modern arguments in the domain of crypto:

(citation for further quotes)

On the need for multipermutations- Cryptanalysis of MD4 and SAFER

building the model and getting use to the formalism, we see at the baseline one

A different slice on orthogonality is captured next focusing on completeness and “zero-measurability”, contrasting with the usual geometric perpendicular ideas:

The motivating concept for multi-permutation was built around ignoring certain components of the vector, with no impact. This all gets nicely formalized next, leaving behind the operational concept:

We start to see an intuitive rationale for the compression function, of ciphering – as being that which produces the required collisions (or rather  the space of non-collisions) IN ORDER to construct the orgononality conditions one is after.

Now we also saw some of this in the advanced material of the Colossus/Tunny wheel breaking, concerning accurate (vs crude) convergence, post-rectangling. There the issue ensuring that for sub-fields of the two wheel patterns, one would build a bi-linear constraint set in which each individual constraint constituted a particular factor, n of which (one for each combination of columns) create a factor chain leading to a unique scoring system.

Now that revisionistic view on 1940s thinking reminds to to focus on an earlier 1930s idea – the math version of the discreteness of energy quanta, in 1930s physics:

We see in the last phrase what we have seen before: that what one wants is a world in which the cryptanalyst is inherently denied parameters that are outside the system. This is the trapdoor idea, of course.

Struggling with this paper shows me where I need to concentrate in year 4 of a 5 year crypto course! But at least I can see how the history develops to this point. One is seeking to construct a unique scoring system, where the unit is so small that it takes greater than brute force to distinguish paths that are indistinguishable within that space.

Posted in crypto

## hypergeometric sampling, Profs book for breaking naval engima

Polya’s Urn:

http://www.math.uah.edu/stat/urn/Polya.html

Sequences of random variable

http://www.math.uah.edu/stat/bernoulli/BetaBernoulli.html

conjugate sets of distributions

http://www.math.uah.edu/stat/point/Bayes.html

Posted in coding theory

## Turing-ismus–and the formation of Turing

Turingismus can be explained in more modern terms – now we understand what 1930s mathematicians were thinking about. We can also explain in a manner that contrasts with how most folks performing the procedure thought at the time. Time has evolved our concept set so now we can now peer into the minds of the genius class who designed the method and reduced it to procedures that any monkey could follow. Now we can follow their thought train and peer into their intuitionistic reasoning – using modern concepts for what were early notions of sum/product decoding.

If you have taught yourself the  properties of the central limit theorem, you have grasped the idea that inner product spaces can be probability measures. The inner product can be a projection of one probability mass function onto another, in some sense. Similarly, if you have been taught real analysis, interval theory, L1 & L2 spaces and multi-dimensional polar coordinate systems, you can see a such measure, being an interval or contained within an outer interval, as a component part or collection of such components of a vector. If the vector is a point, one first represent the point on a particular scale in a highly truncated region of the line as (sinh, cosh), and one might assign a measure between 2 such vectors in terms of the angle between them. The notion of angle may take numeric form only as a point that wanders along a hyberbolic curve

If you are trained in permutation and substitution group theory, you might be found looking for a magical generator set, whose content when combined with the nature of a particular “action” algebra gives you a type of programming apparatus: with inputs, outputs, and generic programs . One such apparatus might be the 1890s permutation algebra(s) whose rules were themselves permutation expressions. Such an algebra might act to alter the formal specification of a permutation given as input to output a modified permutation specification, consistent with the transformation rules defined for the algebra. Since the rules of such an algebra are themselves merely a set a permutation expressions, as is the input, one has programs acting on programs.

Now permutations have a very obvious feature – when represented in cycle notation. They have cycles of various lengths (and “characteristic” invariants computable upon such lengths) not dissimilar to the kind of (Huffman source encoding) lengths that pre-fix free codes might have, computed using an algorithm founded on measure and interval theory – and their extension to sequence of points represented as intervals whose sequence (of intervals) may converge to “condensation points” – leveraging a kind of “limiting distribution” that look at any particular instance of differentiation as a kind of distance measure that may be done in terms of one of possibly many directions (one per dimension). There can be seen to be an algebra of differencing, and thus an algebra of such algebras. As with permutation-centered “programming  types” (functional calculii) working with cycle lengths and conjugate classes thereof, differentiating “programming types” may be working with difference sets (and…weight classes).

Now permutation groups and difference groups  share something. When using a representation basis of diagonal matrices one can consider the nature of each row. Perhaps it has even parity, or a even number of cycles of a particular length. Perhaps its expressing the same conditions in essence as a walsh or hadamard matrix. And for each, perhaps there is a kernel idea that enforces the rules of such matrix (row) formation – a kind of global defining-constraint that  bounds the evolution of expressions, and meta-expressions alike.. For permutations, all expressions when alternatively represented in their univariate form may have whose power terms that are required to sum to zero. Or, for difference sets, the motion of difference measure is constrained to lie on a particular hyperbolic arc tracing over a very limited set of “kernel coordinates”, defined in terms of points (sinh, cosh).

The kernel of a permutation algebra, a hadamard transform or the measure of a difference set all share the property that it codes the meta-rules that transforms rules that are themselves possibly further meta-rules – in a manner that is consistent and decidable.

getting back to turingismus, folks used the peep holes of delta-PSI to see both factors contributing to a key byte, given the stepping action of the Tunny machine. But more than that, they used conditional expressions and aligned linear programming techniques, augmented with colossus-based counting when helpful, to define decision regions in probability spaces – over which might be drawn a myriad of different regions. The trick was to reduce things to a single dimension through projection techniques, in order that a single decision threshold can define a binary split between one or the other region. Since this loses so much information, the lack of which may be critical to overcoming the noise of any random guassian process and its erc() function – the like of which one studied when looking at the central limit theorem and its relationship to experiment design for determining causality – one needs to go back and use what little one has reduced to its kernel form to improve the data to which one might now look again, now with marginally less error signal component – filtering at each stage to produce a cascade of multi-dimensional decision regions – one per (projected-onto) dimension/subspace.

A final analogy with turingismus with modern math concepts comes from the very notion that the kernel of a cipherstream for Tunny is .. of course… for bit patterns on each of the wheels. Turingismus treats each wheel independently, sometimes treating the field as F2**1 and sometimes as F2**5 as the kernel-constraints of probability-bulges demanded, as the conditional expressions used to compute the final parameters of a conditional probability distributed ranged over impulses, combinations of impulses, and combinations of impulses in different K, Chi or Psi streams.

The very core of what we now now of as turbocode decoding is the computation of the conditional probability distribution of the channel – an iterative process that has as its side-effect the decoding of coded values. Any such distribution can be seen as the condensate of a sequence of “probability intervals” – the component PMFs of each tunny wheel of course. It’s also a general form of a particular condensate of a stochastic process conformed of n independent random variables – the typical set (in American theory). It’s not hard to then turn this around conceive of machines, permutation stream generators, that when programmed by a kernel in the form of a rotor upright might act as a LDPC encoder, producing a converged distribution – one that appears uniform, even.

Posted in early computing

## hyperbolic cosines in 1940, and log log scales

http://www.sliderules.info/a-to-z/hyperbolic.htm

Posted in early computing

## hamming weight increase, reliability decrease. BAC, Colossus and Sigaba

The particular colossus electronics had a bug/feature in its amplifier designs that made it unreliable when presented with long sequences of all crosses. Thus, folks used conditional expressions (probability bulge, Fourier transformed or not, of an expression that defined a subspace or “region” of decidability) to count long sequences of dots. By duality, constraint expressions can be transformed from one bit labeling perspective to the other making the “bug” of Colossus into a feature – since it happened to then force asymmetric coding understanding.

The world of binary asymmetric channel analysis also concerns itself the reliability of particular transitions between two binary digits (or particular direction of the shifts one may perform in the q-ary generalization).

(covers all further quotes san citation)

In particular one can look at the relationship between valid codewords and those non-valid codewords with (correctable) errors:

This is an very intuitive use of the B(x) convention – since now its about reliability since the ball covers all the reliable codewords. (We also remember B as the generators used to formulate cosets, given the action of a particular modular group, too.) But, also note how we are relating bit patterns – sub-patterns rather – of input vectors to their output forms, as constrained by the weight of the input. We get to see clearly how the cross-over probability parameter AND the probability of tolerable decoding error parameter combine to form an effective model; intuitively addressing something as basic as a colossus-era (electronics-based) reliability issue concerning handling crosses (vs dots) due to early-amplifier design properties!

So, we see how B is a partition function applied here to denote those several error words all being mapped to only 1 true (corrected) code-word. The reason for the non-intersection is intuitively obvious — in the BAC world. More widely, we should recall the relationship between the indicator/membership/constraint-expression, the related property regarding weight (1s…) being being nth-order correlation-immune, the concept of perfect diffusion, and the work addressed in a topic treated earlier: resilience to at most t asymmetric flips. We can see this as a special case of what follows

But there is more, to recall concerning the model of what flips happen in the input do to the output – relating number of such flips to minimum distance, WHEN input and output are concatenated and are treated as a codeword in an (n,m) boolean function.

Now we also saw, concerning BAC probability models which have two cross-over probability parameters, how the amplitudes of cross-over probabilities are mutually constrained, inherently by “conditional expressions”. And, furthermore, we saw that if we have the constraint models the wrong way around, one can flip the bit model (making all crosses dots, and vice versa, for input and/or output bitvectors) so that now the invariant defining the decision  regions satisfy the desired convention. Of course, this idea was built into the Tunny break method, with folks accepting they were working “inside-out” (not that it matters providing one is consistent) until one applies a last step if necessary to do some final flipping.

If we summarize our toolkit even back in Colossus days, folks had weight classes, they had conditional expressions in computational graph models that configured a decoding/search machine, they had differential transforms in a transform space bounded by perfect diffusion,  hey had the sign basis (which is just the case of theta=pi/4 of the circular polarity basis), they had an unreliable machine that forced them to understand its own inner-resistance (as the battery of a elementary circuit), they had their  very own BAC channel that composed with the channels they were interesting decoding, that had “indicators” graphs (the settings!) with fourier coefficients that could be seen to be broken out as the fourier coefficients found in ciphertext depths (induced from rectangling), etc.

If you look at the American side of the house in the same period, they had figured sigaba’s use of substitutions and permutations network as (parameterizable) “conditional PMF” making machine, with control rotors selecting certain bands of frequency, and the permutation then mixing up said bands – as a function of the permutations of bands induced by the indexing rotors – to formulate a second stage histogram (the conditional-PMF) bounded by certain constraints imposed by the generator – the fixed wiring.

Posted in coding theory, colossus

## segment, search and cryptanlaysis–Tunny to AES

The following sounds rather like the process used in the Tunny settting (when folks first used a particular conditional probability for a pseudo-channel to figure the Chi1 and Chi2 indicator values, then a second conditional probability for Chi4 and Chi5, similarly). In the 4/5 case, folks are using a second stage decoder, that depends on the having found the decision region using the first stage addressing Chi1 and Chi2).

As the General Report on Tunny made perfectly clear, the limited speed of their calculator (colossus) when set to do searching meant one had to break down the problem into time-computable problem sets.

https://citp.princeton.edu/research/memory/

What is interesting about more modern language is clarity of the phrase “order  of likelihood”. In at least the academic models portrayed in the Tunny report, the channel parameters are pre-computed for a setting run (the estimated sigma from the length etc. given pdfs that approximate normal) such that the most likely stands out beyond much doubt.

Posted in colossus, crypto

## binary asymmetric channels

I’m not sure I’ve ever studied the “binary asymmetric channel”. So let’s briefly review the paper quoted and cited above. I’m mostly interested in the model (rather than their coding solution) and its relation to standard decision models for decoding (MAP & ML).

We see how the equations constrain the decision region, clearly:

For me whats most interesting at this point is (a) the use of the I() functions, for which the argument is a post-colossus “expression” (or run)

and (b) that much as in differential trials analysis, the BAC can be concerned with hamming weights (and one sees how each of d11 through d01 are the analogue of the 17/16/18, above)

We recall from Gallager, the topic of pairwise contrasts (for the n-dimensional vector, each with q symbols)

And we see the basis of the Princeton paper: https://citp.princeton.edu/research/memory/

Posted in coding theory

## Feistel ladders, probability of error, and error correction in DES subkey memory

Let’s say we have a bit  vector and we apply it to a function that considers the bit values in each component. The result is an output bit vector with slightly more zeroes than the input. More precisely, the value of any component from the left half “PLUS a bit or two” is complemented /shifted and those of value zero (i.e. whose input is 1 in a binary field, or q-1 in a q-ary field) are duplicated over to the complementary position (now complemented/shifted) on the right side – providing it doesn’t overlap with the “PLUS a bit or two” region. If it does, just drop it.

We sample the left side that is, and use its measure to bias the bits found on the (slightly smaller) right side.

Now the key observation is the left and right sides are only metaphors for 2 normal curves, each a conditional probability, centered around the antipodal values of a signal set. i.e. {–a, a } for the 2-ary world. And the half PLUS A BIT idea represents the threshold between the poles, a function of the ratio of the likelihoods of the two PMFs (which reflects the ratio of the priors). Since the two curves overlap, the threshold being slightly positive means that the area Al under the curve for –a that is ALSO under the curve for +a AND is to the right of the threshold has less area than its pair. In classical communications theory (i.e. gallagher), it’s the probability of error, of course.

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Now move the threshold so the area Al  becomes small – really small (1 bitsworth of entropy infact ) – and then repeat the experiment to populate extra zeros – inducing a certain type of dependency –  on the right hand side of thesh-hold (the positive pole). The bias is a function of the probability of error, which controls the dropping rate, acting as a constraint or “catalyst” agent. If it helps to think about semiconductors, do!

Let’s not forget that we now swap over left and right halves and start again – in a Feistel ladder.

A nice usage of this model is then in cryptanalysis (if done properly):

https://citp.princeton.edu/research/memory/

via cryptome.org originally (posted while he was in a rare non-vainglorious mood)

Notice how the model is cumulative, using the multiple copies of the key bits stored in a persisted key schedule to allow easy accumulation of the evidence(s) per original bit. At first blush, we are really in a world of

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

But, since in the particular model (of multiple copies of the original keybit), one might better exploit multi-dimensional forms of the same thing, where each hypotheses (of M) codes one of the copies of the big from the key schedule stored in RAM

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Gallagher reinforces the use of a geometric inner product space (albeit with the restriction to only 2 dimensions, to start with), indicating that the decision problem is fundamentally a projection problem – one that easily generalizes to n-ary vectors – using an argument that REALLY ought to appeal to a classical-educated architect (such as my co-worker).

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

Now (shush). don’t we recall the cyclic nature of the extended RM* codes?

Anyways, correct standard model theory concludes with a call to simply work in the Log-likelihood plane:

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.

The model of multi-hypothesis MAP/ML is really not applicable to the Princeton group’s cryptanalysis of DES subkeys stored in RAM cells (since the 14 replicated bits acting as 14 hypotheses/decisions are not mutually-exclusive, as is assumed in detection theory trying to pick 1 from 14 alternatives). The model that fits much better is, of course, the Tunny wheel breaking procedure and the “rectangling” process used specifically to get a “start” for the Tunny break. Having accumulated a combined weight from the contribution made by each period for each position, there one had a “repeat rate” – albeit one corrupted by a noise signal –  and of course we from the multiple pieces of summable evidence one could infer the some of the keybits  (“get a start”) thus correct the rest of the errors by less than brute force search.

We  should also remember that in the colossus-era version of MAP, that evidence from a detected bias in a particular trial (posterior for 0 is greater that posterior for 1, say) was accumulated only when the magnitude passed a particular threshold (or certainty). In this way, the errors due to randomness itself were not allowed to accumulate and overwhelm the signal.

Posted in crypto

## supports, walsh transforms and reed muller code sets

Susskind taught us the math for quantum mechanics – and the role of the fourier basis as an “Conjugator” – allowing one to see the duality of space and momentum measures. Furthermore, we were able to comprehend circular polarity – and the riemann manifold. in the course of that, we got to see modulating ovals that capture the notion of “motion” and evolution in phase space.

given we have the trace operator under belt now ( from quantm mechanics, quantum shannon theory, and good old linear algebra and eigenvectors/values), we can capture RS and RM classes of codebooks. in much the same way that one doesn’t wish a cryptographic function to be approximated by a set of linear simultaneous equations, by extension for the case of perfect diffusion we don’t want a superposition of some n-dimensional flats to approximate the relationship between the supported of a function and any L2 function so supported.

To this end, Carlet gives us, focusing on cyclic code, simple definitions of Reed-Solomon and Reed-Muller codes – in terms of Boolean functions:

Reed-Solomon (and BCH):

Reed-Muller

both quote sets from

so we see that reed-muller (which Forney presents as almost a passe topic for coding courses) is very much at the heart of the crypto world – in the sense of acting as a reference frame.

We have to remember in the era of Colossus and of wheel breaking how folks would work with only subsets of the chi bits (only those that had “sufficient support”) as they sought to converge the proposed wheels to their maximum correlation with the pseudo-depth created from the overlaying successive periods of ciphertext.

Taking f as a generator set for R(1,n) as a group/graph, we see how a partial order can be defined on the resulting cosets/orphans!

The topic comes full circle back to the Fourier basis (and the permutation “function” of this basis)

Posted in coding theory, colossus, crypto