more on Signin

image

https://yorkporc.wordpress.com/2011/04/29/nsa-signin-via-cryptome/

We have more context now to understand the technical information released in the signin disclosures.

Let’s assume the phrase “mixed wiring” (for the separator) means that unlike all other (removable) rotors, 1 input from the left maps to multiple outputs on the right.

This gives us a system with a sub-system on the left and another on the right – of different concepts and design contributions.

On the left, we have a character input being transformed using a compound rod square, where each conjugating rotor has as its stepping “motor-rotor” the one to the left rotor. The motor concept is one in which, for each 5 transitions FROM a notched position in the motor, the controlled rotor steps once; being otherwise “limited” by the motor. There is ultimately a “periodicity” of stepping, built from a cascade of notch transitions. This machine is NOT like Tunny (trying to create an aperiodic convolution ) and it is not like T.52 (using impulse permutation to interdict plaintext-cribbing).  Its much more basic and classical. On the other hand, its rotor cage concept is more “involved”, and feels (when combined with the concept on the right) like a mini-Sigaba concept.

On the right, we now have more than 5 inputs – due to the “mixed-wiring” effect of the  separator. Presumably this is Rowlett feature, reusing a little of Sigaba deign theory (in a much smaller machine!) in order to create a stream of PMFs that input to the right hand side. This contrasts with the left hand side, whose inputs were “characters”, note. In Sigaba, the PMFs control stepping. Here, folks are creating an additive keystream directly (rather than creating a “motor” for the sigaba enciphering rotor cage).

Now the mechanism on the right is two fast rotors flanking a couple of enigma-rotors. Recall that an operator is to reject any indicator (conveyed under an orthogonal latin-square indicator protection cipher) that would set the 3 fast rotors to the same setting. The purpose of the fast rotors is to stecker – and thus multiply the period of the next inner-rotor.

What you see is someone in the Rowlett school still dominated by sigaba type theory, even as they all play with feature of the T52 captured in N Africa. However, note how their instruction for permuting the “band” of output didn’t feature what the T.52 had (which interdicted the compromise of long plaintexts from breaking the wheels  – which NEVER change in T.52 (but then they didn’t need to, given the premise of the function and strength of the permutation generator). In both Tunny and Sturgeon, the 1930s-era German designers are clearly thinking more in terms of modern  theory (of convolutional coding, etc); whereas Rowlett and co are still thinking in terms of PMFs. Of course… we also know which theory won that round of the cipher wars, though!

Posted in crypto

Turing’s next engine; beetle research

In his 1950 works, its clear Turing was striving to define his next class of universal computational engine. Another theoretical machine (necessarily), it was to capture the principles of the computational processes only discovered in that timeframe. Since there was considerable overlap with crypto and ciphers, was that enough for the US to kill him like anyone else who didn’t contribute to the state? (yes…!)

In his rambling but entertaining book, Mainzer gives us a highly intuitive comprehension of non-linear science. In the part devoted to the period of Turing’s 1950s work, he builds a model of life processes – from a diffusion reactor. This engine is a step ahead of the improbability drive, and falls into the same category of theorizing. Of course, in crypto, such is not mere theory.

image

image

image

I find the theory very much a “crypto” model. We have channel coding gains, losses due to noise, and errors due to incorrect copying. We have random variables that are proportions, that calculate locally, but whose calculations are then subject to re-normalization to the averaging computed over all other members of the rv set. A motive is given for the macro-level constraints for the set of RVs as a whole in terms of conserving the energy. Then we have at least 3 different types of generator scheme, one for producing energy itself, another for fabricating calculating engines nodes, and those running calculations through the engine. The self-reproducing property knows how to at least replicate those three groups multipliers. Self-reproduction is due to the self-balancing diffusion of concentrations, which gives rise to the micro/macros interactions that drives the dynamic graph-based computation theory.

One sees Turings’ crypto experience all over this, down to fixed point attactors found in engima-era beetles!

Posted in crypto

a Turing text book

 

Turing’s university text book – prep for “principles of communication” (and crypto):

image

image

image

http://archive.org/stream/calculusofobserv031400mbp#page/n19/mode/2up

you can see the genesis of the thinking (and notation) on Tunny

image

We don’t think too much about difference tables and interpolation these days (unless you first tutor was called Samet…). The general form can be reduced to (tunny impulse) bit strings:

image

just look at 10110 (the first 5 digits of my binary value u) as 1.0110, where the 4 digits of the decimal are “4 digits worth of accuracy” (requiring 3 difference columns to describe it).

Remembering Babbage’s difference engine (for formulating such tables), we can even see a bit of a turing machines reader header (moving left and right on the tape helping out print out the tables) as E-1, and E:

image

Of course, in binary, this works quite LITERALLY (where delta is xor). Its interesting to to see the title of the section “symbolic operators” – consistent with (i) operator theory of the era, and (ii) the focus on logical foundations too.

One can now sort of see Chi1 [m] + Chi2 [n] = 1271… for Colossus scanning , etc:

image

its makes very “intuitive sense” that (1 + delta) to the power x (times a function) is  now a “specification”, where x is the movements on the tape to help “generate” the magical multiple (a log…)

interesting to see logs (and probabilities of error) and Turing tapes/reader come from a 1925…book! Even 1+ delta (the american AH form of BP’s (.5 + bulge)). As we say, power series (which is what difference tables are all about) come naturally to terms of the form (1+x).

Its also interesting to see how Turing was taught to think about probability (in terms of differences, and newtonian methods). There is no notion of probability per se – its just a distribution function, to be used in newtonian differencing methods (which iteratively calculate integrals, without using algebraic transforms).

image

Of course, this all comes back when looking at both communiction theory (wanting a numerical solution to conditional probability modeling) and thus crypto channels between wheel patterns and depths…)

While we cannot really assert an “instrumental” connection bets ween the following and the 32-count of the colossus-assisted method of attack on Tunny (paricularly since in WWI cryptanalysis folks already applied frequency spectral attacks)…

image

…we can perhaps note how Turing was oriented by the *argument* – that the deviations are the sum of lots of smaller deviations (per factor). Of course, we saw him playing with these ideas in his fellowship dissertation

It takes a little getting used to looking at calculus formulae and “thinking” probabilities! But, armed with hilbert spaces and L2 functions, its not too hard!

Posted in crypto

Nash life neighborhoods pointed out by cryptome; Talbot and Welsh

image

2013-0071.png

One of the fun things about real analysis is that it teaches one that the Fourier transform acts on formulae, much like a programming macro. It builds in one an easy intuition that such transforms are acting on functional forms themselves – in the style of Laplacians in general. This contrasts with schoolboy intuitions about mere fourier series – being merely a sequence of sin and cos terms that sum to a number.

Now given we are talking of sums we reflect on what we have learned: that the sum may be a sum of tiny functions rather than numbers; and said functions may be probability mass functions. They can still sum up (in a functional adding calculus). In particular, a mass function might factorize a density function – describing an independent sequence of bits.

In the Nash concept, in a quoted pictured displayed above, the designer’s combiner algebra builds upon the idea of a functional calculus. IN his concept, he partitions a whole up into tiny functions by treating and computed “neighborhoods” – the functional forms for probability measures, or functions thereof. To this he adds associated notions of distance and energy. Of course, this is all modern crypto – playing with indeterminism.

With all this in mind, one can conceive of a general cipher “method” as one that exploits an automorphism group (e.g. that of the fano plane). In support, such a method might call for generators -of automorphisms. These are a type of functional-program that chooses the order of functions. It permutes them in other words, or calculates a function that does the permuting.

In the cryptome-distributed book from Talbot and Welsh on Complexity and Cryptography, basic computer science is taught (in the style suiting math students). In one part, we get a good analytical presentation of what we are trying to say, above:

image

 

image

We must remember that the  classical Turing machine’s tape is split into 1d list of squares , but ANY single grapheme can be written into a square. The particular grapheme could be a “single” 15-digit, multi-dimensional vector for all it matters, encoded in the 45 strokes of a single Japanese character that symbolizes that grapheme-represented value from all others in the same value-type.

Using Turing-era terms from his crypto period of 1939 (vs his coding period, of 1932), assume a permutation group identifies the classical cycles within an enigma rotor’s alphabet/upright. Such values may be multiplication or conjugated rather, whose composite value has factor cycles. Expression tress exist, where one considers each “terminal node” in such a factor graph to be a small permutation mass function in a markov space – where a cycle such A-D-T-Y is now a trajectory description within a state space bearing a probability measure, etc , where the rules of the space are defined by a particular turing machine who state sets, i.e. equivalence classes of states, control different type of convergence and limiting functions.

One can match all this apparatus with (log) likelihood thinking – a particular type of measure. Having modeled the data sequence of plaintext in terms of its “intrinsic” log-likelihood metric, can one now express a “metric” (of an input bit-sequence) as the “component metrics” for each bit, as re-ordered in a manner controlled by each roundkey according to the permutation-program generated by the previous round, and whose output is a permutation-program for the next round that permutes the permutation of the state!  (Isn’t permutation group theory wonderful; all math concepts being… a  permutation!) In computer science terms, the oracle can replace a term with a program that computes the term. The trick is to ensure that the means used to compute that program (a permutation) is itself polynomial in time-complexity, and whose execution also uses polynomial time – to get to turing-reducibility.

For Nash, he uses the infamous life neighborhood concept to generate such generators – aiming to produce those conditions that induce the fractal conditions (defined ultimate as inducing perfectly self-replicating patterns).

If you just think in terms of permutation groups, it’s really a lot easier – as that notation lets any terms be a lambda expression – being an upright cycle, a fano plane design, an automorphism, or an expression in the automorphism group. Its important that any term is also a  function, which is also is a term (whose transmutability is the basis of the church/turing/kleene lambda calculus, of course)

Now, in Turing’s time, with L2 spaces (a space of mass functions from [-T/2,T/2] –> Cn), the indeterministic algorithm would factorize the density in terms of L2 decomposition. For Nash, he wanted to move forward to consider also the energy distance assigned to (or generated by) different neighborhoods. Thus he used a permutation-generator to choose which neighborhoods are applied, and which at the next stage of processing act to “generate a further permutation program”  that chooses the next round’s neighborhoods – for each point on that next round’s plane. Thus the energy decomposition is a function of the distances being “extrinsically generated”; and ultimately this all drives the fractal generator to converge.

You see the same thing in DES, where each coset generator is working on the dots or the crosses and estimating its extrinsic metric, a vector of real numbers (logically) that then go into a convolutional code generator to spread the distance from the estimate to the real metric. Multiple rounds generate maximal distance in the dual coding world inhabited now by a cascade of error terms, creating a unique noise-sequence that  is not itself describable in terms of orthogonality or non-orthogonality – being a dynamic sequence!

The welsh book stays away from anything too controversial (and useful) to actual crypto! It doesn’t give any crypto-centric examples of oracles in the wild (such as those used in the Tuuny era, during which reasonable-time estimation could do in time t, with confidence level d, what could not be done by brute  force in the time available. And it certainly does apply that to modern realtime analysis of security protocol messages, to exploit their (IETF designed-in) flaws (the whole IPsec half-IV scandal, to use an CBC oracle)

Posted in crypto, DES

fried fish

Written in March 1944,

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf001.htm

Its interesting to note the American (army, not navy) observer’s throw away line.

Reading this OCR, rather than the manuscripts, is like code breaking – given all the missing pieces!

We see the core practical nature of the orientation, teaching the knowhow behind the core skills:

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf001.htm

This implies BP training was drill based, not academic.

Next, we see another throw away remark, implying AH is very much involved in Tunny analysis (itself)

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf001.htm

We see that the monthly (at that time) breaking of Chi wheel patterns used methods know to the US – being turingery of course. Let’s assume that this refers to the detection of random vs non-random signatures in streams.

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf001.htm

NExt, we here the author implicitly comparing what he sees concerning the complexities and the reasons for with things he already knows … rooted in key management:

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf001.htm

In motor breaks relevant to a stopping/stuttering effect, one sees reference to “old” US skills, and the use of erasure channel theory:

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf001.htm

we see the log scale and bulges reasoned without respect to tunny (just raw dot excess):

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf006.HTM

we see the rationale given for using the 1 + lambda form (vs .5 + excess):

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf007.HTM

we get the american/Lincoln power-centric way of thinking: terms basically of s2/n for probabilistic decoding:

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf007.HTM

interesting to see someone clearly familiar with the American practice mixing newly-learned british terminology with what he his local vernacular:

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf007.HTM

This next set of phrasing tells one that the US is not reading the signals from the wire, and the techniques described were not gleaned from thinking about tunny specifically.

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf008.HTM

concerning motor rectangles, one sees some of the core design logic (based on column matching) – treating the components of wheel pattern as a channel:

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf011.HTM

One sees the authors assumption that some US machines can do “3 wheel runs” (count …) and that the terminology being used is not unfamiliar to his US readership (with little UK terms familiarity, no doubt).

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf014.HTM

We see reference to the “film machine” – irrespective of the US mecans of generating tunny streams using their geared thingy:

image

Posted in colossus, crypto, early computing | 1 Comment

cross here, cross there, dot everywhere

The following  made no sense, when I first read it – a year ago

image

http://www.codesandciphers.org.uk/documents/wffishnotes/wf001.htm

The text follows a description of those at BP who were taught Testery mental arithmetic. For a few hours, I then could not fathom how the method stated in the material quoted matched up. And of course, it doesn’t. I wrongly assumed that the two trainee groups were learning the same method (by two different teaching techniques, befitting folks’ aptitudes). Of course, the second group is being trained for converging rectangles!

What a difference a year makes!

Posted in colossus

cryptome not pissed for once; actually funny

http://cryptome.org/2013/01/spooky-emanations.htm

its hilarious (and well done).

The NSA facility has similar issues – since the architecture and the location geography affect the socialization aspects of movement – and thus offer tracking capabilities due to concentration. Look at the NSA facility as the key, and sensor data about traffic patterns from RFIDs sensed at roadjunctions as the periodic depths of (connected) ciphertext to be reverse inferred…

Put your sensor car in the (slow-moving) traffic lane, and use its DSP to scan the RFIDs in the tires of (probable) NSA employees’ cars (as folks go to shift work, each day); figured by location and timing cross-correlation. Alternatively, bury the sensor and a $10 solar array for the power source in the dirt somewhere along a funnel road… and wander by it (and its wifi) in your car every so often to pick up the data dump….

Now do the same at major junctions, in all the bedroom community towns, and you can trace the paths, figure shift patterns, guess roles of folks even, and generally do intel (much like was done in 1941-45 on the Tunny and T.52 “nets”)

But I’m not inventing anything, Peter.  Stop pretending. It’s already there, done, and working! (its just someone’s else boxes…).

The one thing Tunny taught us was that cryptanalysis by itself is kind of useless unless (a) its tied to device/people tracking upfront, to get a priori likelihoods sorted, and (b) intelligence evaluation after the fact, so all the bother is worth doing. Of course the after the fact eval feeds back to target the devices/people you MOST care about!

Posted in rant

Replacing the Quaternion adjacency with X(Z/nZ, S)

In Turing’s On Permutations manuscript, we saw what I think were random walks on the finite circle:

image

http://turingarchive.org/viewer/?id=133&title=31

 

image

http://www.scribd.com/doc/119042559/Fano-plane

 

That is, rotor U1 is in I setting, and rotor U2 is in J setting. Whether U1 is I or J (and vice versa for U2) is 1/2 – to start. Then, one looks at the 4 possibilities of the next state for the 4 cases: (I in U1, I in U2), (I in U1, J in U2), (J in U1, J in U2), (J in U1, I in U2). In the top left quadrant of the state machine one sees the computation of the next step (2), for each possible set of 2 inputs – using standard octonion multiplication (that is ordering sensitive!)

Getting from step 2 to step 3 is just as easy. assume the either I or j and multiply by one of the three columns produced by the last round. IN any order, they produce the symbols noted in #3. with the relative (uniform) weights shown. Getting from #3 to #4 just continues the process. Take the outputs from #3 and multiple each by i and j… (or reverse that..)

Now for a Turing, he would probably be “thinking in BPSK”, that is there are only i,j,k … and their component signs (I, vs I’). In terms of cayley notions, his I and j (at each round) are his “generators” (see next) which generate his 2 cosets (1 of which is measure zero).

image

image

image

image

What is interesting is to remember in the setup to Turing’s generalization of graphs, having found an “exceptional group”, was the means used in permutation group theory to prove exceptional group property. If one remembers, proving that any and all 2-cycles and 3-cycles were included in the definition involved arguments for a modular world – rather similar to those seen in Z/nZ when treated as as (discrete) points on a circle.

We can also sees his focus on quaternions (and signed quantities) for the case of 2 rotors as giving him an easy way of thinking about the relationship of 1 Chi wheel bit and the next cam/bit which follows it, in a likelihood model. In his permutation paper, creating the fano plane (and its quaternion algebra)  is based on setting a parity condition on the univariates of his factor group.

I almost see his mental model for using automorphism , as it relates to _Fast_ convergence to the  uniform pmf. In a permutation group world, automorphisms come naturally as coordinate shuffling “functions” that a LLN transform can act on, too, in an L2 sense. One sees how these relate to ramanjuan graphs, with its restrictions on the diagonalized form of each (automorphism re-labelled) graph that control the rate of convergence to the limiting distribution.

It now occurs to me to consider the 3 DES hamiltonians (yet again). Could we see two of them defining their modular groups (coset codes), and the third is the finite state machine defining a convolution code that weights the production of trajectories formed in either one of the 2 coset codings’ codeword sets, ever so slightly, so as to take said codeword as input use 2-state delay sampling, to manufacture distance in the 1d bit plane as a function of the particular labels used in the codeword (after n automorphisms)? Hmm. Not well stated, but there is the germ of the idea, here.

Perhaps we can think of the first coset as being the colinear component, and the second as the orthogonal component in a “custom correlation” subspace reduced to calculable form, as internal codebooks that are designed to convolve easily.

Posted in coding theory

black bodies

I remember studying black bodies (and their radiation properties) at school, in Physics class. Perhaps because I was more interested in a particular brown body (of the human female kind), I don’t recall ever figuring the main takeaway for the science!

Put bluntly, the wavelength gets smaller till such point is reached in the ultraviolet frequency band whereupon the intensity curve changes direction. The idea is that the “size” of the smeared-out particle, now a wave with an average wave length, hits a minimum length representing the discrete quanta unit. Once below that unit, the intensity continues to grow but grows in the opposite direction to the growth manifest while above the unit.

image

http://www.myrealitycheck.net/2011/04/ultraviolet-catastrophe.html

Posted in early computing

MIT/lincoln knowhow in statistical cryptanalysis, 1944

So what was US academic “statistical” knowhow on the topic of cryptanalysis, in 1944?

image

image

image

Well, we can assume it was basically what we now think of today as “the principles of communication.” Back then, there was an even greater focus on the no-coding modulation world, with side bands, low –pass filters,  pass bands, fourier transforms of cos and sin terms, etc

Concerning Levine:

image

http://www.lib.ncsu.edu/findingaids/mc00308?doc.view=print;chunk.id=

one can see at http://www.princeton.edu/~mudd/finding_aids/mathoral/pmc24.htm a snapshot of Turing’s time at Princeton.

The Schulze and Luders book does a good job in capturing that older world view, expressing it rather more as it was probably thought at the time. This contrasts somewhat with the more modern intellectualized versions of it, formulated since.

image

any quotes below are from Shulze and Luders, unless marked “Small”

We have to remember that crude convergence uses the sign of the log-likelihood only (essentially hard decoding), whereas accurate convergence used the magnitude value too. We note how the US already has the guts of accurate convergence. It not only has the concept (log-likelihoods and cross-over models etc etc) but it  has the “factor” model.

Lets see how signs play into all this, from first principles.

image

Having learned that we can think of LLN as a norm and furthermore the nature of ratios is such that the quantity is some modification of one of the two terms (-1 is –1, but +1 is just the negative of –1, so –1 is a “universal” at the base of opposites, etc…), we see the same trick applied to probability measures. We want the a two-layer universal, in which probability of two things is always expressed in the terms of a probability of just one of them (say s), modified “somehow” to express the necessary difference obviously, which itself deconstructs into a likelihood ratio that we also just learned to expressed in its own universal single-dimension notation. So, in short we end up with the concept of “sign-magnitude”! But we get it for probability “measures” too, not only reals.

When we now look at 3.12 above, we see the very notion of probability of a “sign” (either + difference or a – difference) uses the cross-over model : we know the probablities of each edge in that “unit” computational basis. And we see sign-magnitude notation sT, where T itself is (T= +1).

I just see a clock hand for s = +1 pointing up, able to move left or right according to s. How much it moves per unit depends on the C term. There is another clock hand for s = –1, that also moves (up) left or right, in the same C “minute” unit.

Now, on old clockwork clocks, one would hear a whirring sound and the minute hand would move all at once between minute 35 and 36, day. As a boy looking at my grandmothers (old) clock, the hand would never be in-between the markers for 35, and 36. You could count up the clicking sounds, and then boom, a discrete movement would happen. So it is with banburismus and soft/reliability decoding. How much the counted seconds matter depends; as the minutes may give you most of the evidence anyways (so don’t bother spending time working with the extra digits). If sign-based decoding gives you 6db, and keeping the magnitudes will only given you at most an extra 3db, one has a tradeoff. Does 1 place of magnitude give you 2db (foregoing a 1 db, but saving time)? if so, can you still solve a banburismus puzzle despite that loss of accuracy?

Good’s argument was that by “merely rounding up” with the right rule, you added an additional probabilistic element whose addition entailed that you got back quite a bit of that missing 1db anyways. This was his ultra-violet catastrophe moment. You might as well just guess your last mile, to the quantum boundary. That the clock hand moves between minute markers all at once, doesn’t matter…

Now, let’s go back to the US crypto scene in the 1940s. Since the ultraviolet catastrophe is about reducing the wavelength till the intensity curve suddenly starts growing (even faster) but now in the opposite direction, perhaps we have an intuitive mental model of the stats-logic beneath the 5202 hollywood film-projector class of cryptoanalyzer (and its photodiode, reacting to intensity). One has a way of inducing a binary event (as direction changes) – that flashes a “stop here” light.

Now, I have not studied how WWII-era Japanese code machine were broken. But I half understand that the contraptions were basically enigma-class machines, built with multi-connector relays rather than rotors. The US attack on them was thus essentially refined brute-force search, given depths induced by the a daily key dynamics of the key management regime, again working on the assumption that no turnovers occur within the sample space – meaning the depth evidence can be chained, per the “theorem of multiple witnesses”.

now, in 1941, were folks using sign/magnitude, and log-likelihoods? That the real question, I suppose. Folks do know it as “accurate scoring” – using slide-rules and their log/hyperbolic scales that were “quite normal” thinking constructs (back then).

Posted in colossus

small’s elaboration of tunny wheel breaking runs

image

The albert small report on Tunny breaking, of 1944, doesn’t remind one of how to counts dots (in delta-D) given a set of dots/cross in a delta-X space. But, we remember the rule. Flip the bits in columns headed by x.

Note how the second method (count the bias of dots over crosses, per theta.ij is far more intuitive…and sums LOWER RIGHT to the same value as the top method.

This now explains WHY the two methods are equivalent, and why truly the original phrasing was correct: measure “against the ‘unknown’ wheel character. The delta.X2 on the left are the unknown wheel pattern’s characters, and folks pick the lowest one in this calculation. Flipping that from dot to cross and converging act to select WHICH pattern’s character one is interested in pipping/decibanning.

Presumably, if one sets the middle X2 to cross instead, this is equivalent in the difference-base notation to summing the differences across the middle row (i.e | 12 |).

As we see in the Tunny report, folks then for each “unknown wheel” position set a cross in the wheel pattern buffer, and counted the dots, for the conditioned-selection of those R positions matching 1p2=dot. Small’s worksheets, that Tony Sale withheld, would show the BP process.

Seeing as the Tunny report only makes oblique references to the actual BP processes, one sees why Sale given his background would seek to induce “lack of knowledge transfer” – to use an English phrase for the more typically American practice of lying/deceiving by omission.

Posted in colossus

from saml to openid to openid connect: increasing lack of expectation of privacy

home_pw@msn.com:

At the outset of openid, there was an attempt to deny the term websso – as the exceptionals played mind games with the definitions we plebs are to use. One would invert the model, so individuals could see their own trail of interactions with numerous websites. Of course, it failed; having spread spiteful rumor about anything that preceded it. Such is normal US standards technique from the free market (that can often be seen to have military funding trails, when you look a little harder).

Oauth 2 is another attempt at redefining the rules of the mind game. For the phrase “oauth2 is not about authentication,” now read “oauth 2 is not about websso” – unless you stick an openid connect gateway in front.

Having defeated the SAML/XML monster (even if encoded in binary), using largely divisive and manipulative propaganda, openid now takes on its former-enemies core mantle: websso!

What we lost by ceding saml2-grade websso was privacy, built in; preventing IDPs from tracking and tracing (and spying, and surveillance). In short,  we lost the properties of inversion that openid marketed (because it was just not viable, in practice), and we lost what it dissed. Ensuring that individuals have no expectation of privacy is core to the US model of global control, and I propose it is why US exceptionalism (whatever that is, but it sounds a little 1930s-ish) had to slay that which marketed a notion incompatible with such as Google’s raison-d’etre for providing free websso: so it can spy now on openid-grade websso as an IDP.

Originally posted on leastprivilege.com:

I get this question a lot. Short answer: “you don’t!”.

For the long answer:
http://blogs.msdn.com/b/vbertocci/archive/2013/01/02/oauth-2-0-and-sign-in.aspx

View original

Posted in rant

delta-Z vs delta-D mixing

Computational graphs are necessarily constructive.

https://yorkporc.wordpress.com/2011/03/04/markov-ciphers-and-differential-cryptanalysis-xuejia-lai-james-l-massey-and-sean-murphy/

 

Referring back to the old memo cited, above, can we apply some of our more recent Tunny knowhow to the “computational graph”?.

Look at X1 through X4 as 4 Chi wheel cam bit, from a particular wheel. Its specifically 4.

Look at Z as cipher characters from a particular impulse stream.. Z5 and Z6 operate on delta-D composed with delta-Z. Z1 through Z4 operate in delta-Z.

It is as-if you perform a delta-Z operation, and then enter a world of computing the delta-D (using two delta-Zs). Unlike Tunny, one uses Z5Z6 rather than the aperiodic PSI to perform mixing in the delta-D alphabet.. Finally, having spread the delta-Z into delta-D, one updates delta-Z with the result of that spreading.

Let’s GUESS (total guess) that the inner cycle is there in order to introduce cycles into the diffusion and thus counter use of sum-product type decoding method

Posted in crypto

graph based codes to Tunny/Colossus log-likelihoods

When you are as dumb and stupid as me, you recognize quality writing; particularly when its standard material you have already read in n other places. But occasionally, a writer actually teaches. Nigel Boston is one such teacher. He teaches on the topic of Tunny-era wheel patterns – focusing in particular on those in which one has insufficient confidence, the method of Turingismus and the underlying math toolkit shared between the math guys who wrote the Tunny report. Of course, he doesn’t use those terms. Those terms are ours!

Our goal is show why the following lines are what have searched for. They lead us to a book that addresses this areas of math. Unlike typical Anglo-American style writing on the underlying topics, it is written in a German style of English writing that is VERY effective at technical summarization. It assumes you are competent in the knowhow, and simply gets on with laying out standard engineering methods. It doesn’t see any need to use overly-intellectual arguments, to say 1 + 1 = 0.

image

image

See GRAPH-BASED CODES url

citation for following quotes

 

Let’s start the story:

image

Because of its relevance to shannon-limit capacity approaching codes, NB focuses our attention on the erasure channel (see red 1). Change context, immediately, however. Move from his abstract modeling and consider A or B to be one or more cams on the Tunny Chi wheel and/or the Chi stream – to be cryptanalyzed from Tunny key or cipher impulse streams. Why? Because it was done in Tunny:-

image

http://www.ellsbury.com/tunny/tunny-132.htm

of course, nothing is actually restricted to binary!

image

http://www.scribd.com/doc/117022731/Computer-Security-and-Cryptography

Treating the wheel pattern as a sequence of independent values, we might have an “erasure” or two when “a neighboring character” of the wheel pattern has either unknown or merely “soft” reliability less than some minimum. Alternatively , we can look at zeros in the sparse Turingismus cage conformed of the Chi stream generated by the wheel patterns as representing the erasures occurring in the communication of the generated Chi stream. (The cage JUST HAPPENS to satisfy the (irregular) LDPC condition (be sparse!).)

At red 2, below, HB notes a feature of the model that is appropriate for the Tunny world in which transmitting/receiving machines were not synced. There was no automatic feedback channel; though there were human operators feedback channels that turned out to be crucial to cryptanalysis).

image

What NB does well is show the way to think about conditional probabilities in a specifically boolean truth-table world. He moves the topic from a stats class discussing urns to a crypto class whose topic is communication channels. To think in this manner, take each line from some truth table in turn,  for those lines meeting some or other boolean formula defining a ‘constraining’ “conditional expression .” These are the lines that contribute probabilities and define a random conditional: P ( A | B ). THINK TRUTH TABLE, that is. For each such line formulate its contributing expression based on analyzing which bit positions in the input/output match up and thus contribute a difference (p) or similarity (1-p) measure. Whether you are comparing columns, bits, probability, likelihoods or bulges its always the same story: same vs. difference.

Having gone from thinking about how define a condition using a sequence of 2 bit-differences or bit-similarities HB has us look at capacity, for different channel types.

image

There are times to think about power and mean squares; which means think in AWGN. At other times, think in terms of the BEC. It’s easiest to see how the channel’s properties impact the formulas by annotating his worked example, given for the more intuitive BSC case:

image

image

HB’s paper goes on to give a nice summarization of using the trellis-form of the knowhow expressed above to compute the same values, with less work. It is presented much as Turing would have thought about it: the finite state machine defines a group whose nature defines the cross-over model, and sequences generated by that machine can be cast in terms of the state-space in discrete time – through which the sequences are trajectories. In the era of discovering vacuum tubes and modern physics, this is all directly tangible. Its really just modeling particle physics! 

For Turing – a mind-factory making methods for others to use – things can be reduced down from analytical theory found normally at the High Table to more mundane principles. One can simply look at all the possible pairings in the column of Turingismus cage – the Chi communication channel as received, with erasures. One computes the sum given above — one per pairing – “condensing” the sameness-difference of the same space to its average. Each sum is his little “neuron” acting as firing/calculating engine – all done to get a “start” useful for then converging his Tunny rectangle.

With a start established and thus a reliability value assigned to each character, one can now sort the charters into a list of the most likely.

image

image

If we run forward 20 years, we see how then how the cage was related all along to the rules of LDPC. That theory generalizes the notion of likelihood – defined in Turing’s day simply as the ratio of the p term in the cross over model to the ( 1-p ) term. That is, the relative excess in probability terms of the bulge of dots over the bulge of crosses.

With all this setup, we can get back to our main  focus: “the trick.” This is that which allows analysis in terms of a likelihood L(X) with a composite X to get to analysis of L(X | Yi) with a componentized Yi; from whence get to what we want: L(Xi | Yi) with componentized Xi .

image

This develops further…

image

…where the workings gets us to a world in which  a conditional probability over two d-length bit vectors is a product of the probabilities of the individual component vectors…but only when one notes there is a particular parity constraint on the source bit-vector. The argument develops into a form concerning likelihoods, featuring the ( Li  – 1 ) over ( Li  + 1 ) ratio.

We now get to define the parity-constrained world of a conditional probability in bit-vector land. It is defined in terms of a product of certain likelihood ratio. Throwing in a little geometry and log theory:

image

We end up with a world in which we set a condition on several wheel cam bits (that their parity is 0) and expressed the communication model in terms of a log-likelihood measuring stick scale.


I think we can read more from http://www.scribd.com/doc/52493629/Wiley-Theory-and-Applications-of-OFDM-and-CDMA-Wideband-Wireless-Communications-2006- to get to bigger picture on tanh! And in so doing, we get to think even more like Jack Good, distinguish signs from magnitudes

image

image

image

This book is exactly what we need.

Posted in colossus, crypto

lab manager for programming rats

As a first cloud, let’s NOT act as if we were the state of California running a mainframe, full of processes. that is, we don’t want to create a private cloud out of the VMM fabric that is in turn orchestrated using service manager (which provides a UI for said processes, much likes CA’s infamous service center product). We can do that later, once we have more experience. At the same time, we want to be above the layer of  (i) using the Windows UI to make VMs, and (ii) using the VMM console to make VMs.

Lets play with the programming-team centric integration – a private cloud factory that fashion sets of VM in a custom cloud aligned to a particular build – which then hosts (and presumably auto-tests) the built software system. For this we play with the TFS 2012 express edition, linking its “lab management” concept to VMM’s fabric.

So, first in visual studio ,we see how to talk to the TFS project management sub-system (think like a project manager on a building site…building 20 houses in a row) and to the source control sub-system (the quartermaster of said site). Here we see talking to the local support infrastructure, and to a hosted version too.

image

 

image

Now to exploit this, we REALLY need tiered storage, leveraging a SAN that can give us snapshots, LUN cloning, etc.

Posted in system center

System center 2012 SP1, Windows 2012 Server iscsi target and SMI-S; cloud storage

Not having a SAN I can manage myself, I need to build one using Windows 2012 – so that my system center VMM environment can give me “tuned” storage blocks. In this manner, I can see how a (private) cloud administrator (or an administrator of its underlying fabric, at least) really works with storage in a now “cloud-centric manner”.

First we learn to add a “managed” (vs. unmanaged) file share to VMM, as shown. This is not what we want. But it is a stepping stone to see how “managed” file servers relate to VMM. In an SAN-less installation and for a non-clustered set of hosts, they can share the fileserver share (and thus “migrate” VM stores between hosts).

image

This done, my library server now binds to a couple of terabyte usb or eSata drives hanging off some hosts. I used ADUC  on a DC to denote the delegation level of the VMM host: allowing delegation to all remote services (using kerberos). I also assigned to the library resource, hosted by this VMM host, a library management “client” credential. This is just a stored uid/pwd!

While I was at it, I set the configuration on the library server to relate it to other resources used by private clouds: a named group of hypervisor hosts and a named “VM network”. The latter wraps a These are both fabric-universal resource types. I keep it “unencrypted” so FBI can spy (not that having it encrypted would make any difference..)

image

The “VM network” should be thought of much as the kind of UI that configures the physical connectivity of blade’s Ethernet ports (sans PHY) in a blade array to the cards inserted into the chassis bearing lots of PHY electronics (for some cable type). For us, for now, we keep it literally that simple:

image

While all this fabric setup gives me a mix of unmanaged and managed LIBRARY shares usable by VMs in hyper-v able to create switch paths over Ethernet to data services, and this is all indeed then usable by any tenant private cloud hosted in this fabric, it has not solved my “storage problem”. I just have a virtual blade chassis.

Let’s now create a new iscsi target, to see how to upgrade our consciousness to cloud. Unlike a month ago, this time I will not simply expose the iscsi target to an iscsi client, that in turn simply turns around and exposes the now-mounted volume as SMB file share. Rather, I’ll make the target (not the iscsi client note) into an SMI-S style resource. To do this, we use the latest drivers from 2012 SP1 VMM software suite,per http://technet.microsoft.com/en-us/library/jj860422.aspx

image

http://technet.microsoft.com/en-us/library/jj860422.aspx

to make it work, this is what we had to do (not that all elements may be necessary). First, the SMI-S provider is installed on the same host as that hosting the iscsi target (and its associated iscsi virtual drive, in all likelihood when using the Windows 2012 server iscsi target service). Second, We set the authorized initiators as follows (as we fiddled). Third, we REMOVED our iscsi client binding to the target on the VMM host. (i.e. don’t add it in the first place!)

image

My thinking, to get around a first-trial error noted below, was that the SMI-S client-of the iscsi target may need its own access rights. This could be so, even though the consuming process is on the same host as the target. Therefore we specified the client in 3 ways, as shown. We also added names for the VMM host. Note that I don’t really know if just one, some, all or perhaps none of this is needed. So play! For all I know the error we encountered first time around (when the rules disclosed above were not true) could be related to any of them.

image

Anyways, we fixed it.. and moved on, Now we can play that the VMM UI rather than use further power shell command line. now I should be able, cloud-era, to play with modern modern concepts than online, offline, near line storage.

image

Well it would nice if it worked, but it didn’t. And here is a summary of what not to do (including several of the steps given above!). Of course the goal is to get to here:

image

So, first thing to realize is that you don’t really need to use the PowerShell commands. The GUI works just fine. Next, you don’t need to configure any iscsi target (or fiddle with its security model). You do not need to create any iscsi virtual disks (bound to the not-needed actual target). You Do need to complete windows update  having installed the iscsi target service, though. (I suspect the latter is critical.) In particular you do not need to do what I did, and go install an additional SATA drive so that IO would have some “primordial” storage out of which I then created a raid with storage spaces, supporting a STORAGE POOL. That’s because the term “storage pool” in VMM has NOTHING TO DO WITH storage pools. (Sigh.) A ‘storage pool’ is any formatted-volume that MIGHT host an iscsi virtual drive (if you were to create one in the iscsi LUN-management world). Of course, all that is happening on VMM  creating a LUN is that an iscsi virtual drive (a VHD remember) is created remotely…

Anyways, despite being a waste of time, I did learn to install the sata drive (and fiddle with old BIOS settings, learn to discard a drive that didn’t work when it should have, understand internal eSata connectors vs. those found on the case…(since they are different!), and otherwise have a jolly time fiddling around with motherboards. Gone are the days of two IDE cables, and you have to set the master/slave bus bits! I also learned more about storage Pools (the windows type), and got to set dedup on my ‘storage pool’ – the VMM (not windows) ‘storage pool’ – a volume that might host a vhd remember! This will presumably be used given its to be storing virtual machines images, much of whose blocks will be dedup-able

image

Now, an attempt to repeat this failed miserably. Some set of the following “things to do” made it work.

(1) obviously install latest updates and Iscsi target.

(2) install on the iscsi target host the MOF files and associated DLLs that enable WMI on the host to expose the MIB. This means install the iScsiProvider, found on the SCVMM disk (under install/, amd64/, msi/ etc).

(3) uninstall and reinstall the latter, in some special way.

(4) reboot the target machine and/or restart the WMI services (and thus iscsi service, and VMM host agent service…)

(5) use the powershell scripts to first associate the provider with SCVMM. This seems CRUCIAL (as slow target discovery seem to time out the SCVMM built in client). This should at least show up “providers” in SCVMM fabric.

(6) Remove the providers via SCVMM UI, and then in the powershell ADD command repeat the success of (5), but now have a “description” field. I used the host name.

(7) use the array element of storage fabric – not the provider element – to see if there are now listed any pools.

(8) Try using the powershell “$Pro=GET-SC…” and then the “READ- -StorageProvider $Pro”

Scream a lot.

Cursing doesn’t seem to make the slightest difference, note.

This is a miserable process, and needs sorting – perhaps needing only a blog post and what to do, predictably.


Third Time Lucky … im starting to figure the necessary magic.

install iscsi target etc on the Windows Server 2012 host.

Add a iscsi virtual disk, noting that no “SPC:” target is yet available …to which one may assign the disk.

Use the powershell features of SCVMM console to add a storage provider (see MSFT notes) to SCVMM. ONLY THIS creates the ISCSI target on the server (NOTE THIS WELL)

Now you can assign the iscsi virtual disk to the target. Or, you can use the Console’s create LUN wizard – but only ONCE you have used the properties of the new array – to mark which “storage pools – aka. disks on the windows host bearing the iscsi target) are assigned to storage classes.

Now you can go to a host/cluster and assign the iscsi array, which seems to remotely do iscsi initiator configuration.

Posted in system center

early NSA detection and filtering

The early “NSA” method of making a co-incidence-based counter was discussed at https://yorkporc.wordpress.com/2011/07/10/5205-and-photo-films/, We have a much better understanding of the setting method (for Colossus), based on detecting when certain counts in certain delta-alphabets for certain bit conditions (in X and Y) imply a Chi setting. What is clear is that the 5205 is just another way of executing the same method. There is noting that Colossus is doing that the 5205 process can not doing, at a theoretical level. One could even do it by hand, given enough monkeys.

It helps to think of the “designer/operator” view of doing set up of the machine. They do “plugging” in order to configure (remembering that steckering was also just a type of “plugging”. When we see

image

http://www.alanturing.net/turing_archive/archive/t/t25/TR25-001.html

we see a plugging spec (where plugging may be several sets of plugs across multiple panels). If we want a spot in a (once-developed) film to appear at 1 of 8  heights above some line, we can define the conditions that induce the placement of the spot there. For the delta-X world, we look at the X3X4X5 and simply use the map given above. Why that table has the form it does is not mentioned, note. And similarly why the map from delta-Z to 8 is defined the way it is is similarly not mentioned; but obviously one can plug that way too (when making films).

When one looks at the maps, vertically, obviously you 3 lines, all balanced. You see the classical binary (not gray) coding…albeit written left to right., for X. For Z, folks rotated Z1 a bit upwards, and Z3 4 bits upwards. The net effect is of course just a re-ordering of the rows, treated as binary numbers

Now note the form the argument. Given a detector pattern expressed in delta-D, what one wants to detect that …is a spot at the same level in X and Z (given the particular plugging specified). SO the particular mutual orderings of rows in X and Z  work to produce the required coincidence, on the “projection plane” of 1..8

Note the phrasing, next. Only on “inspection” will this effect be apparent. No rationale is given.

Lets assume that within the ordering is a common probability argument, a kind of BAC or BSC argument linking X to Z (via D…). Without having to be an exact geometric equivalence, the spots come “close” from both projections – close enough (within a probability of error) to be within a suitably small bounding box drawn around them. The distance (in squared-error terms) is within some e. The common probability model means that there is some bounding box that is suitable for any of the pairs, even though distance per pair may vary (within the limit).

Now, to actually compute the power function, they use a power lamp….and a photo-cell detecting the incidence level of (8?) energy levels.

So does my theory hold?

Let’s see a more complex plugging that defined a class of characters (by a truth table for a (3,3) Boolean function)

image

http://www.alanturing.net/turing_archive/archive/t/t25/TR25-001.html

This simply controls the film-making process, for an example colossus run (that just defines a character selection, which in turn is the indicator function of a particular sub-graph within the finite state machine).

Well, I f I draw the map, there is obvious “construction” going on.

image

More later, once we have some system center stuff working! But you can clearly see how this method is general (and applied to WWII-Japanese codes too). Which implies that colossus was applied to Japanese-codes, too… not that we have seen much about that, in the record.

Posted in colossus, early computing

wave scattering, sequences, crypto

image

WAVE SCATTERING BY SEMI-INFINITE STRUCTURE

Posted in crypto

gray-coded trellis

Way back when,we wondered just what motivated the ordering of the Tunny alphabet. This was in the days before we really understood how the very listing order and the associated (32) character counts performed by Colossus enabled an experienced eye to detect the chi setting or settings that were out of wack. The consequences of error due to partial chi-setting errors would show up in how the errors in the counts then re-distributed, as explaining by mutual dependency on particular impulse lines.

Of course, today we would look at the systematic hamming parity matrix with 3 redundant bits and know that for each there is a mutual-dependency on certain information (and other redundant) bits.  This is all that’s really being said! Folks assume relative confidence in X1X2 setting. then, using the linear dependency model of parities between the various bit helps show which of X5X4X3 are consistent with the parity model, which implies the the Chi setting for that wheel (or wheels) is not correctly set.

Interesting to see core Hamming coding at the heart of cryptanalysis, at the end of the day.

Let’s go again from the memory wheel “encoding” of a “program” (for generating a codebook, or decoding a wheel pattern). Assume colossus plugging is a universal interpreter not for turing  machines but for just this class of “instructions” – any de bruijn sequence whose compressed internal coding of history/future state transitions for the turing machine graph (the memory wheel…) codes up the how to make the codebook – a bit like 1850s cards would tell the weaving machine how to make a carpet.

image

https://yorkporc.wordpress.com/2011/09/28/cosets-and-unique-decoding/

We see Jack Good’s graph above, from which one can easily build out the trellis of depth p (column count), with 4 states per column (row count). When using mean-square distance minimization in trellis thinking to decode either a signal or a wheel pattern (in the cryptanalysis version of the process), we are using soft decoding using the vertirbi form of BP (in 1944). We are assigning weights, as conditional probabilities to each edge in the path of inferences along the data path of a given bit stream.

If we take now the standard 2-bit, 4-state, 1/2 rate code let’s consider using only the delta and double-delta (x2 lookback) electronics of the 1944-era colossus computer. What happens to the trellis when we use Good’s graph with its gray coding of edges (rather than Forney’s classical graph, with binary-labelled edges)?

 image

Low-complexity trellis decoding of linear block codes

See TRELLISES AND TRELLIS-BASED DECODING ALGORITHMS …

Gray coding helps us understand the notion of history and future. Given 001 edge passes through state 01 to become 011, we have the very simply model that the state is due to the overlap of 001 with 011 (001 is seen as –01, and 011 is seen as 01-, where the overlap tells us what is history (h) and what is future (f).

h 01
   01 f

it becomes obvious how then the math formalizes this now intuitive idea from the memory wheel. Any state such as 01 in its space is a vector …that is then projected onto (i) the history subspace and (ii) the future subspace. We can be interested in the length of the vector in terms of its co-linear (vs perpendicular) component on the subspace. In conditionality probability terms do we wish to look at P (s | h) or P(f | s) (and what does that tell us about P(h|f) or more interestingly for crypto P(f|h) (and P(f | not.h)

Turing is playing with lattices in 1954 and Good is playing with the finite-state machine showcasing history and future characters, And both obviously know lots about soft-decoding from the days of “decoding” tunny wheels (or American M209 settings, that even the 1950-era “peasant-army” of the finally-free Chinese nation must have been breaking during the war in Korea!). In 1948, the capacity theorem form of Shannon’s work gets published, and systematic codes for RM are disclosed in early 1950s.

what we need to do next is figure what knowhow on the likes of semi-infinite sequences etc was in the curriculum or research tools of that period. Forney does a great job of showing the design method for delay-based filters – easily implemented even on a Colossus running  in 1948 say.

Posted in colossus

modern cryptanalysis done the hard way..

Cryptome points to something interesting for once, other than the 19th copy of some DoD policy on how to write policies, for and by a policy-writing contractor.

 

image

 

On reading it (in French, since its easier than struggling with a crappy translation), here are some first impressions (remembering she has a doctorate…). Remember it actually helps to block-read more…

Its starts out with an over-concentration of focus on the computational graphs.

The language use, perhaps because of the French, is more interesting and lively than your typical American paper, 99% of which trots out the same tone  – as folks re-package.

At the same time, the argument is all about attacking the graphs, rather than explaining the design principles that gave it strength in the first place (to resist X, Y, Z…)

Figures like the following do suggest that we are in a world of forms of chains/trajectories of formulae, and their generators, each constrained by antecedents – with a view to doing “Something”

image

image

 

Seems like an interesting figure, along with its companions. Reminds me A LOT of a certain graph for  decision regions, for binary async channels… just those that create distance…as a function of the data (rather than constructively)

image

If nothing else I learned the term for standard deviation in French! of course, this is pertinent to our recent devoirs. Nice to see a Probability of Error argument that is NOT shannon-era related…or about measure theory, metrics, intervals and projected planes!

image

note use of an enigma-era term (presumably Turing got it from France…, or Poles speaking French…)

image

We see the differential trail analyzer:

image

interesting that she is not intimidated by the  fast and complete growth of diffusion, but goes looking for early round vectors.

Part I ends. Hmm. So So. Obviously diligent, but so is a mouse on a wheel.

part II didn’t seem to be in my space at all (being public key centric), until we got to bi-linear differentials:

image

in conclusion:

image

Hmm.

Posted in crypto

modern cryptographic analysis technique applied to Tunny

Let’s try rewriting the following a little.

https://yorkporc.wordpress.com/2012/03/14/tunnycolossus-setting-vs-des-differential-cryptanalysis/

Let’s use Tunny-era terminology, and analogies. Since the material in the chapter is so good, its worth contrasting its terminology with what we understand from Tunny. In particular the work focuses on formulating “correlation” tables, which aligns with our intuition concerning projection of keys onto the set of cyclical shifts providing a set of orthonormal wave functions.

In Turingismus, one may relate 1 bit of an “instigating” delta-D character – associated with an extended psi-stream character assumed to be ‘/’ (in the differenced delta-PSI’ stream) – to a second “input” character in the delta-D stream, whose stream-position and full 5-bit value is inferred from the period (Wi) cyclic repeat of the input bit’s sign and associated Chi wheel pattern (Wi) and 4 bits that this bit’s sign implies given the ciphertext/key stream found at the same position. A third “output” character may be defined much as the input character was defined, at a stream-distance 2 * Wi and with value determine in correlation with the ciphertext/key stream character at the same distance. Call these input/output characters an “output  pair”.

Thus, translating:

Assume that the method is based on gathering statistical key information about ciphertext blocks that are obtained by encrypting pairs of (known) plaintext blocks with a specific bitwise difference A’ for which there is a bulge δ  for (delta-D12 = dot).

Chi wheel patterns are extracted from the output pairs in the following way. For each pair it is assumed that the intermediate difference (in energy) is equal to B’. The absolute power-values associated with the bit pattern of output character and the (assumed) differences of intermediate values of Chi and PSI’ conforming the key character impose restrictions on a number ν of bits of associated de-chi character. A pair is said to suggest dechi values that are compatible with these restrictions. While for some pairs many dechi values are suggested, no characters are found for other pairs, implying that the output characters are incompatible with B’. For each suggested dechi value a corresponding entry in a frequency table is incremented.

The attack is successful if the right value of the dechi character is suggested significantly more often than any other value. Pairs with an intermediate power difference not equal to B’ are called wrong pairs.  Dechi character values suggested by these pairs are generally wrong. Right pairs, with an intermediate power difference equal to B’,  do not only suggest the right dechi value but often also a number of wrong dechi values. For Tunny, the wrong suggestions may be considered uniformly distributed among the possible dechi values if the value P(B’ | A) is significantly larger than P(C’ | A)  for any C’ ≠ B’.

Under these conditions it makes sense to calculate the ratio between the number of times the right value is suggested and the average number of suggestions per entry, the so-called signal-to-noise or S/N ratio. If the size of the table is 2ν and the average number of suggested subkeys per pair is γ, this ratio is equal to P(B’ | A’)2νγ. The S/N ratio strongly affects the number of right pairs needed to uniquely identify the right dechi value. Experimental results showed that for a ratio of x/y about n-m right pairs are enough. For larger ratios only a few right pairs are needed and for ratios that are much smaller than 1 the required amount of right pairs can make a practical attack infeasible.

Posted in colossus, DES

power limited tunny, with points on a sphere.

The Albert Small report on methods of breaking Tunny showed clearly how folks were using what we would now call principles of communication theory, for which Gallager’s highly-engineering-math-oriented course notes ‘book’ is excellent. Gallager writes today as thought a 1930s math type about advanced math research topics of the day. Though Small is not of the same intellectual class as the Turing and Newman (etc) on whose methods he was reporting, he also had that rather American knack of of writing clearly. In particular, he focuses our attention on the core nature of xor: that one is counting when pairs of bits are the same or different.

In Small’s report, he is at pains to teach the relevance to the binary symmetric channel to analyzing sequences (of bits). Given a history of the bits in the current sequence, what is the next bit likely to be? Is it the same as the previous one (or different)? Small shows how folks modeled that question producing an answer in terms of the expressions for the cross-over probabilities of the source channel: whether a motor bit stream is +/-, and/or whether some ciphertext stream has the same value, and whether this means two bits in the sequence then should match or differ.

Going beyond setting, one focused on tuning up the techniques for breaking. In this case the motivation of counting same and different pairings is because of the bigger-picture XOR property. That is

image

http://www.alanturing.net/turing_archive/archive/t/t07/TR07.php

Now, built into that overly-concise, logic-centric statement is Small’s better-expressed mental model: that two “parametric” equations are at work. Having noted the bias to dots over crosses, we know that xor enables us to count when same pairs are encountered. giving dots. The evidence for dots from delta-D holds as (noisy) evidence for the same property in delta-Z.

In GCCS terminology, we have the the number of pips (an integer count) as the magnitude of this evidence. The magnitude is the “score” in pips, note. It’s the number of pips, which is the number of xor = dot (against some breaking condition). In general the condition is the antecdents pertaining to one bit in the cam patterns of the wheel being attacked (and then the next…) – which obviously aligns with a single column in the turingismus cage.

Now, we look at the column of the Turingismus cage as a the state of the ‘signal space” at time t (where t’s unit is chi wheel-cam turnovers). For a model of the signal space, use the PAM model: a subset of the possible set of M  points in a signal constellation. For tunny, the PAM model of a point in the space, the matrix M, is not square (though recall how Small noted it was made so (23*23) for certain counts of same and different pairings!

At this point we get to quote Gallager’s engineering model for inner products of wave functions under a correlation transform:

image

where (from earlier chapter) one may project onto a “subspace”

image

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

and focus on the next point which adds in noise, giving a “parametric” world view:

image

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

We can think of the abstract pulse function p as a vector subspace (treating it as irrelevant which concrete function is assigned, providing it is a cyclical shift). The subspace is formed from each shift of the pulse in its cycle, in some unit time; and upon which one can project. Then, we note Gallager’s admonition: that the noise integral Zk is, for wave functions, the inner product of the (i) noise sequence with (ii) the subspace-of-pulses. In short, the integral noise sequence is projected onto the pulses, as they shift through the cycle-offsets (t/T)  forming up their subspace.

With that in mind we can get back to Tunny breaking concerning scores of evidence (in integral pips) and the interaction with the cycles due to turnovers of the cams on the Tunny wheels. Note that the time unit t/T in the Tunny break is 1/1271, for a composite wheel X1/X2. But, what is the “measure” per pip? For this, we look for a theory and for that, first, a rationale:

image

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

To make sure we are not in confusing a random error as the signal, we want the uncertainty of that particular event to be properly modeled, noting that we can drive the uncertainty to the minimum quanta of energy by filtering the data to only consider that which is in the tails of the distribution (n standard deviations from the mean). Of course, govt-endorsed cipher design may add to a signal not Gaussian noise but their own, ahem, “characteristic” noise… But Shush!

Getting back to the modern text,  lets rewrite it in Tunny speak

image

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

 

To be more specific about the relationship between M, the 1271 of points of the Chi 1+2 rectangle, d  the d-ottage of the motor controlling stop-and-go of the aperiodic “noisy PSI”), and the variance σ2 of guassian noise Zk,  suppose a function of d is selected to be ασ (a certain number of standard deviations), where α is “noted” to be high enough to make detection sufficient reliable.

This is course is a summary of chi-setting, on Colossus, where tunny d is the d–as-distance between points in a signal constellation’s “nearest neighbors” in the PSI noise stream – where neighbors are inherently separated out as the extended PSI by the motor stream pseudo-Gaussian process (controlled by its dottage).

In Tuuny speak, folks converted flagged rectangles with such scores into a set of measures ( in decibans/db units) in order to then perform the BP process of inference. But what was the scale of measure to be?

Gallager helps us understand the Tunny report’s own reporting, imposing the stream room model of energy:

image

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

 

We see the minimum energy due to just the right distancing and thus mixing of energy sources as Es, derived from mean-square error arguments (where we just can add/subtract easily, once in the squared-world). Of course, rearrangement allows then one to figure the bits per point (which is our Tunny measuring stick, per cipher stream Z):

image

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

 

Gallagers next paragraph clears up my personal confusion (held for 2 years now) on how to see Tunny’s use of SNR alongside the Shannon’s capacity theorem’s use of SNR. (How comes only 1948 was the year of the “US” communication breakout, when clearly Freidman/GCCS folks were looking at the very same “information theory” terms deeply in 1939!). Gallager distinguishes subtly FROM that theorem’s formula:

image

Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology.

For Turing, he didn’t care how small was b, and simply focused on the detection theory (upon which BP then builds). For Shannon, he happened to remark upon the bound. b is just some scaling unit, and some long decimal (which perhaps explains why the ACE and Manchester baby designs had floating point and counts for 1 bits in a word array built into the very hardware, as early as 1946!!)

This is what is said in the Tunny report, albeit very obliquely:

image

http://www.alanturing.net/turing_archive/archive/t/t07/TR07.php

One can reduce the process to “normalized” measure theory, ie. b, and for that one would use the “accurate convergence” process (rather than the merely the actually-used estimated evidence convergence process, a limiting case).

imageimage

In Tunny terms, the ratio of “energy” to sigma (SD) in units of “the bulge” is the extra power required (in measure units of beta/b) due to the excess of dots/crosses.

Now how do Tunny disclosures handle α2 ? And what what folks in Tunny era really doing, when they say “find the X1/X2 cams” that make beta a maximum (apart from get the functional output… the embryonic wheels!)?

We can look at the answer to the latter question as one of  geometrically “skewing” the underlying euclidean plane such that the distance between the signs, when integrated under the mean-square rule, is that now-non-linearized placements of signs such that this placement maximizes  the measure beta – the operational meaning of which is that the wheel bits (vectors which code up the needed skews) best  reflect the evidence. We have taken the top vertical and left horiztonal “vectors” on the originally rectangular plane and progressively lengthened  and shortened them (on scales of 43b and 31b) to create a custom angle between them, that is beta. We have formed a custom inner product rule, that is, working in phase space (when now thinking of points on the bloch sphere so beta is just the difference between the skew-vectors express in that coordinate system as (the cosine of) the their angle, cos(theta) = beta necessarily less than 1 to reflect the invariant).

The Tunny Report does a good job, in contrast to its explanation of beta maximization, of then explaining convergence using “inference measure logic” (needed later as part of BP, recall)

image

It’s the stats of delta-D that drive the whole process, of course, so fiddling with optimal skew fields at the end of the day is about fitting chi onto delta-D, as summarized by the conditional expression for delta-D12 is dot in the power-regime.

The Tuuny report goes on indicate nicely how flagging is a reduced set of accurate convergence processes, applied not to measure the vector-similarity of two wheel patterns (in this custom measure metric with its unique “rod”, recall), but to two rows, when looking for starts.

Posted in colossus

distance and count of terms hat differ

I like the way this guy writes, as we gear up for a big one!

image

image

http://www.scribd.com/doc/43395812/Coding-Theory

Having seen that BP algorithm is about find converging a channels conditional probability map for all symbols influencing a series of weights on the interacting nodes of a tanner graph, we are getting a profound respect for the notion of distance, being it Tunny dots D, the distance between points in a PAM signal constellation, or the d in (n,k,d).

Posted in coding theory, colossus

Small and wheel breaking

See http://wp.me/p1fcz8-3qe It’s interesting to see the presumed bits derived from a guessed / in delta K for each wheel period be then counted as if on a period of 23 (psi 5) and then be used in chi1/chi2 rectangle. Flagging the underlying cages for all pairs and pair counts that are the same is analogous to a formulating a std rectangle.

Aside | Posted on by

Laurent Series, Complex Fourier, automorphisms, sigma-cycles

Rauch’s paper broaches communication theory linking series to transforms. It feels a little like the theory presentation made by Galagger who similarly explained engineering principles of communication similarly. One starts starting with continuous Fourier series, addresses discreteness,  figure in fourier transforms, worry about L2 convergence, and end up with a series of numbers (that project onto a sinc(t) cyclic shifter)!

image

See Fourier Series, Integrals, and, Sampling From Basic Complex …

and then

image

See Fourier Series, Integrals, and, Sampling From Basic Complex …

When looking at DES, we were contemplating pseudo-random means of engineering t-resilient functions. In the field of convolution code encoders, based on relative simple finite state machine ideas not unknown to Turing (!), we see how the design of the state machine can so bias the sequences such that minimum distances can be generated. In the world of Tunny cryptanalysis and the scoring of  repeat bits (see Small) , we saw how folks modeled scored probabilities in custom  (analytic) cryptographic channels. Let’s see how folks say all this properly!

image

image

also citation for following, unless otherwise noted

First, our intuition on leveraging both cyclic code and non-commutability seems right (since both were featured in Turing’s On Permutations manuscript, featuring the use of the Octonions, the Fano plane, parity constraints, and automorphisms (of the Fano plane))

image

The notion of ideals needs some background, particularly concerning the relationship with generators for normal and dual spaces:

image

 

image

wikipedia

Further (really well written) tutorial information on ideals and coding generators can be found at Cyclic Error-Correcting Codes.

But, getting back to convolutional (or correlational) codes, we see a variant of what we saw in Turing’s code-centric On Permutations manuscript (remembering we read also concerning 1930s sigma-algebras, in general)

image

Of course, in Turing’s infamous enigma-work, shifts of cycles (between the compartments of boxes in a steckering relation) were of paramount importance, requiring custom measures (in decibans, of course).

Now what is that mapping? It says that a polynomial (with n terms, and throwaway z placeholder) maps to an automorphism “function”, parameterized by z and sigma. Z is the  specification(s) of the delay circuit (a multi-tap shift register acting as a custom multiplier for polynomial-fields). sigma is the possible re-labelings of the nodes in such as the 7-point fano plane – where the set of all possible re-labelings are limited to the potential set by the mutual-dependency relation specification built into each “curves” (lines) of the “design”. For rationale and intuition concerning the relevance of such constraint “networks,” just think of Turing bombe menus or think colossus cryptographer evaluation the 32 count… to see it the sub-patterns are consistent with the dual space constraint set).

In webby terms, the automorphism keeps changing the names of the references, whose identity is stable despite re-naming. An ontology instance (the fano plane) constrains the graph-rewriting.

Note now how when renaming graphs, the world of constrained-renaming of nodes itself can have an algebra and said algebra itself have properties – which is what folks are saying with sigma-cylclicity. The renaming (changing the address map of the bits in a vector) is a bit like the pulse shifts in PAM!

Posted in coding theory, crypto

turingery calculations of significance in pairwise counting

Continuing the Small analysis:

Having made columns of depths in cages (which Small calls rectangles), folks count pairwise matches. Now one wishes to test the assumption that the start that generated the assumed bits was indeed a delta-PSI’ position aligned with an actual repeat on a motor-stopped position.

image

Its not a total surprise that folks are forming up a ratio of the random case to the “combined flag” count case.

Presumably, a similar process could be used for the relationship between motor stops and psi wheel patterns.

The rationale and basis for applying a threshold for ignoring weights is given, nicely:

image

The relationship between weight classes and scoring is given here MUCH better than in any other material I’ve seen.

Small summarizes weight classes in terms of odds, in a matching paradigm, assuming a given beta and the usual factor of 2 to deciban mapping rule.

.image

Posted in colossus

Small and PSI breaking

continuing the last memo, we can focus on the method of settting wheels much as must have been used in the days before machines, for a single wheel-set subtractor ciphers, anyways.

We see our weight table come back into vogue, where it is essentially a distribution.

image

This derives into a set of heuristics, used when teasing out two ciphertexts (dechis, remember) that are in depth.

image

Good discussion of skills requires in breaking, versus setting, vs anagramming, vs motor pattern determination. You really get a feel for what methods folks are using.

Also, some turingery context, rather more operational than the theory. Note how it overlaps with rectangling…

image

Posted in coding theory

Small and wheel breaking

Continuing the last memo:

Small goes on to outline wheel breaking, and the math processes in particular. He assumes the use of (well-known to the US) means of guessing wheel bits (probably from cryptanalysis techniques applied against hagelin or other types of bit-wise subtractors, leveraging “crude convergence”

What is interesting is then the further acts of convergence of sign-counting rectangles made on the fly;, assuming one wheels bits and the other wheel as a unit vector.

We recall in the Accurate convergence discussion in the General Report the derivation from the bulge concept (1 + delta) becomes (1 + delta)/(1-delta); and why we have the power concept. Small shows clearly how to go from pairwise counts to the calculate the entropy per pip – the numeric “basis” for the set of impulses applied by setting each other-wheels bit-pattern to some bit being 1 while all others 0, in turn.

image

Its for this reason I always think of wheel breaking as calculating the same value as log (1 + SNR), giving the baseline entropy rate for the specific density of the Cipher, itself. In American terms, Small goes on to show the estimation process (and rationale for the estimating method) for the spectral density. This is not too interesting (at my level).

If we look at the test applied to determine if the guessed wheel pips has indeed helped get out a good estimate of the other wheel, they seem to be doing what we see in the BP method at the end, after all the rounds of inference. And, they apply a threshold.

We can look at the scoring of each “component” impulse as similar in BP to gathering the beliefs from each of the parent nodes (before passing on some accumulation onto children). The factor graph is such that all variable nodes connect to all impulse nodes, taken one inference calculation at a time of course, in a serial evaluation model.

But, Small conveys in his tutorial style numerous interesting things: that wheel breaking and forming up the factor graph is a very interactive process, involving multiple rectangles , multiple messages – done to build up the weighted space. And, its clear the US has not been doing anything like the same thing – which begs the question what DID they do, beyond crude convergence of rectangles to then leverage them?? Did the bits just help them cut down brute force searching a bit!!?

Posted in colossus

Small report on Colossus

In the General report on Tunny, till now I always struggled to comprehend the disclosures on how folks modeled the machine as a cross-over channel (one thinks BSC, etc). The small report makes it intuitively very obvious!

image

image

First, even though its implied in the Tunny report, the stats on D were collected via the Testery process. Note also how Small gives D a rather simple operational interpretation (compared to the dechi conception): it is “pseudo-ciphertext”. Having read the Tunny report, I always thought of it its key-centric conception: that key from which they had removed multi-p6eriodic Chi. Small focused on later-GCCS conceptions, after the era of using key depths.

Now, we must remember our weight classes that extend the model, above. For in the later period where one really is filtering cipher (rather than key), the channel model also considered the cross-overs for each weight class (allowing colossus then to search and count-up for those filtering criteria, specifically). This extended model reminds us of a q-ary symmetric channel, where q is the number of weight classes in the energy spectrum rather than q=26, or q=2. Whereas a basic q-ary channel is like a group whose normal subgroup allows one to formulate cosets partitions (think weight classes, now), now considser the quotient group formed when the cosets are themselves the group elements, giving one a q=6 (or so) cosets/weight cayley-table/graph – and a custom cross-over model.

Now, despite having done a great job on motivating 2 of the liens in the table, Small doesn’t focus on the remaining lines (though prints them):

image

Now what small’s particular phrasing does do is point out the “pairwise” counting process – which reminds me for the first time of the counting theory used in Turingismus (and repeat rate for pairings, in Enigma hand methods search for depths). One thinkgs of Delta’ing in terms of its counter-measure rationale (since it was built into various of the machine’s countermeasures’ security-strength concepts, recall). But here, the concept is “simple observation” phenomenon, usable as is Banburismus and Turingismus decoding or sieving strategies for max likelihood ordering.

image

With weight classes from JUST the upper set denoted, (partially) matching the weight table above.

While the colossus room might have been focused on leveraging “fits with distributions” to the filtering-problem, its interesting to see the comment concerning hand methods:

image

Newmany-centric folks don’t seem to have captured that well, in their post-war reporting, just how much stats tables (and scoring presumably) was driving “anagramming”. The associated “method” material seems to be still largely withheld. But, we can think of it as the means of choosing the univariables  ( g(ki) ) in the BP method, so as to project the de-chied cipher (into its two components).

Small conveys the relevance of the conditions expressions nicely, building the rationale on top of the stats of de-chi’ed cipher (which he calls D). These expressions do not have to align in any way with the nature of the weight classes, note; being simple stats.

image

The next table, and his phrasing, is also instructive. Rather than think that using a pair of conditions is about applying 2 filters SEQUENTIALLY when focussed on 1 and 2 streams say (a fact which requires the factor/product graph to account for), he is simply saying that the / means one can do 2 runs at once (usefully, too).

image

image

The particular  case he points out is super-interesting, as one makes settings on the colossus panels to represent the condition(s). One sees 3 conditions applied: count/filter for 4 being dot or cross, in parallel count 5 xor 4 and count 3 xor 4, filter ing the latter two for having value dot and cross respectively.

Now, what small does not talk about is the notion of sigmage  in terms of it being the ratio of the bulge beta against the SD for the random noise case. If I’ve got it right, beta varies by conditional expression, and of course the gaussian SD is obtained from the counting a random tape (enabling the table to be specified). So how did they get the betas per expression (just count the excess for dots, perhaps ?)

Now, obviously, the table of expressions are is sorted by “expected sigmage” about which Small talks in plain language:

image

But it’s the next few phrases that are especially interesting – given he is delivering a “tutorial” on the “cryptographic art” to his colleagues back at Arlington Hall. We should think of variance here as “uncertainty” measure – to give it an operational meaning – that itself reflect the impact of that tiny (phase-space) oscillator due to the the bulge. Under such interpretation, the variance of the run is the (linear) sum of the uncertainties from the characters specified by the conditional expression – which are NOT weight classes, but unique little non-linear truth-tables/functions:

image

Under such an interpretation the non-linear truth table (of a run) really is a graph, with weighted arcs from the weight classes to the target filter – with the weighting assigned to the arcs reflecting the degree to which the particular weight class impacts the stats (through its representatives referenced by the expression). Of course, a class might be just that which is know to be significant:

image

Im not sure from he then gets 16, when he then talks about 16 messages:

image

But it seems clear that he is essential refererring to each line in the truth table, that  substantiates the estimated sigmage of the run class. Perhaps 16 = 2**4, since there may be 4 impulse streams involved?

when we follow his tutorial looking, post X1X2 setting, at X3X4X5 setting mid-process, how does colossus know, for the red-noted material, say, to ONLY print the setting of K3!!?

image

 

The ultimate smell test of all assumptions and their consequences are tested by human intuition, in the 32 count: the smell-iness factors against which are nicely expressed by Small:

imageo

As Good said, it’s the gray code ordering that enables an inference from the error pattern – to predict which settings of the 5 were not  correct (so go and look at the next best scores…).

It would be fascinating to know how folks were trained in all this…or rather, how their experience and abilities to apply smell tests was “pre-honed”. That sounds like a Turing trick!

But it all just reminds me of today’s home-fiasco. Someone wanted to rush out and buy a new water heater, since there is so little hot water for days now (at 15F outside!). Having whined earlier and turned up the dial to max, it turns out someone turned it all the way down (not all the way up) – on the assumption that you turn a dial clockwise… to raise temp. Hmm. Given the probability of exactly such a binary error – when humans are involved – is high, how did such human frailties affect operator training and abilities!?

Be interested to see the original now, with its original notation.

Posted in colossus

BP 1941 and Lincoln 1953

image

Image | Posted on by