## Turing’s argument on limiting distributions

For Turing, in his On Permutations paper, he uses the term “limiting distribution” in a manner that has confused me for a long time. In his world, each “evolution” of the wave function produces a limiting distribution. That is, there is 1 line for each evolution step (and each is known as a “limiting distribution”). Furthermore, the n’th limiting distribution, as the distributions evolve, eventually showcases the “properties” of the ideal limiting distribution: the constancy of the terms. For continued application of a certain and same generator (j, below) the output set is a shift of the input set, for a field of size s = 4.

Much of his argument now makes sense to me.

We have to look at H1 as the normalizer of the 7-point geometry – being the maximum normal subgroup. It’s the circle in the middle! It commutes with every other point. The phrase “1-dimensionality being 1” refers to the idea that we are concerned with points on the edge of space – (imagining the circle within the 3 straight lines of the geometry to be a “bloch” sphere inside a 3d pyramid). Even though the vector space addresses any point on or whithin the sphere, we are concerned only with those translations that keep a point on the edge on the edge, after application of the operator. The translation is “closed” – and the 1-d function being 1 basicially says: 1… if a member of the (edge) set  (and zero otherwise).

Now look at expressions of the form a.b-1 as the archetype-specification of a walk through p-adic branching space. a is a particular color and b is color found once the walker has branched as he walks forward in time. p, in p-adic, is the number of wheels in the cage – and is the “reach” of the branching network. One can view a walk in the p-adics as a walk backwards through time – looking back through the past history of the branching. Thus, we can see

as starting in the future at 1, and walking back n-1 steps to time n (in the past, but indexed as 1…n, just to be confusing). This gives us a visual inner product space, in which each line of the “v” is a padic number any 1 digit of whyich may be color a or color b (based on the directional choices taken as one branches). Unlike susskinds 2-ary branching space for qbits, Tutring is working in qudits (where upto n branch choices at any branching poitns are restricted to being 1 the k generators in the cayley graph overlaying the possible points in the group’s inner product space).

Now, when a limiting distribution k (each k being a line in the evolution table, recall) is acted on by a norm-scaler (φn) derived from the wave function (H1) to give the next evolution, lets recall k is also the next line.

Thus k (on the left, and having 2 parameters whose terms are past and future) is the successor distribution to the k on the right (of one term, current branching future)  which is evolved under the action of H1. Remember we are walking backwards, from the endpoint(future) to the common past…  The branching possibilities – characterized by the norm – are contracting, as we walk back towards the apex of the V – till they reach identity, where further walking reaches an accepting state.

Recall that φcan be viewed in 1-d numbers, which are thus scalers, which thus “re-scale”…the norm of each liming distribution line – as the wave function produces each evolution and induces the new (scaled) norm.

Turing’s argument then all makes sense!

1. since k (on the left) shall be k (on the right), obviously the factor must be 1 – when there is no mutual information between the past and future. It must be real, since the k vectors are real. It must be positive, furthermore, since we don’t want it to flip the parity.

2. alternatively, we can stay in group theory and recall that for circular groups, some number of actions (e.g. m of them) eventually bring you back to the starting point which means you have passed through the unit element (1).  Thus a series of m rescalings is the same as the rescaling due to the m’th element.

3. g(x) refers to the frequency of characters input to the wheel initially, recall (i.e. the plaintext, logically). In much the same way that operators act on wave functions (to product new wave functions), wave functions act on states to produce new states – which is what g is: the input state to the just-re-normed operator. Of course, the output state is… one of (or perhaps better, the next of) the limiting distributions. g is always positive (since at least one of the characters has non-zero frequency); so φ must therefore be positive – in order NOT to reduce the product to zero and NOT to flip the parity. For the very first limiting distribution (the starting distribution of the evolution) to be a k, derived from g, k == g, which means the factor is just 1.

4. Finally, and this was the hardest of all, this means g has to be constant – which means that each of the terms in g are equal (and sum to 1) – given the constraint that the product of k = 1.g  – which means g is a set of fractions… 1/n

its all constant through each coset of H1 because, as we learn next, we can use the very same character (generator) on the input tape each time as the amount of translation to apply to the H1 subgroup. Products of generators represent the different terms of H, which when applied to the H1 enumerate all the cosets…

ok. we can now move onto the last point in the argument:

here, he is justifying why the constraints upon the initial settings of the wheel drum matter – which is what H1 is all about. The basic point is that H1 is closed (see point 2). The orbit is restricted to subgroups – and the cosets must be subgroups.

so for an input a, and output b, b is just a function of the input (i.e. a is both input and the name of the “input-function”). b is any member of the group, possibly not in the H1 subgroup. However, the point is that the any action of 1 or more generator shifting overall parityone way must equal those shifting the parity the other way, for the operation to be closed. See 1. This is basically the unitary rule – in the funky world of conjugates and adjoints.

Now, looking at a particular generator, the i of the quaternion group is U1 in his later example, imagine an input tape in which of the multiple characters on the input tape they are ALL of value i – which means that i has ALL the probability of the input distribution g.  This is not a problem, since any generator to the mth power gives all other elements of H, anyways

So it makes sense when then he says.

that the nominal input character a is represented by the product of the subgroup and some generator to the mth power (see 1). Since this is the parity changing “function”, it must be matched by an a-1 of the same (adjoint) power.

Point 2 is also critical – and his original formulation was clearer. When g is the input distribution and g consists of a single (repeated) generator (i.e. i in the Quaternian group), then g(i) has all the probability (i.e. is 1). Obviously, all the other characters that might play the role of a each have 0 (since all is taken, already).

So, when we apply the i generator n times and consider the branching rules and where we can thus walk to, we have all the points in p-adic space that are reachable. Thus, for each possible branching path, we can count (as we did a year ago) which land where, and thus divvy up / sum up the probabilities of the possible path.

Since, point 3, all we have done is consume H1 (the commutative set of generator) and lots of I (yet more generators ), any place we CAN walk to (under the non-commutativity) g ets SOME probability, and is thus not zero, and is a function of ALL generators. In group theory terms, its in the coset H1.in, which we know is closed expression  so the resulting coset is the same coset.

his focus on calculating s concerns the way that there is a periodic cycle, formed by the commutator, as we saw at the outset, above.

Presumably, the point he was making to his audience was that it not the 8 terms of the quaternion group that really show interest but it’s the wheel wiring and its non-exceptional group that one wants, equivalently. Then, one has a larger space to walk through, which means a longer period of non-predictable generated-tape – to be used an additive cipher for 7 minutes of transcoded voice, perhaps. We see clearly, 4 * 2 is the 8 terms that his rules of finding suitable wheels exclude those generators (in permutation group land) which commute with half the number of p wheels.

It was useful to have seen the stuff of gramm-schmidt kernel, and the notion of operator norms, etc. One sees clearly how in Turings mind he was working with quantum theory, even as he used the (really “awful”) pure functional analysis math of the day …to talk about convergence etc. Fortunately, I got to see the final product of that, getting the engineers presentation (of all the abstract math reasoning) of just what is so fun about L2, in particular