## Entanglement and Turing’s limiting distribution

In his lecture on entanglement, Susskind helps use more with Cryptanalysis and cryptography – as thought about in terms that Turing probably used. For in the lecture we see presented, by a master theoretician who states clearly the conceptual motivation for his proof steps for theorems, the basic elements of entanglement. If you first think in these terms to get the elements of a proof, then the symbols will easily flow.

Having already shown that he can build an order 168 group based on the 7-point geometry out of a series of enigma style drums/rotors and permutation group theory representation properties, in his “On Permutations” manuscript Turing then introduces probability distributions and convolutions.  His distributions assume such properties as those of g() – alphabet characters introduced to the first wheel from a sample (whose frequency counts…define g()); and such as f() – the output distribution of the wheel, for each output wire.

Towards the end of his argument, as he seeks to prove properties of limiting distributions, he introduces a “one-dimensional projection argument” that we struggled to understand. As we see below, I suspect Susskind outlines the theory behind it, for us, in what is elementary quantum mechanics math concerning the relationship between wave functions, probability densities, and real function analysis (open sets etc).

First, Susskind gives us standard definitions concerning expectation values calculated by linear operators. To do so, he expands the bra and the ket in terms of a set of chosen basis elements such as label-a’ for a bra (and label-a for a ket).  He goes out of his way to use the term “label” for such eigenkets/values, hinting at automorphism group properties perhaps (where automorphism groups are essentially about re-labeling). However, Susskind does all this elementary expansion in a particular context, one in which the state is composite – “one sub-system state from alice and another from bob. Thus, the basis element for Alice (associated with the bra variety of the state) is a’ (for want of a suitable symbol name, and using the apostrophe to hint at it being a bra basis label); and that for Bob (associated with the ket) is a (to hint by also being an a that it aligns with the label used by Alice, but its of of ket variety (having no apostrophe convention)). Obviously, there could be several basis terms, such as ab, or a’b’, if both Alice and Bob use multiple labels each, for their bras and kets basis components.

Then he rearranges the now-expanded terms of the second line, above. He collects the wave functions together, and then recasts the L with the <a’| and|a> in a single matrix term, L.a’a – the matrix form of the averaging operator.

Earlier Susskind motivated both ‘wave functions’ and matric form of operators such as La’a, as summarized very simply. First the “wave function” is just that function which calculates the expansion terms; nothing more. That the various terms of such a function may be themselves operated on by “macros” that produce additional expansion terms as the combinatorics of the matrix operation acts (a bit like waves cohere to produce ripples) is incidental (but a useful analogy).

As he says, and I paraphrase, if you know the wave function and the state vector, one has a “computational medium”. For Susskind, good candidates for a save function involving say 3 basis states/labels (e.g. ‘a’, ‘b’, ‘c’ for ket conventions) would be the up/down, right/left, in/out eigenvectors of the half-spin model (the bloch sphere), giving a wave function of the form ψ(abc) – that “acts on” the (multi-label) basis vector |abc>.

Susskind motivates introducing the matrix form of an averaging operator by definition. If one has an L operator act on some ket (whose label-a), and whose result is projected on a bra (of label-a’), one gets a matrix defined as La’a – just a notational change from the braket “wave function” style of statement.

So lets summarize the “elements” – said without fanfare. What matters is you have a wave function, that is a macro-style formula that outputs expansion coefficients, given an input of an earlier round of such expansion coefficients. You have a class of “expected” value operators, acting both right and left on bra and ket, and their associated basis labels. And you have a matrix notation for such averaging operators, in which the basis labels are the indexers into of the rows and columns of the matrix. It may take 2 labels to index a row (column), if one is using multi-label wave functions for each bra/ket. A good example of a multi-label world is one in which Alice has label a, and Bob has variable b. Their “shared” matrix is just indexed as Alice’s a rows, by Bobs b columns. If there are two terms for Alice (and Bob), then one might have a “shared” matrix indexed as Alice’s a’b’-named individual rows, by Bob’s ab-named columns.

Next, we see a higher-level projection theorem being applied to fashion “composite states” and their associated “composite wave functions”. In the composite form of the matrix, each row/column requires two indexing terms. It can be projected then onto Alice’s view of the hilbert space (as it can for Bob’s) – which indexes her matrix using only 1 indexing term (not really knowing about Bob). If one has a matrix averaging operator L acting on the composite state (a two term formula, for both bra and ket), note that one needs a similar 2-term form for the projected-onto space. But, though one keeps the form, the wave function component loses an indexer basis label… (either Bob’s or Alice’s depending on perspective) – which get re-attached to a throw-away “identity-kronecker delta” operator that acts (in a no-change fashion) nominally – on nominal variables of the other party.

ok. So we have a world of quantum computing.

Next, Susskind eliminates the need for even thinking about “wave function collapse” by simply modeling the observer (doing measurement acts) as a third components of the now the (now 3 subsystem) composite system. Of course, one could retain a 2-party (composite) system in which (each) one of the parties acts as the observer.

Susskind then moves onto product states, giving us a good feeling for why we should care (i.e. it makes no difference, providing one has a product). Things change little except that the composite wave function is clearly factorized into 1) Alice’s wave function, and 2) Bob’s wave function. Both terms are retained as one projects onto each of the 2 constituent Hilbert spaces. However, the (summed over all bra and ket basis labels) terms due to the partner’s wave function reduce to 1.

Now, the value of focusing on the product state – a product of two wave function factors of a composite wave function – is shown when next considering a product of two expectation operators, similarly. One gets to the notion of correlation.

For a product of operators, Susskind has us focus on something “very Turing” considering the difference (even semantically) between a product of averages (<L><M>) and a average of a products (<LM>).  A major point is that when these two are difference, one has “encountered” entanglement – and the difference in the measure is know as a correlation. A non-zero measure implies entanglement.

We also get a glimpse of what we might call a Tunny-moment, in the same sense. Susskind notes how the average of certain operators is all about either <ud| having 1/2 probability (else |du> also having said probability). What he also says also rather relevant to crypto: in that despite knowing everything about the composite system, one knows nothing about each component system.

He builds on this theme to introduce intuition about the correlation measure – as it relates to entanglement.  He want to point out that the average of the product is –1 (whereas the average of each individual expected value is 0). he goes to quite some trouble to point out how in some cases knowledge of one state gives information about another (related) state, but not in others. This is very different to classical world, where knowledge the whole gives knowledge of the components (and any combinations of components), and in which the “amount” of information is “higher” furthermore.

This all leads up to a crypto principle, nicely characterized in terms of entanglement (and the difference/similarity ideas we already saw in dots vs crosses, or dots versus random, in Tunny theory).

For every direction (component) you take (sigma.x, sigma,y, sigma,z, principally), the average is zero. (50.10 minutes). Upon entanglement, there is no direction that is definite.

Now, when I heard that, for the first time late one night, I immediately thought of DES-  in which the two “states” being entangled were the DES key and the DES plaintext. Their “product” (i.e. the ciphertext) has average 1 in any direction. Now, I find that a lot more “tenable” that stuff about affine planes, etc. Furthermore, it is the very non-commutativity of the algebra which means that any one of alice or bob cannot know sigma.x and sigma.y at the same time – which prevents them, verily, from knowing the linear combination of sigma.x, sigma,y, and sigma.z. Furthermore, the very energy process involved ensures that this means that poles attempt to make themselves anti-aligned (in every direction).

Finally we get to the notion of the “probability density” (which is the distribution k, in Turing’s model). Note how there are two “terms”, with common b, but different a/a’, before the density.

Having introduced the notion, he notes how the density might be factorized, in which case one notes how (from Alice’s perspective) there is no contribution from Bob:

Note then then claim: Alice’s probability density matrix has one and only one eigenvector (namely Psi(a)) with a non-zero eigenvalue.

The original wave function is an eigenvector of the density matrix, with eigenvalue = 1. Considering next all eigenvectors that are perpendicular to this one, then the eigenvalue is zero. That is we have a single path (in the cayley graph).

When we are no dealing with product states, then there are multiple such eigenvectors (with value 0). This all tells you about the “measure” of entanglement. The more eigenvectors there are with eigenvalues that are non zero, the “more” entangled it is.

That which is the maximum is of course, Turings infamous uniform distribution, in which there are a max number of eigenvectors, each with eigenvalue 1/n (being a surrogate for maximum amount of entanglement).

And we get to see why he chose k, as his right hand term (being the eigenspace).