The ACM hosts forum in which lectures from its Turing Award winners are delivered, to the illustrious. See http://amturing.acm.org/lectures.cfm.
In short, only in 1965 was the domain of “computational complexity” invented (by an American, of course). And so say we all, the also illustrious. But is it true? Lets look at history, and some of the first “codes on graphs” from Turing himself. TO be fair, the author only implies that he was the first to “call” something by a name (that gets popular). But, why do we care (who [also] named something, or who [also] called it something else)?
Now, I accept – since I got to study it (as an awful-grade student) – that the American automata-analytical world for studying complexity was rigorous, and is useful. But, it always felt like a refined-academic model, designed for analysis (and the pursuit of pure research). For that purpose, it seemed great (and largely beyond me). I do remember the thrust of my 6 week course in complexity analysis — that hinted at the type of Turing machines discussed below. But the hint was very strained…behind layers of theory supporting only a “mathematician’s analysis”. We were studying the language of analyzing complexity, rather than complexity as found by engineers in actual decoding graphs automated by little machine types. We were computing 1Z + 2Z = 3Z in group theory, rather than just saying 3 = 1 + 2 to a 5-year old learner.
Let’s see what happens when we go back in time, and talk to the Turing machine itself.
In 1948, we see Turing himself taking a very straightforward approach to measuring work for one particular highly generalizable process (matrix math, supporting the operator theory and the world of digital filtering/transformations):
See Off-print, ‘Rounding-off errors in matrix processes’ from The Quarterly Journal of Mechanics and Applied Mathematics, (Vol. I, Part 3, Sept. 1948).
Now, we also remember Forney hinting at the same general approach of counting primitive operations (implementing a custom algebra, or algorithm, tuning up a decoding machine). Decoding is of course rather more general, than the analysis centric automata-theory for strings. Everyone time I read Forney’s material, I keep thinking: he is (now) characterizing all that Turing didn’t actually say about late 1930s and early 1940s design work (but was thinking, albeit probably in different analytical frameworks based on permutation theory).
We know from Forney’s MIT course lecture notes (see here, chapter 6) that in the 1950s folks were counting operations (what a surprise) for the world of decoding. That world was very much about engineering (applying math), rather than analytical math for its own sake.
David Forney, MIT, lecture notes
This study of decoding is interesting because its 1950s relevant and seems to capture the shannon/turing/newman/good “cryptanalytical” spirit – the end of the 1930s work on foundations of math that grew into concrete math-engineering (with L2 spaces, complex sphere, hypercubes, and graph based handling of quantum-state representation/measurement). We seem to have lost so much in computer science (as folks trawl through endless notational analysis about, ultimately, strings). Strings seem so boring, compared to time and frequency! We have engineers talking about 1930s concepts, of metric spaces, normed spaces, orthogonal transforms, etc.
When talking about moderate complexity decoders, Forney discusses maximum-likelihood decoding, analysing complexity in terms of states versus branching. Wasn’t Turing interested in “branching” in lattice generators, too (i.e. phyllotaxis)? Were not the imagined machines and real computing devices of 1947 very much interesting in the cost of branching?
Later on in the course, once the trellis has been generalized to general graph theory, don’t we see a design method being taught, that reduces the graph specifically in a manner that does tradeoffs between performance and complexity – thereby characterizing different complexity classes? Don’t we see analysis of what happens when we decode convolutional vs block codes – in terms of complexity?
Getting back to matrix methods and errors, doesn’t this all remind one of the (weight-centric) of the union bound estimate world (capturing the relationship between coding gains, etc and Gaussian error Q function)?
Anyways, it seems pretty clear to me that the concepts and tools for considering the complexity of “computational graphs” was in evidence long before 1965. When we look at the topic of handling codeword generators, we recall of course the founding work:
Extract, ‘On computable numbers, with an application to the Entscheidungsproblem’ from Proceedings of the London Mathematical Society, (Ser. 2, Vol. 42, 1937); and extract ‘On Computable Numbers, with an Application to the Entscheidungsproblem. A correction.’ From Proceedings of the London Mathematical Society, (Ser. 2, Vol. 43, 1937).
What do we see Turing doing (apart from giving a tutorial on m-configs vs configs)? We see him using a first “universal” machine that interprets an instruction on a tape (that happens to be a binary string with form), which outputs a result that is now itself treated as an instruction, which generates it successor codeword – which just happens of course to be yet another instruction (to this limited universal machine for this instruction-type) to generate its successor…
What is fun is to see the “purpose” of the begin state – to write the seed instruction. We also see the “purpose” of the final state, to reset the pointer to the initial assumption about the metastate concerning the instruction under decoding (a condition that the seed instruction writing initially also satisfies, of course).
Now, given the obvious representation of the m-configs + “arcs” in a graph derived from the operations performed in the output transition that turing didn’t count his nodes, and his arcs, cosndiering their complexity? And would he not have compared that decoder design to another, his own u-machine spitting out the m-machine just discussed, interpreting the m-machine (in order to have it then do the job above)? Is it not obvious that the complexity of that meta-meta-machine – in terms of nodes and arcs – is very different to the simple decoder?
I struggle to believe that Turing was not thinking in terms much as we are all taught to think today, about complexity of code generators. Turing simply called a codeword a computable number. His “tables” of m-configurations are generators – that happen to be recursion-class formulations . Just as one can have recursions chains within recursion chains, one has orthonormal representations of a sequence, each sampled by one of the recursion chain – which is an embodiment of a particular rademacher function (a generrator line with a particular number of sign transitions). We are really doing multi-dimensional sampling, a la PAM.