# The Limits of Computation under the Turing Machine Model

Angela Tosca, AC'11
Senior Honor's Thesis

## Possible Structuring of Investigation

First added by JOR 24Sep10, and subsequently updated. Quite tentative....

1. Computability
1. The Church-Turing Thesis for idealized human computation
2. What is physically computable, computable with all we know about physics. E.g., computing with DNA.
1. Including speculation about what we don't yet know about physics (black holes, relativistic speeds, etc.)
3. Can computers model physical processes? (Not the same as what can be computed using physical processes.)
1. Martin-Lof random sequences are not computable.
2. Pour-El & Richards? (Discussed by Pitowski, and by Copeland)
3. Brownian motion (see below)
2. Complexity
1. Distinguish between the tractable (P) from the likely intractible (NP-complete and worse)
2. Various models: quantum computing, DNA computing, molecular computing, membrane computing, etc.

## Computabulity

Shoot, so that's that one susppoes.

### Physical Limits

OM5eUg <a href="http://npccjoxfaxbw.com/">npccjoxfaxbw</a>

### Philosophical Implications

What do we loose if we really can't compute beyond the "Turing barrier?"

## Complexity

### Various Models of Computation in Terms of Complexity

Model Associated Complexity Classes Relationships Among/Between Classes

Deterministic Turing Machine
• P (Polynomial Time)
The class of problems that can be solved efficiently by a DTM bounded in time by a polynomial function of the input length 
• NP (Non-Polynomial Time)
The class of decision problems such that, if the answer is "yes," then there is a proof of this fact, of length polynomial in the size of the input, that can be verified in P (i.e. by a deterministic polynomial-time algorithm) 
• NP Complete
The set of problems X in NP such that every other NP problem reduces to X. 
• PSPACE (Polynomial Space)
the class of problems that can be solved by a DTM that uses a polynomially bounded amount of space, regardless of how long the computation takes.
• EXP (Exponential Time)
the class of problems that can be solved by a DTM subject to a time bound O(2p(n), where p(n) is a polynomial function of n.
• L (Logarithmic Space)
the class of problems solvable by a DTM using a logarithmic amount of memory

• P != NP?
• L is a subset of P, which is a subset of NP, which is a subset of PSPACE, which is a subset of EXP
Probabilistic Turing Machine
• BPP (Bounded-error Probabilistic Polynomial Time)
The class of decision problems solvable by an NP machine such that (1) if the answer is 'yes' then at least 2/3 of the computation paths accept, and (2) if the answer is 'no' then at most 1/3 of the computation paths accept. (Here all computation paths have the same length.) .
Often identified as the class of feasible problems for a computer with access to a genuine random-number source. 
• RP (Randomized Polynomial Time)
The class of decision problems solvable by an NP machine such that (1) if the answer is 'yes,' at least 1/2 of computation paths accept (2) if the answer is 'no,' all computation paths reject.
• PP (Probabilistic Polynomial Time)
The class of decision problems solvable by an NP machine such that (1) if the answer is 'yes' then at least 1/2 of computation paths accept. (2) If the answer is 'no' then less than 1/2 of computation paths accept.
• It is clear that BPP contains P, but it is an open question whether P=BPP. Many researchers now conjecture that this is the case.
• One key finding that hints at this is the AKS Primality Test, which deterministically tests the primality of an integer. It was long thought that this could only be accomplished with random algorithms. According to the Complexity Petting Zoo, this is a key example of derandomization. 
• Aaronson  also refers to the 1997 paper by Impagliazzo and Wigderson, titled "P=BPP if E Required Exponential Circuits: Derandomizing the XOR Lemma" which he says ""gave strong evidence that P=BPP."
• According to the Wikipedia article on BPP, , polynomial identity testing is an example of a problem that is in both BPP and co-RP but which has not yet been shown to be a member of P.

Quantum Turing Machine
• BQP (Bounded-Error Quantum Polynomial-Time)
The class of problems solvable by a quantum computer. Formally: the class of languages decidable by a uniform family of quantum circuits (that is, each circuit is produced by a P machine) involving no more than a polynomial number of gates drawn from a specified set of universal quantum gates, subject to the same bounded error constraints as BPP. (Complexity Petting Zoo maintains that a description in terms of circuitry is more natural for describing quantum computing tasks than a generalized TM model). 
• According to the Complexity Petting Zoo, "Since QTMs are inherently probabilistic, BQP is a closer analog to BPP than to P"
• Also per above, "given the present state of complexity theory, we cannot hope for an unconditional proof that BQP is strictly larger than P or BPP. However, Shor's famous result that the factoring and discrete logarithm problems are in BPP provides evidence that BPP != BQP."
• The Complexity Zoo refers to the fact that NP is not contained in BQP. I expect to fill in more details about this statement from the Aaronson paper soon.

### Notes Triggered by Scott Aaronson's "Limits of Quantum Computers"

A few interesting points from the Scientific American article:

• p.1: Characterization of "magic computers" is quite revealing. What it would imply if we could solve NP-complete problems quickly. Could search through all possible proofs or disproofs of the Riemann Hypothesis up to a billion symbols.
• p.4: "Black-box" algorithms only speed-up search by square-root. Leads to widespread belief that quantum computers cannot solve NP-complete problems in polynomial time.
• Back to magic computers: "we could simply ask our computer to search through all possible proof and disproofs containing up to, say, a billion symbols."
• Cook's "Can Computers Routinely Discover Mathematical Proofs?"
• Hartmanis' "Computational Complexity and Mathematical Proofs"
• Both make it evident that recognizing that a purported proof is correct is in NP. But wouldn't searching through an exponential number of purported proofs still take exponential time?
• I think the resolution is that Scott simply mis-stated the claim.
• Cook makes it clear that e.g., Goldbach's Conjecture can be phrased in Predicate Calculus. Then the Satisfiability question would resolve Goldbach's conjecture.
• But Cook also says that satisfiability question cannot be solved by brute-force search. But if P=NP, then somehow there would be a way to resolve that NP question in polynomial time.
• One question remains: Would we also obtain a proof? Or just yes/no, this is or is not a theorem?
• Risking a posting to Theoretical Computer Science. It was answered nicely, and the ensuing discussion was enlightening.

That's not even 10 mineuts well spent!

### Experimental/Non-classical Models of Computation

That's really tihkning out of the box. Thanks!

## Machine Learning

### Notes on Mitchell [JOR: 8Feb11]

• Tom Mitchell, "The Discipline of Machine Learning," 2006.
• Little high-end theory in this paper. More pragmatic, a snapshot of where the field is now, and some of the driving practical problems in the immediate future.
• "How can we build computer systems that automatically improve with experience...?"
• Longer-Term Research Questions:
• Never-ending learning? Involves self-supervised learning
• Relevance of machine learning to human learning?
• Programming languages for machine learning
• Computer perception and machine learning: multiple input modalities.

### Notes on Solomonoff [JOR: 8Feb11]

• Ray Solomonoff, "Machine Learning--Past and Future," 2009
• Begins by distinguishing between those models that employ recursion and those that do not.
• "Minsky-Papert recursion barrier"
• Surpassed by many models
• Universal Distribution
• extrapolation of a sequence of binary sequences -- includes all induction problems
• p.5: PM: "if there is any describable regularity in a batch of data, PM will find it, using a relatively small amount of data.
• cites Marcus Hutter [the most interesting work in this area, in my opinion]
• but PM is not computable
• so: approximations
• p.7: "Any complete induction system cannot be computable."
• p.12: Cites Schmidhuber's work: "It uses Levin search over a Turing-complete set of instructions to find solutions to problems."

### Notes on Burgin and Klinger [JOR: 8Feb11]

• Mark Burgin and Allen Klinger, "Experience, generations, and limits in machine learning," 2004
• This paper is potentially interesting but I had difficulty extracting the essence without reading it in detail.
• Distinguishes three models:
• polynomially bounded Turing machines
• Turing machines
• inductive Turing machines of the 1st order
• p.89, Conclusion: "It is demonstrated that neither recursive no subrecursive algorithms give any increase in potentially learnable knowledge. Only super-recursive algorithms allow achieving better results in stratified learning."

### Notes on Hutter [JOR: 8Feb11]

• Marcus Hutter, "Open Problems in Universal Induction & Intelligence," 2009 http://arxiv.org/abs/0907.0746
• AIXI model, historically based on Solomonoff's theory
• p.3, Contents is a good high-level summary (using lots of jargon).
• p.6: "AIXI is an agent that interacts with an environment in cycles... In one cycle k, AIXI takes action ak based on past perceptions...."
• p.7: "AIXI tries to maximize its total future reward... Since q [environment program] is unknown, AIXI has to maximize its expected reward... AIXI's actions ak are limit-computable via the expression above (all quantities are known)...In practice, AIXI needs to be approximated."
• p.9: "Solomonoff's induction scheme represents a universal (formal, but incomputable) solution to all passive prediction problems. (It appears that Hutter's main contribution is to include actions.)
• p.13: "a loophole in the proof of Martin-Lof convergence ... has been found...this loophole cannot be fixed."!!

### Notes on Reviews of Hutter's book [JOR: 14Feb11]

I tried to find substantive reviews. The best I found is by Lee Carlson @amazon. Here are some useful quotes from the review:

• the `Solomonoff universal prior', which is defined as the probability that the output of a universal Turing machine starts with the string when presented with fair coin tosses on the input tape. As the author points out, this quantity is however not a probability measure, but only a `semimeasure', since it is not normalized to 1, but he shows how to bound it by expressions involving the Kolmogorov complexity.
• The author then studies the case where the probability distribution of the environment is known, in order to motivate the notion of a `universal algorithmic agent (AIXI).' This type of agent does not attempt to learn the true probability distribution of the environment, but instead replaces it by a generalized universal prior that converges to it. This prior is a generalization of the Solomonoff universal prior and involves taking a weighted sum over all environments (programs) that give a certain output given the history of a particular sequence presented to it.
• The process or program by which the machine does this is `optimal' in the sense that no other program can solve or learn significantly faster than it can. The machine is `universal' in that it is independent of the true environment, and thus can function in any domain.

Another Amazon review, by Shane Legg:

• (AIXI) is really just Solomonoff's model with an expectimax tree added to examine the potential consequences of the agent's actions

A review by Peter Ortega:

• Given the model, all possible futures can be simulated (up to a predefined horizon) by trying out all possible interaction paths. Essentially, a huge decision tree is constructed. Having this information, it is easy to solve for the best policy. Just pick at each step the action that promises the highest expected future reward. This is done using Bellman's optimality equations.

An interesting point made by Robert Jones:

• But there is experimental evidence that Occam's razor is incorrect. (The Myth of Simplicity, M. Bunge, Prentice-Hall, 1963) See also, Machine Learning, Tom Mitchell, McGraw-Hill, 1997, pg 65-66. Rather than saying that nature IS simple I believe that it is more correct to say that we are forced to approximate nature with simple models because "our" (both human and AI) memory and processing power is limited.

#### New papers by Hutter (15Feb11)

• http://arxiv.org/abs/1102.2468
• "Algorithmic Randomness as Foundation of Inductive Reasoning and Artificial Intelligence"
• Abstract. This article is a brief personal account of the past, present, and future of algorithmic randomness, emphasizing its role in inductive inference and artificial intelligence. It is written for a general audience interested in science and philosophy. Intuitively, randomness is a lack of order or predictability. If randomness is the opposite of determinism, then algorithmic randomness is the opposite of computability. Besides many other things, these concepts have been used to quantify Ockham's razor, solve the induction problem, and define intelligence.
• http://arxiv.org/abs/1102.2467
• "Universal Learning Theory"
• Abstract. This encyclopedic article gives a mini-introduction into the theory of universal learning, founded by Ray Solomonoff in the 1960s and significantly developed and extended in the last decade. It explains the spirit of universal learning, but necessarily glosses over technical subtleties.

### Solomonoff Induction

(23Feb11).

• Here is a few-sentence summary of how it works: http://wiki.lesswrong.com/wiki/Solomonoff_induction.
• Here is a longer explanation: http://singinst.org/blog/2007/06/25/solomonoff-induction/
• "Solomonoff induction predicts sequences by assuming they are produced by a random program."
• "This means if a sequence is generated by short program it is likely, and if it is only generated by long programs it is unlikely. This is a form of Occam’s razor: simpler sequences (i.e. those described by short progams) are given higher prior probability than complex (i.e. long description) ones."
• "For any given sequence, there is a finite set of programs which generate it (...). The probability the series begins with this sequence is the probability any of these programs are generated,..."
• Here is a long (28-page) explanation, Solomonoff Induction by Shane Legg, 2004: disSol.pdf. Unfortunately, highly technical.
• S-Induction is not the same as Minimum Description Length (MDL) induction, I think.

### Models for Machine Learning

This link may be useful, as it attempts to summarize the strengths and weaknesses of all extant machine learning models: Models of M.L.. Does not mention MDL (minimum description length).

### Learning of Regular Languages(6Mar11)

• Regular languages are equivalent to DFAs, so the literature phrases it either way.
• Quotes from the first paper cited below.
• "Exact learning of a DFA from a finite set of positive and negative examples is NP-hard."
• "Efficient learning of DFA is a challenging research problem in grammatical inference. It is known that both exact and approximate (in the PAC sense) identifiability of DFA is hard."
• "We prove that the class of DFA is efficiently learnable under the PACS (PAC learning with simple examples) model wherein positive and negative examples are sampled according to the universal distribution conditional on a description of the target concept."
• Found some other papers that I did not yet investigate. The 2010 survey on "Grammatical Inference" seems the best bet. (9Mar11) Oops, turns out that survey is 1986, so I didn't even look it up. Found another one, 2004, which is proving useful. The one by De La Higuera, last item below.
```@article{DBLP:journals/ml/ParekhH01,
author    = {Rajesh Parekh and
Vasant Honavar},
title     = {Learning DFA from Simple Examples},
journal   = {Machine Learning},
volume    = {44},
number    = {1/2},
year      = {2001},
pages     = {9-35},
bibsource = {DBLP, http://dblp.uni-trier.de}
}

@article{de2005learning,
title={{Learning context-free languages}},
author={De La Higuera, C. and Oncina, J.},
year={2005}
}

@article{lee1996learning,
title={{Learning of context-free languages: A survey of the literature}},
author={Lee, L.},
journal={Techn. Rep. TR-12-96, Harvard University},
year={1996},
publisher={Citeseer}
}

@article{fu2010grammatical,
title={{Grammatical Inference: Introduction and Survey: Part I}},
author={Fu, K.S. and Booth, T.L.},
journal={Systems, Man and Cybernetics, IEEE Transactions on},
number={1},
pages={95--111},
issn={0018-9472},
year={2010},  %Originally 1986
publisher={IEEE}
}

@article{de2005bibliographical,
title={{A bibliographical study of grammatical inference}},
author={De La Higuera, C.},
journal={Pattern Recognition},
volume={38},
number={9},
pages={1332--1348},
issn={0031-3203},
year={2005},
publisher={Elsevier}
}
```

## Intro/Conclusion Ideas (6Apr11)

#### Introduction

There are three boundaries that limit computations:

1. The Turing-computable functions vs. undecidable problems.
2. P vs. NP: feasible computations vs. likely infeasible (but still Turing-computable).
3. The class of learnable languages.

These three are listed in an order that (a) reflects a broad-to-narrow scope, (b) the historical development/investigation of each, and consequently (c) how settled are those boundaries.

1. The first boundary is the focus of more than half this thesis. It goes back to Turing's 1936 paper, and although it continues today (relevant papers have been published on this topic as this thesis was being written), there appears to be general agreement on where the line falls and its significance. [Expand...?]
2. The P vs. NP boundary was defined in the 1960's and 70's, and although there is a widespread belief that P ≠ NP, mathematians and computer scientists seem far from proving that as a fact. The introduction of quantum computation in the 1980s(?) has raised the possibility that, even if P ≠ NP, still some NP-complete problems might be feasibly attacked on quantum computers. The second half of this thesis explores these computational complexity issues.
3. Finally, there is intense activity since the 1990's to today in the area that might be labeled the theory of machine learning. This topic is very much unsettled, with several models in play to capture what can be learned. Here I've not been done more than scratch the surface of this complex topic, so this is left for "future work."

#### Future Work: Machine Learning

• One prominent model was proposed by Leslie Valiant, the PAC model, standing for Probably Approximately Correct. The idea is that it is too stringent to demand exact learning (say, of the grammar of an unknown language, when presented with samples). Rather it makes sense to declare success for an algorithm that with arbitrarily high probability, can achieve an approximately correct answer, to any degree of approximation desired. After intense study, it seems that this model is still too stringent, in that some languages that humans seem to be able to learn, are NP-hard to learn under the PAC model.
• For example, exact learning of regular languages (i.e., DFAs) is NP-hard, and even PAC-learning of regular langauges is NP-hard. However, researchers have explored variations in how the samples are presented to the learning algorithm (distributions, counterexamples, etc.), and can prove that under some conditions, learning of regular languages is in P.
• This is an example illustrating that there seems to be a search for the ideal model.
• The above could be characterized as a search for what can be feasibly learned. Another approach, epitomized in the work of Solomonoff and Hutter, works from the other end. They define universal models that can essentially learn anything, but the models go beyond Turing computability to achieve this universality. The work in this domain starts from these very general models and tries to see what can be accomplished if we insist upon Turing-computability, or even computations in P.
• Perhaps the future will see a meeting between the bottom-up approach (trying to define what can feasibly be learned) and the top-down approach (progressively weakening a universal learning algorithm).