CSC430Angela
Contents
 1 The Limits of Computation under the Turing Machine Model
 1.1 Possible Structuring of Investigation
 1.2 Computabulity
 1.3 Complexity
 1.4 Machine Learning
 1.4.1 Notes on Mitchell [JOR: 8Feb11]
 1.4.2 Notes on Solomonoff [JOR: 8Feb11]
 1.4.3 Notes on Burgin and Klinger [JOR: 8Feb11]
 1.4.4 Notes on Hutter [JOR: 8Feb11]
 1.4.5 Notes on Reviews of Hutter's book [JOR: 14Feb11]
 1.4.6 Solomonoff Induction
 1.4.7 Models for Machine Learning
 1.4.8 Learning of Regular Languages(6Mar11)
 1.5 Intro/Conclusion Ideas (6Apr11)
The Limits of Computation under the Turing Machine Model
 Angela Tosca, AC'11
 Senior Honor's Thesis
 Advisor: Joseph O'Rourke
Possible Structuring of Investigation
First added by JOR 24Sep10, and subsequently updated. Quite tentative....
 Computability
 The ChurchTuring Thesis for idealized human computation
 What is physically computable, computable with all we know about physics. E.g., computing with DNA.
 Including speculation about what we don't yet know about physics (black holes, relativistic speeds, etc.)
 Can computers model physical processes? (Not the same as what can be computed using physical processes.)
 MartinLof random sequences are not computable.
 PourEl & Richards? (Discussed by Pitowski, and by Copeland)
 Brownian motion (see below)
 Complexity
 Distinguish between the tractable (P) from the likely intractible (NPcomplete and worse)
 Various models: quantum computing, DNA computing, molecular computing, membrane computing, etc.
Computabulity
Shoot, so that's that one susppoes.
Physical Limits
BION I'm imperssed! Cool post!
OM5eUg <a href="http://npccjoxfaxbw.com/">npccjoxfaxbw</a>
Philosophical Implications
What do we loose if we really can't compute beyond the "Turing barrier?"
Complexity
Various Models of Computation in Terms of Complexity
Model  Associated Complexity Classes  Relationships Among/Between Classes  





Deterministic Turing Machine 



Probabilistic Turing Machine 



Quantum Turing Machine 


Notes Triggered by Scott Aaronson's "Limits of Quantum Computers"
A few interesting points from the Scientific American article:
 p.1: Characterization of "magic computers" is quite revealing. What it would imply if we could solve NPcomplete problems quickly. Could search through all possible proofs or disproofs of the Riemann Hypothesis up to a billion symbols.
 p.4: "Blackbox" algorithms only speedup search by squareroot. Leads to widespread belief that quantum computers cannot solve NPcomplete problems in polynomial time.
 Back to magic computers: "we could simply ask our computer to search through all possible proof and disproofs containing up to, say, a billion symbols."
 Cook's "Can Computers Routinely Discover Mathematical Proofs?"
 Hartmanis' "Computational Complexity and Mathematical Proofs"
 Both make it evident that recognizing that a purported proof is correct is in NP. But wouldn't searching through an exponential number of purported proofs still take exponential time?
 I think the resolution is that Scott simply misstated the claim.
 Cook makes it clear that e.g., Goldbach's Conjecture can be phrased in Predicate Calculus. Then the Satisfiability question would resolve Goldbach's conjecture.
 But Cook also says that satisfiability question cannot be solved by bruteforce search. But if P=NP, then somehow there would be a way to resolve that NP question in polynomial time.
 One question remains: Would we also obtain a proof? Or just yes/no, this is or is not a theorem?
 Risking a posting to Theoretical Computer Science. It was answered nicely, and the ensuing discussion was enlightening.
That's not even 10 mineuts well spent!
Experimental/Nonclassical Models of Computation
That's really tihkning out of the box. Thanks!
Machine Learning
Notes on Mitchell [JOR: 8Feb11]
 Tom Mitchell, "The Discipline of Machine Learning," 2006.
 Little highend theory in this paper. More pragmatic, a snapshot of where the field is now, and some of the driving practical problems in the immediate future.
 "How can we build computer systems that automatically improve with experience...?"
 LongerTerm Research Questions:
 Neverending learning? Involves selfsupervised learning
 Relevance of machine learning to human learning?
 Programming languages for machine learning
 Computer perception and machine learning: multiple input modalities.
Notes on Solomonoff [JOR: 8Feb11]
 Ray Solomonoff, "Machine LearningPast and Future," 2009
 Begins by distinguishing between those models that employ recursion and those that do not.
 "MinskyPapert recursion barrier"
 Surpassed by many models
 Universal Distribution
 extrapolation of a sequence of binary sequences  includes all induction problems
 p.5: P_{M}: "if there is any describable regularity in a batch of data, P_{M} will find it, using a relatively small amount of data.
 cites Marcus Hutter [the most interesting work in this area, in my opinion]
 but P_{M} is not computable
 so: approximations
 p.7: "Any complete induction system cannot be computable."
 p.12: Cites Schmidhuber's work: "It uses Levin search over a Turingcomplete set of instructions to find solutions to problems."
Notes on Burgin and Klinger [JOR: 8Feb11]
 Mark Burgin and Allen Klinger, "Experience, generations, and limits in machine learning," 2004
 This paper is potentially interesting but I had difficulty extracting the essence without reading it in detail.
 Distinguishes three models:
 polynomially bounded Turing machines
 Turing machines
 inductive Turing machines of the 1st order
 p.89, Conclusion: "It is demonstrated that neither recursive no subrecursive algorithms give any increase in potentially learnable knowledge. Only superrecursive algorithms allow achieving better results in stratified learning."
Notes on Hutter [JOR: 8Feb11]
 Marcus Hutter, "Open Problems in Universal Induction & Intelligence," 2009 http://arxiv.org/abs/0907.0746
 AIXI model, historically based on Solomonoff's theory
 p.3, Contents is a good highlevel summary (using lots of jargon).
 p.6: "AIXI is an agent that interacts with an environment in cycles... In one cycle k, AIXI takes action a_{k} based on past perceptions...."
 p.7: "AIXI tries to maximize its total future reward... Since q [environment program] is unknown, AIXI has to maximize its expected reward... AIXI's actions a_{k} are limitcomputable via the expression above (all quantities are known)...In practice, AIXI needs to be approximated."
 p.9: "Solomonoff's induction scheme represents a universal (formal, but incomputable) solution to all passive prediction problems. (It appears that Hutter's main contribution is to include actions.)
 p.13: "a loophole in the proof of MartinLof convergence ... has been found...this loophole cannot be fixed."!!
Notes on Reviews of Hutter's book [JOR: 14Feb11]
I tried to find substantive reviews. The best I found is by Lee Carlson @amazon. Here are some useful quotes from the review:
 the `Solomonoff universal prior', which is defined as the probability that the output of a universal Turing machine starts with the string when presented with fair coin tosses on the input tape. As the author points out, this quantity is however not a probability measure, but only a `semimeasure', since it is not normalized to 1, but he shows how to bound it by expressions involving the Kolmogorov complexity.
 The author then studies the case where the probability distribution of the environment is known, in order to motivate the notion of a `universal algorithmic agent (AIXI).' This type of agent does not attempt to learn the true probability distribution of the environment, but instead replaces it by a generalized universal prior that converges to it. This prior is a generalization of the Solomonoff universal prior and involves taking a weighted sum over all environments (programs) that give a certain output given the history of a particular sequence presented to it.
 The process or program by which the machine does this is `optimal' in the sense that no other program can solve or learn significantly faster than it can. The machine is `universal' in that it is independent of the true environment, and thus can function in any domain.
Another Amazon review, by Shane Legg:
 (AIXI) is really just Solomonoff's model with an expectimax tree added to examine the potential consequences of the agent's actions
A review by Peter Ortega:
 Given the model, all possible futures can be simulated (up to a predefined horizon) by trying out all possible interaction paths. Essentially, a huge decision tree is constructed. Having this information, it is easy to solve for the best policy. Just pick at each step the action that promises the highest expected future reward. This is done using Bellman's optimality equations.
An interesting point made by Robert Jones:
 But there is experimental evidence that Occam's razor is incorrect. (The Myth of Simplicity, M. Bunge, PrenticeHall, 1963) See also, Machine Learning, Tom Mitchell, McGrawHill, 1997, pg 6566. Rather than saying that nature IS simple I believe that it is more correct to say that we are forced to approximate nature with simple models because "our" (both human and AI) memory and processing power is limited.
New papers by Hutter (15Feb11)
 http://arxiv.org/abs/1102.2468
 "Algorithmic Randomness as Foundation of Inductive Reasoning and Artificial Intelligence"
 Abstract. This article is a brief personal account of the past, present, and future of algorithmic randomness, emphasizing its role in inductive inference and artificial intelligence. It is written for a general audience interested in science and philosophy. Intuitively, randomness is a lack of order or predictability. If randomness is the opposite of determinism, then algorithmic randomness is the opposite of computability. Besides many other things, these concepts have been used to quantify Ockham's razor, solve the induction problem, and define intelligence.
 http://arxiv.org/abs/1102.2467
 "Universal Learning Theory"
 Abstract. This encyclopedic article gives a miniintroduction into the theory of universal learning, founded by Ray Solomonoff in the 1960s and significantly developed and extended in the last decade. It explains the spirit of universal learning, but necessarily glosses over technical subtleties.
Solomonoff Induction
(23Feb11).
 Here is a fewsentence summary of how it works: http://wiki.lesswrong.com/wiki/Solomonoff_induction.
 Here is a longer explanation: http://singinst.org/blog/2007/06/25/solomonoffinduction/
 "Solomonoff induction predicts sequences by assuming they are produced by a random program."
 "This means if a sequence is generated by short program it is likely, and if it is only generated by long programs it is unlikely. This is a form of Occam’s razor: simpler sequences (i.e. those described by short progams) are given higher prior probability than complex (i.e. long description) ones."
 "For any given sequence, there is a finite set of programs which generate it (...). The probability the series begins with this sequence is the probability any of these programs are generated,..."
 Here is a long (28page) explanation, Solomonoff Induction by Shane Legg, 2004: disSol.pdf. Unfortunately, highly technical.
 SInduction is not the same as Minimum Description Length (MDL) induction, I think.
Models for Machine Learning
This link may be useful, as it attempts to summarize the strengths and weaknesses of all extant machine learning models: Models of M.L.. Does not mention MDL (minimum description length).
Learning of Regular Languages(6Mar11)
 Regular languages are equivalent to DFAs, so the literature phrases it either way.
 Quotes from the first paper cited below.
 "Exact learning of a DFA from a finite set of positive and negative examples is NPhard."
 "Efficient learning of DFA is a challenging research problem in grammatical inference. It is known that both exact and approximate (in the PAC sense) identifiability of DFA is hard."
 "We prove that the class of DFA is efficiently learnable under the PACS (PAC learning with simple examples) model wherein positive and negative examples are sampled according to the universal distribution conditional on a description of the target concept."
 Found some other papers that I did not yet investigate. The 2010 survey on "Grammatical Inference" seems the best bet. (9Mar11) Oops, turns out that survey is 1986, so I didn't even look it up. Found another one, 2004, which is proving useful. The one by De La Higuera, last item below.
@article{DBLP:journals/ml/ParekhH01,
author = {Rajesh Parekh and
Vasant Honavar},
title = {Learning DFA from Simple Examples},
journal = {Machine Learning},
volume = {44},
number = {1/2},
year = {2001},
pages = {935},
bibsource = {DBLP, http://dblp.unitrier.de}
}
@article{de2005learning,
title={{Learning contextfree languages}},
author={De La Higuera, C. and Oncina, J.},
year={2005}
}
@article{lee1996learning,
title={{Learning of contextfree languages: A survey of the literature}},
author={Lee, L.},
journal={Techn. Rep. TR1296, Harvard University},
year={1996},
publisher={Citeseer}
}
@article{fu2010grammatical,
title={{Grammatical Inference: Introduction and Survey: Part I}},
author={Fu, K.S. and Booth, T.L.},
journal={Systems, Man and Cybernetics, IEEE Transactions on},
number={1},
pages={95111},
issn={00189472},
year={2010}, %Originally 1986
publisher={IEEE}
}
@article{de2005bibliographical,
title={{A bibliographical study of grammatical inference}},
author={De La Higuera, C.},
journal={Pattern Recognition},
volume={38},
number={9},
pages={13321348},
issn={00313203},
year={2005},
publisher={Elsevier}
}
Intro/Conclusion Ideas (6Apr11)
Introduction
There are three boundaries that limit computations:
 The Turingcomputable functions vs. undecidable problems.
 P vs. NP: feasible computations vs. likely infeasible (but still Turingcomputable).
 The class of learnable languages.
These three are listed in an order that (a) reflects a broadtonarrow scope, (b) the historical development/investigation of each, and consequently (c) how settled are those boundaries.
 The first boundary is the focus of more than half this thesis. It goes back to Turing's 1936 paper, and although it continues today (relevant papers have been published on this topic as this thesis was being written), there appears to be general agreement on where the line falls and its significance. [Expand...?]
 The P vs. NP boundary was defined in the 1960's and 70's, and although there is a widespread belief that P ≠ NP, mathematians and computer scientists seem far from proving that as a fact. The introduction of quantum computation in the 1980s(?) has raised the possibility that, even if P ≠ NP, still some NPcomplete problems might be feasibly attacked on quantum computers. The second half of this thesis explores these computational complexity issues.
 Finally, there is intense activity since the 1990's to today in the area that might be labeled the theory of machine learning. This topic is very much unsettled, with several models in play to capture what can be learned. Here I've not been done more than scratch the surface of this complex topic, so this is left for "future work."
Future Work: Machine Learning
 One prominent model was proposed by Leslie Valiant, the PAC model, standing for Probably Approximately Correct. The idea is that it is too stringent to demand exact learning (say, of the grammar of an unknown language, when presented with samples). Rather it makes sense to declare success for an algorithm that with arbitrarily high probability, can achieve an approximately correct answer, to any degree of approximation desired. After intense study, it seems that this model is still too stringent, in that some languages that humans seem to be able to learn, are NPhard to learn under the PAC model.
 For example, exact learning of regular languages (i.e., DFAs) is NPhard, and even PAClearning of regular langauges is NPhard. However, researchers have explored variations in how the samples are presented to the learning algorithm (distributions, counterexamples, etc.), and can prove that under some conditions, learning of regular languages is in P.
 This is an example illustrating that there seems to be a search for the ideal model.
 The above could be characterized as a search for what can be feasibly learned. Another approach, epitomized in the work of Solomonoff and Hutter, works from the other end. They define universal models that can essentially learn anything, but the models go beyond Turing computability to achieve this universality. The work in this domain starts from these very general models and tries to see what can be accomplished if we insist upon Turingcomputability, or even computations in P.
 Perhaps the future will see a meeting between the bottomup approach (trying to define what can feasibly be learned) and the topdown approach (progressively weakening a universal learning algorithm).