Probabilistic Turing Machines

Reading

Read section 10.2 in Sipser (pp. 368-380). Skim Lemma 10.8 and the section on read-once branching programs.

Reading Questions

1) When would a Probabilistic Turing Machine (PTM) be useful?

2) What algorighm does a PTM choose which non-deterministic branch to follow?

3) What is the amplification lemma? Why does it work?

4) Give a rough argument for why the amplification lemma works.

5) Draw a diagram showing the relationship between NP and BPP. (Is BPP a subset of NP? Is NP a subset of BPP? Are the sets disjoint? etc.) Explain the relationship depicted by your diagram.

6) In the definition of BPP, why is the error probability cap set at 1/3?

7) p is a prime number ____ a^(p-1) mod p = 1. What goes in the blank, =>, <=, or <=>, and why?

8) What is it important to define the complexity class RP?

9) An open problem in computer science states is proving or disproving the hypothesis that P=BPP. Describe what this hypothesis means in English, and explain why this would be significant.

Answers to Reading Questions

1) A PTM is useful when a problem when estimation will lead to a fairly accurate result. PTMs allow for only one branch of non-determinism to be computed instead of all of them, reducing a problem from NP to P, though introducing a probability of error.

2) A coin flip. If a PTM encounters a non-deterministic junction with n possibilities, each branch has a probability of 1/n for being chosen.

3)The amplification lemma states that the error probability of an algorithm can be reduced to an arbitrarily small number by repeating the algorithm many times and averaging the results, provided that the error probability of any one trial is less than 1/2. It is useful because it allows us to make any probabilistic algorithm accurate to any desired degree of precision.

4) If an erroneous outcome is less likely than an accurate one, then it will be even less likely that an erroneous outcome will occur multiple times. Thus, by running more trials we can shrink the error probability to whatever we want.

5) The diagram should show that BPP is a subset of NP. A Probabilistic Turing Machine computes only one branch of a Non-deterministic Turing Machine. Any branch of a NTM take P time, so the probabilistic algorithm will run in P time. However, it is possible that the probability of a accurate outcome from the Turing machine will be less than 1/2 for a given problem, so it is not guaranteed that all problems in NP are in BPP.

6) 1/3 is an arbitrary choice: what matters is that the error probability cap is below 1/2. By the amplification lemma, the probability of an erroneous outcome can be made arbitrarily small if the error probability of a sing trial is below 1/2.

7) => goes in the blank. Fermat's little theorem suggests the forward direction. However, this statement is also true for some composite numbers for a given a. Therefore the statement isn't "iff".

8) RP (1-sided error) is an important complexity class to define because it acts as an intermediate for problems that fall exclusivly in P (0-sided error) and problems that fall exclusivly in BPP (2-sided error)

9) The hypothesis states that any algorithm with low error (in P-time) can be reduced to a problem with absolutely no error, without affecting run time. This would be significant because accuracy of algorithms would increase infinitely without affecting time-complexity whatsoever.

Lecture

Probabilistic Turing Machines

In-class Activities

1) Discuss reading questions as a group.

2) Go over lecture as a class, confirming reading questions answers along the way.

3) In groups, brainstorm problems that could be solved with probabilistic algorithms.

4) Discuss brainstormed problems as a class, implement one or two of them in pseudo-code on the boards.

5) Begin homework assignment.

Quiz

1) Suppose Fermat's little theorem was an "if and only if" relationship. (i.e. (a^p-1 mod p) would only equal 1 if p is prime). How would this affect the complexity of PRIMES?

2) You want to implement a probabilistic algorithm that decides CARMICHAEL, the set of all Carmichael numbers. Assuming that your algorithm uses Fermat and Carmichael tests, does your algorithm suggest that CARMICHAEL is in RP? Why or why not?

3) You want to design a probabilistic algorithm to decide 3-SAT. This is the algorithm you come up with

    • Input: a 3-SAT statement with n variables

    • Repeat the following process f(n) times, where f(n) is some function of n

      • Randomly assign a vale to each of the n variables

      • If the formula is satisfied, return true

    • If none of the tests resulted in satisfied formula, return false

Is this a good probabilistic algorithm? Why or why not? Think in terms of run-time and error probability.

Answers to Quiz Questions

1) The "iff" relationship implies that ALL prime numbers would pass Fermat's test, and prime numbers are the ONLY numbers to pass this test. Thus, the algorithm has 0-sided error, making it NOT a probabilistic algorithm. This would therefore show that PRIMES is in P.

2) This algorithm does NOT suggest that CARMICHAEL is in RP. A Carmichael number should pass most Fermat test and then fail some Carmichael test. If some Fermat tests fail, there is a probability that the input number is either a composite or a Carmichael number. However, if all Fermat tests pass, it is also possible that the input number is either a composite or a Carmichael number. Therefore, with this algorithm, there is two-sided error, so it suggests that CARMICHAEL is NOT in RP.

3) This is not a good algorithm. There are two possible cases to examine:

    • f(n) is on the order of a polynomial expression (or less). Since the number of possibilities of variable assignments is 2^n, the number of possible variable assignments will grow much faster than f(n) as n increases. This means that the error probability of the algorithm will increase as n increases (because the algorithm is based on randomly choosing a set of variables that satisfy the formula). Thus for large values of n, the error probability will be greater than 1/2, making the algorithm very unusable.

    • f(n) is on the order of an exponential expression (or greater). Therefore, the number of possible variable assignments would grow at the same speed or slower than f(n) as n increases. Thus the error probability would decrease or stay the same as n increases. However, since >= O(2^n) configurations of variables are tested, this algorithm does not run in polynomial time.

Either the algorithm is either prone to error, or it offers no complexity advantage over a non-probabilistic algorithm. Therefore it is a bad algorithm.

Homework

For this homework, you will be simulating a Probabilistic Turing Machine (PTM) that decides PRIMES with arbitrary accuracy. To start you off, I have provided primetester.py to start you off, as well as primetester_test.py - a module that tests the accuracy of the PTM.

primetester_test.py displays a plot of the probability that a PTM accepts a given input as a function of the number of Fermat/Carmichael trials, k, that the algorithm runs each test. We expect that accuracy increases as k increases. primetester_test.py will test a given PTM on three numbers - a prime number, a composite number, and a Carmichael number. If the PTM is designed correctly, the prime number will yield an accepting probability of 1, while the Carmichael and composite number will yield an accepting probability that approaches 0.

Note: you need Matplotlib installed for this homework assignment.

To do

1) Implement a version of PRIMES that ONLY runs Fermat tests and not Carmichael tests. After implementing this PTM, run primetester_test.py. What should the graph produced by primetester_test.py look like?

2) Now add Carmichael tests to your algorithm and run primetester_test.py again. What should the graph output look like now? Do you notice an increase in accuracy with the Carmichael numbers?

Please turn in code that contains a copy of your final algorithm, as well as the two graphs produced in steps 1 and 2.