Friday, January 9, 2015

Testing for Prime Numbers


On first glance, prime numbers seem pretty trivial; and yet, most of their more interesting properties still elude us. These numbers have been bugging mathematicians for as long as we've cared about them. All sorts of questions are still completely open:
(That last one is literally a million-dollar question, by the way.)

From a computer science standpoint though, the fact that we know so little about prime numbers is actually kind of useful. If I don't even know how the primes are distributed, I can't easily factorise a massive composite number into its prime constituents; not even if I used a computer. However the reverse problem – taking the primes and multiplying them together to get to the composite number – is much easier. Such 'one-way functions' make for awesome cryptographic algorithms, which are absolutely vital for digital security these days.

We're not going to worry too much about that end of things though, and instead consider the problem of determining whether a given integer is prime. (Inspired by the SICP exercises as always: the relevant Scheme solutions are 1.22, 1.24, 1.27 and 1.28.)

We'll be using good old Java for our programming language today as we explore three different ways to determine the primality of a number:
  • an exhaustive search of all the possible candidates,
  • the Fermat test, which uses Fermat's Little Theorem in a probabilistic primality test,
  • the Miller-Rabin test, a variant of the Fermat test that fixes one particular deficiency.


Our Common Test Interface


Now if we're using Java then we have ready access to all of the awesome techniques of the object-oriented paradigm. So first, we should decide how we want our classes and objects to look. Given that we'll be looking at three methods that all work to determine primality of a given integer, there ought to be some common behaviours that we can abstract away.

So let's construct a base class PrimalityTest. There are three main things our class will need to take into account:
  • We don't want to be too restrictive in regards to how big our primes are allowed to get. So we'll use Java's BigInteger class and the arbitrary-precision integers that it affords us, and write everything in the interface to work with this representation.
  • Each of the tests should have a method, isPrime(), that signals whether a given number was found to be prime or not according to the test. The implementation of the method will of course be dependent on the test that is run, but we can declare the method signature here.
  • And given that method, we can have a method timedPrimeTest() in this base class that times the test for primality and outputs the results in a pretty way. (This is a small example of the Template Method design pattern. The base class implements this method in its entirety, but the details of the testing procedure are handled by the derived classes.)

This base class for the tests is declared as follows:


 

The Exhaustive Prime Search


The simplest algorithm to solve a given problem is usually the one that enumerates the entire solution domain, compares each of them, and returns the best answer. This is the exhaustive search algorithm. The upside is that we will have 100% certainty that we've got the result we want (since we checked them all.) The downside is the efficiency of such an approach – often the solution domain is prohibitively huge, and the algorithm can find itself churning away for truly ridiculous amounts of time.

That said, it's as good a place to start as any. In fact even with the slightly-awkward semantics of the BigInteger class, the code pretty much writes itself here. We inherit from our PrimalityTest class, so we've got the timedPrimeTest() method already. All we need to do is provide an implementation for the isPrime() method.

In our case, to check whether a given number is prime, we simply check all of the potential factors (or divisors) one by one. Is 2 a factor? No.. then is 3 a factor? No.. then is 4 a factor? … and so on. Note that we only have to search up to the square root of the given number for factors (since mathematically, if there was a factor greater than the square root, then there would have to be another factor less than the square root which multiplies with it.)

The code for our ExhaustivePrimeSearch class is then complete:


 
We can now use this class to run some timed tests for primality:

     java ExhaustivePrimeSearch 1234514132413241
     //> Running test: ExhaustivePrimeSearch
     //> n: 1234514132413241
     //> Prime?: true
     //> Time taken: 11129 milliseconds.

     java ExhaustivePrimeSearch 1234567654321
     //> Running test: ExhaustivePrimeSearch
     //> n: 1234567654321
     //> Prime?: false
     //> Time taken: 5 milliseconds.


As you can see, it takes a lot longer to determine that a number is prime, rather than that it isn't. This makes sense: if a number isn't prime then it has a factor, and we stop any further checking once we've found it.

So our class here will always report whether a given number is prime or not correctly (for numbers greater than 1, at least.) However as the size of the number increases, the time taken to make this determination increases in general. We have O(√n) time here; to determine whether n is prime, we need to check at most √n – 1 numbers and see whether they are factors. This sounds good, but remember that prime numbers can be as large as we want; and the more interesting ones definitely will be.

The space-complexity is assumed to be O(1) – at each point we only need room for one more variable. (Note though that if we were to be completely rigorous, our use of arbitrary-precision integers would affect both of our complexity calculations here.)


The Fermat Test


To save time, we can use probabilistic tests for primality instead. This time around, instead of being certain about determining a prime number, we are almost certain.

Our tests will give us a certainty that a number is not prime (that is, if a number ever fails the test, we are 100% sure that it isn't prime), but we may have false positives: numbers that aren't actually prime that were nonetheless reported to be prime. To increase our certainty we can run more iterations of the test; thus trading running-time for accuracy (in general, this technique of iterative improvement is used in all sorts of algorithms.)

The algorithm for the Fermat test works using Fermat's Little Theorem:


All that notation means is that an is congruent to a modulo n – in other words, n divides evenly into an-a.

Given this, notice that when n is not prime, we should be able to find an a such that this congruence does not hold. So our algorithm does exactly that; we randomly pick a number a between 1 and n, and test the congruence. If it doesn't hold then we definitely don't have a prime number in n. If it does hold though, then we have some evidence that n might be prime.

We can repeat this test as much as we want. If n passes the test for all a < n, then we're pretty much guaranteed that n is prime (* with a caveat! More on that later...)

In terms of our code, this will be another class derived from the PrimalityTest interface. As before, we provide the implementation for the isPrime() method. But since we also need to know how many tests to run, we have a constructor that takes this amount as an input when the object is created. Then our isPrime() method simply repeats the Fermat test the given amount of times. The code for the Fermat test is encapsulated in the separate passesFermatTest() method.

Now since we might want more deterministic behaviour (ie. for testing purposes), we add in a special passesAllFermatTests() method that runs the Fermat test with all possible choices of a (where 1 < a < n).

(As the passesFermatTest() and passesAllFermatTests() methods are useful on their own and don't need the number of tests to do their job, I've made them static here, too.)

So here's the code for our FermatTest class:



We can then use this class to run some timed primality tests in precisely the same way as before with the exhaustive search code. We'll use 20 tests for each given number (this won't give us too much certainty, but keep in mind that the more tests we run, the longer it will take!)

     java FermatTest 1234514132413241
     //> Running test: FermatTest
     //> n: 1234514132413241
     //> Prime?: true
     //> Time taken: 8 milliseconds.


     java FermatTest 1234567654321
     //> Running test: FermatTest
     //> n: 1234567654321
     //> Prime?: false
     //> Time taken: 1 milliseconds.


We can handle larger numbers too in basically no time at all (the prime number was hand-picked from the list here)

     java FermatTest 6075380529345458860144577398704761614649
     //> Running test: FermatTest
     //> n: 6075380529345458860144577398704761614649
     //> Prime?: true
     //> Time taken: 41 milliseconds.


     java FermatTest 2738264293843432410989878979887787232331
     //> Running test: FermatTest
     //> n: 2738264293843432410989878979887787232331
     //> Prime?: false
     //> Time taken: 2 milliseconds.


The running time of these tests is tricky to calculate since it will be based directly on the time it takes the modPow() function to do its job. Given that algorithms exist that can perform the task in O(log n) time though, we will definitely have an improvement upon the exhaustive search method at any rate.

For those instances where we want to be exactly certain, we can use the passesAllFermatTests() method to run all of the tests for us. We can see that here in the following output:

     // Not prime: 50688 = 123 * 456
     FermatTest.passesAllFermatTests(BigInteger.valueOf(50688));
     //> false
       
     // is prime:
     FermatTest.passesAllFermatTests(BigInteger.valueOf(13451));
     //> true
   
     // is prime:
     FermatTest.passesAllFermatTests(BigInteger.valueOf(34217));
     //> true
       
     // isn't prime... 29341 = 13 * 2257
     FermatTest.passesAllFermatTests(BigInteger.valueOf(29341));
     //> true


Hold on, 29341 isn't a prime number...

     // isn't prime... 41041 = 7 * 5863
     FermatTest.passesAllFermatTests(BigInteger.valueOf(41041));
     //> true
       
     // isn't prime... 52633 = 73 * 721
     FermatTest.passesAllFermatTests(BigInteger.valueOf(52633));
     //> true

41041 and 52633 are not prime numbers either! What's going on?

Remember that caveat I mentioned? This is it. Those numbers are called Carmichael numbers. They aren't prime, but their unique property allows them to fool the Fermat test completely.

In reality, this is hardly an issue worth concerning ourselves with. Carmichael numbers, particularly large ones on the scale of numbers used in cryptography, are incredibly rare in occurrence. (One of the footnotes in SICP humorously puts this in perspective for us: if ever you come across a large enough number that fools the Fermat test, you've actually got a higher chance that cosmic radiation interfered with your computer's processor and threw off the calculations, than you do of having randomly landed on a Carmichael number.)

Of course, if we really want to be sure, there is a variant of the Fermat test that can't be fooled in this way...


The Miller-Rabin Test


Another probabilistic test that isn't fooled by the Carmichael numbers is the Miller-Rabin test. Other than that fact, it works very similarly to the Fermat test. In fact, we start by stating Fermat's Little Theorem in a different, but equivalent way:


The trick here* is to rewrite the exponent n-1 in terms of a power of two (2s) multiplied by an odd integer (d). With that done, we can then pick a random number a between 1 and n, and calculate ad mod n.

Then we iterate through squaring this result (to make it up to an-1 mod n). However, at each point we check whether we had a nontrivial square root of 1 mod n. If such a nontrivial root exists, then n cannot be prime.

[* Those who've seen the SICP solutions will notice that I've used a different algorithm for the Miller-Rabin test here. The procedure used in SICP turns out to be heavily dependent on the fact that its expmod procedure uses delayed operations under recursion, and so rewriting it as an iterative process is quite complicated. On the other hand, we can quickly find iterative algorithms for the Miller-Rabin test that are easy enough to both understand and translate into working Java code.]

In terms of our code, we've got a bit more work to do with this test. The structure of the code is very similar to the Fermat test one, except that for a given n, we need to first determine the representation n-1 = 2s * d. Since that's a relatively expensive calculation to be doing in every single test we want to run, we can move it out into a separate private method, findSDRepresentation(), that is invoked whenever we call the isPrime() or passesAllMRTests() methods.

A consequence of this though is that our object has state that it needs to keep track of, so we either have to pass s and d around as parameters (and the method signatures are already pretty long as it is), or store them as class fields (which I've done here). The downside of this is that it no longer allows us to make any of the methods static, so we have to instantiate a new class even to run the “all values” test. In that regard it differs radically from the FermatTest class we had earlier.

Then the main brunt of the work occurs in the passesMRTest() method. Here we work through the algorithm detailed above and return true or false depending on whether the test passes.

The class definition for our MillerRabinTest class is quite long, and not as readable as the others, but here it is:



We can then use it to run tests in the usual way:

     java MillerRabinTest 1234514132413241
     //> Running test: MillerRabinTest
     //> n: 1234514132413241
     //> Prime?: true
     //> Time taken: 9 milliseconds.


     java MillerRabinTest 6075380529345458860144577398704761614649
     //> Running test: MillerRabinTest
     //> n: 6075380529345458860144577398704761614649
     //> Prime?: true
     //> Time taken: 40 milliseconds.


     java MillerRabinTest 1234567654321
     //> Running test: MillerRabinTest
     //> n: 1234567654321
     //> Prime?: false
     //> Time taken: 1 milliseconds.

And more importantly, we can show that it doesn't get tricked by the Carmichael numbers:

     MillerRabinTest test = new MillerRabinTest(1);

     // Not prime: 50688 = 123 * 456
     test.passesAllMRTests(BigInteger.valueOf(50688));
     //> false
       
     // is prime:
     test.passesAllMRTests(BigInteger.valueOf(13451));
     //> true
   
     // is prime:
     test.passesAllMRTests(BigInteger.valueOf(34217));
     //> true
       
     // isn't prime... 29341 = 13 * 2257
     test.passesAllMRTests(BigInteger.valueOf(29341));
     //> false

     // isn't prime... 41041 = 7 * 5863
     test.passesAllMRTests(BigInteger.valueOf(41041));
     //> false
       
     // isn't prime... 52633 = 73 * 721
     test.passesAllMRTests(BigInteger.valueOf(52633));
     //> false

(The running time here is also tricky to determine, particularly since we have that extra work in factoring the powers of two from n-1. Other than that it will also depend directly on the speed of the modPow() function, as the Fermat test did. So we won't really worry about this.)


So we've seen some different ways to test for prime numbers here today. In fact, these are in active use! That BigInteger class we used actually has a method called probablePrime() that will tell us whether the integer is prime to a given degree of certainty. If you have a quick read of the documentation for the method in that link, you can probably guess what it's doing! Sure enough, when we check the source code we find that uses the Miller-Rabin test that we just saw.

No comments:

Post a Comment