Sunday, June 13, 2010

Primes in R (Part III): Schoolmath is Broken!

Here we are in Part III. Wait!? What happened to Parts I and II? Well, I started to write an article about Amdahl's law, parallelism and prime numbers, but found myself buried three levels deep trying to resolve problems with prime numbers in R. My normal inclination is to use Mathematica for such things, but I happened to already be using R for another reason so, I thought I'd see what it had to offer for calculating with primes. It now looks like that might have been a mistake.


Putting primes in R

The first problem is that prime functions are not installed with R base packages by default (because they're not often used in statistical calculations, I suppose) so, you have to find and install a package that has them. Googling around, I landed on primeFactors in the R.basic package, but that failed to install. Grrr!

Added Note: On Mon, Jun 14, 2010, David Smith (VP of Marketing, Revolution Analytics) pointed me at the GMP package on CRAN. Although it does have an isprime() tester, there is no prime number generator. Also, since isprime returns a 2, 1 or 0 depending on whether the number is, might be, or is not a prime, it's probably easier to use it with an ifelse function:

> isprime(29)
[1] 2
> ifelse(isprime(29)==2,"yes","no")
[1] "yes"

Undaunted, I finally found the package called schoolmath, mentioned over at R-bloggers, and it appeared to have just the functions I needed. For example, it has a convenient function called primes that lists all the prime numbers within a specified range.

Good. Let's try listing the primes between 2 and 10:

> library(schoolmath)
> primes(2,10)
[1]  1  2  3  5  7 11 13 17
Huh!? That's not right. It turns out that this result occurs because the underlying R code says:

if (start < 18) { primzahlen <- c(1, 2, 3, 5, 7, 11, 13, 17) 
which is bloody annoying! The other glaring problem is the 1. Who ordered that? (see Muon) The number 1 was once considered prime, but not these days. The usage note states: primes(start = 12, end = 9999). Sigh!


Counting primes

Ignoring that strangeness (I can always decrement by 1), it appears to do better if you ask for a bigger range. Let's look at primes less than 500:

> rsm500 <- primes(2,500)
> rsm500
 [1]   1   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
[22]  73  79  83  89  97 101 103 107 109 113 127 131 133 137 139 149 151 157 163 167 173
[43] 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 259 263 269 271 277 281
[64] 283 293 301 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
[85] 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499
>length(rsm500)
[1] 99
Hang on, that's wrong too! There are only 95 primes less than 500. You can check against this correct list on the web, and here they are from Mathematica:

> mma500
 [1]   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73
[22]  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181
[43] 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307
[64] 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433
[85] 439 443 449 457 461 463 467 479 487 491 499
> length(mma500)
[1] 95
Now, I have to find out where the inconsistencies reside.


Debugging the count

To find the wrong additional entries, I wrote an R script and here's the output. The first column is just the row number, which starts at zero (because of the incorrect inclusion of 1 in the R package) so that successive entries line up. The second column (MMA) contains the Mathematica primes and the third column (Rpkg) is the primes produced by the schoolmath package.

Comparison with Mathematica list of primes
   MMA Rpkg
 0 NaN   1 *
 1   2   2 
 2   3   3 
 3   5   5 
 4   7   7 
 5  11  11 
 6  13  13 
 7  17  17 
 8  19  19 
 9  23  23 
10  29  29 
11  31  31 
12  37  37 
13  41  41 
14  43  43 
15  47  47 
16  53  53 
17  59  59 
18  61  61 
19  67  67 
20  71  71 
21  73  73 
22  79  79 
23  83  83 
24  89  89 
25  97  97 
26 101 101 
27 103 103 
28 107 107 
29 109 109 
30 113 113 
31 127 127 
32 131 131 
33 137 133 *
34 139 137 *
35 149 139 *
36 151 149 *
37 157 151 *
38 163 157 *
39 167 163 *
40 173 167 *
41 179 173 *
42 181 179 *
43 191 181 *
44 193 191 *
45 197 193 *
46 199 197 *
47 211 199 *
48 223 211 *
49 227 223 *
50 229 227 *
51 233 229 *
52 239 233 *
53 241 239 *
54 251 241 *
55 257 251 *
56 263 257 *
57 269 259 *
58 271 263 *
59 277 269 *
60 281 271 *
61 283 277 *
62 293 281 *
63 307 283 *
64 311 293 *
65 313 301 *
66 317 307 *
67 331 311 *
68 337 313 *
69 347 317 *
70 349 331 *
71 353 337 *
72 359 347 *
73 367 349 *
74 373 353 *
75 379 359 *
76 383 367 *
77 389 373 *
78 397 379 *
79 401 383 *
80 409 389 *
81 419 397 *
82 421 401 *
83 431 409 *
84 433 419 *
85 439 421 *
86 443 431 *
87 449 433 *
88 457 439 *
89 461 443 *
90 463 449 *
91 467 457 *
92 479 461 *
93 487 463 *
94 491 467 *
95 499 479 *
96 NaN 487 *
97 NaN 491 *
98 NaN 499 *

Bad R pkg values:  1 133 259 301
The end of this output (scroll down) shows 4 incorrectly included composite numbers (integers that are not prime).

In row 0, Mathematica (correctly) does not produce 1 as a prime, so its value is shown as NaN and an asterisk is appended to that row to indicate a discrepancy. Rows 1 through 32 are consistent with each other. The trouble starts at line 33 where R incorrectly inserts 133 as a prime in its list. We can easily check that 133 is not prime using the schoolmath function is.prim():

> is.prim(133)
[1] FALSE
With 4 composites wrongly listed as primes you get a count of 99 instead of the correct 95.

This is rather troubling. My experience with R is that it has surprisingly few bugs. This schoolmath package is the worst case of bugs I've seen on CRAN. Perhaps I should have been warned by the name of the package, but it was recommended by a mathematician. Schoolmath needs to go back to school. I hope the authors will enroll it and soon.

4 comments:

Neil Gunther said...

Prof. Christian Robert (the mathematician who recommended Schoolmath in good faith) has kindly noted my analysis on his blog and also urges the authors of Schoolmath to enroll it in a Primes 101 class.

Robert D. Brown III said...

I have an implementation of Eratosthenes Sieve that that I wrote in R that finds all the primes between 2 an X. If you don't have a similar routine, I'll be glad to send it to you.

Neil Gunther said...

Thanks but take a look at primegen in the 1st box of my Primes Part II post and see if you can convince yourself how it works in R. It's rather cute.

Neil Gunther said...

Note added about using isprime() in the GMP package on CRAN.