March 28th, 2013, 12:30 AM

Primes
Hello all. I am trying to make a program to generate all the primes up to 10^9 however i am not able to make an array of integers of that size. I understand how the sieve works but how can I go about generating all the primes up to said number?
March 28th, 2013, 02:27 AM

One way would be to have an array of bits, rather than an array of bytes (or worse, integers).
So you only use one bit to indicate whether a number is prime or not.
Comments on this post
March 28th, 2013, 09:33 AM

Certainly one efficient way of doing it by Salems method. But however an alternative way is to use linked list?
ssharish2005
March 28th, 2013, 12:11 PM

Originally Posted by thomas12
i am not able to make an array of integers of that size.
You don't need an array of integers for a sieve  just an array of booleans. That should cut down your memory consumption by a factor of 4, down to 1GB, which should fit into your memory fine.
Using bits instead of bytes, as salem suggested, will cut it down by another factor of 8, down to 125MB. If you're using C++, you get this for free by using a vector<bool>.
Also note that you should allocate large arrays on the heap, not the stack. The amount of memory that can be allocated on the stack is much more limited than on the heap. On my system it's 8MB, so a sieve for one billion numbers won't fit on the stack even if you use bits.
Originally Posted by ssharish2005
But however an alternative way is to use linked list?
Using a linked list instead of an array will consume more memory, not less. Also sieve algorithms rely on random access being O(1). If you use a linked list for a sieve, that completely destroys its runtime  you might just as well use a naive nonsieve algorithm then.
March 28th, 2013, 12:27 PM

Code:
p: inv 1e9
50847534
p: 50847534
1000000007
You should end up with 50,847,533 prime numbers.
You could use MillerRabin test instead of sieve.
To quickly compute a large sieve you could compute in separate blocks. Maybe I'll implement this to better understand timing based on working within my level 2 cache.
In other words, instead of sieving all billion numbers at once do them in blocks of a kilobyte or something. Probably needless to say, the blocking is separable from bit representation versus unsigned int. Bits give you a factor of 32 more numbers you can sieve.
[code]
Code tags[/code] are essential for python code and Makefiles!
March 28th, 2013, 12:28 PM

There's a significant speed cost to implementing the sieve of Eratosthenes using a linked list, though. The usual implementation makes pretty heavy use of random access.
A very common optimization both for time and memory is not to represent integers that are divisible by a small prime. For example, for a positive integer N not to be divisible by 2 or 3, N % 6 must be 1 or 5. That automatically eliminates 2/3 of the composite numbers that a naive implementation would consider. If you're careful, you don't need to waste array space on those obviously composite numbers:
array[0] > 1
array[1] > 5
array[2] > 7
array[3] > 11
etc.
You can eliminate more than two small primes, but you'll find diminishing returns quickly. I found that eliminating up to 7 small primes worked pretty well.
Combined with the bitarray optimization, this should put this problem well within reach. To get all the primes up to 10^9, if you eliminate multiples of 2 and 3 and use a single bit for each candidate prime, you're looking at an array of about 10 million 32bit integers.
There are algorithms that are a lot more efficient memorywise, but they tend to sacrifice something in speed. A pretty good compromise is in this paper:
http://www.cs.tufts.edu/~nr/comp150f.../SieveJFP.pdf
It uses a lazy functional programming model, but the algorithm is pretty easy to understand. I implemented a C++ version of it a while ago, and it came out to about 200 lines of code including the basic heap operations which I decided to implement myself.