March 28th, 2013, 12:30 AM
Hello all. I am trying to make a program to generate all the primes up to 10^9 however i am not able to make an array of integers of that size. I understand how the sieve works but how can I go about generating all the primes up to said number?
March 28th, 2013, 02:27 AM
One way would be to have an array of bits, rather than an array of bytes (or worse, integers).
So you only use one bit to indicate whether a number is prime or not.
Comments on this post
March 28th, 2013, 09:33 AM
Certainly one efficient way of doing it by Salems method. But however an alternative way is to use linked list?
March 28th, 2013, 12:11 PM
You don't need an array of integers for a sieve - just an array of booleans. That should cut down your memory consumption by a factor of 4, down to 1GB, which should fit into your memory fine.
Originally Posted by thomas12
Using bits instead of bytes, as salem suggested, will cut it down by another factor of 8, down to 125MB. If you're using C++, you get this for free by using a vector<bool>.
Also note that you should allocate large arrays on the heap, not the stack. The amount of memory that can be allocated on the stack is much more limited than on the heap. On my system it's 8MB, so a sieve for one billion numbers won't fit on the stack even if you use bits.
Using a linked list instead of an array will consume more memory, not less. Also sieve algorithms rely on random access being O(1). If you use a linked list for a sieve, that completely destroys its runtime - you might just as well use a naive non-sieve algorithm then.
Originally Posted by ssharish2005
March 28th, 2013, 12:27 PM
You should end up with 50,847,533 prime numbers.
p: inv 1e9
You could use Miller-Rabin test instead of sieve.
To quickly compute a large sieve you could compute in separate blocks. Maybe I'll implement this to better understand timing based on working within my level 2 cache.
In other words, instead of sieving all billion numbers at once do them in blocks of a kilobyte or something. Probably needless to say, the blocking is separable from bit representation versus unsigned int. Bits give you a factor of 32 more numbers you can sieve.
[/code] are essential for python code and Makefiles!
March 28th, 2013, 12:28 PM
There's a significant speed cost to implementing the sieve of Eratosthenes using a linked list, though. The usual implementation makes pretty heavy use of random access.
A very common optimization both for time and memory is not to represent integers that are divisible by a small prime. For example, for a positive integer N not to be divisible by 2 or 3, N % 6 must be 1 or 5. That automatically eliminates 2/3 of the composite numbers that a naive implementation would consider. If you're careful, you don't need to waste array space on those obviously composite numbers:
array -> 1
array -> 5
array -> 7
array -> 11
You can eliminate more than two small primes, but you'll find diminishing returns quickly. I found that eliminating up to 7 small primes worked pretty well.
Combined with the bit-array optimization, this should put this problem well within reach. To get all the primes up to 10^9, if you eliminate multiples of 2 and 3 and use a single bit for each candidate prime, you're looking at an array of about 10 million 32-bit integers.
There are algorithms that are a lot more efficient memory-wise, but they tend to sacrifice something in speed. A pretty good compromise is in this paper:
It uses a lazy functional programming model, but the algorithm is pretty easy to understand. I implemented a C++ version of it a while ago, and it came out to about 200 lines of code including the basic heap operations which I decided to implement myself.