### Thread: Get a series of random numbers from a big list

1. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Mar 2013
Posts
3
Rep Power
0

#### Get a series of random numbers from a big list

Guys,
I want to generate a series of random numbers WITHOUT REPLACEMENT at an ascending order. Basically I have a list of strings (many million lines) and some of them (small percentage) are repetitive. What I want to do is
1. take 10,000 random strings from this list, store them into file 1, remove these 10,000 picks for the rest of sampling
2. then take another 10,000 random strings from the reminder of the list, store them along with file 1 into file 2, remove these 10,000 picks
3. another 10,000 random strings from the reminder of the list, store them along with file 2 into file 3, remove these new 10,000 picks
...
until I have sampled all of the available strings.

I kinda of have a rough idea of how to implement it. Since I'm fairly new to python, I would like to know if there are efficient way to do this. So I'd love to hear what your thoughts on this!

Thanks!
2. No Profile Picture
Contributing User
Devshed Newbie (0 - 499 posts)

Join Date
Dec 2012
Posts
114
Rep Power
3
Probably your best bet is to shuffle the list using random.shuffle() and then pop items off the end of it using the list's pop() method.
3. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Mar 2013
Posts
3
Rep Power
0
Originally Posted by Nyktos
Probably your best bet is to shuffle the list using random.shuffle() and then pop items off the end of it using the list's pop() method.
I see. I'll take a look at those two methods. Thanks!
4. But to eliminate duplicates first make the list into a set.

LIST_WITHOUT_DUPLICATES = list(set(LIST))
random.shuffle(LIST_WITHOUT_DUPLICATES)
L = LIST_WITHOUT_DUPLICATES
# the rest begs for a loop
# while L:
store_in_file('a',L[-10000:])
L = L[:-10000]
store_in_file('b',L[-10000:])
L = L[:-10000]
5. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Mar 2013
Posts
3
Rep Power
0
Originally Posted by b49P23TIvg
But to eliminate duplicates first make the list into a set.

LIST_WITHOUT_DUPLICATES = list(set(LIST))
random.shuffle(LIST_WITHOUT_DUPLICATES)
L = LIST_WITHOUT_DUPLICATES
# the rest begs for a loop
# while L:
store_in_file('a',L[-10000:])
L = L[:-10000]
store_in_file('b',L[-10000:])
L = L[:-10000]
So set is an unordered collection of unique elements.. I actually don't want to remove duplicates (rather,non-unique elements).