#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    3
    Rep Power
    0

    Get a series of random numbers from a big list


    Guys,
    I want to generate a series of random numbers WITHOUT REPLACEMENT at an ascending order. Basically I have a list of strings (many million lines) and some of them (small percentage) are repetitive. What I want to do is
    1. take 10,000 random strings from this list, store them into file 1, remove these 10,000 picks for the rest of sampling
    2. then take another 10,000 random strings from the reminder of the list, store them along with file 1 into file 2, remove these 10,000 picks
    3. another 10,000 random strings from the reminder of the list, store them along with file 2 into file 3, remove these new 10,000 picks
    ...
    until I have sampled all of the available strings.

    I kinda of have a rough idea of how to implement it. Since I'm fairly new to python, I would like to know if there are efficient way to do this. So I'd love to hear what your thoughts on this!

    Thanks!
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    114
    Rep Power
    3
    Probably your best bet is to shuffle the list using random.shuffle() and then pop items off the end of it using the list's pop() method.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    3
    Rep Power
    0
    Originally Posted by Nyktos
    Probably your best bet is to shuffle the list using random.shuffle() and then pop items off the end of it using the list's pop() method.
    I see. I'll take a look at those two methods. Thanks!
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480
    But to eliminate duplicates first make the list into a set.

    LIST_WITHOUT_DUPLICATES = list(set(LIST))
    random.shuffle(LIST_WITHOUT_DUPLICATES)
    L = LIST_WITHOUT_DUPLICATES
    # the rest begs for a loop
    # while L:
    store_in_file('a',L[-10000:])
    L = L[:-10000]
    store_in_file('b',L[-10000:])
    L = L[:-10000]
    [code]Code tags[/code] are essential for python code and Makefiles!
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    3
    Rep Power
    0
    Originally Posted by b49P23TIvg
    But to eliminate duplicates first make the list into a set.

    LIST_WITHOUT_DUPLICATES = list(set(LIST))
    random.shuffle(LIST_WITHOUT_DUPLICATES)
    L = LIST_WITHOUT_DUPLICATES
    # the rest begs for a loop
    # while L:
    store_in_file('a',L[-10000:])
    L = L[:-10000]
    store_in_file('b',L[-10000:])
    L = L[:-10000]
    So set is an unordered collection of unique elements.. I actually don't want to remove duplicates (rather,non-unique elements).
  10. #6
  11. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480
    OK, I'll read your post carefully.

    "I want to generate a series of random numbers WITHOUT REPLACEMENT at an ascending order."

    Here are 10 random numbers drawn from a range of 10 WITHOUT REPLACEMENT at an ascending order:

    1 2 3 4 5 6 7 8 9 10
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo