#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2017
    Posts
    2
    Rep Power
    0

    Python/Numpy have I already written the swiftest code for large array?


    **GOAL:**
    I would like to get my script total execution time down from 4 minutes to less than 30 secs. I have a large 1d array (3000000+) of distances with many duplicate distances. I am trying to write the swiftest function that returns all distances that appear n times in the array. I have written a function in numpy but there is a bottleneck at one line in the code. Swift performance is an issue because the calculations are done in a for loop for 2400 different large distance arrays. 

    import numpy as np
    for t in range(0, 2400):
    a=np.random.randint(1000000000, 5000000000, 3000000)
    b=np.bincount(a,minlength=np.size(a))
    c=np.where(b == 3)[0] #SLOW STATEMENT/BOTTLENECK
    return c
    **EXPECTED RESULTS:**
    Given a 1d array of distances [2000000000,3005670000,2000000000,12345667,4000789000,12345687,12345667,2000000000,12345667]
    I would expect back an array of [2000000000,12345667] when queried to return an array of all distances that appear 3 times in the main array.

    What should I do?
  2. #2
  3. Contributing User
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Aug 2011
    Posts
    5,968
    Rep Power
    509
    I tried numpy.compress which might give you the result you need. On the other hand, it is no faster. And by gosh you've got a big memory.

    Comments on this post

    • justforkicks1 agrees
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2017
    Posts
    2
    Rep Power
    0
    Thanks for the reply!!!

IMN logo majestic logo threadwatch logo seochat tools logo