#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    3
    Rep Power
    0

    what is the formula


    I have a matrix M
    M= [[ 0. 0. 0. ..., 0. 0. 0.]
    [ 0. 0. 0. ..., 0. 0. 0.]
    [ 0. 0. 0. ..., 0. 0. 0.]
    ...,
    [ 0. 0. 0. ..., 0. 0. 0.]
    [ 0. 0. 0. ..., 0. 0. 0.]
    [ 0. 0. 0. ..., 0. 0. 0.]]
    This data is tfidf scores

    and a centroid
    [ 0. 0.00259728 0. ..., 0. 0. 0. ]
    I want to find the distance between the centroid and each score of matrix M.
    I have imported newsgroup dataset from sklearn and got the tf idf scores. And got centroid by taking mean of 1 class of dataset. (average of first 200 rows of M).
    What will be the formula to find the distance between each score in matrix and each score in centroid such that the format of matrix X will not be affected or is there any other way.
  2. #2
  3. Contributing User
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Aug 2011
    Posts
    5,858
    Rep Power
    509
    Maybe you want the square root of the dot product as a distance metric?
    In newest python version I think there's a dot product operator, @
    You probably want map(math.sqrt, M@centroid) to compute all at once.
    You could use scipy for dot product.
    Code:
    # commanding    python3 -m doctest thisfile.py  runs the doctest
    
    def dot(a, x):
        '''
            return the dot product of matrix a with (column) vector x
    	>>> dot([[0, 1, 2, 3], [4, 5, 6, 7]], [2, 3, 5, 7])
            [34, 102]
        '''
        return [sum(ai[j] * xj for (j, xj) in enumerate(x)) for ai in a]
    
    
    you_might_want = [a**0.5 for a in dot(M, centroid)]
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Posts
    3
    Rep Power
    0
    Originally Posted by b49P23TIvg
    Maybe you want the square root of the dot product as a distance metric?
    In newest python version I think there's a dot product operator, @
    You probably want map(math.sqrt, M@centroid) to compute all at once.
    You could use scipy for dot product.
    Code:
    # commanding    python3 -m doctest thisfile.py  runs the doctest
    
    def dot(a, x):
        '''
            return the dot product of matrix a with (column) vector x
    	>>> dot([[0, 1, 2, 3], [4, 5, 6, 7]], [2, 3, 5, 7])
            [34, 102]
        '''
        return [sum(ai[j] * xj for (j, xj) in enumerate(x)) for ai in a]
    
    
    you_might_want = [a**0.5 for a in dot(M, centroid)]
    Is there anyway so that i can find the distance between 0 and 2 , 1and 3, 2 and 5 ,3 and7
    and the result will be same as size of matrix M
  6. #4
  7. Contributing User
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    Aug 2011
    Posts
    5,858
    Rep Power
    509
    the data in M must represent something and be in a particular order.
    Distance between two points on a line is their difference.
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo