April 25th, 2017, 09:56 AM

what is the formula
I have a matrix M
M= [[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]]
This data is tfidf scores
and a centroid
[ 0. 0.00259728 0. ..., 0. 0. 0. ]
I want to find the distance between the centroid and each score of matrix M.
I have imported newsgroup dataset from sklearn and got the tf idf scores. And got centroid by taking mean of 1 class of dataset. (average of first 200 rows of M).
What will be the formula to find the distance between each score in matrix and each score in centroid such that the format of matrix X will not be affected or is there any other way.
April 25th, 2017, 02:08 PM

Maybe you want the square root of the dot product as a distance metric?
In newest python version I think there's a dot product operator, @
You probably want map(math.sqrt, M@centroid) to compute all at once.
You could use scipy for dot product.
Code:
# commanding python3 m doctest thisfile.py runs the doctest
def dot(a, x):
'''
return the dot product of matrix a with (column) vector x
>>> dot([[0, 1, 2, 3], [4, 5, 6, 7]], [2, 3, 5, 7])
[34, 102]
'''
return [sum(ai[j] * xj for (j, xj) in enumerate(x)) for ai in a]
you_might_want = [a**0.5 for a in dot(M, centroid)]
[code]
Code tags[/code] are essential for python code and Makefiles!
April 25th, 2017, 09:19 PM

Originally Posted by b49P23TIvg
Maybe you want the square root of the dot product as a distance metric?
In newest python version I think there's a dot product operator, @
You probably want map(math.sqrt, M@centroid) to compute all at once.
You could use scipy for dot product.
Code:
# commanding python3 m doctest thisfile.py runs the doctest
def dot(a, x):
'''
return the dot product of matrix a with (column) vector x
>>> dot([[0, 1, 2, 3], [4, 5, 6, 7]], [2, 3, 5, 7])
[34, 102]
'''
return [sum(ai[j] * xj for (j, xj) in enumerate(x)) for ai in a]
you_might_want = [a**0.5 for a in dot(M, centroid)]
Is there anyway so that i can find the distance between 0 and 2 , 1and 3, 2 and 5 ,3 and7
and the result will be same as size of matrix M
April 26th, 2017, 08:40 AM

the data in M must represent something and be in a particular order.
Distance between two points on a line is their difference.
[code]
Code tags[/code] are essential for python code and Makefiles!