### Thread: what is the formula

1. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Apr 2017
Posts
3
Rep Power
0

#### what is the formula

I have a matrix M
M= [[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]]
This data is tfidf scores

and a centroid
[ 0. 0.00259728 0. ..., 0. 0. 0. ]
I want to find the distance between the centroid and each score of matrix M.
I have imported newsgroup dataset from sklearn and got the tf idf scores. And got centroid by taking mean of 1 class of dataset. (average of first 200 rows of M).
What will be the formula to find the distance between each score in matrix and each score in centroid such that the format of matrix X will not be affected or is there any other way.
2. Maybe you want the square root of the dot product as a distance metric?
In newest python version I think there's a dot product operator, @
You probably want map(math.sqrt, M@centroid) to compute all at once.
You could use scipy for dot product.
Code:
```# commanding    python3 -m doctest thisfile.py  runs the doctest

def dot(a, x):
'''
return the dot product of matrix a with (column) vector x
>>> dot([[0, 1, 2, 3], [4, 5, 6, 7]], [2, 3, 5, 7])
[34, 102]
'''
return [sum(ai[j] * xj for (j, xj) in enumerate(x)) for ai in a]

you_might_want = [a**0.5 for a in dot(M, centroid)]```
3. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Apr 2017
Posts
3
Rep Power
0
Originally Posted by b49P23TIvg
Maybe you want the square root of the dot product as a distance metric?
In newest python version I think there's a dot product operator, @
You probably want map(math.sqrt, M@centroid) to compute all at once.
You could use scipy for dot product.
Code:
```# commanding    python3 -m doctest thisfile.py  runs the doctest

def dot(a, x):
'''
return the dot product of matrix a with (column) vector x
>>> dot([[0, 1, 2, 3], [4, 5, 6, 7]], [2, 3, 5, 7])
[34, 102]
'''
return [sum(ai[j] * xj for (j, xj) in enumerate(x)) for ai in a]

you_might_want = [a**0.5 for a in dot(M, centroid)]```
Is there anyway so that i can find the distance between 0 and 2 , 1and 3, 2 and 5 ,3 and7
and the result will be same as size of matrix M
4. the data in M must represent something and be in a particular order.
Distance between two points on a line is their difference.