September 3rd, 2013, 02:00 AM

Help on using Nearest neighbour interolation of Scipy
Hi all,
I am very new to Python. I want to use the functionality within NumPy and SciPy to do a nearest neighbour interpolation for multivariable data I have.
1. I have a text file (points.dat) in each line I have a sample for (x1, x2, x3,..., x14). The number of line is 1600. So I have 1600 sample with 14 variables.
2. I have a text file(value.dat) in each line I have only one value corresponding to the sample in that line.
3. I want to use Nearest neighbor Interpolation of Scipy to make interpolation for a new sample . This sample however is saved in another text file (TestSample.txt) but it is columnwise. it means that in line 1 I have only x1, in line 2 I have only x2,... to x14.
Can you please help to write some lines of codes to use NumpY loadtxt and Scipy interpolate.NearestNDinterpolator to import the data, and build the interpolant, and make use of scipy.spatial.cKDTree to test with a new sample for obtaining the value of interpolation?
I have written some codes but I have problem with data dimensions and etc.
Thank you.
September 3rd, 2013, 09:46 PM

I do not understand your item 2. You claim a single number; I think you need a vector length 14.
My file points.dat has 1600 rows by 14 columns of pseudo random values between 0 and 1 generated by the j sentence
1600 14?@$0
www.jsoftware.com
Following python session shows scipy imports, loading the data with loadtxt, building a KDTree object, then failure using vector of wrong length (your item 2), a successful query for point nearest a vector of 14 0.4's, and a vector length computation showing that I understand the output from tree.query.
Code:
$ python3
Python 3.3.1 (default, Apr 17 2013, 22:30:32)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy
>>> A = scipy.loadtxt('points.dat')
>>> import scipy.spatial
>>> tree = scipy.spatial.KDTree(A)
>>> tree.query([0.4])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/distpackages/scipy/spatial/kdtree.py", line 405, in query
raise ValueError("x must consist of vectors of length %d but has shape %s" % (self.m, np.shape(x)))
ValueError: x must consist of vectors of length 14 but has shape (1,)
>>> tree.query([0.4]*14)
(0.64333815234291836, 182)
>>> A[182]
array([ 0.677797, 0.116046, 0.388321, 0.254809, 0.511037, 0.427214,
0.333014, 0.420218, 0.511805, 0.292786, 0.320155, 0.656409,
0.19426 , 0.119901])
>>> (A[182]0.4).dot(A[182]0.4)**0.5
0.64333815234291836
>>>
[code]
Code tags[/code] are essential for python code and Makefiles!
September 5th, 2013, 02:00 AM

Thank you.
The item 2 means y=f(x1,x2,...x14). The file points.dat for example, only lists the independent variables x1,x2...x14, and file values.dat shows the corresponding values.
Example (file points.dat which has 1400 lines or 1400 samples of vector X which has 14 members) i.e.
1 4 5 6 9 8 6 8 9 0 4 5 2 1
3 4 5 3 4 2 4 5 9 8 8 6 5 2
...
1 2 4 7 9 6 5 7 6 6 7 5 5 6
Example (file values.dat: 1400 lines : 1400 samples of y value corresponding to vector X in each line of points.dat), i.e.
4
9
..
3
September 5th, 2013, 10:45 PM

In my experiment, which you could have easily done,
loadtxt of a column of numbers produces a rank 1 array; also known as a vector.
Now, in point 3, are there just 7 numbers in the file TestSample.txt? If there are many sites to interpolate the data then after reading the data, also with loadtxt, take the transpose.
You've got all the pieces you need.
[code]
Code tags[/code] are essential for python code and Makefiles!