#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2013
    Posts
    2
    Rep Power
    0

    Python import dataset question


    The following snippet occurs in "sample." My question is where physically are the files related to the following statement located. Should they be called "iris.??"?

    from sklearn import datasets
    iris = datasets.load_iris()



    The above code is in the following directory /home/xxx/examples/sample.py.

    Where would the datasets be located relative to the above directory

    the following directories exist

    /home/xxx/sklearn/dataset

    The listing inside dataset is

    ls -l
    total 228
    -rw-r--r-- 1 syedk syedk 14819 Sep 21 2011 base.py
    -rw-r--r-- 1 root root 14958 Jul 23 02:32 base.pyc
    drwxr-xr-x 2 syedk syedk 4096 Jul 23 19:27 data
    -rw-r--r-- 1 syedk syedk 6152 Sep 21 2011 DATASET_PROPOSAL.txt
    drwxr-xr-x 2 syedk syedk 4096 Jul 23 19:32 descr
    drwxr-xr-x 2 syedk syedk 4096 Sep 21 2011 images
    -rw-r--r-- 1 syedk syedk 1601 Sep 21 2011 __init__.py
    -rw-r--r-- 1 root root 2227 Jul 23 02:32 __init__.pyc
    -rw-r--r-- 1 syedk syedk 16520 Sep 21 2011 lfw.py
    -rw-r--r-- 1 root root 14036 Jul 23 02:32 lfw.pyc
    -rw-r--r-- 1 syedk syedk 3703 Sep 21 2011 mlcomp.py
    -rw-r--r-- 1 root root 3362 Jul 23 02:32 mlcomp.pyc
    -rw-r--r-- 1 syedk syedk 6563 Sep 21 2011 mldata.py
    -rw-r--r-- 1 root root 5453 Jul 23 02:32 mldata.pyc
    -rw-r--r-- 1 syedk syedk 3882 Sep 21 2011 olivetti_faces.py
    -rw-r--r-- 1 root root 3844 Jul 23 02:32 olivetti_faces.pyc
    -rw-r--r-- 1 syedk syedk 30412 Sep 21 2011 samples_generator.py
    -rw-r--r-- 1 root root 28918 Jul 23 02:32 samples_generator.pyc
    -rw-r--r-- 1 syedk syedk 515 Sep 21 2011 setup.py
    -rw-r--r-- 1 root root 773 Jul 23 02:31 setup.pyc
    -rw-r--r-- 1 syedk syedk 3523 Sep 21 2011 svmlight_format.py
    -rw-r--r-- 1 root root 3515 Jul 23 02:32 svmlight_format.pyc
    drwxr-xr-x 3 syedk syedk 4096 Jul 23 02:32 tests


    I cd'd to data but did not see any file referring to "iris".
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    Usually Japan when not on contract
    Posts
    240
    Rep Power
    12
    Are you asking where the python module sklearn is located within your filesystem? Are you looking for the source that defines the function datasets.load_iris()?

    The answer to the question depends on how you installed sklearn and what platform you're on. I'm on an SL6 system right now, so any add-on modules I've got installed begin at /usr/lib/python2.6/site-packages/ or /usr/lib/python3.2/site-packages/. Inside there you would probably find an sklearn/ directory and a datasets.py file. So something like /usr/lib/python2.6/site-packages/sklearn/datasets.py is probably what you're looking for. Within that file you'd want to read the lines beginning with "def load_iris".

    The location is different on different platforms. I don't recall where it is on my Vine and Gentoo systems, but its something very similar to the above paths.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    194
    Rep Power
    2
    Ok, so it is a bit of a wild goose chase.
    When you run this line:
    Code:
    from sklearn import datasets
    the file __init__ in the datasets folder runs.

    In that file you will find the line:
    Code:
    from .base import load_iris
    this means import the function load_iris from a module called base, which is also in the datasets directory.

    So let's look there. In base we find the load_iris function. In that function you find these lines:
    Code:
    data_file = csv.reader(open(join(module_path, 'data', 'iris.csv')))
    fdescr = open(join(module_path, 'descr', 'iris.rst'))
    These folders are also in the datasets directory. Within the datasets/data folder you will find 'iris.csv':
    https://github.com/scikit-learn/scik.../data/iris.csv
    and within the datasets/descr folder you will find 'iris.rst':
    https://github.com/scikit-learn/scik...descr/iris.rst

    Cheers,
    -Mek
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    Usually Japan when not on contract
    Posts
    240
    Rep Power
    12
    I think there was also a builtin or function somewhere that can tell you the filesystem location of an imported module. But that might just be something I'm mixing up with a ghci or sbcl feature... Anybody?
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    194
    Rep Power
    2
    Well you can do stuff like this:
    Code:
    >>> import pygame
    >>> print(pygame.__file__)
    C:\Python27\lib\site-packages\pygame\__init__.pyc
    >>>
    -Mek

    Edit:
    Also some other interesting answers here regarding cases where the source file isn't actually written in python:
    how-do-i-find-the-location-of-python-module-sources
    Last edited by Mekire; July 25th, 2013 at 02:16 AM.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    Usually Japan when not on contract
    Posts
    240
    Rep Power
    12
    Pow! There it is.
    So I didn't just imagine that.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2013
    Posts
    2
    Rep Power
    0

    Mapping import statement to filesystem


    Trying to chase down the filesystem location of this statements

    from sklearn import datasets
    iris = datasets.load_iris()

    I located 2 files called iris.csv and iris.rst. However, as a test I renamed them to xxx.csv and xxx.rst and re-ran the test with no error. I was assuming the missing files would cause an error.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    194
    Rep Power
    2
    For me on windows the files are here:
    Code:
    >>> from sklearn import datasets
    >>> datasets.__file__
    'C:\\Python27\\lib\\site-packages\\sklearn\\datasets\\__init__.pyc'
    >>> from sklearn.datasets import base
    >>> base.__file__
    'C:\\Python27\\lib\\site-packages\\sklearn\\datasets\\base.pyc'
    >>>
    The load_iris function is on line 202 of that base.py file

    Your can see it here:
    https://github.com/scikit-learn/scik...s/base.py#L202

    iris.csv is located here for me:
    Code:
    C:\\Python27\\lib\\site-packages\\sklearn\\datasets\\data\\iris.csv
    and if I rename it to "xxx.csv" and run this:
    Code:
    iris = datasets.load_iris()
    I get this:
    Code:
    Traceback (most recent call last):
      File "<interactive input>", line 1, in <module>
      File "C:\Python27\lib\site-packages\sklearn\datasets\base.py", line 232, in load_iris
        data_file = csv.reader(open(join(module_path, 'data', 'iris.csv')))
    IOError: [Errno 2] No such file or directory: 'C:\\Python27\\lib\\site-packages\\sklearn\\datasets\\data\\iris.csv'
    >>>
    -Mek

IMN logo majestic logo threadwatch logo seochat tools logo