#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    3
    Rep Power
    0

    Newbie: Find a variable outside a function


    I'm new to Python and have a question about indexes and data frames. I have 3 source files that can be uniquely identified with the concatenation of two fields (district_code, district_type). I'm extracting data from three source files based on a unique key that I produce -dist_key. The totals extracted from the database sproc are compared to the aggregate totals from the source files. It is important to have a unique identifier (dist_key) so duplicate district codes dont affect the results and the results match what is in the database.

    The function below produces an error "keyError: u'no item named dist_key". The unique identifier (dist_key) is only defined in this function. Being a newbee to this language and scripting, I'm unsure of how to call the needed variable (dist_key) outside the function.


    Code:
    def addNamesCodes(testframe,districtnamedata,districtcodedata):
            """ Function that will correct any missing data such as district names or district codes.  Parameter is a pandas dataframe, dictionaries which map the district names and district codes """
    
    
            #contain list of correct district codes and district names
            districtnames=[]
            districtkeys=[]
            #Non matches
            fdistrictnames=[]
            fdistrictkeys=[]
            #fill empty values in names and codes
    
            testframe['district_name']=testframe['district_name'].apply(lambda x: str(x))
            testframe['district_name']=testframe['district_name'].fillna('')
            testframe['district_code']=testframe['district_code'].fillna('')
            testframe['dist_key']=testframe['dist_key'].fillna('')
            testframe['dist_key']=testframe['dist_key']+testframe['district_code']
            #Create two new columns containing the district names and district codes in same format as enrollment and teacher data  
            for i in range(len(testframe.index)):
                #both district code and district name are present
                if districtnamedata.has_key(testframe['dist_key'][testframe.index[i]]) and districtcodedata.has_key(testframe['district_name'][testframe.index[i]]):
                    #district code and district name are a match
                    if ((districtnamedata[testframe['dist_key'][testframe.index[i]]]==testframe['district_name'][testframe.index[i]]) and (districtcodedata[testframe['district_name'][testframe.index[i]]]==testframe['dist_key'][testframe.index[i]])):
                        districtnames.append(districtnamedata[testframe['dist_key'][testframe.index[i]]])
                        districtkeys.append(districtcodedata[testframe['district_name'][testframe.index[i]]])
                    #potential wrong mappings
                    else:
                        districtkeys.append(testframe['dist_key'][testframe.index[i]])
                        districtnames.append(districtnamedata[testframe['dist_key'][testframe.index[i]]])
                else:
                    #check if district code is present
                    if districtnamedata.has_key(testframe['dist_key'][testframe.index[i]]):
                        districtkeys.append(testframe['dist_key'][testframe.index[i]])
                        districtnames.append(districtnamedata[testframe['dist_key'][testframe.index[i]]])
                    #check if only district name is present 
                    elif districtcodedata.has_key(testframe['district_name'][testframe.index[i]]):
                        districtnames.append(testframe['district_name'][testframe.index[i]])
                        districtkeys.append(districtcodedata[testframe['district_name'][testframe.index[i]]])
                    #complete nonmatches
                    else:
                        fdistrictnames.append(testframe['district_name'][testframe.index[i]])
                        fdistrictkeys.append(testframe['dist_key'][testframe.index[i]])
            #extend the list by the complete nonmatches
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,997
    Rep Power
    481
    Your function doesn't have any syntax errors! Great.

    Panda: fuzzy creature from China.

    Code:
    testframe['district_name']=testframe['district_name'].apply(lambda x: str(x))
    Where you have
    apply(lambda x:str(x))
    you could simply use
    apply(str)

    Need test data and desired result please. I am installing pandas.
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo