#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    4
    Rep Power
    0

    Finding the first usage of either of two words, negating, and moving to the next use


    I'm writing a code in which I'm trying to negate certain arguments. For example, I want to pull in data from an inFile and search for "has" and replace it with "has not" and find "has not" and replace it with has. I have written a code that successfully does this if only one or the other argument is found, but when both arguments are in the same line, I am reaching a problem.

    What I would like to attempt is to search the line for both words and their index values, create an if statement saying if one value is less than another (finding which one appears first), to slice the string, negate the portion I just sliced, and add it to an empty string, and continue the process until all the arguments are gone. I don't have too great of an idea on how to execute this (I am a beginner). The following code is what I've written, but it's incomplete.

    Code:
    def replace(inFile):
        inFile = inFile.readlines()
     #this is where I'm attempting to find, slice and replace
    #I got a tip to use the "has not" == "has not" and "has"=="has" that way when searching for has, it only brings a result when it matches "has" exactly, instead of finding "has not" 
        for line in inFile:
            newline = ""
            while "has not" == "has not" and "has" == "has" in line:
                x = line.find(line, (has" == "has"))
                y = line.find(line, ("has not" == "has not"))
                if x < y:
                    line2 = line[:x]
                    line2 = line2.replace("is" == "is", "is not")
                    newline = newline + line2
                else:
                    line3 = line[:x]
                    line3 = line3.replace(("has not" == "has not"), "has")
                    newline = newline + line3
            
    #these two statements work if only one or the other are in a lien
            if "has not" == "has not"  in line:
                line = line.replace("has not", "has")
                print(line)
    
                
            elif "has" == "has" in line:
                line = line.replace("has" , "has not")
    
               print(line)
    The section where I am attempting to find, splice, replace, and move on, is bringing up several errors. I have a feeling that there is a much better way to go about doing this, and many thanks in advance for any help.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    Usually Japan when not on contract
    Posts
    240
    Rep Power
    12
    This is a case of parsing with a single lookahead. Getting all crazy with indexing the whole line and whatnot is going to overcomplicate your code.

    Without getting into grammar theory, here is an example of using a few Python string builtins to emulate the way a single-looakead parser works:

    python Code:
    input = 'I am an input string which has a lot of characters, but has not a clue why I am.'
     
    def replace(instring):
        in_has = False
        splitstring = instring.split(' ')
        newstring = []
        for word in splitstring:
            if word != 'has':
                if in_has is False:
                    newstring.append(word)
                else:
                    if word == 'not':
                        newstring.append('has')
                    else:
                        newstring.extend(['has not', word])
                    in_has = False
            else:
                in_has = True
        newstring = ' '.join(newstring)
        print instring
        print newstring
     
    replace(input)

    Produces this:
    Code:
    [ceverett@taco python]$ python parse_replace.py 
    I am an input string which has a lot of characters, but has not a clue why I am.
    I am an input string which has not a lot of characters, but has a clue why I am.
    That should give you some ideas. There are, of course, ways to shorten the logic above, regex solutions, and other ways to make this work, but the function above is a bare example of how a single-lookahead can be made to work taking advantage of a few shortcuts in Python.

    Also, I recommend that you don't treat your file as single lines. What if there is a linebreak between "has" and "not"? If the input might be really huge, then you want to split it into chunks -- and in this case you need to provide a single word of overlap between chunks in case there is a chunk split between "has" and "not". You can address that by either passing the current state of in_has to the next chunk's pass along with a single word to be prepended to instring, or think up another way to guarantee that you don't get duplicates but can replace across chunk/line boundaries.

    Incidentally, if you want to learn more about parsers (they are pretty fascinating) try doing "info bison" and sitting back with a cup of coffee...
  4. #3
  5. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,854
    Rep Power
    481
    Or use the quick-and-dirty swap
    t=a; a=b; b=t;
    Code:
    >>> A
    'I am an input string which has a lot of characters, but has not a clue why I am.'
    >>> A.replace('has not','ZZZZZ').replace('has', 'has not').replace('ZZZZZ','has')
    'I am an input string which has not a lot of characters, but has a clue why I am.'

    Comments on this post

    • zxq9 agrees
    [code]Code tags[/code] are essential for python code and Makefiles!
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    Usually Japan when not on contract
    Posts
    240
    Rep Power
    12
    Originally Posted by b49P23TIvg
    Or use the quick-and-dirty swap
    t=a; a=b; b=t;
    Indeed! (^.^)

    If that is too dirty (and if it is, then regex magic would be as well) and the end case you're building up to is actually a tokenizing/parsing problem where you actually do have a need to re-write tokens by index, then look into how the enumerate() function works. In particular enumerate(instring.split(' ')) is something you might find useful.

    If the problem is truly non-trivial and is file or stream-wide consider forming an array across lists of lines, each line being split() and enumerated -- in other words turn the problem into a 2-dimensional array replacement/matrix function problem. That can be a useful approach to parsing non-trivial grammars where n-lookahead is insufficient (but introduces memory and chunking management).

IMN logo majestic logo threadwatch logo seochat tools logo