#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    8
    Rep Power
    0

    Loop through each file in folder and look for specific character strings


    Hi,

    First off, I'd like to precise that I have never programmed anything in python, but I would like to write this little program to learn more about this language.

    I currently work with VBA within Excel, but I would like to know if I could be able to do this job with python.

    I have an Excel file with 600 lines. Those lines contains the specific characters strings I want to search in each .txt file in a folder.

    Here is my current code in VBA. I commented the code a bit in order for you to understand each steps.

    To start, how do I input my 600 values in python in order to look for them in each files?

    Secondly, how do I rewrite the following code in python? :

    Code:
    Sub FINDlinesinfolder()
    
    
        MsgBox ("Please choose the folder")
    
    
    
        Application.ScreenUpdating = False
    
        With Application.FileDialog(msoFileDialogFolderPicker)
            .AllowMultiSelect = False
            .Show
            If .SelectedItems.Count > 0 Then
                fd = .SelectedItems(1)
            End If
        End With
    
    
    
        fn = Dir(fd & "\" & "*.*")
    
        
            Set ws1 = Workbooks("myvalue.xls").Sheets(1)
    
    'I set my sheet that contains the values I want to look at
    
    
            ws1.Cells(1, 17) = "Found in following file"
            ws1.Cells(1, 18) = "found on following line"
    
            Do While fn <> "" ' I loop through each file
    
    
    
                Set ws2 = Workbooks.Open(fd & "\" & fn).Sheets(1)
                lr2 = ws2.Cells.Find(What:="*", After:=[A1], SearchDirection:=xlPrevious).Row
    
                For i = 1 To 600 ' I loop through my 600 values
    
    
    mydate = Right(ws1.Cells(i, 2), 4) & Mid(ws1.Cells(i, 2), 3, 2) & Left(ws1.Cells(i, 2), 2)
                    myaccount = WorksheetFunction.Substitute(ws1.Cells(i, 5), "-", "")
                    myamount = WorksheetFunction.Substitute(ws1.Cells(i, 3), ",", "")
    
    'My values are a combination of the formatted value of 3 cells
    'i could input only the end result in python
    
                    For y = 1 To lr2
                        If mydate <> "" Then
                            If ws2.Cells(y, 1) Like "*" & mydate & "*" & myaccount & "*" & myamount & "*" Then                            ws1.Cells(i, 17) = fn
                                ws1.Cells(i, 18) = y
                            End If
                        End If
                    Next y
    
    'If the line in the text file is like "wildcard" & mydate  & "wildcard" & myaccount & "wildcard" & myamount then write filename and  line in my original excel file
    
                Next i ' loop each line
    
                ws2.Parent.Close False
    
    
                fn = Dir
    
    
    
            Loop 'loop each files
     
    
        End Sub
    Hope you undersand what I am trying to do.

    Thank you for your help and time.
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    8
    Rep Power
    0

    Code


    This is my first attempt at a simplified version of what I want to do. I want to loop through each file in a folder and print the filename if a line contains the text 'mytest'.

    The only thing is that I am unable to make it run.

    Can you please help me?

    Code:
    >>> import os
    rootdir='c:\test\'
    def myscan(line):
        return line
    for subdir, dirs, files in os.walk(rootdir):
        for file in files:
            f=open(file, 'r')
            lines=f.readlines()
            for line in lines:
                if "mytest"
                in line: print f.path
            f.close()
  4. #3
  5. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,696
    Rep Power
    480
    Code:
    import os
    
    rootdir='c:\\test\\'  # The backslashes are a problem.
    
    for subdir, dirs, files in os.walk(rootdir):
        for file in files:
            with open(file, 'r') as f:
                lines=f.readlines()
            for line in lines:
                if "mytest" in line:
                    print f.path
    
    
    # In unix, I'd use this command
    
    # find root_path -type f -exec grep --silent mytest {} \; -print
    [code]Code tags[/code] are essential for python code and Makefiles!
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by b49P23TIvg
    Code:
    import os
    
    rootdir='c:\\test\\'  # The backslashes are a problem.
    
    for subdir, dirs, files in os.walk(rootdir):
        for file in files:
            with open(file, 'r') as f:
                lines=f.readlines()
            for line in lines:
                if "mytest" in line:
                    print f.path
    
    
    # In unix, I'd use this command
    
    # find root_path -type f -exec grep --silent mytest {} \; -print
    Thank you for the reply.

    I get an invalid syntax error at f.path highlight on f

    Can you please help me solve this problem?
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    8
    Rep Power
    0
    Got it to work but still have lot of questions.

    Code:
    import os
    
    rootdir='c:\\test\\'  #rootdir seems to be the directory where is located my python project (.py file) and not c:\test\. How do I solve this.
    
    for subdir, dirs, files in os.walk(rootdir):
        for file in files:
            with open(file, 'r') as f:
                lines=f.readlines()
            for line in lines:
                if "aaa" in line:
                    with open("Output.txt", "w") as text_file:text_file.write(f.name)
                    f.close() #my loop does not seem right because even tought I have multiple files that contains 'aaa' it only prints one
    Also, since I am quite beginner, I would really like if you could help me write the final program. I learned quite a lot in VBA just by looking at other people code and I would be grateful if you could help me with this.

    What I want it to do (I'm realizing that my first post might not be comprehensible) is this :

    #1 I will have a .txt file (myvalues.txt) that will contains 600 lines and on each line there will be three value separated by a space.
    Lets declare those 3 variables as follow : mydate, myaccount and myamount

    #2 open each file in a specified folder and read each lines from them

    #3 If the opened file contains a line with a string as follow from myvalue.txt (I will use "*" as wildcards and & to join each strings, even thought I'm not sure if this is how you do it in python): "*" & mydate & "*" & myaccount & "myamount" & "*" ; print this file name in an output file named output.txt

    #4 loop through each file

    Hope this is clearer and you can help me with this.

    Thank you for your help and time.
  10. #6
  11. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,696
    Rep Power
    480
    instead of `f.path' use `f.name' since name is an attribute of files.

    instead of `print string' use `print(string)' as this will work in python 2 and python 3.

    Where you have `f.close()' remove it. The with context already closed your file f .

    Maybe you should open this file in append mode. What you've got overwrites itself each time you execute the statement.
    with open("Output.txt", "w")
    Restructuring is better.

    Therefor:
    Code:
    import os
    
    rootdir='c:\\test\\'  #rootdir seems to be the directory where is located my python project (.py file) and not c:\test\. How do I solve this.
    
    with open("Output.txt", "w") as text_file:
    
        for subdir, dirs, files in os.walk(rootdir):
            for file in files:
    
                with open(file, 'r') as f:
                    lines=f.readlines()
                # f is closed when the code finishes the block
    
                for line in lines:
                    if "aaa" in line:
                        text_file.write(f.name)
                        break # I assume you don't need to write the name for each occurrence of the target string 
    
    # python closes text_file when you get back to this indentation level.
    Untested. My untested codes almost never work.
    [code]Code tags[/code] are essential for python code and Makefiles!
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    8
    Rep Power
    0
    Thank you for the reply.

    Two concerns :

    The program only writes one filename in output.txt even thought I have multiple files with 'aaa' string in them

    The program reads file in the directory where the program file (.py) is. I want it to read data from c:\test\

    Can you help me solve this
  14. #8
  15. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,696
    Rep Power
    480
    # the file is in subdir. Use os.path.join
    with open(os.path.join(subdir,file), 'r') as f:

    # you'll probably want a separator between the files
    # listed in the output.
    text_file.write(f.name+'\n')
    [code]Code tags[/code] are essential for python code and Makefiles!
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    8
    Rep Power
    0
    works like a charm thanks for the reply.

    Now lets say I have a text file with the following text file (named myvalues.txt)


    aaa bbb ccc
    ddd eee fff
    ggg hhh iii

    Instead of having if "aaa" in line:

    How do I get something like this :

    myline(x) # This is to illustrate a variable that would contains the value of x line that i could loop

    if "*" & left(myline(x)) & "*" & mid(myline(x),5,3) & "*" & right(myline(x),3) & "*" #Here & is used to join strings together and "*" is a wildcard (not sure how you do this in python)

    How do I code this?

    Thank you for your help and time with this.

    Really appreciated.
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    8
    Rep Power
    0
    bump...
  20. #11
  21. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,696
    Rep Power
    480
    Input:

    aaa bbb ccc
    ddd eee fff
    ggg hhh iii


    What is the output you desire?


    I can't derive a sane meaning from
    if "*" & left(myline(x)) & "*" & mid(myline(x),5,3) & "*" & right(myline(x),3) & "*"

    The nonsense I see:
    ) not connected in anyway to the input
    ) "*"&left
    why would there be anything to the left of left? (or to the right of right)


    Returning to your first post, you had
    mydate = Right(ws1.Cells(i, 2), 4) & Mid(ws1.Cells(i, 2), 3, 2) & Left(ws1.Cells(i, 2), 2)


    Study the re module, click here.

    You can join python strings with + or with the join method.

    >>> A='abc'
    >>> A += 'def'
    >>> ' , '.join((A,'ghi'))
    'abcdef , ghi'
    [code]Code tags[/code] are essential for python code and Makefiles!
  22. #12
  23. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Location
    Santa Clara, CA
    Posts
    5
    Rep Power
    0
    On a side note I've been developing a board on Python resources related to looping. Any recommendations you guys might have to add to this? http://www. verious.com/board/AKumar/looping-in-python/ (Sorry, I can't post links yet)

IMN logo majestic logo threadwatch logo seochat tools logo