March 4th, 2013, 04:22 PM
-
Loop through each file in folder and look for specific character strings
Hi,
First off, I'd like to precise that I have never programmed anything in python, but I would like to write this little program to learn more about this language.
I currently work with VBA within Excel, but I would like to know if I could be able to do this job with python.
I have an Excel file with 600 lines. Those lines contains the specific characters strings I want to search in each .txt file in a folder.
Here is my current code in VBA. I commented the code a bit in order for you to understand each steps.
To start, how do I input my 600 values in python in order to look for them in each files?
Secondly, how do I rewrite the following code in python? :
Code:
Sub FINDlinesinfolder()
MsgBox ("Please choose the folder")
Application.ScreenUpdating = False
With Application.FileDialog(msoFileDialogFolderPicker)
.AllowMultiSelect = False
.Show
If .SelectedItems.Count > 0 Then
fd = .SelectedItems(1)
End If
End With
fn = Dir(fd & "\" & "*.*")
Set ws1 = Workbooks("myvalue.xls").Sheets(1)
'I set my sheet that contains the values I want to look at
ws1.Cells(1, 17) = "Found in following file"
ws1.Cells(1, 18) = "found on following line"
Do While fn <> "" ' I loop through each file
Set ws2 = Workbooks.Open(fd & "\" & fn).Sheets(1)
lr2 = ws2.Cells.Find(What:="*", After:=[A1], SearchDirection:=xlPrevious).Row
For i = 1 To 600 ' I loop through my 600 values
mydate = Right(ws1.Cells(i, 2), 4) & Mid(ws1.Cells(i, 2), 3, 2) & Left(ws1.Cells(i, 2), 2)
myaccount = WorksheetFunction.Substitute(ws1.Cells(i, 5), "-", "")
myamount = WorksheetFunction.Substitute(ws1.Cells(i, 3), ",", "")
'My values are a combination of the formatted value of 3 cells
'i could input only the end result in python
For y = 1 To lr2
If mydate <> "" Then
If ws2.Cells(y, 1) Like "*" & mydate & "*" & myaccount & "*" & myamount & "*" Then ws1.Cells(i, 17) = fn
ws1.Cells(i, 18) = y
End If
End If
Next y
'If the line in the text file is like "wildcard" & mydate & "wildcard" & myaccount & "wildcard" & myamount then write filename and line in my original excel file
Next i ' loop each line
ws2.Parent.Close False
fn = Dir
Loop 'loop each files
End Sub
Hope you undersand what I am trying to do.
Thank you for your help and time.
March 4th, 2013, 07:14 PM
-
Code
This is my first attempt at a simplified version of what I want to do. I want to loop through each file in a folder and print the filename if a line contains the text 'mytest'.
The only thing is that I am unable to make it run.
Can you please help me?
Code:
>>> import os
rootdir='c:\test\'
def myscan(line):
return line
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(file, 'r')
lines=f.readlines()
for line in lines:
if "mytest"
in line: print f.path
f.close()
March 4th, 2013, 08:23 PM
-
Code:
import os
rootdir='c:\\test\\' # The backslashes are a problem.
for subdir, dirs, files in os.walk(rootdir):
for file in files:
with open(file, 'r') as f:
lines=f.readlines()
for line in lines:
if "mytest" in line:
print f.path
# In unix, I'd use this command
# find root_path -type f -exec grep --silent mytest {} \; -print
[code]
Code tags[/code] are essential for python code and Makefiles!
March 4th, 2013, 08:45 PM
-
Originally Posted by b49P23TIvg
Code:
import os
rootdir='c:\\test\\' # The backslashes are a problem.
for subdir, dirs, files in os.walk(rootdir):
for file in files:
with open(file, 'r') as f:
lines=f.readlines()
for line in lines:
if "mytest" in line:
print f.path
# In unix, I'd use this command
# find root_path -type f -exec grep --silent mytest {} \; -print
Thank you for the reply.
I get an invalid syntax error at f.path highlight on f
Can you please help me solve this problem?
March 5th, 2013, 01:42 AM
-
Got it to work but still have lot of questions.
Code:
import os
rootdir='c:\\test\\' #rootdir seems to be the directory where is located my python project (.py file) and not c:\test\. How do I solve this.
for subdir, dirs, files in os.walk(rootdir):
for file in files:
with open(file, 'r') as f:
lines=f.readlines()
for line in lines:
if "aaa" in line:
with open("Output.txt", "w") as text_file:text_file.write(f.name)
f.close() #my loop does not seem right because even tought I have multiple files that contains 'aaa' it only prints one
Also, since I am quite beginner, I would really like if you could help me write the final program. I learned quite a lot in VBA just by looking at other people code and I would be grateful if you could help me with this.
What I want it to do (I'm realizing that my first post might not be comprehensible) is this :
#1 I will have a .txt file (myvalues.txt) that will contains 600 lines and on each line there will be three value separated by a space.
Lets declare those 3 variables as follow : mydate, myaccount and myamount
#2 open each file in a specified folder and read each lines from them
#3 If the opened file contains a line with a string as follow from myvalue.txt (I will use "*" as wildcards and & to join each strings, even thought I'm not sure if this is how you do it in python): "*" & mydate & "*" & myaccount & "myamount" & "*" ; print this file name in an output file named output.txt
#4 loop through each file
Hope this is clearer and you can help me with this.
Thank you for your help and time.
March 5th, 2013, 10:45 AM
-
instead of `f.path' use `f.name' since name is an attribute of files.
instead of `print string' use `print(string)' as this will work in python 2 and python 3.
Where you have `f.close()' remove it. The with context already closed your file f .
Maybe you should open this file in append mode. What you've got overwrites itself each time you execute the statement.
with open("Output.txt", "w")
Restructuring is better.
Therefor:
Code:
import os
rootdir='c:\\test\\' #rootdir seems to be the directory where is located my python project (.py file) and not c:\test\. How do I solve this.
with open("Output.txt", "w") as text_file:
for subdir, dirs, files in os.walk(rootdir):
for file in files:
with open(file, 'r') as f:
lines=f.readlines()
# f is closed when the code finishes the block
for line in lines:
if "aaa" in line:
text_file.write(f.name)
break # I assume you don't need to write the name for each occurrence of the target string
# python closes text_file when you get back to this indentation level.
Untested. My untested codes almost never work.
[code]
Code tags[/code] are essential for python code and Makefiles!
March 5th, 2013, 07:22 PM
-
Thank you for the reply.
Two concerns :
The program only writes one filename in output.txt even thought I have multiple files with 'aaa' string in them
The program reads file in the directory where the program file (.py) is. I want it to read data from c:\test\
Can you help me solve this
March 5th, 2013, 08:54 PM
-
# the file is in subdir. Use os.path.join
with open(os.path.join(subdir,file), 'r') as f:
# you'll probably want a separator between the files
# listed in the output.
text_file.write(f.name+'\n')
[code]
Code tags[/code] are essential for python code and Makefiles!
March 5th, 2013, 09:42 PM
-
works like a charm thanks for the reply.
Now lets say I have a text file with the following text file (named myvalues.txt)
aaa bbb ccc
ddd eee fff
ggg hhh iii
Instead of having if "aaa" in line:
How do I get something like this :
myline(x) # This is to illustrate a variable that would contains the value of x line that i could loop
if "*" & left(myline(x)) & "*" & mid(myline(x),5,3) & "*" & right(myline(x),3) & "*" #Here & is used to join strings together and "*" is a wildcard (not sure how you do this in python)
How do I code this?
Thank you for your help and time with this.
Really appreciated.
March 6th, 2013, 09:57 PM
-
March 6th, 2013, 10:21 PM
-
Input:
aaa bbb ccc
ddd eee fff
ggg hhh iii
What is the output you desire?
I can't derive a sane meaning from
if "*" & left(myline(x)) & "*" & mid(myline(x),5,3) & "*" & right(myline(x),3) & "*"
The nonsense I see:
) not connected in anyway to the input
) "*"&left
why would there be anything to the left of left? (or to the right of right)
Returning to your first post, you had
mydate = Right(ws1.Cells(i, 2), 4) & Mid(ws1.Cells(i, 2), 3, 2) & Left(ws1.Cells(i, 2), 2)
Study the re module, click here.
You can join python strings with + or with the join method.
>>> A='abc'
>>> A += 'def'
>>> ' , '.join((A,'ghi'))
'abcdef , ghi'
[code]
Code tags[/code] are essential for python code and Makefiles!
March 7th, 2013, 01:30 PM
-
On a side note I've been developing a board on Python resources related to looping. Any recommendations you guys might have to add to this? http://www. verious.com/board/AKumar/looping-in-python/ (Sorry, I can't post links yet)