October 22nd, 2003, 01:08 PM
-
os problem
The following code is supposed to count the files on a computer:
Code:
import os,sys
def count(filepath):
global counter
print filepath #temporary
print os.listdir(filepath) #temporary
for file in os.listdir(filepath):
if os.path.isfile(file) or os.path.islink(file):
counter += 1
elif os.path.isdir(file):
count(file)
else:
print "Erreur."
sys.exit(1)
counter = 0
root = 'C:/' #place your own system root here
count(root)
print 'There are ',counter,' files on this computer.'
For some reason, my C:\ is not considered as a file, nor a directory (and obviously not a symlink, since I'm on XP). Why could that be ?
Time is the greatest of teachers ; sadly, it kills all of its students.
- Hector Berlioz
October 22nd, 2003, 02:42 PM
-
hmmm
i looked up the specifications for os.path.isdir() and all those functions...
you have to specify the path, not just the filename 
so, what you'd want to do is os.path.isfile('%s%s'%(filepath,file))
'C:/' should work just fine as a directory specification...
so yeah... that answers why you're unable to do that. because you had a system.exit(1) in there, you never got any useful messages telling you what was going on
the way i figured out that the 'C:/' part was working while the isdir and isfile etc functions weren't... was that i replaced "system.exit(1)" with
print 'nope',file
so, when i saw a file printed after 'nope' i immediately saw that the file was not determinded to be a valid file or directory... that led me to digging around the os.path module, and that caused me to learn that the path must be specified 
however, i'm still running into problems in trying to count all the files on my computer. i'm still getting 'nope + file' printouts to screen and i can't figure that part out; i changed the recursive call to count(filepath+file) and that allowed me to get into subfolders of C:/, but it stopped working after that =/ ah well... i'll maybe play around with it more later, but for now i'll give it back to you
October 22nd, 2003, 02:44 PM
-
oh yeah
i also got this error after doing the whole '%s%s'%(filepath,file) thing...
WindowsError: [Errno 5] Access is denied: 'C:/System Volume Information/*.*'
i guess that means there is a protected system folder that you won't be able to get into? well, just be aware of it...
October 22nd, 2003, 02:53 PM
-
haha
sorry for spamming this post... but i figured out why i wasn't able to get beyond the subfolders...
instead of this:
os.path.isfile('%s%s'%(filepath,file))
do this:
os.path.isfile('%s/%s'%(filepath,file))
...the added '/' tells the computer that you're dealing with adding on directory names
then, for the recursive call, you'll want to do this:
count('%s/%s/'%(filepath,file))
yeah... that should do it.
i still don't know what to do about the 'C:/ System Volume Information' directory though =/
October 22nd, 2003, 03:05 PM
-
ok, last one from me i swear
soo sorry for all these msgs
in my last post, i said to do this:
count('%s/%s/'%(filepath,file))
that's not correct. do this instead:
count('%s/%s'%(filepath,file))
sorry!
i did that and i counted up to 9354 files on my computer before i ran into 'C:/System Volume Information' and was forced to stop with the error =/
October 22nd, 2003, 04:32 PM
-
Thanks a lot for the input ; I really missed the insertion of /.
As for that problem of yours, that's easily fixed. You can catch that error with os.access() :
Code:
import os
def count(filepath):
global counter,access_error
for file in os.listdir(filepath):
try:
if os.path.isfile(filepath+'/'+file) or os.path.islink(filepath+'/'+file):
counter += 1
elif os.path.isdir(filepath+'/'+file):
count(filepath+'/'+file)
except:
if not os.access and not access_error: access_error = True
counter = 0
access_error = False
root = 'C:/' #place your own system root here
count(root)
if access_error:
print 'You have access to ',counter,' files on this computer.'
print 'Could not access some of the files.'
else:
print 'There is ',counter,' files on this computer.'
EDIT : anybody sees a problem in this code ? Last time I ran Ad-Aware, it scanned 300k+ files, and I had about 138k according to my program.
Last edited by SolarBear; October 22nd, 2003 at 04:42 PM.
October 22nd, 2003, 08:52 PM
-
:)
cool, thanks for the pointer about access...
as for the file count, maybe Ad-Aware is able to go into the access protected directories? Also... doesn't ad-aware also scan the inside of zip files? i don't think the code ya got there reaches inside zip files... i could be wrong of course.
October 22nd, 2003, 10:34 PM
-
If Python can't access protected directories, I don't believe Ad-Aware could ; however, you could be right about ZIP, RAR, etc. files : it does scan inside them.
However, I wonder if there's any way of getting the system's root folder without specifying it in the file...
October 23rd, 2003, 10:35 AM
-
instead of using listdir and isfile I chose to use path.walk
I was able to use path.walk to list all files including those in system folders. (I also have XP) I'm not sure why there'd be a difference betweent he two approaches. But if we're just after a count of files....
Code:
import os
def walkfunc(ext,dir,files):
global TotalFiles
for file in files:
TotalFiles += 1
TotalFiles = 0
os.path.walk('c:\\',walkfunc,'.*')
print TotalFiles
Last edited by irishtek; October 23rd, 2003 at 11:05 AM.
October 23rd, 2003, 11:00 AM
-
also with a minor modification to the last program you can count the number of contents in a zip file....
Code:
import os
import zipfile
def walkfunc(ext,dir,files):
global TotalFiles,ZipFiles
for file in files:
if file[-3:] == "zip":
zipfilepath = os.path.join(dir,file)
z = zipfile.ZipFile(zipfilepath)
ZipFiles += len(z.namelist())
TotalFiles += 1
TotalFiles = 0
ZipFiles = 0
os.path.walk('c:\\',walkfunc,'.*')
print TotalFiles
print ZipFiles
Last edited by irishtek; October 23rd, 2003 at 11:05 AM.
October 23rd, 2003, 12:43 PM
-
I've written a small, hopefully efficent, self contained function based on Irishs first program using os.walk() instead of os.path.walk().. if somone else could give it a test and tell me if it works properly because i'm getting different from the two programs
.
My guess is that os.walk() does the opposit to os.path.walk() when dealing with protected directories and just ignores them, but i can't find anything to confirm or deny this?!
Code:
import os
def countree(path, type = 2, count = 0):
for object in os.walk(path):
count = count + len(object[type])
return count
Note: you can use the numbers 0-2 in the type option to count directories aswell files.
Oh, Irish, this isn't exactly accurate since the list returned by ZipFile.namelist() includes directories aswell as any file!
Mark.
October 23rd, 2003, 01:34 PM
-
Your right.
The file count also counts directories... although in the case of viruses directories themselves could be infected.. at any rate there's even a higher level of complication...
Which probably is quickest to handle with a little recursion:
Counting files that are compressed as 'zip' and then a series of zip files compressed together as 'zip'
The files in the nested zip files don't get counted with this algorithim either.
October 24th, 2003, 04:35 PM
-
Ok i've written a little function, which should give you a list of ALL the FILES inside a zip archive, regardless if they are inside another zip! You shouldn't have a problem modifieing this to include directories or both.. Obviously using recursion 
Code:
#!/usr/bin/env python
import os, zipfile
def rezipe(path, files = []):
zip = zipfile.ZipFile(path)
for name in zip.namelist():
if name.endswith('zip'):
file(name, 'wb').write(zip.read(name))
rezipe(name)
os.remove(name)
elif not name.endswith('/'):
files.append(name)
return files
Note: The zip archives themselves are not actually counted.. but again thats easily fixed!
Have fun,
Mark.