Thread: split lines

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2004
    Posts
    4
    Rep Power
    0

    split lines


    I am reading in a log file in the following format:

    64.68.82.38 - - [16/Nov/2003:08:17:57 +0000] "GET /pipermail/gm2/2003-September/author.html HTTP/1.0" 200 4537

    and i am trying to count the number of accesses to the file pipermail

    Code:
    def ProcessLine ()
           count = 0
           for line in open(logfile.data).readlines():
                 words = string.split(line)
                 space = string.split(words[6],' ')
                 line = string.split('/')
                 site = space[0]
                 print 'Filename:', site
                 if site == 'pipermail':
                      count += 1
           print 'No. of accesses in pipermail:',count
    this code displays all the files but it only gives a count of 0 which is obviously wrong.

    Edit: Added [ code ] tags, this way you're code will maintain its indentation; very important in Python
    Last edited by netytan; January 14th, 2004 at 07:55 AM.
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Ok, the problem here is that site is never 'pipermail' so count never becomes more than 0

    What you need to do is check if site starts with '/pipermail/', where i could've fixed your function it just seemed easier to just start over...

    Code:
    #!/usr/bin/env python
    
    def processline():
    	count = 0
    	for line in open('test.txt', 'r'):
    		words = line.split()
    		words = words[6]
    		print 'Filename:', words
    		if words.startswith('/pipermail/'):
    			count = count + 1
    	print 'No. of accesses in pipermail:', count
    
    if __name__ == '__main__':
    	processline()
    another thing i knowticed, you're fuction would also cause a SyntaxError because theres no ':' after you define your function

    Edit: fixed the typo in the shebang

    Mark.
    Last edited by netytan; January 14th, 2004 at 03:20 PM.
    programming language development: www.netytan.com Hula

  4. #3
  5. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,642
    Rep Power
    4247
    You have a minor typo too netytan. Your first line reads:
    #!/usr/bun/env python
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  6. #4
  7. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Thanks Scorpi , the thing about windows, it runs regardless so its hard to debug little things like that ... We'll spotted!

    Mark.
    programming language development: www.netytan.com Hula

  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2004
    Posts
    4
    Rep Power
    0

    logfile help


    Code:
    #!/usr/bin/python
    
    import sys, string, getopt
    
    def count_lines ():
           linecount = 0
           logfile = opt[1]
           for  line in open(logfile).readlines():
                  linecount = linecount + 1
    
            print "accesses:", linecount
    
    def processline ():
           count = 0
           total = 0
           logfile = opt[1]
           for line in open(logfile).readlines():
                 words = string.split(line)
                 words = words[6]
                 total = total + 1
                 if words.startswith('/pipermail'):
                     count = count + 1
                     percentage = count * 100 / total
                print  'Accesses directory:',count,'(',percentage,'%)'
    
    try:
          optlist, list = getopt.getopt(sys.argv[1:], ':p:f:')
    
    except getopt.GetoptError:
             Usage()
             print "called exception"
             sys.exit(2)
    
    for opt in optlist
          if opt[0] == '-p':
               processline()
          if opt[0] == '-f':
              count_lines()
    when this code is run in a terminal using the following options:

    ./python.py -f access.data -p access.data
    i get the following output:

    accesses: 6
    Accesses directory: 3 ( 50% )

    but instead of putting the filename after the -p option i would like to be able to put in a directory name such as:
    /pipermail or,
    /Glamorgan
    and then the output would display the number of access to that specific director and not the whole file.

    The logfile which I am processing contains the following data (just a sample as it contains many entries some with /pipermail and somewith /Glamorgan)

    213.1.145.506 --{12/Dec/2002:07:41:19 +0000} " GET /pipermail/notes/web/m2f.html HTTP/1.0" 301 -

IMN logo majestic logo threadwatch logo seochat tools logo