#1
  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2004
    Location
    UK
    Posts
    3
    Rep Power
    0

    Question Webserver log script


    Hi all i'm new to all this so i apologise. I've looked at tutorials for python and i can't make heads nor tails of them.

    I need to make a python program which will summarise an apache web server log file. It should report the total number of successful web server accesses.
  2. #2
  3. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,643
    Rep Power
    4248
    Adapting some code from the Python Cookbook (original author for this snippet was Mark Nenadov):
    Code:
    #!/usr/bin/env python 
    def CalculateApacheIpHits(logfile_pathname):
        IpHitListing = {}
        # Use readlines(), if xreadlines() doesn't work.
        Contents = open(logfile_pathname, "r").xreadlines()
    
        # Alternatively, use the following code, if you don't have
        # xreadlines and don't want to slurp the code with readlines. 
        # Comment the "Contents = open..." and "for line in..." and 
        # replace with the lines below:
        #input = open(logfile_pathname, "r") 
        #while 1:
        #    line = input.readline()
        #    if not line: break
    
        for line in Contents:
            #Split the string to isolate the IP address
            Ip = line.split(" ")[0]
    
            if 6 < len(Ip) <= 15:
                #Update the hash
                IpHitListing[Ip] = IpHitListing.get(Ip, 0) + 1;
    
        return IpHitListing
    
    if __name__ == "__main__":
        # Substitute your own log file name here
        hits = CalculateApacheIpHits("/var/log/apache/access_log");
        for ip in hits.keys():
            print ip, "\t\t", hits[ip]
    This one only summarises the hits by IP address. However, you can also get it to check the HTTP return code (if you're logging that, of course) and only log if a successful code (i.e. 200, 301, 302 etc.) is returned.
    Last edited by Scorpions4ever; January 18th, 2004 at 11:32 AM.
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  4. #3
  5. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2004
    Location
    UK
    Posts
    3
    Rep Power
    0
    Thank you very much....this will help me tremendously. Will apply it monday hopefully.

    Words can't explain.
  6. #4
  7. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Just a small, more up to date example of how this task could be done...

    Code:
    #!/usr/bin/env python
    
    def hits(path):
    	count = {}
    	codes = ('200', '301', '302', 'etc')
    	for line in file(path, 'r'):
    		line = line.split()
    		if line[-2] in codes: count[len(count) + 1] = value
    	return count
    This should do what scorpy was discribing (untested). This actually has a lot of potensial , would be interested in seeing you're finished piece if you're willing to show it.

    Note: xreadlines was depreciated in Python 2.3 and replaced with 'for line in file'

    Anyway, have fun.

    Mark.
    programming language development: www.netytan.com Hula

  8. #5
  9. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2004
    Location
    UK
    Posts
    3
    Rep Power
    0
    Cool thx

IMN logo majestic logo threadwatch logo seochat tools logo