Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old October 4th, 2012, 08:07 PM
auda_kidd[&] auda_kidd[&] is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Oct 2012
Posts: 1 auda_kidd[&] User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 12 m 58 sec
Reputation Power: 0
Debug python program

pls can anyone help me point out the errors in this/bugs in this program; especially at the except syntax.I have tried all i could.

from query import CrawlerDb
from content_processor import ContentProcessor
from settings import LOGGING
import sys
import urlparse
import urllib2
import shutil
import glob
import robotparser
import logging
import logging.config
import traceback

# ===== Init stuff =====

# db init
cdb = CrawlerDb()
cdb.connect()

# content processor init
processor = ContentProcessor(None, None, None)

# logging setup
logging.config.dictConfig(LOGGING)
logger = logging.getLogger("crawler_logger")

# robot parser init
robot = robotparser.RobotFileParser()

if len(sys.argv) < 2:
logger.info("Error: No start url was passed")
sys.exit()

l = sys.argv[1:]

cdb.enqueue(l)

def crawl():
logger.info("Starting (%s)..." % sys.argv[1])
while True:
url = cdb.dequeue()
u = urlparse.urlparse(url)
robot.set_url('http://'+u[1]+"/robots.txt")
if not robot.can_fetch('PyCrawler', url.encode('ascii', 'replace')):
logger.warning("Url disallowed by robots.txt: %s " % url)
continue
if not url.startswith('http'):
logger.warning("Unfollowable link found at %s " % url)
continue
except name= urllib2.HTTPError, e
status = e.code
if cdb.checkCrawled(url):
continue
if url is False:
break
status = 0
req = urllib2.Request(str(url))
req.add_header('User-Agent', 'PyCrawler 0.2.0')
request = None

try:
request = urllib2.urlopen(req)
except urllib2.URLError, e:
logger.error("Exception at url: %s\n%s" % (url, e))
continue

if status == 0:
status = 200
data = request.read()
processor.setInfo(str(url), status, data)
ret = processor.process()
if status != 200:
continue
add_queue = []
for q in ret:
if not cdb.checkCrawled(q):
add_queue.append(q)

processor.setInfo(str(url), status, data)
add_queue = processor.process()
l = len(add_queue)
logger.info("Got %s status from %s (Found %i links)" % (status, url, l))
if l > 0:
cdb.enqueue(add_queue)
cdb.addPage(processor.getDataDict())
processor.reset()

logger.info("Finishing...")
cdb.close()
logger.info("Done! Goodbye!")

if __name__ == "__main__":
try:
crawl()
except KeyboardInterrupt:
logger.error("Stopping (KeyboardInterrupt)")
sys.exit()
except:name = Exception, e:
logger.error("EXCEPTION: %s " % e)
traceback.print_exc()

Reply With Quote
  #2  
Old October 4th, 2012, 09:19 PM
Lux Perpetua Lux Perpetua is offline
Contributing User
Dev Shed Intermediate (1500 - 1999 posts)
 
Join Date: Feb 2004
Location: San Francisco Bay
Posts: 1,936 Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level)Lux Perpetua User rank is General 5th Grade (Above 100000 Reputation Level) 
Time spent in forums: 1 Month 1 Week 2 h 12 m 42 sec
Reputation Power: 1312
Put [code][/code] around your code, please.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > Debug python program

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap