Thread: Python MySQLdb

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    9
    Rep Power
    0

    Python MySQLdb


    Hi.

    I' have an SQL database setup on my local machine called urls which I can connect to no problem.
    The database a table called links.

    In that table are 2 columns.

    1 is an auto increment primary ID
    the other is called url and is varchar.

    My little script runs as if it's adding information to the database but when I open phpmyadmin to browse the rows it returns empty rows.

    Nothing is written to the database.

    I'd be greatful if someone could check my code for me and spot my mistake.
    Code:
    import sys
    import re
    import urllib2
    import urlparse
    import MySQLdb
    
    
    
    """ The begining of the DB setup..."""
    con = None
    
    try:
    
        con = MySQLdb.connect('localhost','user','pass','urls');
        print "Connected"
        cur = con.cursor()
        cur.execute("SELECT VERSION()")
        
        data = cur.fetchone()
        
        print "Database version : %s " % data
        
    except MySQLdb.Error, e:
      
        print "Error %d: %s" % (e.args[0],e.args[1])
        sys.exit(1)
        
       
            
        
    
    
    
    
    tocrawl = set(["http://dmoz.org"])
    crawled = set([])
    keywordregex = re.compile('<meta\sname=["\']keywords["\']\scontent=["\'](.*?)["\']\s/>')
    linkregex = re.compile('<a\s*href=[\'|"](.*?)[\'"].*?>')
    
    while 1:
    	try:
    		crawling = tocrawl.pop()
    		print crawling
    		
    		cur.execute("""INSERT INTO links(url) VALUES('crawling')""")
    		con.commit()
    		
    	except KeyError:
    		raise StopIteration
    	url = urlparse.urlparse(crawling)
    	try:
    		response = urllib2.urlopen(crawling)
    	except:
    		continue
    	msg = response.read()
    	startPos = msg.find('<title>')
    	if startPos != -1:
    		endPos = msg.find('</title>', startPos+7)
    		if endPos != -1:
    			title = msg[startPos+7:endPos]
    			print title
    	keywordlist = keywordregex.findall(msg)
    	if len(keywordlist) > 0:
    		keywordlist = keywordlist[0]
    		keywordlist = keywordlist.split(", ")
    		print keywordlist
    	links = linkregex.findall(msg)
    	crawled.add(crawling)
    	for link in (links.pop(0) for _ in xrange(len(links))):
    		if link.startswith('/'):
    			link = 'http://' + url[1] + link
    		elif link.startswith('#'):
    			link = 'http://' + url[1] + url[2] + link
    		elif not link.startswith('http'):
    			link = 'http://' + url[1] + '/' + link
    		if link not in crawled:
    			tocrawl.add(link)
    I've read that I need to run cur.commit after my insert statement but I get a traceback like this...

    Traceback (most recent call last):
    File "C:\Python27\spider.py", line 46, in <module>
    cur.commit
    AttributeError: 'Cursor' object has no attribute 'commit'

    EDIT** I made a mistake in my code. I was trying to run commit on the cursor rather than the connection.
    I've edited the code but it still doesn't work.
    TIA
    Last edited by sheffieldlad; February 8th, 2013 at 10:20 AM. Reason: spotted mistake in my code
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    3
    Rep Power
    0
    Get rid of the pain, download the Django framework for Python. Then download or create a Django-ORM standalone application. Now you can use Django's queryset language instead of horrid SQL statements.

IMN logo majestic logo threadwatch logo seochat tools logo