#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    4
    Rep Power
    0

    Python + BSoup AttributeError 'NavigableString' object has no attribute 'has_attr'


    So i am trying to parse data from an 'onclick' attribute on an 'a' tag. The data is held inside a 'div' tag with
    'id=accordion'. There are two 'a' tags inside this 'div' container but i am only trying to access the first one.

    the two 'a tags are:

    PHP Code:
    <a onclick="getProductsBasedOnCategoryID('Asus','AC Adaptor','ET10B','6941','E Series')">
        
        <
    a onclick="getProductInformationModal('Asus','04G265003580')"
    Basically i am having two problems which both relate to the same problem.

    When i test the first part of the code the following way, it works fine however...:

    PHP Code:
    datas s.find(id='accordion')
        
    datas.findAll('a')[0]
        print 

    ...This only prints the first category, and i want all the categories to be printed.. e.g AC Adaptor, Bracket, Cable, Camera,
    HDD, etc.. The only problem is that it doesn't work with the following for loop:

    PHP Code:
    for data in a:
        
                if(
    data.has_attr('onclick')):
                    
    arguments literal_eval('(' data['onclick'].replace(', this''').split('('1)[1])
                    
    model_info.append(arguments)
                    print 
    arguments 
    I get the following error:

    PHP Code:
     line 80in <module>
            if(
    data.has_attr('onclick')):
          
    File "C:\Python27\lib\site-packages\bs4\element.py"line 667in __getattr__
            self
    .__class__.__name__attr))
        
    AttributeError'NavigableString' object has no attribute 'has_attr' 
    However, if i take the 'index' out, it works with the for loop but prints out BOTH the 'a' tags as there is no 'index' of 0.
    But when there is an 'index' of 1, it DOES print out the SECOND 'a' tag. So i dont get why only the second 'index' works but
    not the first? And how can i solve this issue so that it prints out the first 'index' with ALL categories?
    This is the desired output i want:

    Code:
        Asus	AC Adapter	ET10B	6941	E Series		
        Asus	Bracket		ET10B	7138	E Series
        Asus	Cable		ET10B	6983	E Series
        Asus	Camera		ET10B	6985	E Series
        Asus	Cooling		ET10B	6999	E Series
        Asus	Cover		ET10B	6984	E Series
    Any help would be appreciatted, thanks.
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,996
    Rep Power
    481
    You worry about "index" but the error fingers "has_attr".

    hasattr is the usual python function, which accesses
    __hasattr__
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    4
    Rep Power
    0
    I changed it to 'hasattr' but it is still showing the same error, I am not sure where to go from there..
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,996
    Rep Power
    481
    Your beautiful soup trouble might interest someone who can help you if you posted enough code to demonstrate the problem. You showed some input---that's better than many who ask questions.

    Input:
    <a onclick="getProductsBasedOnCategoryID('Asus','AC Adaptor','ET10B','6941','E Series')">

    <a onclick="getProductInformationModal('Asus','04G265003580')">


    Output:
    Code:
        Asus	AC Adapter	ET10B	6941	E Series		
        Asus	Bracket		ET10B	7138	E Series
        Asus	Cable		ET10B	6983	E Series
        Asus	Camera		ET10B	6985	E Series
        Asus	Cooling		ET10B	6999	E Series
        Asus	Cover		ET10B	6984	E Series
    Originally Posted by littlea5ma
    However, if i take the 'index' out,
    I searched this thread web page for "index" finding the first occurrence of "index" in that quote, and all occurrences within that paragraph.

    Provide reasonable information to increase the chance you'll get a useful answer.
    Last edited by b49P23TIvg; April 25th, 2013 at 11:42 AM.
    [code]Code tags[/code] are essential for python code and Makefiles!
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    4
    Rep Power
    0
    Would it help if i posted the entire code i have? If so..


    PHP Code:
    import stringurllib2urlparsecsvsyscodecscStringIOglobreos
    from urllib import quote
    from urlparse import urljoin
    from bs4 import BeautifulSoup
    from ast import literal_eval

    class UnicodeWriter:
        
    """
        A CSV writer which will write rows to CSV file "
    f",
        which is encoded in the given encoding.
        """

        
    def __init__(selffdialect=csv.excelencoding="utf-8", **kwds):
            
    # Redirect output to a queue
            
    self.queue cStringIO.StringIO()
            
    self.writer csv.writer(self.queuedialect=dialect, **kwds)
            
    self.stream f
            self
    .encoder codecs.getincrementalencoder(encoding)()

        
    def writerow(selfrow):
            
    self.writer.writerow([s.encode("utf-8") for s in row])
            
    # Fetch UTF-8 output from the queue ...
            
    data self.queue.getvalue()
            
    data data.decode("utf-8")
            
    # ... and reencode it into the target encoding
            
    data self.encoder.encode(data)
            
    # write to the target stream
            
    self.stream.write(data)
            
    # empty queue
            
    self.queue.truncate(0)

        
    def writerows(selfrows):
            for 
    row in rows:
                
    self.writerow(row)

    changable_url 'asusparts.eu/partfinder/Asus/All In One/E Series'
    page urllib2.urlopen(changable_url)
    base_url 'asusparts.eu'
    soup BeautifulSoup(page)

    selects = []
    redirects = []
    model_info = []

    # Opening CSV writer
    UnicodeWriter(open(r"asus_stock.csv""wb"))

    print 
    "FETCHING OPTIONS"
    select soup.find(id='myselectListModel')
    selects.append(select)

    for 
    item in selects:
        print 
    item.get_text()

    options select.findAll('option')

    for 
    option in options:
        if(
    option.has_attr('redirectvalue')):
            
    redirects.append(option['redirectvalue'])

    for 
    r in redirects:
        
    rpage urllib2.urlopen(urljoin(base_urlquote(r)))
        
    BeautifulSoup(rpage)

        
    # Fetching the main title for each specific model and printing it out
        
    print "FETCHING MAIN TITLE"
        
    maintitle s.find(id='puffBreadCrumbs')
        print 
    maintitle.get_text()

        
    datas s.find(id='accordion')

        
        
    datas.findAll('a')[1]
        print 
    a

        content 
    datas.findAll('span')

        print 
    "FETCHING CATEGORY" 
        
    for data in a:
            if(
    data.has_attr('onclick')):
                
    arguments literal_eval('(' data['onclick'].replace(', this''').split('('1)[1])
                
    model_info.append(arguments)
                print 
    arguments

        
    # Fetching name of product
        
    print "\n"
        
    print "FETCHING PRODUCT NAME"
        
    name s.find('b').get_text()
        print 
    "Product Name: " name

        
    # Fetching datatable which contains all information about product
        
    table s.find(class_='ProduktLista')
        
        
    # Fetching part-number for product
        
    parttable datas.findAll('td')[1]   

        print 
    "\n"
        
    print "FETCHING PART NUMBER"
        
    partnum parttable.findAll('span')[1]    
        print 
    partnum.get_text()


        
    # Fetching price with VAT
        
    pricetable datas.findAll('td')[2]
        
        print 
    "FETCHING PRICE (inc. VAT)"
        
    incprice pricetable.findAll('span')[0]
        print 
    "Price: " incprice.get_text()
        
        
    # Fetching price without VAT
        
    print "FETCHING PRICE (ex. VAT)"
        
    exprice pricetable.findAll('span')[1]
        print 
    "Price: " exprice.get_text()

        
    # Fetching images
        
    print "\n"
        
    print "FETCHING IMAGES"
        
    img s.find('td')

        
    images img.findAll('img')
        print 
    images
        
    print "\n"

    c.writerows(model_info
    Also, because i am a new user the forum wouldnt allow me to post the whole url so i had to take the http out.

    Here is a snippet of where im parsing the data from

    PHP Code:
    <div id="accordion" class="ui-accordion ui-widget ui-helper-reset ui-accordion-icons" style="width: 760px;" role="tablist">
        <
    h3 class="ui-accordion-header ui-helper-reset ui-state-active ui-corner-top" role="tab" aria-expanded="true" aria-selected="true" tabindex="0">
            <
    span class="ui-icon ui-icon-triangle-1-s"></span>
            <
    a onclick="getProductsBasedOnCategoryID('Asus','AC Adapter','ET10B','6941', this, 'E Series')" href="#AC Adapter" tabindex="-1" loaded="Loaded">AC Adapter </a>
        </
    h3>
        <
    div id="6941" class="ui-accordion-content ui-helper-reset ui-widget-content ui-corner-bottom ui-accordion-content-active" role="tabpanel" style="display: block;">
            <
    table class="productTableList">
                <
    tbody>
            </
    table>
            <
    table class="productTableList">
                <
    tbody>
                    <
    tr style="height:90px;background-color:#ebf4ff;">
                        <
    td class="ProduktLista" width="70px">
                        <
    td class="ProduktLista" width="315">
                            <
    a onclick="getProductInformationModal("Asus","14G110008340");">
                            <
    br
    The first a tag is the one i want
  10. #6
  11. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,996
    Rep Power
    481
    Running your program gave me
    1) indentation error because of the space character you thoughtfully included at the start of your code post.
    2) fixing that, I get unknown url type error.
    3) fixing that, with
    Code:
    changable_url = 'http://www.asusparts.eu/partfinder/Asus/All In One/E Series'
    gives me urllib2.HTTPError: HTTP Error 400: Bad Request
    4) fixing that, I replace changeable_url and page with
    Code:
     
    #changable_url = 'http://www.asusparts.eu/partfinder/Asus/All In One/E Series'
    #page = urllib2.urlopen(changable_url)
    base_url = 'asusparts.eu'
    import io
    page = io.StringIO(u'''<div id="accordion" class="ui-accordion ui-widget ui-helper-reset ui-accordion-icons" style="width: 760px;" role="tablist"> 
        <h3 class="ui-accordion-header ui-helper-reset ui-state-active ui-corner-top" role="tab" aria-expanded="true" aria-selected="true" tabindex="0"> 
            <span class="ui-icon ui-icon-triangle-1-s"></span> 
            <a onclick="getProductsBasedOnCategoryID('Asus','AC Adapter','ET10B','6941', this, 'E Series')" href="#AC Adapter" tabindex="-1" loaded="Loaded">AC Adapter </a> 
        </h3> 
        <div id="6941" class="ui-accordion-content ui-helper-reset ui-widget-content ui-corner-bottom ui-accordion-content-active" role="tabpanel" style="display: block;"> 
            <table class="productTableList"> 
                <tbody> 
            </table> 
            <table class="productTableList"> 
                <tbody> 
                    <tr style="height:90px;background-color:#ebf4ff;"> 
                        <td class="ProduktLista" width="70px"> 
                        <td class="ProduktLista" width="315"> 
                            <a onclick="getProductInformationModal("Asus","14G110008340");"> 
                            <br>  
    ''')
    page.seek(0)
    gives
    Code:
    FETCHING OPTIONS
    Traceback (most recent call last):
      File "s.py", line 72, in <module>
        print item.get_text()
    AttributeError: 'NoneType' object has no attribute 'get_text'
    5) Next I search for 'myselectListModel' within this program, and there's only one occurrence. I classify the NoneType message as "expected".


    If you were me, would you give up?
    [code]Code tags[/code] are essential for python code and Makefiles!
  12. #7
  13. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,996
    Rep Power
    481
    Unfortunately, I'm tenacious.
    If I substitute for the junk you passed off the page source at 'http://www.asusparts.eu/partfinder/Asus/All In One/E Series'
    and then I fix
    base_url = 'http://asusparts.eu'
    your program does run and reproduces the error.
    I then change the code to show what's happening:
    Code:
        print "FETCHING CATEGORY"
        for data in a:
            print('--->.{}'.format(data))##############this line is new
            if(data.has_attr('onclick')):
                arguments = literal_eval('(' + data['onclick'].replace(', this', '').split('(', 1)[1])
                model_info.append(arguments)
                print arguments
    and then pipe the output through
    gawk 'a{print;a=0}/FETCHING CATEGORY/{a=1}'
    We find
    Code:
    --->.<span><b>POWER ADAPTER 65W19V 3PIN</b>
    --->.<span><b>POWER CORD 3P L:80CM,UK(B)</b>
    --->.<span><b>POWER CORD 3P L:150CM,US(B)</b>
    --->.<span><b>POWER ADAPTER 65W19V 3PIN</b>
    --->.<span><b>POWER ADAPTER 65W19V 3PIN</b>
    --->.<span><b>ADAPTER 40W/19V</b>
    --->.<span><b>POWER CORD 2P L:150CM,US(B)</b>
    --->.<span><b>POWER ADAPTER 65W19V 3PIN BLACK</b>
    --->.<span><b>POWER ADAPTER 65W19V 3PIN</b>
    --->.<span><b>POWER CORD 3P L:80CM,AU(B)</b>
    --->.<span><b>POWER ADAPTER 65W19V 3PIN</b>
    --->.<span><b>TK NCL30 AC ADAPTER-90W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-90W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-90W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-90W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-65W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-65W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-65W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-90W</b>
    --->.<span><b>TK PCA60 AC ADAPTER-65W-MEX B</b>
    --->.<span><b>TK NCL30 AC ADAPTER-90W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-90W</b>
    --->.<span><b>TK PCA60 AC ADAPTER-65W-MEX B</b>
    --->.<span><b>TK NCL30 AC ADAPTER-90W</b>
    --->.<span><b>TK PCA60 AC ADAPTER-65W-MEX B</b>
    --->.<span><b>TK PCA60 AC ADAPTER-65W-MEX B</b>
    --->.<span><b>TK NCL30 AC ADAPTER-120W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-120W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-120W</b>
    --->.<span><b>TK NCL30 AC ADAPTER-120W</b>
    --->.Accessory
    One of these is not the same as the others. Can you tell which one? One of them caused the error. Was it one of the ones in the middle? The first? NO! It was the last.

    See if the information on this web page helps:
    http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#Searching%20the%20Parse%20Tree

    Otherwise, you could catch this sort of problem with a try: except: block, or use other tests.
    [code]Code tags[/code] are essential for python code and Makefiles!
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    4
    Rep Power
    0
    Oh, i have felt like giving up a million times lol. I am a newbie to Python Programming so it is taking me a while to get used to it. Well i will have to read through your posts more thoroughly to get an understanding but i have changed my for loop to this:

    PHP Code:
    print "FETCHING CATEGORY"
        
    atag s.h3
        
    for data in atag:
            while 
    getattr(atag'name'None) != 'h3':
                
    atag atag.nextSibling
            atag
    .a
            
    print atag 
    Which prints out this:

    Code:
    <h3><a href="#AC Adapter" onclick="getProductsBasedOnCategoryID('Asus','AC Adapter','ET1611PUK','6941', this, 'E Series')">AC Adapter
    
                </a></h3>
    <h3><a href="#AC Adapter" onclick="getProductsBasedOnCategoryID('Asus','AC Adapter','ET1611PUT','6941', this, 'E Series')">AC Adapter
    
                </a></h3>
    An improvement but it still prints out for only one Category which is the AC Adaptor..?

    I am guessing if i grab the ['onclick'] attribute it might print out all Categorys? But how i will accomplish that is a puzzle to me

IMN logo majestic logo threadwatch logo seochat tools logo