#1
  1. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593

    XPath and getting the next child node


    This is my first time using XPath to parse a DOM document. I was able to find the tag I want using query. However, I can't find an example that explains how to do what I want next. All the examples simply echo the found node(s). In my case I need to drill down until I find a DIV child to extract some text. How do I use the result from XPath to get subsequent child nodes with DOM? TIA.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  2. #2
  3. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    XPath is to SQL as DOMDocument/SimpleXMLElement are to MySQL. One pair lets you do a query for stuff handled by the second pair.

    What's your code? XPath gives you a list of nodes, so from there you can just .childNodes (assuming DOM) to go further. But I bet you can write a better XPath query to get what you want, or at least a lot closer than you're getting now.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593
    Thanks for the reply. I didn't realize it was that simple since there were no examples I could find. I thought XPath returned an object different than that of DOM. This is my code with my presumed next step.
    PHP Code:
    $dom=new DOMDocument();
    $dom->loadHTML("http://somewebsite.com/judges.php?str=".str_replace(" ","+",$_POST['name']."&search=Search"));
    $xpath=new DOMXPath($dom);
    $tags $xpath->query('//*[@class="smallTextList"]');
    $children=$tags->item(0)->childNodes(); // there is only one tag with a class of this type 
    I don't know how there is an easier way since there are no classes or ids on any tags below this point but any suggestions would be appreciated.
    Last edited by gw1500se; October 6th, 2012 at 06:25 PM.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  6. #4
  7. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    - urlencode() has a really specific use case: putting arbitrary stuff in a URL. Use it instead of str_replace().
    - DOMXPath::query() returns a DOMNodeList which consists of the exact same DOMNodes that were in the original DOMDocument. In fact that list will modify itself automatically if you make changes to the document or nodes.

    It would really help to know the HTML you're looking at.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593
    Thanks for the reply. I will take your suggestion and make the change. As for what I am parsing it is a bit complicated (to me). Somewhere in the document there will be an unordered list with the class 'smallTextList'. Getting that was the easy part with XPath. From there I'm not certain how to approach it but each 'li' contains various tags in various orders but some of them will contain a <div> and that is what I'm after. I need to find those tags and parse out the text.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593
    After messing with this for a while I am missing something that is probably obvious to everyone else. If '$xpath' is a DOMNodeList then $xpath[0].childNodes should point to a DOMNodeList of the child nodes of the first element. However, I get an error saying a DOMNodeList cannot be used as an array. Yet numerous examples use DOMNodeLists in precisely that manner.

    I've tried a couple of different things which is why my initial post tries to use 'item' which is wrong too.

    Perhaps the problem is that I am not understanding what a DOMNodeList is supposed to look like. If a var_dump($tags) displays this:

    object(DOMNodeList)#3 (0) { }

    That appears to me to be an empty list which means the query is the problem which I thought was working. I do not see how the query did not find the class since it is there, unless my pattern structure is wrong.
    Last edited by gw1500se; October 7th, 2012 at 03:56 PM.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593
    I changed my query to try to find a number of different things and my earlier statement that finding the 'ul' was the easy part is wrong. My query never finds anything no matter what I look for so I need to back up and figure out what is wrong which my query.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593
    This was supposed to be an exercise for me to learn DOM and XPath but was taking too much time with little progress. However, what I did learn was how Simple HTML DOM Parser got its name.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    4
    Rep Power
    0
    Originally Posted by gw1500se
    This was supposed to be an exercise for me to learn DOM and XPath but was taking too much time with little progress. However, what I did learn was how Simple HTML DOM Parser got its name.
    have you tried:

    Code:
    /ul[@class='smallTextList']/li/div

IMN logo majestic logo threadwatch logo seochat tools logo