#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2017
    Posts
    25
    Rep Power
    0

    fetching data with PHP Simple HTML DOM Parser


    good day dear experts,


    hello i need to fetch the data out of this page

    Web Filter

    first i do a view on the page source to find HTML elements: view-source:https://europa.eu/youth/volunteering...rganisation_en

    note: i need to fetch the data that come right below this line:


    <h3>EVS accredited organisations search results: <span class="ey_badge">6066</span></h3> </div>


    i have several optoins: to do this with PHP Simple HTML DOM Parser (cf.PHP Simple HTML DOM Parser: Manual ): This way i need to create HTML DOM object

    BTW: there are other options: to do this with a special function: pc_link_extractor which is etracting all the links

    Code:
    function pc_link_extractor($s) {
    $a = array();
    if (preg_match_all(/>]*)[\\]?[^>]*>(.*?)\/a>/i,$s,$matches,PREG_SET_ORDER)) {
    
    foreach($matches as $match) {
    array_push($a,array($match[1],$match[2]));
    }
    }
    return $a;
    }

    or i am able to do it with -


    preg_match_ all


    see for example:

    Code:
    - preg_match
    #1 preg_match_all      ("|<[^>]+>(.*)</[^>]+>|U",
     "<b>example: </b><div align=\"left\">this is a test</div>",
     $out,
     PREG_PATTERN_ORDER)

    see here the dataset which i am interested in derived from h site: Web Filter
    Code:
      <div class="view-content">
        
    <div id="views-bootstrap-grid-1" class="views-bootstrap-grid-plugin-style">
                <div class="row is-flex">
                      <div class="col-md-4">
                <div class="vp ey_block block-is-flex">
      <div class="ey_inner_block">
        <h4 class="text-center"><a href="/youth/volunteering/organisation/948417016_en" target="_blank">&quot;Academy for Peace and Development&quot; Union</a></h4>
              <div class="org_cord"><strong>Topics: </straaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaong>Access for disadvantaged; Youth (Participation, Youth Work, Youth Policy); Intercultural/intergenerational education and (lifelong)learning</div>
                <p class="ey_info">
        <i class="fa fa-location-arrow fa-lg"></i>
        Tbilisi, <strong>Georgia</strong>
    </p>    <p class="ey_info"><i class="fa fa-hand-o-right fa-lg"></i> Receiving, Sending</p>
              <p class="ey_info"><i class="fa fa-external-link fa-lg"></i><span> <a href="http://www.apd.ge" target="_blank">www.apd.ge</a></span></p>
                      <p><strong>PIC no:</strong> 948417016</p>
            <div class="empty-block">
          <a href="/youth/volunteering/organisation/948417016_en" target="_blank" class="ey_btn btn btn-default pull-right">Read more</a>    </div>
      </div>
    </div>
              </div>
                      <div class="col-md-4">

    note there are hundreds of pages - [ see below the pagination things ]

    well you see that we have some options here.

    which way should i go?! Which way would you go?

    love to hear from you

    Greetings
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2017
    Location
    Lithuania
    Posts
    48
    Rep Power
    46
    I always use HTML DOM Parser for all my data parsing needs because it's so much easier to use than writing your own preg_match rules.
    Do you license and update your PHP scripts?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2017
    Posts
    25
    Rep Power
    0
    hello dear phpmillion,

    many many thanks for the quick answer. Sure thing - your answer is very convincing.

    Originally Posted by phpmillion
    I always use HTML DOM Parser for all my data parsing needs because it's so much easier to use than writing your own preg_match rules.

    BTW: what about XPath - (see PHP: SimpleXMLElement::xpath - Manual )

    Code:
    <?php
    $string = <<<XML
    <a>
     <b>
      <c>text</c>
      <c>zeugs</c>
     </b>
     <d>
      <c>code</c>
     </d>
    </a>
    XML;
    
    $xml = new SimpleXMLElement($string);
    
    /* Suche nach <a><b><c> */
    $result = $xml->xpath('/a/b/c');
    
    while(list( , $node) = each($result)) {
        echo '/a/b/c: ',$node,"\n";
    }
    
    /* Relative Pfade funktionieren ebenfalls ... */
    $result = $xml->xpath('b/c');
    
    while(list( , $node) = each($result)) {
        echo 'b/c: ',$node,"\n";
    }
    ?>


    for Domparser i have found a great source: PHP Simple HTML DOM Parser: Manual

    it provides a quick intro into all the following fields:

    Get HTML elements
    Modify HTML elements
    Extract contents from HTML
    Scraping Slashdot!


    Again dear phpmillion - many thanks for the posting!
  6. #4
  7. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2017
    Location
    Lithuania
    Posts
    48
    Rep Power
    46
    PHP SimpleXMLElement is for XML data (as its name suggests). Hence, it shouldn't be used for HTML data.
    Do you license and update your PHP scripts?
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2017
    Posts
    25
    Rep Power
    0
    good evening dear PHPmillion


    again many thanks for the quick reply - you encouraged me to dig deeper and to look for some manpages and examples
    i found some very nice ressources that i want to share with all those who might get into need of some help in this area.



    see the two ressources:

    PHP Simple HTML DOM Parser

    Top 10 Best Usage Examples of PHP Simple HTML DOM Parser

    nimishprabhu has made a list of codes, which he uses from time to time, that can come in handy for us all
    it helped me to understand the usage of Simple HTML DOM Parser and get readymade PHP codes for the same.


    Downloading and storing structured data
    Data can be obtained from mainly three different sources : URL, Static File or HTML String. Use the following code to create a DOM from three different alternatives.

    the author offers very very useful and handy tools at his page.
    Top 10 Best Usage Examples of PHP Simple HTML DOM Parser


    i do not want to infringe the board-rights here - so i do not put some of his examples to this thread.


    @phpmillion - many thanks for your continued help.

    greetings

IMN logo majestic logo threadwatch logo seochat tools logo