#1
  1. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2017
    Posts
    830
    Rep Power
    0

    Thumbs up DOM Parser Stuffs


    Php Folks,

    If you find or know of any DOM Parser tutorial links that will be good for newbies then let us all know here.
    When you paste the links, mention whether it's for php beginners, intermediates or advanced programmers.
    Anyway, I just came across this one and reading it.
    Just thought I might aswell open a thread and mention the link just incase it becomes handy for others.

    Top 10 Best Usage Examples of PHP Simple HTML DOM Parser
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2017
    Posts
    830
    Rep Power
    0
    Folks,

    Check-out the number 6 code in this tutorial:
    Top 10 Best Usage Examples of PHP Simple HTML DOM Parser

    It says we can change an attribute from one thing to another.
    The code is this:
    PHP Code:
    <?Php 

    /*
    Adding / Changing attributes of the elements
    Let’s say you want to change the value of attribute of particular element. For e.g. if you wished to change all the hyperlinks having class=postlink to class=topiclink, you can do so as follows :
    */

    include('simple_html_dom.php');
     
    $url 'https://www.phpbb.com/community/viewtopic.php?f=46&t=543171';
     
    $html file_get_html($url);
     
    foreach(
    $html->find('a.postlink') as $a) {
     
    $a->class 'topiclink';
    }
     
    echo 
    $html;
    ?>
    How can you change the html attributes of a 3rd party webpage ? That is a security issue is it not ?
    Unless ofcourse, you are saving a copy of the third party's html onto your own hdd and changing the attribute on your saved file.
    But, I don't see any code lines that tell the DOM Parser to download the page before changing the attribute.
    What am I missing here ?
    I remember we had the same thing with Ubot Studio. The bot would fetch the page and display it on your screen with the attribute changed. No downloading was done. Back then, I used to get confused just like I'm getting confused on this DOM Parser thing.
    Any ideas ? Sure you do. So, what is the answer ? I fetched the page with that piece of code (no page downloading) and checked the source code of the fetched page. I don't see the attribute changed. That's the confusion!
  4. #3
  5. Code Monkey V. 0.9
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2005
    Location
    A Land Down Under
    Posts
    2,411
    Rep Power
    2105
    Originally Posted by UniqueIdeaMan
    How can you change the html attributes of a 3rd party webpage ? That is a security issue is it not ?
    You can't. Well, you can, but only on the copy that you have. As for a security risk, it could be. It all depends on what your'e changing and how your system and server are set up.

    Originally Posted by UniqueIdeaMan
    Unless ofcourse, you are saving a copy of the third party's html onto your own hdd and changing the attribute on your saved file.
    You don't need to save it to display it. Just download it into memory and run whatever processes you need to on it. That's what this script is doing.

    Originally Posted by UniqueIdeaMan
    But, I don't see any code lines that tell the DOM Parser to download the page before changing the attribute.
    PHP Code:
    $html file_get_html($url); 
    Seriously... If you missed that you weren't even looking at anything!!!

    Originally Posted by UniqueIdeaMan
    What am I missing here ?
    A basic understanding of how that process is meant to work.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2017
    Posts
    830
    Rep Power
    0
    Originally Posted by Catacaustic
    You can't. Well, you can, but only on the copy that you have. As for a security risk, it could be. It all depends on what your'e changing and how your system and server are set up.



    You don't need to save it to display it. Just download it into memory and run whatever processes you need to on it. That's what this script is doing.



    PHP Code:
    $html file_get_html($url); 
    Seriously... If you missed that you weren't even looking at anything!!!



    A basic understanding of how that process is meant to work.
    Originally Posted by Catacaustic
    You can't. Well, you can, but only on the copy that you have. As for a security risk, it could be. It all depends on what your'e changing and how your system and server are set up.



    You don't need to save it to display it. Just download it into memory and run whatever processes you need to on it. That's what this script is doing.



    PHP Code:
    $html file_get_html($url); 
    Seriously... If you missed that you weren't even looking at anything!!!



    A basic understanding of how that process is meant to work.
    No. I thought the downloading was done on my hdd. Now, I understand. It was downloaded onto my RAM. Right ?
  8. #5
  9. Code Monkey V. 0.9
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2005
    Location
    A Land Down Under
    Posts
    2,411
    Rep Power
    2105
    Originally Posted by UniqueIdeaMan
    No. I thought the downloading was done on my hdd. Now, I understand. It was downloaded onto my RAM. Right ?
    That's right. It's only stored in memory unless you tell your program to save it to a file somewhere.

    And that's another thing that you will need to consider - memory usage. You really don't want to be storing a heap of downloaded pages in memory as it will eat up your available memory very quickly. if you have do parse and process things, make sure that you do it one page at a time.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2017
    Posts
    830
    Rep Power
    0
    Originally Posted by Catacaustic
    That's right. It's only stored in memory unless you tell your program to save it to a file somewhere.

    And that's another thing that you will need to consider - memory usage. You really don't want to be storing a heap of downloaded pages in memory as it will eat up your available memory very quickly. if you have do parse and process things, make sure that you do it one page at a time.
    Mmm. When we used to build bots (.exe) with Ubot, we could open many threads in the background that open many windows and download many pages simultaneously. Guessing the ram was enough to open 100 pages in the background. Cannot php do the same ? Open 100 threads and get cURL to load 100 pages in the background so the user only sees one page loading on his screen while the other 99 are out of sight.
    I ask because, when you submit your url to my SE (searchengine), it will first crawl the link you submitted and when it finds more links like 50 then I will get it to open 50 threads to load all those 50 pages in the background simultaneously and scrape their content (like meta tags, links, etc.) simultaneously for spidering purpose.
    And so, do you mind showing us newbies a code snippet on how to open many threads and another snippet to show how to load many pages in the background out of the user's site to scrape the pages ?
    And finally, another snippet that does the 2 things mentioned before ?

    Thanks!
    Last edited by UniqueIdeaMan; May 23rd, 2018 at 12:06 PM.

IMN logo majestic logo threadwatch logo seochat tools logo