#1
  1. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2017
    Posts
    663
    Rep Power
    0

    Question Best Way We Can Save Cpu Usage Wen Looking For Banned Words ?


    Php Experts & Gurus,

    I need to add a filter on my Web Proxy (Mini Proxy Gpl script) so when a user tries loading a page that contains a banned word then the page does not load but give an error alert instead stating which banned word has been found. And then exits() the script.
    Having a look at the following 2 scripts, what do you think ?
    I favoured the 2nd one over the 1st at the end. What is your opinion on my choosing ?
    The major requirement is that, my webhost's cpu must be used less as much as possible. I was told if I explode all found words on the proxied page and get the filter to check for each word one by one for matches against a list of banned words then that would take too long for big pages (pages with lots of text) and the cpu would be used too much and my site would slow down when this filtering process takes place on each and every page my web proxy fetches. Plus, when a lot of users simultaneously use my web proxy it would be the end of my venture. That means everyone disliked my 1st script.
    Anyway, If you agree with what I've been told then how do you reckon I should proceed with the filter building if you reckon my 2nd script is no good and does not follow what others told me (I reckon it follows, though) ? Where should I make changes in my script ? How would you code it yourself ?
    I tried my best:

    1st Script:

    PHP Code:

    <?php

    ini_set
    ('display_errors''1');
    ini_set('display_startup_errors''1');
    error_reporting(E_ALL);
    mysqli_report(MYSQLI_REPORT_ERROR MYSQLI_REPORT_STRICT);


    // 1). $curl is going to be data type curl resource.
    $curl curl_init();

    // 2). Set cURL options.
    curl_setopt($curlCURLOPT_URL'http://www.tcm.com/this-month/article/297159|0/Dirty-Harry.html');

    curl_setopt($curlCURLOPT_SSL_VERIFYPEERfalse);

    curl_setopt$curlCURLOPT_RETURNTRANSFERtrue );

    // 3). Run cURL (execute http request).
    $result curl_exec($curl);

    /**
     * It is possible for cURL to return a response that
     * is not a good response. You can't count on the response
     * being usable, so you need to do a little error handling.
     * What I've provided here is the most basic way to see
     * if the response is good.
     */
    $response curl_getinfo$curl );
    if( 
    $response['http_code'] == '200' )
    {
        
    //Set banned words.
        
    $banned_words = array("Dirty Harry","Callahan""Clint Eastwood");

        
    //Separate each words found on the cURL fetched page.
        
    $word explode(" "$result);

       
    //var_dump($word);

        
    for($i 0$i <= count($word); $i++){
            foreach (
    $banned_words as $ban) {
                if (
    stripos($word[$i],$ban) !== FALSE){
                    echo 
    "word: $word[$i]<br />";
                    echo 
    "Match: $ban<br>";
                }else{
                    echo 
    "word: $word[$i]<br />";
                    echo 
    "No Match: $ban<br>";  
                }
            }
        }
    }

    // 4). Close cURL resource.
    curl_close($curl);
    ?>
    From 1 - 10, how much mark would you give for the above code ? 1 = Worst; 5 = Ok; 10 = Best. And so on.

    PHP Code:
    <?php

    /*
    ERROR HANDLING
    */

    // 1). Set banned words.
    $banned_words = array("Prick","****","***");

    // 2). $curl is going to be data type curl resource.
    $curl curl_init();

    // 3). Set cURL options.
    curl_setopt($curlCURLOPT_URL'https://www.buzzfeed.com/mjs538/the-68-
    words-
    you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X'
    );
    curl_setopt($curlCURLOPT_SSL_VERIFYPEERfalse);
    curl_setopt($curlCURLOPT_RETURNTRANSFERtrue );

    // 4). Run cURL (execute http request).
    $result curl_exec($curl);
    $response curl_getinfo$curl );

    if(
    $response['http_code'] == '200' )
         {
              
    $regex '/\b';      // The beginning of the regex string syntax
              
    $regex .= implode('\b|\b'$banned_words);      // joins all the 
              
    banned words to the string with correct regex syntax
              $regex 
    .= '\b/i';    // Adds ending to regex syntax. Final i makes 
              
    it case insensitive
              $substitute 
    '****';
              
    $cleanresult preg_replace($regex$substitute$result);
              echo 
    $cleanresult;
         }

      
    curl_close($curl);

      
    ?>
    From 1 - 10, how much mark would you give for the above code ? 1 = Worst; 5 = Ok; 10 = Best. And so on.

    Anyone grabbing my code and fixing it as they deem fit are welcome. Just make sure to provide your sample here for everyone's benefit.

    Thanks!
    Last edited by UniqueIdeaMan; December 18th, 2017 at 09:03 AM.
  2. #2
  3. Code Monkey V. 0.9
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2005
    Location
    A Land Down Under
    Posts
    2,386
    Rep Power
    2105
    Exploding the entire pages content would create more memory issues, so that's about the worst idea that you can use.

    But... if you really need to show which word/words were found, your second example doesn't do that, so that's no good.

    If you can live without knowing which words were found (and you don't need that, just an explanation saying "bad words found"), then I'd do something like this.

    PHP Code:
    $replace_count 1;
    $cleanresult str_replace ($banned_words''$result$replace_count);

    if (
    strlen ($cleanresult) != strlen ($result)) {
        echo 
    '<p>Bad words found</p>';

    In a small test that I did that ran about 1/2 the time that the regex did, and it's substantially easier to read and modify later on. Remember that if you're main concern is performance, regex's are never the tool to use.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2006
    Posts
    2,653
    Rep Power
    1822
    Sorry to intrude, but I LOVE the idea of a web page that splatters a 'banned word' across the screen saying. "oooh, look what we found, bad word, we won't see the page that contains it ..."

    Comments on this post

    • Catacaustic agrees : I had the same thoughts, but it's still one of the least crazy ideas that they've had so far!
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2017
    Posts
    663
    Rep Power
    0
    Originally Posted by SimonJM
    Sorry to intrude, but I LOVE the idea of a web page that splatters a 'banned word' across the screen saying. "oooh, look what we found, bad word, we won't see the page that contains it ..."
    Naah! Just trying to learn the basics that's all. I won't echo the found banned word to the user.
    Just give me your best shot (with a code sample) and see how I change it. Afterwards, see if you agree to the way I updated it or not.
    Same reply goes to Catacaustic.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2006
    Posts
    2,653
    Rep Power
    1822
    Originally Posted by UniqueIdeaMan
    Naah! Just trying to learn the basics that's all. I won't echo the found banned word to the user.
    Just give me your best shot (with a code sample) and see how I change it. Afterwards, see if you agree to the way I updated it or not.
    Same reply goes to Catacaustic.
    Sadly (happily?) I don't know PHP, so would not be much help. However ... to stop someone seeing bad words on a web page means pre-parsing the page, and that would mean intercepting the rendered page before it is seen by the client (I can imagine a world of pain down that route) or by writing your own dedicated client. With caches of we-pages being held in a variety of places, even before you look at systems that deliberately use off-location caching such as CloudFlare the interception would need to be very close to the client. I guess that is your intent with this web proxy idea. Good luck!
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2017
    Posts
    663
    Rep Power
    0
    Originally Posted by SimonJM
    Sadly (happily?) I don't know PHP, so would not be much help. However ... to stop someone seeing bad words on a web page means pre-parsing the page, and that would mean intercepting the rendered page before it is seen by the client (I can imagine a world of pain down that route) or by writing your own dedicated client. With caches of we-pages being held in a variety of places, even before you look at systems that deliberately use off-location caching such as CloudFlare the interception would need to be very close to the client. I guess that is your intent with this web proxy idea. Good luck!
    Thanks!

IMN logo majestic logo threadwatch logo seochat tools logo