#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    307
    Rep Power
    1

    Question How To Filter Content Before Loading On Screen?


    My Fellow Php Folks!

    I am now doing a peeping Tom into Php-Proxy script to gain work experience in building a web proxy as I am struggling to build one from scratch.

    The script has 2 major pp files.
    index.php
    config.php

    I am including both's code below.
    I need your aid to add comments on the lines so by looking at your comments I can learn what each line does. So, who will be the good Samaritan ?
    In the past, the following have helped. Ranked according to who started helping first (according to my memory):

    SedoPati
    Kicken

    If I've forgotten anybody then please excuse.

    Actually, let's play a little game and make this forum a little fun.
    The first person who reads this, make your comments on the first few lines. The 2nd person who reads this can do for the next few lines and so on.
    That way, everyone contributes a little.
    Ok, the index.php, I believe is the actual script and even though it has comments they are very brief and not so friendly to a newbie. Therefore, any volunteering to make them in depth would be appreciated by all newbies.
    Once I finish learning from your comments, then it would be easy for me to complete my other learning project where I try building the web proxy from scratch.

    Thanks :blush:

    PS - In the index.php, on which line do I add the content filtering code ?
    This is where, I will list a list of banned words and if these banned words are found on the page the user is trying to load via the web proxy then the page would not load. I'll write code for the user to get alerted that the page won't load because it has banned words in it's content/title/ meta keywords/meta descriptions/file names/img names/link anchors/etc.

    index.php

    PHP Code:
    <?php

    define
    ('PROXY_START'microtime(true));

    require(
    "vendor/autoload.php");

    use 
    Proxy\Http\Request;
    use 
    Proxy\Http\Response;
    use 
    Proxy\Plugin\AbstractPlugin;
    use 
    Proxy\Event\FilterEvent;
    use 
    Proxy\Config;
    use 
    Proxy\Proxy;

    // start the session
    session_start();

    // load config...
    Config::load('./config.php');

    // custom config file to be written to by a bash script or something
    Config::load('./custom_config.php');

    if(!
    Config::get('app_key')){
        die(
    "app_key inside config.php cannot be empty!");
    }

    if(!
    function_exists('curl_version')){
        die(
    "cURL extension is not loaded!");
    }

    // how are our URLs be generated from this point? this must be set here so the proxify_url function below can make use of it
    if(Config::get('url_mode') == 2){
        
    Config::set('encryption_key'md5(Config::get('app_key').$_SERVER['REMOTE_ADDR']));
    } else if(
    Config::get('url_mode') == 3){
        
    Config::set('encryption_key'md5(Config::get('app_key').session_id()));
    }

    // very important!!! otherwise requests are queued while waiting for session file to be unlocked
    session_write_close();

    // form submit in progress...
    if(isset($_POST['url'])){
        
        
    $url $_POST['url'];
        
    $url add_http($url);
        
        
    header("HTTP/1.1 302 Found");
        
    header('Location: '.proxify_url($url));
        exit;
        
    } else if(!isset(
    $_GET['q'])){

        
    // must be at homepage - should we redirect somewhere else?
        
    if(Config::get('index_redirect')){
            
            
    // redirect to...
            
    header("HTTP/1.1 302 Found"); 
            
    header("Location: ".Config::get('index_redirect'));
            
        } else {
            echo 
    render_template("./templates/main.php", array('version' => Proxy::VERSION));
        }

        exit;
    }

    // decode q parameter to get the real URL
    $url url_decrypt($_GET['q']);

    $proxy = new Proxy();

    // load plugins
    foreach(Config::get('plugins', array()) as $plugin){

        
    $plugin_class $plugin.'Plugin';
        
        if(
    file_exists('./plugins/'.$plugin_class.'.php')){
        
            
    // use user plugin from /plugins/
            
    require_once('./plugins/'.$plugin_class.'.php');
            
        } else if(
    class_exists('\\Proxy\\Plugin\\'.$plugin_class)){
        
            
    // does the native plugin from php-proxy package with such name exist?
            
    $plugin_class '\\Proxy\\Plugin\\'.$plugin_class;
        }
        
        
    // otherwise plugin_class better be loaded already through composer.json and match namespace exactly \\Vendor\\Plugin\\SuperPlugin
        
    $proxy->getEventDispatcher()->addSubscriber(new $plugin_class());
    }

    try {

        
    // request sent to index.php
        
    $request Request::createFromGlobals();
        
        
    // remove all GET parameters such as ?q=
        
    $request->get->clear();
        
        
    // forward it to some other URL
        
    $response $proxy->forward($request$url);
        
        
    // if that was a streaming response, then everything was already sent and script will be killed before it even reaches this line
        
    $response->send();
        
    } catch (
    Exception $ex){

        
    // if the site is on server2.proxy.com then you may wish to redirect it back to proxy.com
        
    if(Config::get("error_redirect")){
        
            
    $url render_string(Config::get("error_redirect"), array(
                
    'error_msg' => rawurlencode($ex->getMessage())
            ));
            
            
    // Cannot modify header information - headers already sent
            
    header("HTTP/1.1 302 Found");
            
    header("Location: {$url}");
            
        } else {
        
            echo 
    render_template("./templates/main.php", array(
                
    'url' => $url,
                
    'error_msg' => $ex->getMessage(),
                
    'version' => Proxy::VERSION
            
    ));
            
        }
    }

    ?>

    config.php
    PHP Code:
    <?php

    // all possible options will be stored
    $config = array();

    // a unique key that identifies this application - DO NOT LEAVE THIS EMPTY!
    $config['app_key'] = '04e8155d1ddc8d00c578a7ffc0018692';

    // a secret key to be used during encryption
    $config['encryption_key'] = '';

    /*
    how unique is each URL that is generated by this proxy app?
    0 - no encoding of any sort. People can link to proxy pages directly: ?q=http://www.yahoo.com
    1 - Base64 encoding only, people can hotlink to your proxy
    2 - unique to the IP address that generated it. A person that generated that URL, can bookmark it and visit it and any point
    3 - unique to that session and IP address - URL no longer valid anywhere when that browser session that generated it ends
    */

    $config['url_mode'] = 2;

    // plugins to load - plugins will be loaded in this exact order as in array
    $config['plugins'] = array(
        
    'HeaderRewrite',
        
    'Stream',
        
    // ^^ do not disable any of the plugins above
        
    'Cookie',
        
    'Proxify',
        
    'UrlForm',
        
    // site specific plugins below
        
    'Youtube',
        
    'DailyMotion',
        
    'RedTube',
        
    'XHamster',
        
    'XVideos',
        
    'Twitter'
    );

    // additional curl options to go with each request
    $config['curl'] = array(
        
    // CURLOPT_PROXY => '',
        // CURLOPT_CONNECTTIMEOUT => 5
    );

    //$config['replace_title'] = 'Google Search';

    //$config['error_redirect'] = "https://unblockvideos.com/#error={error_msg}";
    //$config['index_redirect'] = 'https://unblockvideos.com/';

    // $config['replace_icon'] = 'icon_url';

    // this better be here other Config::load fails
    return $config;

    ?>
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    307
    Rep Power
    1
    Php Folks,

    In what way would you build your content filter for your web proxy ?
    Imagine, you gave you children a web proxy. Installed it on their computers so you can have them privately surf the web but all while you can track their movements in order to make sure they are behaving.
    Ok, children safe filters exist but I'm talking of building my own ontop of an existing web proxy (Php-Proxy which I did not write) so I can customize the features to my needs. Plus, I get to learn more php, this way. It is all about learning, really. :wink:
    Now, imagine you don't want them viewing or downloading from bad sites like software pirate sites, porn sites, pirate music sites, etc.
    Now, imagine you are using Php-Proxy and find-out it does not have a content filter or banned words filter and you decide to write-up your own code. Let us call this "your mini script" which is a chunk of php code (a few lines) which you would add onto your chosen web proxy (eg. Php-Proxy). How would you write it ?
    Here are a few methods I am guessing you could use but which one would you use and why that one over the others and which ones you would stay away from and why from them ? What are flaws in their methods ?

    Q1. Would you build a mini script that would:

    1). Check the meta content of the page they load on their screen ?
    2). Check the img file names and ALT tags ?
    3). Check the content on the page as a general check and that would be enough as it would also check all the things mentioned above ?

    Q2. How would you prevent downloads such as video downloads, img downloads, software (.exe) downloads ? Got to prevent the downloads to prevent them downloading trashy imgs and trashy clips and trashy .exe (that might be malware).
    And so, what method would you use to prevent these downloads ?
    I'm guessing you would get your mini script to check the links for what their extention types are. Right ? Yes or no ? If so, then how would you get your mini script to deal with it so the downloads don't download ? This is how I might do it and I need your advice if the method is sound or not.
    I'd get my mini script to replace (str_replace/preg_match) the link file extensions on the proxified pages. Only those links that download anything. Not those links (.html, .shtml, .php, jpeg, .giff, .pdf, etc.) that take you to another page. That way, the download links would become useless. The browser won't understand it is a link that downloads something. If you deem this method is ok then tell me, how do I know which link leads to another page and which link leads to a download ? Ok, I can check for .zip, .rar (zip files) but any other extension or anything else I should get my mini script to check for to spot a "downloading link" ?
    Is there a php function that checks for download links ?

    Q3. Is it possible to load a webpage in the background then get your mini script to check the content and if the filter gets the page not flagged then load the page on screen ? That way, the users don't view pages containing banned content ? I managed to do this on my .exe tool (free tool which I may upload to this forum for you guys to check it out) but I don't have enough experience with php and so need your advice and tips.

    Q4. How would you prevent viewing streaming sound or video files ? How to detect streaming ?

    Q5. Reading the 4 questions above no doubt has given you some ideas to which php functions I should be using and so which ones you have in mind that would do me the job ?

    So, what do you think ? What are your answers for all my 4 questions ?

    And no. I can't be creating a whitelist to only allow these or those sites. Will become too restrictive. Can't create a blacklist either as there are too many sites to blacklist and we never know them all. So, we are back to square one: Content and File Types Filtering.

    Doing a search now for:
    banned words filtering in proxies

    And, checking this out to see if the source code would be available for me to get my code snippets:
    2.5.2. URL Filter Administrative Web Page
    (I'm not affiliated with them. Checking the site out for the firs time. Just mentioning the link so you understand to what kind of things I want the script to do.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    307
    Rep Power
    1
    Php Folks,

    In what way would you build your content filter for your web proxy ?
    Imagine, you gave you children a web proxy. Installed it on their computers so you can have them privately surf the web but all while you can track their movements in order to make sure they are behaving.
    Ok, children safe filters exist but I'm talking of building my own ontop of an existing web proxy (Php-Proxy which I did not write) so I can customize the features to my needs. Plus, I get to learn more php, this way. It is all about learning, really. :wink:
    Now, imagine you don't want them viewing or downloading from bad sites like software pirate sites, porn sites, pirate music sites, etc.
    Now, imagine you are using Php-Proxy and find-out it does not have a content filter or banned words filter and you decide to write-up your own code. Let us call this "your mini script" which is a chunk of php code (a few lines) which you would add onto your chosen web proxy (eg. Php-Proxy). How would you write it ?
    Here are a few methods I am guessing you could use but which one would you use and why that one over the others and which ones you would stay away from and why from them ? What are flaws in their methods ?

    Q1. Would you build a mini script that would:

    1). Check the meta content of the page they load on their screen ?
    2). Check the img file names and ALT tags ?
    3). Check the content on the page as a general check and that would be enough as it would also check all the things mentioned above ?

    Q2. How would you prevent downloads such as video downloads, img downloads, software (.exe) downloads ? Got to prevent the downloads to prevent them downloading trashy imgs and trashy clips and trashy .exe (that might be malware).
    And so, what method would you use to prevent these downloads ?
    I'm guessing you would get your mini script to check the links for what their extention types are. Right ? Yes or no ? If so, then how would you get your mini script to deal with it so the downloads don't download ? This is how I might do it and I need your advice if the method is sound or not.
    I'd get my mini script to replace (str_replace/preg_match) the link file extensions on the proxified pages. Only those links that download anything. Not those links (.html, .shtml, .php, jpeg, .giff, .pdf, etc.) that take you to another page. That way, the download links would become useless. The browser won't understand it is a link that downloads something. If you deem this method is ok then tell me, how do I know which link leads to another page and which link leads to a download ? Ok, I can check for .zip, .rar (zip files) but any other extension or anything else I should get my mini script to check for to spot a "downloading link" ?
    Is there a php function that checks for download links ?

    Q3. Is it possible to load a webpage in the background then get your mini script to check the content and if the filter gets the page not flagged then load the page on screen ? That way, the users don't view pages containing banned content ? I managed to do this on my .exe tool (free tool which I may upload to this forum for you guys to check it out) but I don't have enough experience with php and so need your advice and tips.

    Q4. How would you prevent viewing streaming sound or video files ? How to detect streaming ?

    Q5. Reading the 4 questions above no doubt has given you some ideas to which php functions I should be using and so which ones you have in mind that would do me the job ?

    So, what do you think ? What are your answers for all my 4 questions ?

    And no. I can't be creating a whitelist to only allow these or those sites. Will become too restrictive. Can't create a blacklist either as there are too many sites to blacklist and we never know them all. So, we are back to square one: Content and File Types Filtering.

    Doing a search now for:
    banned words filtering in proxies

    And, checking this out to see if the source code would be available for me to get my code snippets:
    2.5.2. URL Filter Administrative Web Page
    (I'm not affiliated with them. Checking the site out for the firs time. Just mentioning the link so you understand to what kind of things I want the script to do.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    307
    Rep Power
    1
    I'm surprised to read my original post and find that I have ranked Sedopati a better helper than others here.
    Mmm. I can't remember to what he has helped me in the past to get that favour. Oh well, if you help someone many days and then start biting them a few days then they forget the favours you did them. With me, the first impression does not count but the most recent ones.

IMN logo majestic logo threadwatch logo seochat tools logo