#1
  1. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Posts
    53
    Rep Power
    12

    Lightbulb Site Copier/Mover (non FTP)


    Somewhat newer working with PHP. I have successfully written a script that will upload a file from a URL address (i.e. http://www.webbywarehouse.com/image.gif ).

    For my clients I will need a site mover/copier where they provide an address such as http://www.wtv-zone.com/woordrack/bars, which btw is a one of the sites/directories that I need to move, and the script opens the URL given and copies the contents of the given directory.

    I need to be able to copy ONLY from directories where directory lists are enabled. In the above mentioned site it is on an Apache server with DL enabled.

    I wrote a script to read the contents of the directory but how to then move the files, typically only images and sound files, to my clients new locaiton at my site/server.

    To be honest, I don't even know where to begin. Any help on how to start, complete, approach this would be greatly appreciated. I know for many this is not hard, for me though it is a leap.

    Thank you for your assistance.
    - Brian
  2. #2
  3. Wiser? Not exactly.
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    5,906
    Rep Power
    3969
    PHP Code:
    $site='http://www.wtv-zone.com/woordrack/bars';
    $savedir='';
    $files=array();

    $con=file($site);
    //Do something to parse the html page into the different links you want to copy over and get them info the $files array.  You mentioned you had this part, so I won't bother.


    foreach ($files as $url){
      
    $savepath=$savedir.basename($url);
      
    $remote=fopen($url'rb');
      
    $local=fopen($savepath'wb');
      while(
    $data=fread($remote2048)){
        
    fwrite($local$data);
      }
      
    fclose($remote);
      
    fclose($local);

    That should do the trick.
  4. #3
  5. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Posts
    53
    Rep Power
    12

    Smile Thank you.


    Thank you for you very speedy reply and assistance.

    I may have mis spoken, I do not know how to parse the the contents in such a way that I can then either transfer all of the files in the directory or create a list of the files.

    Honestly, I just want to be able to offer a very basic script, nothing fancy. I think they could simple drop the URL of the directory in and then the files within that directory then be copied, that is fine.

    So, I can now read the directory, yes. Where I get lost still is what to do with the contents of the file that I am reading. In my URL uploader it is so easy becuase the user provides the exact file name withing the URL, but with this, there is no filename given.

    So, should I then seem the best way to parse the file or is there a better way to do this?

    Thanks again for your assistance. I hope this benefits others as well.
    - Brian
  6. #4
  7. Wiser? Not exactly.
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    5,906
    Rep Power
    3969
    This is a script for the command line. It should give you the idea of how to do it though and you can make a web front for it.

    [I]Copydir.php[I]
    PHP Code:
    <?php
    if (!isset($_SERVER['argv'][1])){
            echo 
    "Usage: {$_SERVER['argv'][0]} <url>[ <save path>]\n";
            exit;
    }


    $url=$_SERVER['argv'][1];
    $savepath=(isset($_SERVER['argv'][2]))?$_SERVER['argv'][2]:'./';

    //Make sure save path has a / on the end.
    if ($savepath{strlen($savepath)-1} != '/' && $savepath{strlen($savepath)-1} != '\\'){
            
    $savepath.='/';
    }

    //Make sure url path has a / on the end.
    if ($url{strlen($url)-1} != '/' && $url{strlen($url)-1} != '\\'){
            
    $url.='/';
    }

    $con=implode(''file($url)); //Get the directory listing for the given url into a string.

    //Remove the html tags from the dir, except the <a> tags.  Everything else is not needed.
    $con=strip_tags($con'<A>');

    //Get all the HREF="..." attribute values from the A tags.
    //Ignore values starting with / (parent directory)  and ? (sorting functions)
    $files=array();
    preg_match_all('/HREF="([^\/?].[^"]*)"/'$con$files);


    //Now, loop through the files, fetch them from the URL, and save them locally.
    foreach ($files[1] as $file){
            
    $remotefile=$url.$file;
            
    //Make sure url path has a / on the end.
            
    if ($remotefile{strlen($remotefile)-1} == '/' || $remotefile{strlen($remotefile)-1} == '\\'){
                    echo 
    "$remotefile is a directory, skipping\n";
                    continue;
            }
            
    $currentSavePath=$savepath.basename($remotefile);

            
    //Open the remote file for reading only.  Use the 'b' mode for binary safe.
            
    $remote=fopen($remotefile'rb');
            
    //Open the local file for writting only.  Use the 'b' mode for binary safe.
            
    $local=fopen($currentSavePath'wb');
            
    //Read the data in 2K intervals and write it to the local file
            
    while ($data=fread($remote2048)){
                    
    fwrite($local$data);
            }

            
    //Close both files.
            
    fclose($remote);
            
    fclose($local);

            
    //Print a done message.
            
    echo "Done copying file $remotefile to $currentSavePath\n";
    }
    echo 
    "Done copying all files.\n";
    ?>
  8. #5
  9. Mobbing Gangster
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Sep 2001
    Location
    "Best City" 2002 and 2003- Melbourne, Australia
    Posts
    4,912
    Rep Power
    32
    Not to be nitpicking, kicken, but fopen() is extremely ineficient comparing to fsockopen()
    And you know I mean that.
  10. #6
  11. Wiser? Not exactly.
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    5,906
    Rep Power
    3969
    Yea, for most the stuff I do w/ http and php I use fsockopen myself, but for the sake of simplicity, I figured it'd be eaiser to just use fopen than have to teach the HTTP protocol as well as PHP.
  12. #7
  13. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Posts
    53
    Rep Power
    12

    Thank you again.


    While the script that you provided does seem for me overwhelming, I am not proud nor shy that I am a newbie to PHP.

    I see how the extraction of the individual files is done but you know how it is with newbies it is actually putting together the puzzle and getting the smaller pieces into their places.

    I did until recently struggle with putting data into arrays, extracting it later, creating scripts with several switch() actions but now it is easy. My latest file manager for Webbywarehouse is IMHO very robuhst. But I worked my tail off to get it. And certainly your programming would run circles around mine.

    Not being proud help me to connect the dots if you don't mind. Here is how I would open directory and create the initial file (containing the hrefs that we want to parse).

    <?php
    function file_get_contents($filename)
    {
    $fp = @fopen($filename, "r");
    $fpsize = filesize($fp);
    if (!($fp))
    {
    return 0;
    }
    /*
    This ensures maximum file size, if $fp = "" it was too big!!! Though not needed for this example I suppose. I put it in for uploading images and limitting their size or I would have HUGE bitmaps.
    */

    $temp .= fread($fp, 131072);
    if (!feof($fp))
    {
    $temp="";
    }
    fclose ($fp);
    return $temp;
    }
    if ($getcontents)
    {
    $data=file_get_contents("$remote_file");
    if ($data != "")
    {
    $rfilesize = filesize($temp);
    $fp=fopen(" $local_file","wb");
    fputs($fp,$data,strlen($data));
    fclose($fp);
    $rext = pathinfo($local_file);
    $rr = $rext['extension'];
    }
    else
    {
    $toobig="1";
    }
    }
    else
    {
    ?>
    <html>
    <head>
    <title>start</title>
    </head>
    <body>
    <form name="getcontents" action="<?php echo $PHP_SELF; ?>">
    <input name="remote_file" value="http://">
    <input type="hidden" name="getcontents" value="go">
    <input type="submit" value="Get the file!!!">
    </form>
    </body>
    </html>
    <?php
    }
    ?>
    <html>
    <head>
    <title>finish</title>
    </head>
    <body>
    <?php
    echo $data;
    ?>
    </body>
    </html>


    This is taken from the actaul script that I use for my URL uploader for Webbywarehouse and does work pretty well I guess. Nothing to write home about though.

    Anyway, once this file is read, and it is not too big, I don't see how I can then parse this with the script that you have generously shared. That is where I am lost. Oh, just a quick follow up, how is fsockopen() better, in a nut shell. Once I have mastered this then I can move on to a better method. I love this stuff you know.

    Thank you for sticking with me on this one.
    Last edited by Webbywarehouse; December 6th, 2002 at 02:04 AM.
    - Brian
  14. #8
  15. Wiser? Not exactly.
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    5,906
    Rep Power
    3969
    With what you posted, there are a couple problems I see of hand that you will need to look at.

    function file_get_contents($filename)
    Such a function like that already exists, so you would get fatal errors trying to use that. If you arn't now, it may be because you are using an older php version.

    $fpsize = filesize($fp);
    Here there are a couple things. One, you don't use a handle from fopen() in filesize(), you will give it the path to the actual file, like filesize('myfile.txt');

    Secondly, in this particular situation, it won't work at all. filesize() will only work on local files, not remote ones. That kind of information is usually not avaiable, and even when it is, php won't get it.

    $temp .= fread($fp, 131072);
    This isn't really a problem, but since $temp doesn't already exist, you should just use $temp = fread() rather than $temp .= fread();


    Ok, now with that said, about your questions.
    While the script that you provided does seem for me overwhelming, I am not proud nor shy that I am a newbie to PHP
    Nothing wrong with being a newbie, everyone is a one point or another . If by chance I offended you w/ the post about takeing the easy way, didn't mean to. Just wanted to keep it simple enough that hopefully there wouldn't be a lot of confusion.


    I don't see how I can then parse this with the script that you have generously shared.
    Getting the file list is going to be the more complicated part. I used some regular expressions to get them which, if you don't know much about it, would be a really good thing to learn.

    PHP Code:
    preg_match_all('/HREF="([^/?].[^"]*)"/'$con$files); 
    That is the line in my script that actuall extracts the files to download from the listing ($con). The results of running the regex are put into the $files array for you to use. The regex itself is fairly simple:
    all perl style regex's need a dilimiter, so that is what the /'s on both sides are for.
    the HREF=" part will tell the regex to look for that string
    the () around the next part will tell php to store the inner information into the $files array.
    the [^\/?] (devshed removed the \) part is to tell php to match any character EXCEPT / or ?.
    the . tells php to match any character.
    the [^"]* will tell php to match any number of characters, so long as one of them is not a "
    Finally, the last " finishes the HREF attribute value.

    After php is done processing things, the $files array should contain two more arrays, one containing everything matched and one containing the stuff that we told it to keep using the ()'s in the regex. What we told it to keep is stored in $files[1] (you can do a print_r($files) to see how this is setup)

    foreach ($files[1] as $file){
    $remotefile=$url.$file;
    //Make sure url path has a / on the end.
    ......
    //Print a done message.
    echo "Done copying file $remotefile to $currentSavePathn";
    }
    This section loops though the captured filenames and will pull them off the server using fopen(). I added a couple checks in there to make sure paths end in a / or \, even though they probably arn't needed.


    Finally, my script isn't really a ready to use thing that you can plugin to yours, it will need to be adapted, but all the more power to ya, good learning experience. Good luck w/ your project.
  16. #9
  17. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Posts
    53
    Rep Power
    12

    Thumbs up Light at the end of the tunnel.


    Thank you again, I see that there is light at the end of the tunnel. Your break down is very helpful and the time you spent is appreciated.

    I have not as much time each day as I would like but over the next few days I will be able to use what you have shown me to create the basic script that I need. I am confident of that.

    I will post another follow up at that point. Thank you again for your help. I really was stuck.

    Take care.
    - Brian
  18. #10
  19. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Posts
    53
    Rep Power
    12

    Cool GOT IT!!!!!!!!!!


    kicken Thank you so much for all your help. I got my uploader working BUT I have another issue now.

    I do READ the remote file with RB and WRITE the local file with WB (binary) but for some reason some of the .gif's get corrupted such that they will not display in the browsers.

    Strange thing is that if you take them to an online image editing site like Image Magik or Gif Works they can view the image, you can then FTP it back to yourself and woola it works again.

    If I view it from the remote server, the server from which it is being transfered FROM originally (using the directory mover) then it views just fine. This is how I have determined that it is being corrupted in transit or when being written.

    I am wondering if you have any idea how, what, why this is occurring and what I may be able to do about this. My clients will be doing quite a bit of image editing with these online editors as most of them are WebTV users and/or Ebay users.

    All suggestions and thoughts are greatly appreciated. Oh, it seems to ONLY be GIF file format so far that the problem is happening and there is a remedy and it is not all of the images either, only maybe one in 20, and only certain ones as well. So it does not happen all the time.

    Thanks again for the help, now the directory mover is up and running (with some error protections as well)

    - Brian
  20. #11
  21. Wiser? Not exactly.
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    5,906
    Rep Power
    3969
    Glad to know that it's working. As for the corruption problem, I've had that problem myself once, but I don't rembmer what I did to fix it. It may be just that there are a couple extra caracters being read or something. The first transloader I did used fsockopen() rather than fopen so it was a bit different in the way things were retrevied, but they should work the same none the less.
  22. #12
  23. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Posts
    53
    Rep Power
    12

    Question Site Copier is BACK.


    I am reviving this topic for two reasons. I have had good success with the script that I recieved so much help developing. Here is the main part of the script that I use to make copies of a remote site (per directory).

    This is the script that I use to populate an array with a list of directories and sub-directories. I am wondering if there is a way to adapt it to reading the directory struction on the remote server. In all cases the remote server(s) are Apache servers with directory listings turned on.

    PHP Code:
    $remote=fopen($remote_url.$tmpArray[1], 'rb');
    $data=fread($remote131072);
    if (
    strlen($data) > 131072)
    {
         
    $data "";
         echo 
    "<strong><span style=\"color:red\">".$remote_url.$tmpArray[1]."</span> was TOO big and was skipped!!!</strong><br>";
    }
    else
    {
         
    $total_written += strlen($data);
         
    $local=fopen($basedir.$directory_list.$tmpArray[1], 'wb');
         echo 
    $directory_list."<strong>".$tmpArray[1]."</strong> is<span style=\"color: red;\"> being written.</span><br>";
         
    fwrite($local$data);
         
    $last_file $tmpArray[1];
         
    fclose ($local);
         
    fclose ($remote);

    This is just the heart of the script but what I am starting to read up on is how to use fsocketopen() and the associated functions. I am pretty much lost on this as none of the manuals that I have cover this very well. The online manual has given me some info but I really don't quite get it.

    I am wanting to build a script that will copy an ENTIRE site without having to go directory by directory. I am wondering if using the fsocketopen() offers advantages and how might I go about copying more than one directory but having the script go down into the subdirectories.

    PHP Code:
    function dirsize($dir=".")
    {
         
    $sizes=array();
         
    $dirs=array();
         
    $handle=opendir("$dir");
         while(
    $file=readdir($handle))
              
    $retVal[]=$file;
         
    closedir($handle);
         
    asort($retVal);
         foreach(
    $retVal as $k=>$v)
         {
               if(
    $v!="." && $v!=".."
              {
                   
    $count ++ ;
                   if(
    is_dir("$dir/$v")) 
                  {
            
    $dirs[$count] = ("$dir/$v");
                        
    dirsize("$dir/$v");
                   }
              }
         }
         global 
    $xl;
         foreach ( 
    $dirs as $k=>$vv 
         {
             
    $vvv=(substr($vv,$xl));
             echo 
    "<option value=\"$vvv/\">$vvv/</option>";
         }

    Any help would be appreciated on the multitude of inquiries that I Have made. Thank you.
    - Brian

IMN logo majestic logo threadwatch logo seochat tools logo