#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    1
    Rep Power
    0

    LWP browser->get challenge


    I convert data for a living and have not dealt with browsers or the internet directly with Perl. However, recently a client asked us to directly download their data from their secure website. This was something new (and exciting!) that I had not done so I went off, did some research, and wrote a program that has worked fairly well. Recently I ran the program to download the data and received a certificate error, which I had never seen before. OK, so, researched that, added code, now it by-passes that. However, my other challenge I have not been able to resolve is this... the program doesn't download the entire page of data any longer. It gets maybe 90% - 95% of the page and then stops and moves on to the next page of data. The only difference I can think of is that I upgraded from Activestate 5.10 to 5.16 but, I wouldn't think that would make a difference but it might. If I use the URL directly in my browser (any page of data) the entire page of data downloads just fine so ... I'm not sure what you guys might need to help out but, I need to be conscience of proprietary information.

    Here is the major piece of code doing the work, with names changed to protect the innocent.

    while ($more) {
    $page++;
    $url = "https://[server name is here]/[path information here]/$element/HAY/?page=$page";
    $filepage = "0" x (3 - length($page)) . $page;
    $response = $browser->get($url,':content_file' => $tempxml,);
    $file = "$output\\$element" . "_" . $filepage . ".xml";
    $response = $browser->get($url,':content_file' => $file,);
    die "Couldn't get $url\n" unless defined $response;
    $more = &check_tmp;
    unlink ("temp.xml");
    print "Completed $element page \($page\) file \($filepage\) \($more\) ...\
    }

    Because there is more than one page of data and I do not know the last page of data I use a temp.xml file to download the data then check the file to see if it has data, if it does I copy it to another location then delete temp.xml and basically grab the next page of data and loop that until no more page data is available.

    To get past the certificate issue I added code...

    $browser = LWP::UserAgent->new(ssl_opts => { verify_hostname => 0,
    SSL_verify_mode => SSL_VERIFY_NONE});

    I also have browser credentials, etc. that work fine. So, any clue as to why I am no longer getting the entire page of XML data any longer?

    And thanks for your time folks!
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,930
    Rep Power
    1225
    Please use the code tags when posting your code.

    Please see my answer on your cross post at perguru.

IMN logo majestic logo threadwatch logo seochat tools logo