#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2004
    Posts
    304
    Rep Power
    0

    Issues with Mozrepl in WWW::Mechanize::Mozilla


    good evening dear friends here at devshed

    well - I have some troubles with a perl-script that turns out to be not 100% optimal. now i am tryin to find a better solution either in perl or ruby - but if you have ideas to re-work the perl-script. i would be glad too.

    The question: Is there a way to specify Net::Telnet timeout with WWW::Mechanize::Firefox?
    At the moment my internet connection [a quite fast dsl one] is very slow and sometimes I get error

    PHP Code:
    with $mech->get():
    command timed-out at /usr/local/share/perl/5.12.3/MozRepl/Client.pm line 186 
    PHP Code:
    SEE THIS ONE:   $mech->repl->repl->timeout(100000); 

    Unfortunatly it does not work: Can't locate object method "timeout" via package "MozRepl"

    Documentation says this should:

    PHP Code:
    $mech->repl->repl->setup_client( { extra_client_args => { timeout => +80 } } ); 
    problem: I have a list of 2500 websites and need to grab a thumbnail screenshot (!) of them. How do I do that?
    I could try to parse the sites either with Perl.- Mechanize would be a good thing.
    Note: i only need the results as a thumbnails that are a maximum 240 pixels in the long dimension.
    At the moment i have a solution which is slow and does not give back thumbnails:
    How to make the script running faster with less overhead - spiting out the thumbnails

    PHP Code:
    My prerequisitesaddon/mozrepl/
    the module WWW::Mechanize::Firefox;
    the module imager 
    This is my source ... see a snippet [example]of the sites i have in the url-list.

    urls.txt [the list of sources in a file]

    www.google.com
    www.cnn.com
    www.msnbc.com
    news.bbc.co.uk
    www.bing.com
    www.yahoo.com - and so on and so forth...:


    What i have tried allready; here it is:


    PHP Code:
    #!/usr/bin/perl

    use strict;
    use 
    warnings;
    use 
    WWW::Mechanize::Firefox;

    my $mech = new WWW::Mechanize::Firefox();

    open(INPUT"<urls.txt") or die $!;

    while (<
    INPUT>) {
            
    chomp;
            print 
    "$_\n";
            
    $mech->get($_);
            
    my $png $mech->content_as_png();
            
    my $name "$_";
            
    $name =~s/^www\.//;
            
    $name .= ".png";
            
    open(OUTPUT">$name");
            print 
    OUTPUT $png;
            
    sleep (5);

    Well this does not care about the size:

    See the output commandline:

    PHP Code:
    linux-vi17:/home/martin/perl # perl mecha_test_1.pl
       
    www.google.com
        www
    .cnn.com
        www
    .msnbc.com
    command timed
    -out at /usr/lib/perl5/site_perl/5.12.3/MozRepl/Client.pm line 186
    linux
    -vi17:/home/martin/perl 
    Question: how to extend the solution either to make sure that it does not stop in a time out. Note again: i only need the results as a thumbnails that are a maximum 240 pixels in the long dimension.
    As a prerequisites, i allready have installed the module imager.
    How to make the script running faster with less overhead - spiting out the thumbnails


    i also tried out this one here:

    PHP Code:
    $mech->repl->repl->setup_client( { extra_client_args => { timeout => 5*60 } } ); 
    putting links to @list and use eval
    PHP Code:
    while (scalar(@list)) {
            
    my $link pop(@list);
            print 
    "trying $link\n";
            eval{
            
    $mech->get($link);
            
    sleep (5);
            
    my $png $mech->content_as_png();
            
    my $name "$_";
            
    $name =~s/^www\.//;
            
    $name .= ".png";
            
    open(OUTPUT">$name");
            print 
    OUTPUT $png;       
            
    close(OUTPUT);
            }
            if ($@){
              print 
    "link: $link failed\n";
              
    push(@list,$link);#put the end of the list
              
    next;
            }
            print 
    "$link is done!\n";



    Question: is there a Ruby / Python /PHP-Solution that runs more efficient - or can you suggest a Perl-solution that is more stable..


    Look forward to hear from you

    Thx for any and all help in advance

    have a great day

    greetings
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2004
    Posts
    304
    Rep Power
    0
    hello have running mozrepl

    but i ve gotten this error.

    PHP Code:
    martin@linux-wyee:~/perlperl moz_test1.pl
    Can
    't locate WWW/Mechanize/Firefox.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.16.0/i586-linux-thread-multi /usr/lib/perl5/site_perl/5.16.0 /usr/lib/perl5/vendor_perl/5.16.0/i586-linux-thread-multi /usr/lib/perl5/vendor_perl/5.16.0 /usr/lib/perl5/5.16.0/i586-linux-thread-multi /usr/lib/perl5/5.16.0 /usr/lib/perl5/site_perl/5.16.0/i586-linux-thread-multi /usr/lib/perl5/site_perl/5.16.0 /usr/lib/perl5/site_perl .) at moz_test1.pl line 4.
    BEGIN failed--compilation aborted at moz_test1.pl line 4.
    martin@linux-wyee:~/perl> ^C
    martin@linux-
    wyee:~/perl> 
    hmm - donno whats wrong here... try to fgure it out
  4. #3
  5. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,252
    Rep Power
    1810
    Actually, perl can be very fast when downloading HTML from a web server. It's often surprised me how I can receive and parse a response in a fraction of a second as compared to the speed with which my browser displays the same page.

    The difference of course, is that when I request a page with perl I only get a simple text response: the base HTML.

    When my browser requests the same page, it receives not only the html, but asks for all the associated CSS and javascript files as well. It requests all the linked images; has to wait on every ad server to deliver it's content. After that, it has to parse all of that data and then calculate and render a graphical display from all of the commands. Some of the data may have to be fetched by the executed javascript and more server requests made. It's complicated, time-consuming, and memory-intensive.

    And all of that is happening in compiled C, not perl. The only way to speed that up on your end is to make sure you have enough memory so there is no page swapping. Opening more threads could make the memory situation worse and slow you down. A modern browser uses a lot of RAM.

    If the perl-Firefox solution works at all, you are doing well. What you are doing is not a pure perl solution so much as it is trying to automate Firefox.

    What you really want is an operating system scripting language to automate the task of running mulitple applications. Perl wasn't designed for it, though there are hacks to make some things work on some systems.

    If you were using a Mac for example, I would recommend using Automator or AppleScript to control Safari and the Grab utility. I have no idea of the Windows equivalent.

    Regardless, it won't be super fast. It will run at about the speed of a human manually navigating to different pages.
    Last edited by keath; October 26th, 2012 at 09:51 AM.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2004
    Posts
    304
    Rep Power
    0
    hello dear Keath

    many thanks for this in depth-goning explanation. Allways very refreshing to talk to you. You do alot here -

    you show solutions and lead to ways that can be fruitful - even for newbies like me.

    I will try to bring the mozrepl (which is not running well at the moment)
    on my OpenSuse machine.

    I run 0penSuse 12.2 -

    I come back and report all the findings..

    greetings
    matze

IMN logo majestic logo threadwatch logo seochat tools logo