#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2004
    Posts
    332
    Rep Power
    0

    Perl::Mechanize: how to itterate (or loop) on x pages


    up vote -1 down vote favorite


    good day dear community,

    i am heading for Perl-programming. I want to learn something. Well i am currently working on a small solution: I have tried various tutorials (examples of Mechanize - that i have found on the CPAN) not oll of them work - some of them are broken!

    Now i try t o get some real-world-task!

    Especially interesting for me as a PHP/Perl-beginner

    i have approximatley 10000 pages to parse



    See what i have found out - that the logic behind this search page: this search page
    http://katholisch.at/content/site/pfarrfinder/index.html

    is the following the pages are organized like they are organized like that:

    they are organized like that:

    PHP Code:
    http://Www.address/5307.html
    http://Www.address/5308.html
    http://Www.address/5309.html 
    Approach: to loop through a set of pages - that is the question: what i have so far:

    i want to take WWW::Mechanize - particularly for doing the form based search and selecting the individual entries.

    Hmm - i guess that the algorithm would be basically 2 nested loops: the outer loop runs the form based search, the inner loop processes the search results.

    PHP Code:
    $mech->follow_link(url_regex => qr/webgrab_path=http://evs2000.*?
    Id=d+$/, => $result_nbr); 
    well i guess that perl is as suitable as Python is. Isnt it?
  2. #2
  3. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,264
    Rep Power
    1810
    It's really just one loop, isn't it?

    pseudo code
    Code:
    foreach my $url (@urls) {
       my $response = $ua->post($url, \%data);
       my $result = parse_response($response);
    }
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2004
    Posts
    332
    Rep Power
    0
    hello dear keath

    many many thanks for the reply - great to hear from you. i am happy.

    thx for the hint - you re right

    Originally Posted by keath
    It's really just one loop, isn't it?

    pseudo code

    Code:
    foreach my $url (@urls) {
       my $response = $ua->post($url, \%data);
       my $result = parse_response($response);
    }

    great - well that looks right the solution that is needed.

    Well - and with that i can arrange the Mechanize-Part at all.
    note: the Goal: i have approximatley 10000 pages to parse

    they are organized like that:

    http://Www.address/5307.html
    http://Www.address/5308.html
    http://Www.address/5309.html

    you think that i arrange the different urls herein


    Code:
    foreach my $url (@urls) {
       my $response = $ua->post($url, \%data);
       my $result = parse_response($response);
    }
    [/QUOTE]




    note: the above infos in the initial posting refer to the task to parse the following
    results: see the page: http://katholisch.at/content/site/pfarrfinder/index.html


    Hmm - i guess that the algorithm would be basically need only one loop - your one!!! Many thanks Keath


    Well - and now i try to combine all with Mechanize and try to get some output..out of it.
    Guess that i try to organize all the output in a way that i get CSV-DATA...
    Last edited by metabo; October 3rd, 2012 at 01:18 PM.

IMN logo majestic logo threadwatch logo seochat tools logo