Perl Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPerl Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old October 2nd, 2012, 02:55 PM
metabo metabo is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2004
Posts: 229 metabo Negative: is most likely a SPAMMER and a traitor to the cause. 
Time spent in forums: 2 Days 13 h 56 m 2 sec
Reputation Power: 0
Perl::Mechanize: how to itterate (or loop) on x pages

up vote -1 down vote favorite


good day dear community,

i am heading for Perl-programming. I want to learn something. Well i am currently working on a small solution: I have tried various tutorials (examples of Mechanize - that i have found on the CPAN) not oll of them work - some of them are broken!

Now i try t o get some real-world-task!

Especially interesting for me as a PHP/Perl-beginner

i have approximatley 10000 pages to parse



See what i have found out - that the logic behind this search page: this search page
http://katholisch.at/content/site/pfarrfinder/index.html

is the following the pages are organized like they are organized like that:

they are organized like that:

PHP Code:
 http://Www.address/5307.html
http://Www.address/5308.html
http://Www.address/5309.html 


Approach: to loop through a set of pages - that is the question: what i have so far:

i want to take WWW::Mechanize - particularly for doing the form based search and selecting the individual entries.

Hmm - i guess that the algorithm would be basically 2 nested loops: the outer loop runs the form based search, the inner loop processes the search results.

PHP Code:
 $mech->follow_link(url_regex => qr/webgrab_path=http://evs2000.*?
Id=d+$/, => $result_nbr); 

well i guess that perl is as suitable as Python is. Isnt it?

Reply With Quote
  #2  
Old October 3rd, 2012, 11:35 AM
keath's Avatar
keath keath is offline
!~ /m$/
Dev Shed Specialist (4000 - 4499 posts)
 
Join Date: May 2004
Location: Reno, NV
Posts: 4,084 keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level) 
Time spent in forums: 2 Weeks 4 Days 6 h 49 m 56 sec
Reputation Power: 1809
It's really just one loop, isn't it?

pseudo code
Code:
foreach my $url (@urls) {
   my $response = $ua->post($url, \%data);
   my $result = parse_response($response);
}

Reply With Quote
  #3  
Old October 3rd, 2012, 12:14 PM
metabo metabo is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2004
Posts: 229 metabo Negative: is most likely a SPAMMER and a traitor to the cause. 
Time spent in forums: 2 Days 13 h 56 m 2 sec
Reputation Power: 0
hello dear keath

many many thanks for the reply - great to hear from you. i am happy.

thx for the hint - you re right

Quote:
Originally Posted by keath
It's really just one loop, isn't it?

pseudo code

Code:
foreach my $url (@urls) {
   my $response = $ua->post($url, \%data);
   my $result = parse_response($response);
}



great - well that looks right the solution that is needed.

Well - and with that i can arrange the Mechanize-Part at all.
note: the Goal: i have approximatley 10000 pages to parse

they are organized like that:

http://Www.address/5307.html
http://Www.address/5308.html
http://Www.address/5309.html

you think that i arrange the different urls herein


Code:
foreach my $url (@urls) {
   my $response = $ua->post($url, \%data);
   my $result = parse_response($response);
}
[/QUOTE]




note: the above infos in the initial posting refer to the task to parse the following
results: see the page: http://katholisch.at/content/site/pfarrfinder/index.html


Hmm - i guess that the algorithm would be basically need only one loop - your one!!! Many thanks Keath


Well - and now i try to combine all with Mechanize and try to get some output..out of it.
Guess that i try to organize all the output in a way that i get CSV-DATA...

Last edited by metabo : October 3rd, 2012 at 12:18 PM.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPerl Programming > Perl::Mechanize: how to itterate (or loop) on x pages

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap