Hello dear Laurent, good evening,
many thanks for the quick reply. great to hear
Quote:
| Originally Posted by Laurent_R It is a useful module. The point is: what do you need, exactly? |
running a script gave back the following;
linux-wyee:/home/martin/perl # perl kath_test_1.pl
Loosdorf Ledochowskastra�e 4 3382 Loosdorf Telefonnummer: 02754 6257 FAX-Nummer: 02754 6257-4
linux-wyee:/home/martin/perl #
the script below gives back only a result - this one;
http://katholisch.at/content/site/pfarrfinder/address/4000.html
Quote:
Loosdorf
Ledochowskastraße 4
3382 Loosdorf
Telefonnummer: 02754 6257
FAX-Nummer: 02754 6257-4 |
see more results:
Quote:
Maria Neustift Neustift 28 4443 Maria Neustift Telefonnummer: 07250/204 FAX-Nummer: 07250/204-4 E-Mail: pfarre.marianeustift@dioezese-linz.at
Maria Puchheim Gmundner Stra�e 1b 4800 Attnang-Puchheim Telefonnummer: 07674/62334 FAX-Nummer: 07674/62334-4 E-Mail: pfarre.mariapuchheim@dioezese-linz.at
Maria Scharten Scharten 1 4612 Scharten Telefonnummer: 07272/5210
Maria Schmolln Maria Schmolln 2 5241 Maria Schmolln Telefonnummer: 07743/2209-12 FAX-Nummer: 07743/2209-17 E-Mail: pfarre.mariaschmolln@dioezese-linz.at
Mattighofen R�merstra�e 12 5230 Mattighofen Telefonnummer: 07742/2273 0676/87765221 FAX-Nummer: 07742/2273-22 E-Mail: propsteipfarre.mattighofen@dioezese-linz.at
Mauerkirchen Pfarrhofstra�e 4 5270 Mauerkirchen Telefonnummer: 07724/2262 |
it does count up - that is great!!
what i want is to force the script to run from 00000 to 10000 -
the results should be stored in a csv formatted way...
therfore i did the changes:
changed the $max_page_num to the max number and change $page to the starting number.
this will only print the data to stdout (console)
now i am trying to modify it... :-)
well i have to put it to the CSV-values.
usually this can be done with use Text::CSV_XS (where the Class::CSV is based on)
a friend also suggested me using Text::CSV which will load up Text::CSV_XS or,
Well at the moment all the results will only print the data to stdout (console) im sure that i can modify it... :-)
i just installed the Text::CSV_XS
took
it from here: http://search.cpan.org/~hmbrand/Text-CSV_XS-0.91/CSV_XS.pm
Now i try to figure out which attributes i do use
Laurent, what do you suggest!?
PHP Code:
#!/usr/bin/perl
## This is how i would go about doing what i understand about what your trying todo
## EXAMPLE only
use 5.014;
use strict;
use warnings;
use WWW::Mechanize;
use HTML::TokeParser;
use Data::Dumper;
my $target_url = 'http://katholisch.at/content/site/pfarrfinder/address/'; #base url
my $page = 4000; #page start number
my $format = '.html'; #ending format
my $max_page_num = 4100; #2300 max page number
#loop threw the pages
for (0..$max_page_num){
#get mech
my $mech = WWW::Mechanize->new();
#set agent
$mech->agent_alias('Windows Mozilla');
#this combines to make the url
my $url = $target_url . "$page" . "$format";
#get the page
$mech->get($url);
#get the page
my $page_content = $mech->content();
#filter the html
my $html = HTML::TokeParser->new(\$page_content);
#search and match
while(my $tag = $html->get_tag('strong')){
my $text = $html->get_trimmed_text('script');
say $text;
}
$page++;
}
1;
and now i ask - how should i add he CSV-Part to the other code?!
Any and all help is greatly appreciaated,
greetings martin
