#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2004
    Posts
    252
    Rep Power
    0

    From text to csv with Text::CSV_XS


    hello dear perl-fans


    first of all - many thanks for the help. you helped me alot so far...


    well i was running a script (see below) gave back the following;

    linux-wyee:/home/martin/perl # perl kath_test_1.pl
    [see below]

    Loosdorf Ledochowskastra�e 4 3382 Loosdorf Telefonnummer: 02754 6257 FAX-Nummer: 02754 6257-4

    linux-wyee:/home/martin/perl #
    the script below gives back result like this one;
    Loosdorf
    Ledochowskastraße
    3382 Loostown
    Telefonnummer: 0002754 6257
    FAX-Nummer: 0002754 6257-4

    see more results:
    Marias Neustift Neustift 28 4443 Maria Neustift Telefonnummer: 007250/204 FAX-Nummer: 07250/204-4 E-Mail: prre.marianeustift@dioezese-linz.at
    Marias Puchheim Gmundner Stra�e 1b 4800 Attnang-Puchheim Telefonnummer: 007674/62334 FAX-Nummer: 07674/62334-4 E-Mail: prre.mariapuchheim@dioezese-linz.at
    Marias Scharten Scharten 1 4612 Scharten Telefonnummer: 007272/5210
    Marias Schmolln Maria Schmolln 2 5241 Maria Schmolln Telefonnummer: 007743/2209-12 FAX-Nummer: 07743/2209-17 E-Mail: prre.mariaschmolln@dioezese-linz.at
    Mattighofen R�merstra�e 12 5230 Mattighofen Telefonnummer: 007742/2273 0676/87765221 FAX-Nummer: 07742/2273-22 E-Mail: peipfarre.mattighofen@dioezese-linz.at
    Mauerkirchens Pfarrhofstra�e 4 5270 Mauerkirchen Telefonnummer: 007724/2262


    it does count up - that is great!!

    1 what i wanted is to force the script to run from 00000 to 10000 -
    note: the results should be stored in a csv formatted way...

    for 1. therfore i did the changes: changed the $max_page_num to the max number and change $page to the starting number. this will only print the data to stdout (console)


    now i am trying to modify it... :-)

    well i have to put it to the CSV-values.

    usually this can be done with use Text::CSV_XS (where the Class::CSV is based on).
    Note: A friend also suggested me using Text::CSV which will load up Text::CSV_XS or,

    Well at the moment all the results will only print the data to stdout (console) im sure that i can modify it... :-)

    i just installed the Text::CSV_XS
    took it from here: http://search.cpan.org/~hmbrand/Text-CSV_XS-0.91/CSV_XS.pm


    now i try to figure out which attributes i do use


    what do you suggest!?
    How to force the script to give back CSV


    here the script without any CSV-modules

    BTW- THIS IS THE SCRIPT - BUT WITHOUT ANY CSV - THINGS...


    kath_test_1.pl
    PHP Code:
      #!/usr/bin/perl 
      
      
      ## This is how i would go about doing what i understand about what your trying todo 
      ## EXAMPLE only 
      
      
    use 5.014
      use 
    strict
      use 
    warnings
      
      use 
    WWW::Mechanize
      use 
    HTML::TokeParser
      use 
    Data::Dumper
      
      
    my $target_url 'http://katholisch.at/content/site/pfarrfinder/address/'#base url 
      
    my $page 4000#page start number 
      
    my $format '.html'#ending format 
      
    my $max_page_num 4100#2300 max page number 
      
      
      #loop threw the pages 
      
    for (0..$max_page_num){ 
          
    #get mech 
          
    my $mech WWW::Mechanize->new(); 
          
    #set agent 
          
    $mech->agent_alias('Windows Mozilla'); 
          
          
    #this combines to make the url 
          
    my $url $target_url "$page"$format"
          
          
    #get the page 
          
    $mech->get($url); 
          
          
    #get the page 
          
    my $page_content $mech->content(); 
          
          
    #filter the html    
          
    my $html HTML::TokeParser->new(\$page_
    content
    ); 
          
          
    #search and match 
          
    while(my $tag $html->get_tag('strong')){ 
          
          
    my $text $html->get_trimmed_text('script'); 
          
          
    say $text
          } 
          
          
          
          
    $page++; 
          
      } 
      
      
      
    1
    question

    how to combine the mechanize script with the one that takes care for the
    text-to-csv-transformation.
    Last edited by metabo; October 5th, 2012 at 05:27 AM.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2004
    Posts
    252
    Rep Power
    0
    by the way

    there has to be some sanitizing as well..

    there has to be some iso 8859 sanitizing....

    PHP Code:
     use Text::CSV::Encoded;
     
    my $csv Text::CSV::Encoded->new ({
         
    encoding_in  => "iso-8859-1"# the encoding comes into   Perl
         
    encoding_out => "cp1252",     # the encoding comes out of Perl 

IMN logo majestic logo threadwatch logo seochat tools logo