Dev Shed Forums - Perl Programming http://forums.devshed.com/ Perl Programming forum discussing coding in Perl, utilizing Perl modules, and other Perl-related topics. Perl, the Practical Extraction and Reporting Language, is the choice for many for parsing textual information. en Mon, 19 Feb 2018 07:40:55 GMT vBulletin 60 http://forums.devshed.com/images/misc/rss.png Dev Shed Forums - Perl Programming http://forums.devshed.com/ Using printf http://forums.devshed.com/perl-programming/980023-using-printf-new-post.html Sun, 11 Feb 2018 15:16:27 GMT I have an echo command string operating inside of a while loop that provides me with the terminal output that I want, except that the field width for part of the output changes character length (example "This is 100% and this is 79%") consequentially the 100% changes less by 1 character and the 79% increases by 1 character, therefore adjusting the string length.
My sample rate is 1 second which can make it difficult to follow the string as it accordions on the command line.
I am sure I can stabilize this string with printf, but even though I have read and tried several things, I am not getting where I need to be.
My echo code is
Code:

echo -en "\rCPU: $DIFF_USAGE% CPU AVERAGE: $AVG%  \b\b"
I really need to format "$DIFF_USAGE%" and "$AVG%" output to 4 spaces - right justified.
I hope someone can help me out.
99% of what I have tried results in printf syntax errors. ]]>
Perl Programming additude http://forums.devshed.com/perl-programming-6/using-printf-980023.html
LWP or Mechanize : which way is the best method to use in a little Perl script http://forums.devshed.com/perl-programming/979915-lwp-mechanize-method-little-perl-script-new-post.html Sun, 28 Jan 2018 12:15:29 GMT dear Perl-Experts,



I'm trying to write a very simple Spider for web crawling. Here's the code:
note - it determines whether $url is in @visited. with a duplicate check using a hash:

and it fetches Urls
but i have to tinker it a bit: i want to fetch content. -that need to be a little tailoring.

finally i want to store all in a file.
or - even better in a CSV - formate.


see the code.


Code:

#!C:\Perl\bin\perl
use strict;
use warnings;
use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use HTML::LinkExtor;

open my $file1,">>", ("links.txt");
select($file1); 

my @urls = ('http://europa.eu/youth/volunteering/evs-organisation#open');
my %visited;  # The % sigil indicates it's a hash
my $browser = LWP::UserAgent->new();
$browser->timeout(5);

while (@urls) {
  my $url = shift @urls;

  # Skip this URL and go on to the next one if we've
  # seen it before
  next if $visited{$url};

  my $request = HTTP::Request->new(GET => $url);
  my $response = $browser->request($request);

  # No real need to invoke printf if we're not doing
  # any formatting
  if ($response->is_error()) {print $response->status_line, "\n";}
  my $contents = $response->content();

  # Now that we've got the url's content, mark it as
  # visited
  $visited{$url} = 1;

  my ($page_parser) = HTML::LinkExtor->new(undef, $url);
  $page_parser->parse($contents)->eof;
  my @links = $page_parser->links;

  foreach my $link (@links) {
    print "$$link[2]\n";
    push @urls, $$link[2];
  }
  sleep 60;
}

The results look like this:

Code:

http://www.cems.org/about/mission
http://www.cems.org/about-cems/overview/key-facts-figures
http://www.cems.org/about/alumni-profiles
http://www.cems.org/about/global

and so forth. Well - this is not what is wanted. I want to fetch the content of the page... with the links

europa.eu/youth/volunteering/evs-organisation#open

and when fetched the data of the first page - then i want to switch to the second page. and so forth...





i want to get the content of the pages that links point to. I have to fetch the content - and i have to do this:
Once i got the URLs i have to fetch the content. i want to do this with the current approach - then i have to take care that we fetch the content
.... I will try to achive that.

in other words: i want to get the content of $url, - this is in $contents; i want to fetch the content of pages that links point to, so i fetch those - like i got $url,
and once got their URLs (the links) the i need to fetch the content.


- one last note: surely this can be achieved while using Mechanize: surely we can use WWW::Mechanize instead and "follow" the links. i have had a closer look at the Link Methods

Quote:

Follows a specified link on the page. You specify the match to be found using the same parms that find_link() uses.

Here some examples:

3rd link called "download"

$mech->follow_link( text => 'download', n => 3 );

first link where the URL has "download" in it, regardless of case:

$mech->follow_link( url_regex => qr/download/i );

or

$mech->follow_link( url_regex => qr/(?i:download)/ );

3rd link on the page

$mech->follow_link( n => 3 );

the link with the url

$mech->follow_link( url => '/other/page' );

or

$mech->follow_link( url => 'http://example.com/page' );

Returns the result of the GET method (an HTTP::Response object) if a link was found. If the page has no links, or the specified link couldn't be found, returns undef.
$mech->find_link( ... )

Finds a link in the currently fetched page. It returns a WWW::Mechanize::Link object which describes the link. (You'll probably be most interested in the url() property.) If it fails to find a link it returns undef.

You can take the URL part and pass it to the get() method. If that's your plan, you might as well use the follow_link() method directly, since it does the get() for you automatically.

Note that <FRAME SRC="..."> tags are parsed out of the the HTML and treated as links so this method works with them.

You can select which link to find by passing in one or more of these key/value pairs:

text => 'string', and text_regex => qr/regex/,

text matches the text of the link against string, which must be an exact match. To select a link with text that is exactly "download", use

$mech->find_link( text => 'download' );

text_regex matches the text of the link against regex. To select a link with text that has "download" anywhere in it, regardless of case, use
and so forth ....
]]>
Perl Programming gibraltar http://forums.devshed.com/perl-programming-6/lwp-mechanize-method-little-perl-script-979915.html