Perl Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPerl Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here
  #1  
Old February 27th, 2001, 03:32 PM
tron's Avatar
tron tron is offline
SwollenMember
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2000
Location: the master control
Posts: 234 tron User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 44 m 46 sec
Reputation Power: 9
Does anyone know how I can use perl to get the html source for a given URL?

For example...I want to get the html line for line from say http://www.yahoo.com and print it out to the screen.

Reply With Quote
  #2  
Old February 27th, 2001, 04:18 PM
mickalo's Avatar
mickalo mickalo is offline
Ole` Timer
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2000
Location: N.W. Iowa
Posts: 469 mickalo User rank is Private First Class (20 - 50 Reputation Level)mickalo User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 5 h 19 sec
Reputation Power: 8
Send a message via AIM to mickalo Send a message via MSN to mickalo
Thumbs up

check out the usage of the LWP module. this works great for this type of application. Once you have gathered the data from the remote web site, it's a matter then of extracting the data or strings you want out of the HTML by regrex.

Cheers,

Mickalo
__________________

Thunder Rain Internet Publishing

Custom Programming & Database development
Providing Personal/Business
Internet Solutions that work!

Reply With Quote
  #3  
Old February 27th, 2001, 08:30 PM
tron's Avatar
tron tron is offline
SwollenMember
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2000
Location: the master control
Posts: 234 tron User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 44 m 46 sec
Reputation Power: 9
yea, i had luck doing it using LWP. but the main box i am using does not have the module and i don't have root permission

any other ideas?

Reply With Quote
  #4  
Old February 28th, 2001, 12:58 AM
JonLed JonLed is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Aug 2000
Location: Indiana
Posts: 614 JonLed User rank is Corporal (100 - 500 Reputation Level)JonLed User rank is Corporal (100 - 500 Reputation Level)JonLed User rank is Corporal (100 - 500 Reputation Level)JonLed User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 4 h 49 m 49 sec
Reputation Power: 9
Download the module from cpan.org (you can search for it at http://search.cpan.org ) and just upload it to your directory. Then in your script, add this:
Code:
push(@INC, '.');
use LWP::Simple;

Make sure to keep the LWP directory structure intact.

Also notice that the pus() comes before the use/require.

Reply With Quote
  #5  
Old February 28th, 2001, 11:47 AM
tron's Avatar
tron tron is offline
SwollenMember
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2000
Location: the master control
Posts: 234 tron User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 44 m 46 sec
Reputation Power: 9
err. thank you guys for the help. i had tried using push(@INC,'.') but i get an error saying what INC includes, and my directory is not one of them. i even tried typing in my complete directory path. any ideas?

Reply With Quote
  #6  
Old February 28th, 2001, 11:53 AM
mickalo's Avatar
mickalo mickalo is offline
Ole` Timer
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2000
Location: N.W. Iowa
Posts: 469 mickalo User rank is Private First Class (20 - 50 Reputation Level)mickalo User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 5 h 19 sec
Reputation Power: 8
Send a message via AIM to mickalo Send a message via MSN to mickalo
Quote:
Originally posted by tron
err. thank you guys for the help. i had tried using push(@INC,'.') but i get an error saying what INC includes, and my directory is not one of them. i even tried typing in my complete directory path. any ideas?


Try using:
Code:
BEGIN {
unshift(@INC, "/path/to/folder");
}

use LWP::Simple;


Then put the module in the path specified above.

Mickalo

Reply With Quote
  #7  
Old March 1st, 2001, 01:14 PM
dsb dsb is offline
PerlGuy
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jan 2001
Posts: 714 dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 2 Days 15 h 44 m 20 sec
Reputation Power: 36
Send a message via AIM to dsb
Talking

You shouldn't need root in order to get the source for the page. Using the LWP::UserAgent in addition to the HTTP::Request, HTTP::Response will get the job done.

Pretty much what happens here is that you write an HTTP request that fools the remote_server into thinking you are a browser making a valid GET request. So, it returns the source. Take a look:

Code:

#!/usr/bin/perl

use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use URI::Heuristic;

use strict;

my $raw_url = "http://www.oreilly.com";   # URL of site you want source for
#my $url     = URI::Heuristic::uf_urlstr($raw_url);   # expands partial URLs
                                                      # only needed for urls
                                                      # like 'www.oreilly.com'
                                                      # not needed here

my $ua = LWP::UserAgent->new();   # Creates a virtual browser
$ua->agent("Mozilla/4.0");   # the browser type

my $req = HTTP::Request->new( GET => $raw_url );   # creates GET request but does not send
$req->referer("http://hi.there.com");  # obviously not a real URL - part of the HTTP headers

my $res = $ua->request($req);   # request is sent

my $src = $res->content(); # gets the source and assign to var...normally you'd want to check for
                           # errors with the 'is_error' function like so:
                           # '$res->is_error()'
                           # $res is your response object

print $src, "\n";  # print the source code


This worked on the dev server at work and I don't have root anywhere much less http://www.oreilly.com.

Hope that helps.


__________________
- dsb -
Perl Guy

Reply With Quote
  #8  
Old March 1st, 2001, 02:28 PM
JonLed JonLed is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Aug 2000
Location: Indiana
Posts: 614 JonLed User rank is Corporal (100 - 500 Reputation Level)JonLed User rank is Corporal (100 - 500 Reputation Level)JonLed User rank is Corporal (100 - 500 Reputation Level)JonLed User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 4 h 49 m 49 sec
Reputation Power: 9
You you been paying attention to what we've been saying :P? I'm guessing not, since we're trying to get the LWP module working for him.

I'll be more spacific with my answer from above (since it was right in the first place ).
Code:
push(@INC, '.');
use LWP::Simple;

@data = get('http://www.whatever.com');

print @data;


Now to make this work, you need to download the module from cpan.org, un tar/gz it, and the upload it to the directory your script is in. Ok -- this is the important part -- your directory structure should now look something like this:
./ <-- main directory, with the script trying to run.
./LWP <-- Directory containing ALL the lwp stuff.

If you still get the @INC error, that means that your structure is still not right. Mess around with it.

Reply With Quote
  #9  
Old March 1st, 2001, 02:41 PM
dsb dsb is offline
PerlGuy
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jan 2001
Posts: 714 dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 2 Days 15 h 44 m 20 sec
Reputation Power: 36
Send a message via AIM to dsb
Talking

Settle down there tiger...and apparently your way isn't working for him. Besides...TMTOWTDI...

BTW...in the docs for LWP::Simple, it suggests using UserAgent for more control over what's going on.


[Edited by dsb on 03-01-2001 at 01:48 PM]

Reply With Quote
  #10  
Old March 1st, 2001, 08:09 PM
tron's Avatar
tron tron is offline
SwollenMember
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2000
Location: the master control
Posts: 234 tron User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 44 m 46 sec
Reputation Power: 9
thanks for the help all. i just got the sysadmin to install pwl. thank you all for the posts.


Reply With Quote
  #11  
Old March 1st, 2001, 08:55 PM
JonLed JonLed is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Aug 2000
Location: Indiana
Posts: 614 JonLed User rank is Corporal (100 - 500 Reputation Level)JonLed User rank is Corporal (100 - 500 Reputation Level)JonLed User rank is Corporal (100 - 500 Reputation Level)JonLed User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 4 h 49 m 49 sec
Reputation Power: 9
If my method wasn't working, it's because he wasn't doing it right. _Your_ method used the exact same module (set) as mine... which is what I was trying to show him how to install.

And yes, the ::UserAgent will provide more control, but if a person don't know how to get ::Simple working, then why would they know how to use the UserAgent method?

Besides, I didn't mean what I said before in a negative way, so you don't need to tell me the settle down, tiger.

Reply With Quote
  #12  
Old March 1st, 2001, 09:36 PM
dsb dsb is offline
PerlGuy
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jan 2001
Posts: 714 dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level)dsb User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 2 Days 15 h 44 m 20 sec
Reputation Power: 36
Send a message via AIM to dsb
Talking

JonLed,
There is no doubt you know what you are talking about. I didn't realize he didn't have the library installed. I was merely suggesting a different method. One by which he might be able to understand how the HTTP protocol functions a little better. But, of course, if he doesn't have the library installed, nothing is going to work .

Didn't mean to step on anyone's toes.

Humble apologies.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPerl Programming > get html source


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support |