Perl Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPerl Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old June 25th, 2009, 02:24 PM
titans8904 titans8904 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2007
Posts: 1 titans8904 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 17 m 3 sec
Reputation Power: 0
Pattern-matching an html document

I want to grab the HTML from a web page and insert certain lines into an array.

Code:
#!/usr/bin/perl -w
use LWP::Simple;

$html2009 = get('http://www.ustreas.gov/offices/domestic-finance/debt-management/interest-rate/yield_historical.shtml');
@dates2009 = ();

#scrape dates#
while ($line = $html2009){

        if ($line =~ m/^"<td class=\"smaller\" headers=\"1\">\d\d\/\d\d\/\d\d<\/td>"/i){

                push (@dates2009, $line);

        }

}

print @dates2009;


It occurred to me that I'm not really assigning a single line to $line, but the whole document. How do I read in one line at a time, or is there a better way to extract the data I'm looking for?

Reply With Quote
  #2  
Old June 25th, 2009, 02:28 PM
Axweildr's Avatar
Axweildr Axweildr is offline
'fie' on me, allege-dly
Dev Shed God 15th Plane (12000 - 12499 posts)
 
Join Date: Mar 2003
Location: in da kitchen ...
Posts: 12,277 Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)Axweildr User rank is General 56th Grade (Above 100000 Reputation Level)  Folding Points: 143856 Folding Title: Super Ultimate Folder - Level 1Folding Points: 143856 Folding Title: Super Ultimate Folder - Level 1Folding Points: 143856 Folding Title: Super Ultimate Folder - Level 1Folding Points: 143856 Folding Title: Super Ultimate Folder - Level 1Folding Points: 143856 Folding Title: Super Ultimate Folder - Level 1Folding Points: 143856 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 4 Months 1 Week 3 Days 13 h 14 m 41 sec
Reputation Power: 4721
Send a message via Google Talk to Axweildr
Orkut
@lines=split(/\n/, $html2009);
foreach (@lines) {
...

but looking at your pattern match, ^"<td>you want the start of the line to have [/u]"<td>[/u], is that likely

You may want to look into HTML::Parser on CPAN, there's a good few of them over there
__________________
--Ax
without exception, there is no rule ...
Heavy Haulage Ireland
Targeted Advertising Cookie Optout (TACO) extension for Firefox
The great thing about Object Oriented code is that it can make small, simple problems look like large, complex ones


09 F9 11 02
9D 74 E3 5B
D8 41 56 C5
63 56 88 C0
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski
Detavil - the devil is in the detail, allegedly, and I use the term advisedly, allegedly ... oh, no, wait I did ...

Last edited by Axweildr : June 25th, 2009 at 02:30 PM.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPerl Programming > Pattern-matching an html document


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump




 Free IT White Papers!
 
How to Present Effectively Online
This white paper offers practical and actionable advice on the key steps that any presenter should consider as they plan and execute a Webinar or online meeting.

 
Open Source Security Myths
Open Source Software (OSS) is computer software whose source code is available to the general public with relaxed or non-existent intellectual property restrictions (or arrangement such as the public domain), and is usually developed with the input of many contributors.

 
Power and Cooling Capacity Management for Data Centers
This paper describes the principles for achieving power and cooling capacity management.

 
Scalable, Fault-Tolerant NAS for Oracle - The Next Generation
For several years NAS has been evolving as a storage alternative for Oracle databases, and for good reason: NAS is quite often the simplest, most cost-effective storage approach for Oracle. Learn about the benefits that HP's approach to scalable NAS brings to Oracle environments in this comprehensive white paper.

 
Understanding Web Application Security Challenges
This white paper discusses many common threats and preventive measures for Web application security, and explains what you can do to help protect your organization.

 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 




© 2003-2009 by Developer Shed. All rights reserved. DS Cluster 4 Hosted by Hostway
Stay green...Green IT