|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Stop making mediocre tutorials.The best tutorials are video! Camtasia Studio makes it easy to create engaging, buzz-building screen videos at any size, in any popular format. Download the free trial!
|
|
#1
|
|||
|
|||
|
Hey All --
I'm having problems getting a pattern match to work the way I need it to. I have loaded the following HTML from a file into a scalar, using chomp to remove newlines: Code:
<!--start1--> <link href="sidenavs.css" rel="stylesheet" type="text/css" /> <div id="sidenavs"> <h1>Products</h1> <ul> <li><a href="#" class="side1"><!--startside1-->Application Virtualization<!--endside1--></a> <div id="subnavs1" style="display: none;"> <ul> <li><a href="#" class="sub1"><!--startsub1-->Activity Recording<!--endsub1--></a> <li><a href="#" class="sub2"><!--startsub2-->Rollback<!--endsub2--></a> <li><a href="#" class="sub3"><!--startsub3-->Application Migration<!--endsub3--></a> <li><a href="#" class="sub4"><!--startsub4-->Workspace Mobility<!--endsub4--></a> <li><a href="#" class="sub5"><!--startsub5-->Application Footprint Cleanup<!--endsub5--></a> </ul> </div> <li><a href="#" class="side2"><!--startside2-->Asset Management<!--endside2--></a> <li><a href="#" class="side3"><!--startside3-->Remote Terminal<!--endside3--></a> <li><a href="#" class="side4"><!--startside4-->What's New<!--endside4--></a> </ul> </div> <!--end1--> <!--start2--> <link href="sidenavs.css" rel="stylesheet" type="text/css" /> <div id="sidenavs"> <h1>Technology</h1> <ul> <li><a href="#" class="side5"><!--startside5-->Virtualization<!--endside5--></a> <li><a href="#" class="side6"><!--startside6-->SaaS<!--endside6--></a> <li><a href="#" class="side7"><!--startside7-->Deployment<!--endside7--></a> <li><a href="#" class="side8"><!--startside8-->Reporting<!--endside8--></a> </ul> </div> <!--end2--> <!--start3--> <link href="sidenavs.css" rel="stylesheet" type="text/css" /> <div id="sidenavs"> <h1>Resources</h1> <ul> <li><a href="#" class="side9"><!--startside9-->Trial Evaluation<!--endside9--></a> <li><a href="#" class="side10"><!--startside10-->Demo<!--endside10--></a> <li><a href="#" class="side11"><!--startside11-->ROI Calculator<!--endside11--></a> <li><a href="#" class="side12"><!--startside12-->Links to external sources (hidden)<!--endside12--></a> </ul> </div> <!--end3--> <!--start4--> <link href="sidenavs.css" rel="stylesheet" type="text/css" /> <div id="sidenavs"> <h1>Company</h1> <ul> <li><a href="#" class="side13"><!--startside13-->About Us<!--endside13--></a> <li><a href="#" class="side14"><!--startside14-->Investors<!--endside14--></a> <li><a href="#" class="side15"><!--startside15-->Office Locations<!--endside15--></a> </ul> </div> <!--end4--> <!--start5--> <link href="sidenavs.css" rel="stylesheet" type="text/css" /> <div id="sidenavs"> <h1>Support</h1> <ul> <li><a href="#" class="side16"><!--startside16-->Login<!--endside16--></a> </ul> </div> <!--end5--> <!--start6--> <link href="sidenavs.css" rel="stylesheet" type="text/css" /> <div id="sidenavs"> <h1>Partners</h1> <ul> <li><a href="#" class="side17"><!--startside17-->MSP<!--endside17--></a> <li><a href="#" class="side18"><!--startside18-->Become a partner<!--endside18--></a> <li><a href="#" class="side19"><!--startside19-->Login<!--endside19--></a> </ul> </div> <!--end6--> <!--start7--> <link href="sidenavs.css" rel="stylesheet" type="text/css" /> <div id="sidenavs"> <h1>News</h1> <ul> <li><a href="#" class="side20"><!--startside20-->Press Releases<!--endside20--></a> <li><a href="#" class="side21"><!--startside21-->In the News<!--endside21--></a> <li><a href="#" class="side22"><!--startside22-->Events<!--endside22--></a> <li><a href="#" class="side23"><!--startside23-->Media Kit<!--endside23--></a> </ul> </div> <!--end7--> I am then attempting to remove the links (<a> tags) in a couple places with the following: Code:
if ($sideunlink){
$holder =~ s/<a href=".*" class="sidenavlit">(<!--startside$sidelit-->.*<!--endside$sidelit-->)<\/a>/$1/;
}
if ($subunlink){
$holder =~ s/<a href=".*" class="subnavlit">(<!--startsub$sublit-->.*<!--endsub$sublit-->)<\/a>/$1/;
}
$sideunlink and $subunlink are parameters passed by the web browser running the script (it's cgi). The statement for $sideunlink appears to work and the link tags are removed, leaving only the text. The statement for $subunlink does not seem to be matching the way I want. Here is the entire script: Code:
#!/usr/bin/perl -wT
use strict;
use CGI ':standard';
use CGI::Carp qw(fatalsToBrowser);
my $sidegroup = param('sidegroup'); # indciates which sidenav group to show
my $sidelit = param('sidelit'); # indicates which sidenav is to be lit
my $sideunlink = param('sideunlink'); # any non-zero value indicates the lit sidenav should not be linked
my $subgroup = param('subgroup'); # indicates which subnav group to show
my $sublit = param('sublit'); # indicates which subnav is to be lit
my $subunlink = param('subunlink'); # any non-zero value indicates the lit subnav should not be linked
my $holder; # holds the navs to be printed
### Open nav template and put into $holder
### You must chomp because the upcoming pattern matching breaks on newlines
open(NAVS,"sidenavs.html") || die "Couldn't open file: $!";
foreach (<NAVS>){
chomp;
$holder .= $_;
}
close(NAVS);
### Extract the required group of navs
$holder =~ m/(<!--start$sidegroup-->.*<!--end$sidegroup-->)/g;
$holder = $1;
### Apply styles according to params
if ($sidelit){
$holder =~ s/class="side$sidelit"/class="sidenavlit"/;
}
if ($subgroup){
$holder =~ s/<div id="subnavs$subgroup" style="display: none;">/<div id="subnavs">/;
}
if ($sublit){
$holder =~ s/class="sub$sublit"/class="subnavlit"/;
}
### Remove links if specified by params
if ($sideunlink){
$holder =~ s/<a href=".*" class="sidenavlit">(<!--startside$sidelit-->.*<!--endside$sidelit-->)<\/a>/$1/;
}
if ($subunlink){
$holder =~ s/<a href=".*" class="subnavlit">(<!--startsub$sublit-->.*<!--endsub$sublit-->)<\/a>/$1/;
}
### Apply styles to non-specific navs (those not specified in params)
$holder =~ s/class="side\d+"/class="sidenav"/g;
$holder =~ s/class="sub\d+"/class="subnav"/g;
### Print the navs
print "Content-type: text/html\n\n";
print $holder;
When run with the following parameters: Code:
sidenavs.pl?sidegroup=1&sidelit=1&sideunlink=1&subgroup=1&sublit=3&subunlink=1 The result is the following (I have formatted for readability, though the actual output has no newlines): Code:
<!--start1-->
<link href="sidenavs.css" rel="stylesheet" type="text/css" />
<div id="sidenavs">
<h1>Products</h1>
<ul>
<li>
<!--startside1-->
Application Virtualization
<!--endside1-->
<div id="subnavs">
<ul>
<li>
<!--startsub3-->
Application Migration
<!--endsub3-->
<li><a href="#" class="subnav">
<!--startsub4-->
Workspace Mobility
<!--endsub4-->
</a>
<li><a href="#" class="subnav">
<!--startsub5-->
Application Footprint Cleanup
<!--endsub5-->
</a>
</ul>
</div>
<li><a href="#" class="sidenav">
<!--startside2-->
Asset Management
<!--endside2-->
</a>
<li><a href="#" class="sidenav">
<!--startside3-->
Remote Terminal
<!--endside3-->
</a>
<li><a href="#" class="sidenav">
<!--startside4-->
What's New
<!--endside4-->
</a>
</ul>
</div>
<!--end1-->
I want to get this result, though (which I'm not getting): Code:
<!--start1-->
<link href="sidenavs.css" rel="stylesheet" type="text/css" />
<div id="sidenavs">
<h1>Products</h1>
<ul>
<li>
<!--startside1-->
Application Virtualization
<!--endside1-->
<div id="subnavs">
<ul>
<li><a href="#" class="subnav">
<!--startsub1-->
Activity Recording
<!--endsub1-->
</a>
<li><a href="#" class="subnav">
<!--startsub2-->
Rollback
<!--endsub2-->
</a>
<li>
<!--startsub3-->
Application Migration
<!--endsub3-->
<li><a href="#" class="subnav">
<!--startsub4-->
Workspace Mobility
<!--endsub4-->
</a>
<li><a href="#" class="subnav">
<!--startsub5-->
Application Footprint Cleanup
<!--endsub5-->
</a>
</ul>
</div>
<li><a href="#" class="sidenav">
<!--startside2-->
Asset Management
<!--endside2-->
</a>
<li><a href="#" class="sidenav">
<!--startside3-->
Remote Terminal
<!--endside3-->
</a>
<li><a href="#" class="sidenav">
<!--startside4-->
What's New
<!--endside4-->
</a>
</ul>
</div>
<!--end1-->
I have tried using a ? after the various .* in the script, but have not had any success. I'm assuming I am misusing .* in some way, but I don't know how. Any help would be appreciated. Thanks! |
|
#2
|
||||
|
||||
|
Using regexps to parse HTML is a bad idea in general. You're far better off using a proper tag-aware HTML parser to do it for you. Useful modules include HTML::TokeParser or HTML::TokeParser::Simple.
__________________
~ishnid; Have you tried: [ search.cpan.org | perldoc | Java API | mysql.com | google ] Apostrophes are NOT used for possessive pronouns or for noun plurals, including acronyms. |
|
#3
|
|||
|
|||
|
Quote:
The other thing i noticed in the past is when you use regex in an if statement you may want to change your second if to an elseif especially if you are looping through these two regex's multiple times. It has something to do with having a match the first time with the first regex. Once the match has been made, the second time though it sees and remembers that there was already a match and skips the second if...?! Just try it my way and see if you dont get it to actually work. Good luck! JOhn |
![]() |
| Viewing: Dev Shed Forums > Programming Languages > Perl Programming > Regex/pattern matching trouble |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|
|