Perl Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPerl Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old November 12th, 2012, 06:19 PM
Laurent_R Laurent_R is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jun 2012
Posts: 502 Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 4 Days 18 h 50 m 40 sec
Reputation Power: 385
Regex to replace white space between brackets in Perl

Hi folks,

This is not a problem that I really need to solve, but I was trying to answer the question of someone in another section of this forum and got partly stuck.

In a question posted in the Regex section of this forum (http://forums.devshed.com/regex-programming-147/regex-to-replace-white-space-between-brackets-933871.html), the original poster asked the following question:

Quote:
Originally Posted by benwenger
I know how to replace everything between brackets but not how to replace parts of it. I need a regex to replace all white space between curly brackets with  

example
$string="lorum {ipsum dolor sit} et amed {nucas nullum} est";
after regex
lorum {ipsum dolor sit} et amed {nucas nullum} est


I warned that this was a bit complicated for a regex and was able to come up with a rather tedious solution that would iterate with a while loop through the string, extract each {...} substring, apply a simple substitution to that substring to replace the spaces, and then replace the original substring by the modified substring, and, in the next while iteration, would do the same thing to the next {...} substring, and so on until the job was done.

Something that works, but looks tedious and rather ugly. I also advised that I personally would probably not do it with regexes, but rather find the substrings with the index function, extract the substring, modify it with a regex, and replace the substring in place, doing the whole thing in a while loop until the job is done.

But then, another poster came with an illuminating remark on something that I had not thought about for one second (even though I knew it in theory, I have probably never used this functionality, so it had not come to my mind):

Quote:
Originally Posted by Jacques1
What you're doing is completely unnecessary effort. PHP (and I'm sure also Perl) can replace patterns with the return value of a callback function.


Jacques1 then gave a piece code in PHP to achieve the required result (the original question was about PHP), which is irrelevant here.

Of course. This is sooooh much better.

So, for the fun of it, I tried to do it in Perl, but found that it was not as easy as I thought.

I finally succeeded to do it this way, using a (sort of callback) function:

Code:
sub remove_sp {
    $_ = shift; 
    s/ / /g; 
    return $_;
}
my $test = "lorum {ipsum dolor sit} et amed {nucas nullum} est";
$test =~ s/(\{[^}]*\})/remove_sp($1)/eg;


This works fine, $test now contains: "lorum {ipsum dolor sit} et amed {nucas nullum} est", which was the required result.

It is pretty good and far better than the regex progressive match constructs within a while loop that I had suggested originally.

But I came up with that solution with a separate function definition only as a fall-back option after I tried unsuccessfully to inline what is in the remove_sp function above as an anonymous function in the replacement part of the s/// expression.

I tried all kinds of ways to inline an anonymous function, but, for example, something like this:
Code:
$test =~ s/(\{[^}]*\})/{$_=$1; s/  / /g}/eg

or
Code:
$test =~ s/(\{[^}]*\})/{$1 =~ s/  / /g}/eg

gave me an "Unmatched right curly bracket" error. I played with a number of variations on that, but I still can't find how to do it. I must be missing something or perhaps doing a silly mistake.



In brief, I am fairly sure it should be possible to do it in an anonymous or inline subroutine within the replacement section of the s/// statement and would like to understand why I don't find the right way to do it. Does anyone have an idea on how to solve this?

Thanks for your thoughts.

Reply With Quote
  #2  
Old November 12th, 2012, 06:53 PM
OmegaZero OmegaZero is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: May 2007
Posts: 737 OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level) 
Time spent in forums: 3 Weeks 4 Days 22 h 50 m 16 sec
Reputation Power: 928
Read Gory details of parsing quoted constructs. The problem is perl sees the '/' of the nested s/// as the end of the outer s///. If you use different delimiters it parses OK. Also you need to use a temporary variable since $1 is read-only. And finally you need to be running on a new enough perl that the regex engine is reentrant.
Code:
$test =~ s/(\{[^}]\})/(my $t = $1) =~ s! ! !g; $t/ge


For simplicity, I'd probably ditch the inner regex and write something like this:
Code:
$test =~ s/(\{[^}]\})/join ' ', split '\s', $1/ge
Comments on this post
Laurent_R agrees: Thank you for your enlightening ideas.
__________________
sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);

Reply With Quote
  #3  
Old November 13th, 2012, 01:11 AM
Laurent_R Laurent_R is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jun 2012
Posts: 502 Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 4 Days 18 h 50 m 40 sec
Reputation Power: 385
Thank you very much for your answer.

I will definitely read the article you mention.

I did try to use other delimiters, either on the inner or on the outer s/// statement, but did not succeed. PossiblyI had another syntax error at the time.

I am using Perl 5.10, but I would assume it is reentrant since I could do it with the function call which, I imagine, would have the same problem if ithe regex engine was not reentrant.

I also appreciate the split-join idea, it is a clever way of doing with simplicity.

Thanks a lot, your post shed a lot of light onto my mind and will help me making other tries in this direction in order to improve my comprehension of this whole shebang.

Reply With Quote
  #4  
Old November 13th, 2012, 06:59 AM
Laurent_R Laurent_R is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jun 2012
Posts: 502 Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 4 Days 18 h 50 m 40 sec
Reputation Power: 385
Quote:
Originally Posted by OmegaZero
Read Gory details of parsing quoted constructs. The problem is perl sees the '/' of the nested s/// as the end of the outer s///. If you use different delimiters it parses OK. Also you need to use a temporary variable since $1 is read-only. And finally you need to be running on a new enough perl that the regex engine is reentrant.
Code:
$test =~ s/(\{[^}]\})/(my $t = $1) =~ s! ! !g; $t/ge


For simplicity, I'd probably ditch the inner regex and write something like this:
Code:
$test =~ s/(\{[^}]\})/join ' ', split '\s', $1/ge


Hi OmegaZero,

I've tried now your suggestions, they did not work as posted. I thought that it had to do with the re-entrance problem (especially that I am now using a server with Perl 5.8 instal, not 5.10 as with my tests yesterday), but it turns out there is a simply small mistake in the code you presented (a + quantifier missing in the search part of the s/// statement). For the benefit of others reading this thread and wanting to test, these are your regexes with the correction of the minor error:

Code:
$test =~ s/(\{[^}]+\})/(my $t = $1) =~ s! ! !g; $t/ge;

Code:
$test =~ s/(\{[^}]+\})/join ' ', split ' ', $1/ge;


With these minor corrections, they work exactly as expected even on Perl 5.8 (even though this Perl version, juste as 5.10 I used yesterday, is not re-entrant). So the fact that newer Perl version are re-entrant must apply to some other functionnality of the Perl regex engine.

Thank you again for your input.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPerl Programming > Regex to replace white space between brackets in Perl

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap