#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2005
    Posts
    355
    Rep Power
    11

    Regex to replace white space between brackets


    I know how to replace everything between brackets but not how to replace parts of it. I need a regex to replace all white space between curly brackets with  

    example
    $string="lorum {ipsum dolor sit} et amed {nucas nullum} est";
    after regex
    lorum {ipsum dolor sit} et amed {nucas nullum} est
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    833
    Rep Power
    496
    First, in which language? I ask this, because this is quite complicated and some regex constructs may not be available in all languages.

    I tend to think that you should not try to do this with pure regexes. I succeeded to do it, but by applying successions of regexes to:
    - extract the first substring between curly brackets (something like: $sub = $1 if $string =~ /(\{.*?\})/
    - modifying the substring, replacing the spaces
    - replacing the first substring by the modified substring
    - start again in the string from the point at the end of the first substring, and so on until there is no more match.

    This is quite complicated and I think you should probably use other methods to find the curlies (the index finction un Perl) , extract a substring, modify it with a regex, use a substring function to replace the first substring, and take it again from there.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2005
    Posts
    355
    Rep Power
    11
    it's in php
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    833
    Rep Power
    496
    Hi,

    This is a start in Perl terms:

    Code:
    $_ = "lorum {ipsum dolor sit} et amed {nucas nullum} est";
    $s = $1 if /(\{.*?\})/; $s2=$s; $s2 =~ s/ / /g; s/$s/$s2/;print;
    which prints:
    Code:
    lorum {ipsum dolor sit} et amed {nucas nullum} est
    (I know that the work is only half done, but keep on reading.)

    This makes relatively heavy use of Perl's default $_ special variable to make the syntax more concise (but less readable for those not knowing Perl). I don't know enough about PHP to write it in PHP, but if I write it again in simpler Perl, without using those special features of Perl, it would look more or less like this:

    Code:
    $string = "lorum {ipsum dolor sit} et amed {nucas nullum} est";
    $s = $1 if $string =~ /(\{.*?\})/;
    $s2 = $s1;
    $s2 =~ s/ / /g;
    $string =~ s/$s/$s2/;
    print $string;
    But this, of course, changes only the first string quoted between curly braces. How to continue from there? Well, in Perl, we can wrap this in a while loop and modify slightly the regular expression so that the next match starts only that the place where the previous match left it. Such a possibility most probably also exists also in PHP, but is almost certainly bound to be very different from the way it is done in Perl, because Perl uses really some of its unique features to do it, so there is no point for me to give you the full thing with this route in Perl.

    An alternative method is to modify this line:

    Code:
    $s = $1 if $string =~ /(\{.*?\})/;
    so that the match occurs only if there is at least one space in the string between the curly braces. In this case, if we wrap this in a while loop, the first string will be matched and spaces within it replaced in the first iteration in the loop, and, in the second iteration, the first string will no longer be recognized (there no space left), so that the second iteration will match the second substring to be modified, and so on until there is nothing left to me matched.

    Here is the new expression to match the substring between curly braces only if there are no spaces in it (using negated alternations):
    Code:
    $s = $1 if $string =~  /(\{(?:[^} ]+ )+[^} ]+\})/;
    So wrapping the code below:
    Code:
    $string = "lorum {ipsum dolor sit} et amed {nucas nullum} est";
    $s = $1 if $string =~ /(\{(?:[^} ]+ )+[^} ]+\})/;
    $s2 = $s1;
    $s2 =~ s/ / /g;
    $string =~ s/$s/$s2/;
    in a while loop that knows how to stop when there is no longer a match (it could be on whether or not $1 is defined) will do the trick. I'll leave it there for you to try to implement that in PHP

    But, as I said before, we are getting at fairly hairy regexes and relatively complicated algorithm. Have you contemplated the other proposal I made: using the built-in PHP functions (rather than regexes) to find the first { and the first }, extract the substring between { and }, and then only use regexes to modify the substring, and insert its replacement into the original string. And then repeat the process with the next substring until you are done. I think it will be much easier and far more readable than using pure regex constructs.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2005
    Posts
    355
    Rep Power
    11
    Thanks a lot for your comprehensive answer.
    I've tried to translate some of your thoughts and code into PHP but to no avail.
    I then came up with a totally different approach that works fine.
    First I put all matches (text between curly brackets) into an array using
    PHP Code:
    preg_match_all('/{([^}]*)}/',$text,$matched); 
    I then loop through the array replacing the matches in the original string as follows

    PHP Code:
    if(count($matched)>0){
    foreach(
    $matched[0] as $match){
    $match2=str_replace(" "," ",$match);
    $text=str_replace($match,$match2,$text);
    }

    Probably not the most eficcient way in terms of speed and ressource but it work for me.
  10. #6
  11. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    What you're doing is completely unnecessary effort. PHP (and I'm sure also Perl) can replace patterns with the return value of a callback function. So you don't need all this. In PHP, it's simply

    PHP Code:
    $test "lorum {ipsum dolor sit} et amed {nucas nullum} est";
    $result preg_replace_callback('/{[^}]*\}/', function ($match) {return str_replace(' '' '$match[0]);}, $test);

    var_dump$result ); 
    If you have an outdated PHP version (<= 5.3), you need to define the callback function with a normal function declaration and then pass the name as a string to preg_replace_callback.

    Note that this does not work with nested braces (like "{ abc { def } xyz}"). If you want to do this, it will get more complicated.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    833
    Rep Power
    496
    You are absolutely right Jacques, I did not think about having a callback function as a replacement part.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    833
    Rep Power
    496
    For the benefit of potentially interested readers, here is a way to use a (sort of callback) function within the replacement part of the s/// statement in Perl:

    Code:
    sub remove_sp { $_ = shift;  s/ /&nbsp;/g; $_;}
    my $string = "lorum {ipsum dolor sit} et amed {nucas nullum} est";
    $string =~ s/(\{[^}]+\})/remove_sp($1)/eg;
    $string now contains: "$test now contains: "lorum {ipsum&nbsp;dolor&nbsp;sit} et amed {nucas&nbsp;nullum} est".

    It is also possible to inline an anonymous function, e.g.:

    Code:
    $string =~ s/(\{[^}]+\})/(my $t = $1) =~ s! !&nbsp;!g; $t/ge;
    This last solution was proposed by another member of this forum, OmegaZero, after I reported that I had trouble finding the exact right syntax.

    He suggested an even shorter and simpler form:

    Code:
    $string =~ s/(\{[^}]+\})/join '&nbsp;', split ' ', $1/ge;

IMN logo majestic logo threadwatch logo seochat tools logo