1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2010
    Rep Power

    Question Preg_replace: Changing patterns depending on structure in subsection

    Hi regex experts,

    Im searching for a regular expression for PHPs preg_replace() in order to adjust concatenated strings (no multi-line).

    These strings are fragments in German which form a sentence and therefore need to be (automatically) edited to adhere to grammar conventions.

    Roughly speaking, the regular expression should perform the following task:

    a starting point der
    and an ending point Zahl
    Match lowercase strings ending in e
    and append a n to each match

    If youre into german language a bit, you might probably know that ...der [...] Zahl denotes a dative declination caused by a preceding preposition like mit (with), e.g.:

    die positive Zahl (nominative ending in e which is the strings default condition)
    mit der positiven Zahl (dative ending in en due to preceding preposition mit)
    (In other words, words ending in e will get an affixed n.)

    Something like:
    "mit die ${attributes} Zahl"
    gets expanded to the full string:
    "mit die positive Zahl"

    Now, as a first step I apply the regex
    preg_replace('/(?<=mit)(\sdie)/', ' der', $full_string);
    to it and get:
    "mit der positive Zahl"

    Provided that I solely have one adjective attribute between der and Zahl, I might continue with a second positive lookbehind that points to der so Ill achieve "mit der positiven Zahl" which is the intended result.

    However, things are not that simple, Ill also encounter concatenations like:
    "mit der positive oder negative Zahl" (oder means or and mustnt be matched)
    > mit der positiven oder negativen Zahl

    "mit der gerade positive Zahl" [with the even positive number]
    > mit der geraden positiven Zahl

    "mit der grte negative Zahl" [with the largest negative number]
    > mit der grten negativen Zahl
    ... and so on ...

    That means, there might be more than just one match between der and Zahl and possible words that have to be skipped (oder).

    I pondered upon some kind of regex which first grasps the entire substring between der and Zahl, returns /\b[a-z]+?e\b/ matches, i.e. each pattern consisting of lowercase characters (incl. umlauts and the German sz ligature ) and ending in e, and finally appends a n.

    Unfortunately, I have no clue how to write this!
    Any ideas on that?

    Thank you very much for your help!
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2007
    Rep Power
    I don't have a single lined elegant solution for this and I do feel that we cannot obtain desired result with single preg_replace (and if anyone else have a solution, i would love to see).

    Something like this will work.
    $str = 'This is a string der positive Zahl and another der positive oder negative Zahl. so on... ok another positive oder negative and last der negative oder positive Zahl';
    if(preg_match_all('/\bder(.*?)Zahl\b/i', $str, $match, PREG_OFFSET_CAPTURE)){
        $shiftOffset = 0;
        for($i=0; $i<count($match[1]); $i++){
            if($i){ // do not change offset for single or first occurence
                $shiftOffset += substr_count($match[1][$i-1][0], 'e ');
                $match[1][$i][1] += $shiftOffset;
            $sub = str_replace('e ', 'en ', $match[1][$i][0]);
            $str = substr_replace($str, $sub, $match[1][$i][1], strlen($match[1][$i][0]));
    echo $str;
    and hey, it catches everything between der and Zahl so please modify it as per you requirement

    and please update this post with your final solution

IMN logo majestic logo threadwatch logo seochat tools logo