#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2016
    Posts
    3
    Rep Power
    0

    Smile Need help with Pattren


    Hi all,

    i am having trouble get the text with regex, all the other text stay the same each time only the mark one change and that what i need.
    i will appreciate if someone can get me the right pattren

    --
    Code:
    הודעה מ 972-52-8647478
    אל: 495925384_180925652@reply.wms.telemessage.com(דואר אלקטרוני)
    
    
    בדיקה 13.12.16
    בדיקה 13.12.16
    
    
    <table dir="rtl" border="0" cellpadding="0" cellspacing="0" width="100%" align="left">
    <tr>
    	<td style="color: #000000; font-size: 9pt; font-family: Arial, Helvetica, sans-serif;"><I>השירות נתמך ע"י
    		<a style="color:#030091;font-size: 9pt; font-family: Arial, Helvetica, sans-serif; font-weight: bold;" href="http://www.telemessage.com/">טלמסג'</a></I><br>
    	</td>
    </tr>
    </table>


    Thanks alot for your time!!
  2. #2
  3. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,018
    Rep Power
    9616
    Base your regex according to the location of the בדיקה.

    What have you tried so far?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2016
    Posts
    3
    Rep Power
    0
    Originally Posted by requinix
    Base your regex according to the location of the בדיקה.

    What have you tried so far?


    i have tried to use stuff i saw at google from searching " get sting between 2 strings "
    because the text i need is always between the same strings :
    Code:
    start  - דואר אלקטרוני)
    
    end - <table
    
    using this  - (?<=beginningstringname)(.*\n?)(?=endstringname) regex.
    i think my problem is happening because of all those empty line .. but i cant get it work
  6. #4
  7. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,018
    Rep Power
    9616
    Are you trying to get the two dates, or literally the stuff that comes after the "(דואר אלקטרוני)" and before the <table>?

    If it's the two dates then the "start" is just the beginning of the line and the "end" is the בדיקה.
    Code:
    ^([\d.]+)(?= בדיקה)$
    If it's the stuff between then what you have seems to be a good start, however the bit in the middle needs to be fixed: . doesn't normally match newline characters so you'd need to enable a "single-line mode" in your regex engine, or if that's not available then be explicit about allowing anything with something like [\s\S]+ which would match spaces and non-spaces (so everything). You should also use lazy quantifiers - it's an optimization that will help.
    Code:
    (?<=beginningstringname)(.*?)(?=endstringname) in single-line mode
    (?<=beginningstringname)([\s\S]*?)(?=endstringname) otherwise
    Adding the question mark to .* makes it lazy: it will match as little as possible until it finds endstringname, as opposed to a regular .* which would match all the way until the end of the string and then start backtracking until it found endstringname.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2016
    Posts
    3
    Rep Power
    0

    Smile


    Originally Posted by requinix
    Are you trying to get the two dates, or literally the stuff that comes after the "(דואר אלקטרוני)" and before the <table>?

    If it's the two dates then the "start" is just the beginning of the line and the "end" is the בדיקה.
    Code:
    ^([\d.]+)(?= בדיקה)$
    If it's the stuff between then what you have seems to be a good start, however the bit in the middle needs to be fixed: . doesn't normally match newline characters so you'd need to enable a "single-line mode" in your regex engine, or if that's not available then be explicit about allowing anything with something like [\s\S]+ which would match spaces and non-spaces (so everything). You should also use lazy quantifiers - it's an optimization that will help.
    Code:
    (?<=beginningstringname)(.*?)(?=endstringname) in single-line mode
    (?<=beginningstringname)([\s\S]*?)(?=endstringname) otherwise
    Adding the question mark to .* makes it lazy: it will match as little as possible until it finds endstringname, as opposed to a regular .* which would match all the way until the end of the string and then start backtracking until it found endstringname.


    unfortunately i didnt manage to make it with regex so i use abit of a " long code ", but its worked thanks alot for your help you have been more then nice and helpfull !

    how i did it :
    -------------------
    for (char c : text.toCharArray()) { // text = all the text i post
    if (c == '<') // < is the end of the stuff i need
    break;
    j++; // counter to know when the stuff i need end
    }

    message = text.substring(87, j); // 87 is the index of where my text i need start
    ------------------


    thanks alot for the help again
    if someone can solve it with regex i will be more then thankfull ! :}

IMN logo majestic logo threadwatch logo seochat tools logo