Thread: Extract a float

    #1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2008
    Posts
    496
    Rep Power
    88

    Extract a float


    Hello,

    I'm trying to extract a float from the following strings:
    -10K
    10K
    2M
    <0.2%

    Basically, there can be textual information before the float and textual information after it. The regexp i'm using now is:

    /(\-?[0-9]+\.?[0-9]+)[^0-9]*/

    I'm not very good at regexps so i don't understand, why it does not match "-1K" string, but matches "-10K". I guess it has something to do with the "\.?[0-9]+" part. It should be one separate rule and the [0-9] should not be matched if the \. is not found. How do i make this entire part optional?
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2008
    Posts
    496
    Rep Power
    88
    I've modified it to:

    /(\-?[0-9]+\.[0-9]+|\-?[0-9]+)[^0-9]*/

    Seems to work now, but not sure if it won't cause any problems.
  4. #3
  5. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,846
    Rep Power
    6351
    You basically just want to pull the number bits out of the string, including a sign (if any), right? You're overcomplicating it:
    Code:
    /-?[0-9]+(\.[0-9]+)?/
    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2008
    Posts
    496
    Rep Power
    88
    Well, almost. I also wanted to extract the part, comming after the number so i just added the brackets to the last part [^0-9], but then i realised i was probably also overcomplicating it and i could've done it with /[0-9]+([^0-9]*)$/ (or probably even without the [0-9]+ part)
  8. #5
  9. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,846
    Rep Power
    6351
    Oh you want the bits afterward too. You don't even have to do [^0-9] you can just use a dot:

    Code:
    /(-?[0-9]+(\.[0-9]+)?)(.+?)$/
    That puts the number (with any sign) into $1, and the rest into $2

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2008
    Posts
    496
    Rep Power
    88
    Thanks, that's a neat expression. But now i remembered why i went through all the trouble. Sometimes there's a weird string like "1-2-3" and this should never be interpreted as a number (shouldn't be matched). Should i use something like [0-9]\-[0-9]\-[0-9] and add an additional expression for this situation or is it possible to modify your idea? Basically the rules generally are:

    * the string starts with a non-numeric character or nothing at all
    * followed by minus sign or not
    * followed by a number (possibly with a floating point)
    * the rest can't contain any numbers or minus signs

    I think an expression, based on these rules, should be able to match all the variants. But then again, i'm not that good in regexps, maybe i'm not thinking the way i should be thinking.


    Update

    I've modified your version of regexp to this:
    /^[^0-9\-]*(-?[0-9]+(\.[0-9]+)?)([^0-9\-]*)$/

    Seems to work, but i still had to change the .+? part to [^0-9\-] to make it work. If i didn't miss anything, that's probably the final solution. I'll add some more comments when i fully test it out.

    Update 2

    Well i've tested every possible situation that might occur (if i haven't forgotten something) and it works ;]

    Thanks a lot for your help!
    Last edited by murklys; October 5th, 2009 at 01:48 AM.

IMN logo majestic logo threadwatch logo seochat tools logo