#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    7
    Rep Power
    0

    parse csv like fgetcsv


    I'm trying to create a PHP regex to parse lines from a csv file, similar to the fgetcsv function. Here is the regex:

    Code:
    (((?:")(([\d\w\s,]*|(?:(\\)")*)*([\d\w\s,]*|(?:(\\)")*)*)(?:(?<!\\)"))|([\d\w\s]*))(\s*)(?:,?)
    and here is the test string:

    Code:
      12  ,"this is a string, with ,comma"  ,   24, "yay",awesome, "this,is 
    a test", "string\"with\" double quote"
    I've been testing on http://regex101.com and it's super-close to working. The problem is in the last field ("string\"with\" double quote"). It's matching the preceding back-slashes even though they are supposed to be in a non-capturing group. Any help is greatly appreciated.
  2. #2
  3. Headless Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,972
    Rep Power
    9647
    A non-capturing group will not prevent an outer capturing group from capturing. All it will do is not create a new capturing group for it.

    A regex will not be able to get "string"with" double quote". That is going to require additional work.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    7
    Rep Power
    0
    Originally Posted by requinix
    A non-capturing group will not prevent an outer capturing group from capturing. All it will do is not create a new capturing group for it.

    A regex will not be able to get "string"with" double quote". That is going to require additional work.
    By "additional work", does that imply that it will require additional parsing with PHP?
  6. #4
  7. Headless Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,972
    Rep Power
    9647
    Should just be a matter of converting \\ to \ and \" to ".
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    7
    Rep Power
    0
    Originally Posted by requinix
    Should just be a matter of converting \\ to \ and \" to ".
    Just to clarify, you mean doing that with PHP, right? I don't know of a way to do that with a single regex.
  10. #6
  11. Headless Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,972
    Rep Power
    9647
    If you can do it with PHP then you should. You could do it with a regex (a second one, after the first, run on each quoted field in the line) but it's simpler and easier not to.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    7
    Rep Power
    0
    Originally Posted by requinix
    If you can do it with PHP then you should. You could do it with a regex (a second one, after the first, run on each quoted field in the line) but it's simpler and easier not to.
    That makes sense, thank you!

IMN logo majestic logo threadwatch logo seochat tools logo