#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2008
    Posts
    58
    Rep Power
    7

    Quotation Marks in a CSV file


    Hi,

    I have a CSV file which has data stored similar to:

    Code:
    "r1c1", "r1c2", "r1c3"
    "r2c1", "r2c2", "r2c3"
    "r3c1", "r3c2", "r3c3"
    "r4c1", "r4c2", "r4c3"
    Fields are comma separated and rows are separated by a line break. A problem occurs however when I have data like this:

    Code:
    "r1c1", "r1c2 "quote something"", "r1, "quote", c3"
    I am bit stuck with what I need to do, obviously I need to change the data to:

    Code:
    "r1c1", "r1c2 "quote something"", "r1, "quote", c3"
    Is there a way to do this using regular expressions?

    Any help is greatly appreciated!

    Thanks,

    Ian
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2005
    Posts
    227
    Rep Power
    17
    From my limited knowledge I would say the best easiest and most reliable solution would be to escape each record when the CSV file is made.

    1. split the records into an array
    2. loop through the array and "escape" each row
    3. join the array again

    This would allow for rogue single " characters and all sorts of different possibilities which might be impossible to catch with a regex.

    >>

    DELETED MY EXAMPLE - IT DIDN'T CATCH THE FIRST RECORD.

    [/CODE]
    Last edited by ryel01; November 11th, 2009 at 09:47 PM.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2008
    Posts
    58
    Rep Power
    7
    Originally Posted by ryel01
    From my limited knowledge I would say the best easiest and most reliable solution would be to escape each record when the CSV file is made.
    Thanks for your suggestion! Unfortunately I don't have control over the CSV when it is created, I can only deal with the data I am supplied which is in the above format!

    Any other ideas?
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2007
    Posts
    765
    Rep Power
    929
    I don't think you can unambiguously parse that since you have quotes within quotes without any escaping.

    My best shot would be to assume that quotes will always be paired in a string:

    Code:
    $line = '"r1c1", "r1c2 "quote something"", "r1, "quote", c3"';
    while( $line =~ /" ( (?: [^"]+ | " [^"]* " )*? ) " (?: , | $ )/xg ) {
        print $1;
    }
    
    # output:
    # r1c1
    # r1c2 "quote something"
    # r1, "quote", c3
    sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);

IMN logo majestic logo threadwatch logo seochat tools logo