The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages - More
> Regex Programming
|
Quotation Marks in a CSV file
Discuss Quotation Marks in a CSV file in the Regex Programming forum on Dev Shed. Quotation Marks in a CSV file Regular expressions forum covering PCRE and POSIX techniques, practices, and standards. Regular expressions help shorten coding time by providing the ability to compact many lines of code into one string.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

November 11th, 2009, 11:01 AM
|
|
Contributing User
|
|
Join Date: Feb 2008
Posts: 58

Time spent in forums: 12 h 17 m 51 sec
Reputation Power: 6
|
|
|
Quotation Marks in a CSV file
Hi,
I have a CSV file which has data stored similar to:
Code:
"r1c1", "r1c2", "r1c3"
"r2c1", "r2c2", "r2c3"
"r3c1", "r3c2", "r3c3"
"r4c1", "r4c2", "r4c3"
Fields are comma separated and rows are separated by a line break. A problem occurs however when I have data like this:
Code:
"r1c1", "r1c2 "quote something"", "r1, "quote", c3"
I am bit stuck with what I need to do, obviously I need to change the data to:
Code:
"r1c1", "r1c2 "quote something"", "r1, "quote", c3"
Is there a way to do this using regular expressions?
Any help is greatly appreciated!
Thanks,
Ian
|

November 11th, 2009, 08:36 PM
|
|
|
|
From my limited knowledge I would say the best easiest and most reliable solution would be to escape each record when the CSV file is made.
1. split the records into an array
2. loop through the array and "escape" each row
3. join the array again
This would allow for rogue single " characters and all sorts of different possibilities which might be impossible to catch with a regex.
>>
DELETED MY EXAMPLE - IT DIDN'T CATCH THE FIRST RECORD.
[/CODE]
Last edited by ryel01 : November 11th, 2009 at 08:47 PM.
|

November 12th, 2009, 05:54 AM
|
|
Contributing User
|
|
Join Date: Feb 2008
Posts: 58

Time spent in forums: 12 h 17 m 51 sec
Reputation Power: 6
|
|
Quote: | Originally Posted by ryel01 From my limited knowledge I would say the best easiest and most reliable solution would be to escape each record when the CSV file is made.
|
Thanks for your suggestion! Unfortunately I don't have control over the CSV when it is created, I can only deal with the data I am supplied which is in the above format!
Any other ideas?
|

November 12th, 2009, 04:11 PM
|
|
|
I don't think you can unambiguously parse that since you have quotes within quotes without any escaping.
My best shot would be to assume that quotes will always be paired in a string:
Code:
$line = '"r1c1", "r1c2 "quote something"", "r1, "quote", c3"';
while( $line =~ /" ( (?: [^"]+ | " [^"]* " )*? ) " (?: , | $ )/xg ) {
print $1;
}
# output:
# r1c1
# r1c2 "quote something"
# r1, "quote", c3
__________________
sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|