Regex Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreRegex Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old November 11th, 2009, 11:01 AM
urbantricker urbantricker is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2008
Posts: 58 urbantricker User rank is Private First Class (20 - 50 Reputation Level)urbantricker User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 12 h 17 m 51 sec
Reputation Power: 6
Quotation Marks in a CSV file

Hi,

I have a CSV file which has data stored similar to:

Code:
"r1c1", "r1c2", "r1c3"
"r2c1", "r2c2", "r2c3"
"r3c1", "r3c2", "r3c3"
"r4c1", "r4c2", "r4c3"


Fields are comma separated and rows are separated by a line break. A problem occurs however when I have data like this:

Code:
"r1c1", "r1c2 "quote something"", "r1, "quote", c3"


I am bit stuck with what I need to do, obviously I need to change the data to:

Code:
"r1c1", "r1c2 "quote something"", "r1, "quote", c3"


Is there a way to do this using regular expressions?

Any help is greatly appreciated!

Thanks,

Ian

Reply With Quote
  #2  
Old November 11th, 2009, 08:36 PM
ryel01 ryel01 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2005
Posts: 227 ryel01 User rank is Sergeant (500 - 2000 Reputation Level)ryel01 User rank is Sergeant (500 - 2000 Reputation Level)ryel01 User rank is Sergeant (500 - 2000 Reputation Level)ryel01 User rank is Sergeant (500 - 2000 Reputation Level)ryel01 User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 15 h 4 m 42 sec
Reputation Power: 15
From my limited knowledge I would say the best easiest and most reliable solution would be to escape each record when the CSV file is made.

1. split the records into an array
2. loop through the array and "escape" each row
3. join the array again

This would allow for rogue single " characters and all sorts of different possibilities which might be impossible to catch with a regex.

>>

DELETED MY EXAMPLE - IT DIDN'T CATCH THE FIRST RECORD.

[/CODE]

Last edited by ryel01 : November 11th, 2009 at 08:47 PM.

Reply With Quote
  #3  
Old November 12th, 2009, 05:54 AM
urbantricker urbantricker is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2008
Posts: 58 urbantricker User rank is Private First Class (20 - 50 Reputation Level)urbantricker User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 12 h 17 m 51 sec
Reputation Power: 6
Quote:
Originally Posted by ryel01
From my limited knowledge I would say the best easiest and most reliable solution would be to escape each record when the CSV file is made.



Thanks for your suggestion! Unfortunately I don't have control over the CSV when it is created, I can only deal with the data I am supplied which is in the above format!

Any other ideas?

Reply With Quote
  #4  
Old November 12th, 2009, 04:11 PM
OmegaZero OmegaZero is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: May 2007
Posts: 737 OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level)OmegaZero User rank is General (90000 - 100000 Reputation Level) 
Time spent in forums: 3 Weeks 4 Days 23 h 23 m 50 sec
Reputation Power: 928
I don't think you can unambiguously parse that since you have quotes within quotes without any escaping.

My best shot would be to assume that quotes will always be paired in a string:

Code:
$line = '"r1c1", "r1c2 "quote something"", "r1, "quote", c3"';
while( $line =~ /" ( (?: [^"]+ | " [^"]* " )*? ) " (?: , | $ )/xg ) {
    print $1;
}

# output:
# r1c1
# r1c2 "quote something"
# r1, "quote", c3
__________________
sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreRegex Programming > Quotation Marks in a CSV file

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap