#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    2
    Rep Power
    0

    Convert line like 'a,,c,,,,g' to 'a,"",c,"","","",g'


    Suppose I have a csv line like: a,,c,,,,g

    Obviously, two commas together means there is no element in that position. I need to replace the no element spot with an empty string (two consecutive double quotes). Given the above pattern, I am looking for a solution that produces: a,"",c,"","","",g

    How do I do that with regular expressions?

    I tried the pattern:
    /,,/g
    and applying the replacement :
    ,"",

    but that misses repeating comma instances. It gives back:
    a,"",c,"",,"",g <--notice there are still instances with two consecutive commas

    Thanks,
    Joe
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Location
    spaceBAR Central
    Posts
    229
    Rep Power
    42
    Try this:
    Code:
    echo "a,,c,,,,g,h,,j,k" | sed -e 's/,,/,"",/g' -e 's/,,/,"",/g'
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    2
    Rep Power
    0
    Interesting, I was ready to respond back that your simply applying the same regular expression twice, and that it wouldn't be able to handle a more radical line like:
    a,,c,,,,g,h,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,zz

    I was thinking such a line would require running the regex pattern many times. But then I ran your script and it worked on that problem also.

    What I failed to realized is that regex solution, when run twice, can handle any arbitrary number of commas. Works for me!

    Thank you!
  6. #4
  7. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,959
    Rep Power
    9397
    The problem is/was that the regex engine will continue from where the previous match stopped. Since the previous match included the second comma, if that was the start of another empty field then the engine wouldn't replace it.

    A lookahead would solve that.
    Code:
    s/,(?=,)/,""/g
    Note that nothing presented so far will work on the first field if it is empty, but that may not be a possibility.

IMN logo majestic logo threadwatch logo seochat tools logo