Regex Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreRegex Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old June 10th, 2009, 10:20 AM
xyexz xyexz is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 45 xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 16 h 17 m 51 sec
Reputation Power: 18
Python Regex - Replace all occurances of multiple spaces with dashes

Here is my scenario:
May 24th 2009 10:10:00 PM something something 10.10.10.10 url uri port something something "something 80 info" 10.10.10.10 something

I need the above string to have all occurances of " " (3 spaces) replaced with " - - " (space dash space dash space) and " " (2 spaces) replaced with " - " (space dash space).

The trick is if the spaces are in a set of double quotes I need them to be ignored.

I've tried lots of things and i'm just not seeing what to do, really have nothing to start with for a regex sorry.

I've got code I use now to take care of it, but it's more than one regex, my goal is to use one regex if it can be done.

Anyone got some ideas?

Reply With Quote
  #2  
Old June 10th, 2009, 11:46 AM
salem's Avatar
salem salem is offline
Contributed User
Click here for more information
 
Join Date: Jun 2005
Posts: 3,839 salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)  Folding Points: 153 Folding Title: Novice Folder
Time spent in forums: 2 Months 3 Weeks 2 Days 18 h 45 m 47 sec
Reputation Power: 1774
> I've got code I use now to take care of it, but it's more than one regex, my goal is to use one regex if it can be done.
So use it, and get on with solving more pressing problems.

You're not going to buy anything by making it one hellishly complicated regex. Even if you manage to create it, you'll need half a page of comment describing it in nauseating detail if you want to have any hope of changing it in the future (as early as next week say).

If it's a few simple steps, you'll be able to change any one of them in a few seconds with minimal effort.

Some hairy monster of the kind you're trying to get would take you another week to figure out.
__________________
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper

Reply With Quote
  #3  
Old June 11th, 2009, 06:06 AM
xyexz xyexz is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 45 xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 16 h 17 m 51 sec
Reputation Power: 18
Quote:
Originally Posted by salem
> I've got code I use now to take care of it, but it's more than one regex, my goal is to use one regex if it can be done.
So use it, and get on with solving more pressing problems.

You're not going to buy anything by making it one hellishly complicated regex. Even if you manage to create it, you'll need half a page of comment describing it in nauseating detail if you want to have any hope of changing it in the future (as early as next week say).

If it's a few simple steps, you'll be able to change any one of them in a few seconds with minimal effort.

Some hairy monster of the kind you're trying to get would take you another week to figure out.
That's all well and good, but I was just curious if someone had an actual solution possibly, not a paragraph trying to talk me out of it haha, no offense.
You tell me to "get on with solving more pressing problems", as if you know what problems I have?
As far as "hellishly complicated regex" goes, who says it has to be complicated, from my experiences some of the more crazy sounding regexs often end up looking very simplistic with a following moment of "Ah....". I will buy something with the regex, less lines and better on resources than what I have now.
Trust me when I say this, I don't need a lot of comments with my code and when I do use comments, clear and succinct is my goal... not long and nauseating.
Thank you for your words.

Reply With Quote
  #4  
Old June 11th, 2009, 07:01 AM
prometheuzz prometheuzz is offline
User 165270
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Oct 2005
Posts: 497 prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level) 
Time spent in forums: 5 Days 10 h 14 m 35 sec
Reputation Power: 936
Quote:
Originally Posted by xyexz
Here is my scenario:
May 24th 2009 10:10:00 PM something something 10.10.10.10 url uri port something something "something 80 info" 10.10.10.10 something

I need the above string to have all occurances of " " (3 spaces) replaced with " - - " (space dash space dash space) and " " (2 spaces) replaced with " - " (space dash space).

The trick is if the spaces are in a set of double quotes I need them to be ignored.

I've tried lots of things and i'm just not seeing what to do, really have nothing to start with for a regex sorry.

I've got code I use now to take care of it, but it's more than one regex, my goal is to use one regex if it can be done.

Anyone got some ideas?


I concur with the previous poster: doing this in one regex isn't advisable. Especially when you strings are large, this will lead to a poor performance.

But this will do the trick:

python Code:
Original - python Code
  1. import re
  2. text = 'abc  def   ghi "ab   cd" pkm'
  3. text = re.sub(' (?= )(?=([^"]*"[^"]*")*[^"]*$)', " -", text)
  4. print text
  5.  
  6. # output:
  7. #           abc - def - - ghi "ab   cd" pkm
  8.  


Note that it also replaces four successive spaces (outside quotes) with " - - - " and 5 successive spaces, etc.

Reply With Quote
  #5  
Old June 11th, 2009, 07:18 AM
xyexz xyexz is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 45 xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 16 h 17 m 51 sec
Reputation Power: 18
prometheuzz, thanks so much for this, I was on the right track with the positive lookaheads but lacked the extra positive lookahead at the beginning.
I know that this regex wouldn't be highly effecient on large strings given the (*) greedy matches but sometimes you can't get around using greedy regex, this would be one of those times.
My string data doesn't usually get over 300 chars in length.

Thanks again!

Reply With Quote
  #6  
Old June 11th, 2009, 07:25 AM
prometheuzz prometheuzz is offline
User 165270
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Oct 2005
Posts: 497 prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level)prometheuzz User rank is General (90000 - 100000 Reputation Level) 
Time spent in forums: 5 Days 10 h 14 m 35 sec
Reputation Power: 936
Quote:
Originally Posted by xyexz
prometheuzz, thanks so much for this, I was on the right track with the positive lookaheads but lacked the extra positive lookahead at the beginning. I know that this regex wouldn't be highly effecient on large strings given the (*) greedy matches but sometimes you can't get around using greedy regex,


It's more because of the look aheads and their contents that makes this regex not too efficient: for every white space the regex encounters, it will always look ahead to the end of the string, making it a quadratic running time Big-O speaking while a linear time algorithm is easily crafted "by hand".

Quote:
Originally Posted by xyexz
this would be one of those times.
My string data doesn't usually get over 300 chars in length.

Thanks again!


Ah, 300 characters is indeed peanuts.

Last edited by prometheuzz : June 11th, 2009 at 08:11 AM.

Reply With Quote
  #7  
Old June 13th, 2009, 12:03 PM
ghostdog74 ghostdog74 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Apr 2006
Posts: 177 ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level) 
Time spent in forums: 2 Days 21 h 34 m 32 sec
Reputation Power: 233
Quote:
Originally Posted by xyexz
Here is my scenario:
May 24th 2009 10:10:00 PM something something 10.10.10.10 url uri port something something "something 80 info" 10.10.10.10 something

I need the above string to have all occurances of " " (3 spaces) replaced with " - - " (space dash space dash space) and " " (2 spaces) replaced with " - " (space dash space).

The trick is if the spaces are in a set of double quotes I need them to be ignored.

I've tried lots of things and i'm just not seeing what to do, really have nothing to start with for a regex sorry.

I've got code I use now to take care of it, but it's more than one regex, my goal is to use one regex if it can be done.

Anyone got some ideas?

why make it so complicated. there's no need for regular expression. use the csv module
Code:
import csv
filename = "file"
reader = csv.reader(open(filename),delimiter=" ")
for row in reader: 
    for item in row:
        if "   " in item: #3 spaces
            item=item.replace("   "," - - "
        if "  " in item:
            item=item.replace("  "," - ")

Reply With Quote
  #8  
Old June 15th, 2009, 07:55 AM
xyexz xyexz is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 45 xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level)xyexz User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 16 h 17 m 51 sec
Reputation Power: 18
I'll take a look at your solution but to be honest one line of regex looks less complicated than what you just posted. Also you can't use a single space as your delimiter because spaces inside of double quotes must not be touched. If this doesn't touch those spaces then great!
I have the regex inside of a loop (regex compiled outside of the loop), so I'm not sure which would be more effecient something like this, or the regex.
Also I have no file reference it's just a string, can you supply this funciton just a string?
Thanks for the suggestion!

Reply With Quote
  #9  
Old June 17th, 2009, 07:09 AM
ghostdog74 ghostdog74 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Apr 2006
Posts: 177 ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level)ghostdog74 User rank is Captain (20000 - 30000 Reputation Level) 
Time spent in forums: 2 Days 21 h 34 m 32 sec
Reputation Power: 233
Quote:
Originally Posted by xyexz
I'll take a look at your solution but to be honest one line of regex looks less complicated than what you just posted.

i will give you an analogy. reading an essay written with english, versus reading an essay full of numbers, where the numbers represent the alphabets. which is more complicated?? that's what regex does. use symbols to represent logic. Its ok for short expressions, but if your string manipulation requirement gets more complex, being more verbose will help you alot. How much time have you wasted coming up with that regular expression?

Quote:
Also you can't use a single space as your delimiter because spaces inside of double quotes must not be touched. If this doesn't touch those spaces then great!

the module is for you to experiment and find out for yourself what's best. i am only providing you an example of how you can do it the easier way.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreRegex Programming > Python Regex - Replace all occurances of multiple spaces with dashes

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap