#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    15
    Rep Power
    0

    Need a code to remove text strings


    Hi,

    I'm new to Python as any other programming language.
    I'd appreciate for a code to remove all sentences in the inverted commas (speech representation) from a text.

    Many thanks!
    Gabriele
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    157
    Rep Power
    2
    It seems like a good application of the re module:
    Code:
    >>> s="abc def ghi \"ajjfj aljjfoweul sjfajlj\" dklio ioia"
    >>> s
    'abc def ghi "ajjfj aljjfoweul sjfajlj" dklio ioia'
    >>> import re
    >>> pat=r'\"[^\"]*\"'
    >>> m=re.search(pat,s)
    >>> m.group(0)
    '"ajjfj aljjfoweul sjfajlj"'
    >>>
    So to really remove them, you would use:
    Code:
    re.sub(pat,"",s)
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    15
    Rep Power
    0

    Removing a text string...


    Hi Rrashkin,

    How does the entire code looks like? Does it start sth like:



    filename="C:/.../python/Dropcity.txt"
    fileobj=file(filename, "r")
    ptext=fileobj.read()
    fileobj.close()

    and then the rest of your code?

    Sorry for asking about so basic steps but I'm really really a beginner.

    Thanks,
    Gabriele


    Originally Posted by rrashkin
    It seems like a good application of the re module:
    Code:
    >>> s="abc def ghi \"ajjfj aljjfoweul sjfajlj\" dklio ioia"
    >>> s
    'abc def ghi "ajjfj aljjfoweul sjfajlj" dklio ioia'
    >>> import re
    >>> pat=r'\"[^\"]*\"'
    >>> m=re.search(pat,s)
    >>> m.group(0)
    '"ajjfj aljjfoweul sjfajlj"'
    >>>
    So to really remove them, you would use:
    Code:
    re.sub(pat,"",s)
  6. #4
  7. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    157
    Rep Power
    2
    gabrielemucho,
    Basically yes. except that I think
    Code:
    fileobj=file(filename, "r")
    should be
    Code:
    fileobj=open(filename, "r")
    So I think it would look like this:
    Code:
    import re
    filename="C:/.../python/Dropcity.txt"
    fileobj=open(filename, "r")
    ptext=fileobj.read()
    fileobj.close()
    pat=r'\"[^\"]*\"'
    ptext=re.sub(pat,"",ptext)
    Try that and see if it gives you anything plausible. There may be trouble with any linefeeds so perhaps we'll need to look into something there.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    15
    Rep Power
    0
    hmmm, what should I write for the print command?
    G

    Originally Posted by rrashkin
    gabrielemucho,
    Basically yes. except that I think
    Code:
    fileobj=file(filename, "r")
    should be
    Code:
    fileobj=open(filename, "r")
    So I think it would look like this:
    Code:
    import re
    filename="C:/.../python/Dropcity.txt"
    fileobj=open(filename, "r")
    ptext=fileobj.read()
    fileobj.close()
    pat=r'\"[^\"]*\"'
    ptext=re.sub(pat,"",ptext)
    Try that and see if it gives you anything plausible. There may be trouble with any linefeeds so perhaps we'll need to look into something there.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    15
    Rep Power
    0
    well, I think I've figured it out:

    print (ptext)

    It seems it returned the text with speech strings in the inverted commas left out. It seems it works.

    Many thanks!
    G


    Originally Posted by gabrielemucho
    hmmm, what should I write for the print command?
    G
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    15
    Rep Power
    0
    Originally Posted by rrashkin
    gabrielemucho,
    Basically yes. except that I think
    Code:
    fileobj=file(filename, "r")
    should be
    Code:
    fileobj=open(filename, "r")
    So I think it would look like this:
    Code:
    import re
    filename="C:/.../python/Dropcity.txt"
    fileobj=open(filename, "r")
    ptext=fileobj.read()
    fileobj.close()
    pat=r'\"[^\"]*\"'
    ptext=re.sub(pat,"",ptext)
    Try that and see if it gives you anything plausible. There may be trouble with any linefeeds so perhaps we'll need to look into something there.
    Hi Rrashkin,

    what would be the reverse code for selecting only those text strings that are surrounded by the inverted commas?

    Thanks,
    Gabriele
  14. #8
  15. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    157
    Rep Power
    2
    Use:
    Code:
    m=re.findall(pat,s)
    instead of the "re.sub()" stuff. Then "m" should be a list of all the matches.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    15
    Rep Power
    0
    Hi,

    s is not defined so the code is not working... how and when to define s?

    Thanks,
    Gabriele

    Originally Posted by rrashkin
    Use:
    Code:
    m=re.findall(pat,s)
    instead of the "re.sub()" stuff. Then "m" should be a list of all the matches.
  18. #10
  19. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    157
    Rep Power
    2
    Sorry, should have been "ptext ".
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    15
    Rep Power
    0
    Hi, no, this does not work either.

    It returns the error as follows: 'list' object has no attribute 'group'.

    But if I use this at the end of th code:
    m=re.search(pat, ptext)
    print m.group ()

    It returns the very first sentences in the inverted commas. And need the the output for all such sentences in the text.

    Originally Posted by rrashkin
    Sorry, should have been "ptext ".
  22. #12
  23. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    157
    Rep Power
    2
    The findall() function returns a list, so you don't need the group property of the match object. Sorry, I should have mentioned that earlier. So you would just do:
    Code:
    m=re.findall(pat,ptext)
    for i in m: print i
    unless you're using v3 in which case print() is a function so
    Code:
    m=re.findall(pat,ptext)
    for i in m: print(i)

IMN logo majestic logo threadwatch logo seochat tools logo