Thread: Sed Problem

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2003
    Posts
    22
    Rep Power
    0

    Sed Problem


    Hi, I am face with a problem with sed command here. I wan to find a string and replace it within a file.

    Eg.

    from
    <a href="BuildPage.cgi?body=index.html">

    change to

    <a href=BuildPage.cgi?body=index.html">

    what is the code tat i should do? thanks for helping..
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    137
    Rep Power
    0
    echo <string> | sed 's/"//'

    But I guess you want to delete the second " too:

    echo <string> | sed 's/"//g'
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2003
    Posts
    22
    Rep Power
    0

    Thanks ya~


    yupe actually i wan to replace both double quotes here..

    But, the file i wan to change got lots of double quotes and i just want to change the double quotes of the href action. So, how to do that ler? is it like tat

    sed 's/href=\"*.*\"/href=*.*/'
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    137
    Rep Power
    0
    You need a good regular expression to do this.

    I can give an example, but it depends on how the html page was written. More specific: how the <a href......> line is written.

    - Is it by itself on a line,
    - Do you use title="..." in the href tag,
    - Are there multiple html statements on 1 line,
    - etc

    Just 3 of many examples:

    <a href="BuildPage.cgi?body=index.html">
    <a href="BuildPage.cgi?body=index.html" title="Build Page">Name</a>
    <a href="BuildPage.cgi?body=index.html">Name</a><p class="article">

    For the above examples the following statement will do the trick:

    cat <htm_file> | sed 's/a href="\(.*\)html"/a href=\1html/'

    But you might need to change the regular expression for your specific needs.

    Hope this helps.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2003
    Posts
    22
    Rep Power
    0
    i try the code u type to me, but it doesn't work

    the output i want should be
    <a href=BuildPage.cgi?body=index.html>
    but the output i get is
    <a href=html>
    the word after the href and b4 the html all gone.

    beside this, sometimes the href not always end with html, for example
    <a href="BuildPage.cgi?body=index.html#adac">

    so, i wan to eliminate the double quotes of the code above too~

    do u hv any better solution? Thanks you
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    137
    Rep Power
    0
    Sorry, but the solution I gave does work:

    $ cat somepage.html

    <a href="BuildPage.cgi?body=index.html">
    <a href="BuildPage.cgi?body=index.html" title="Build Page">Name</a>
    <a href="BuildPage.cgi?body=index.html">Name</a><p class="article">

    $ cat somepage.html | sed 's/a href="\(.*\)html"/a href=\1html/'

    <a href=BuildPage.cgi?body=index.html>
    <a href=BuildPage.cgi?body=index.html title="Build Page">Name</a>
    <a href=BuildPage.cgi?body=index.html>Name</a><p class="article">

    As you can see, only the " around "BuildPage.cgi?body=index.html" are stripped all the rest stays the same.

    Did you make a typo??

    As for the ......html#sometag".... You need to adjust the regular expression for that:

    This will take care of the example you gave (and still works with the 3 example lines shown above):

    sed 's/a href="\(.*\)html[#]*[a-z]*"/a href=\1html/'
    Last edited by druuna; November 14th, 2003 at 07:29 AM.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2003
    Posts
    22
    Rep Power
    0
    Thanks for ur helping, now i already solve the first problem. Now, the second problem is the typo.

    from the code that u given to me, it will remove the typo behind. but actually i wan the typo to remain there.

    for example,
    <a href="BuildPage.cgi?body=index.html#abc">
    change to
    <a href=BuildPage.cgi?body=index.html#abc>
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    137
    Rep Power
    0
    Maybe you misunderstood me.

    Both examples I gave will do what you want.
    The second one will take care of the 'html#abc' stripping.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2003
    Posts
    22
    Rep Power
    0
    Thanks druuna, I solve the problem liao.. Thanks for ur guidance and helping~
    It does help me a lots. i appreciate it very much~
    all the best~

IMN logo majestic logo threadwatch logo seochat tools logo