#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2016
    Posts
    8
    Rep Power
    0

    How to include compound word with a dash (-) asign?


    Hi,

    I'm a total regExp newbie. How do we identify compound words (including a dash - sign) in a text with
    regExp? Such compound words include:
    T-cell
    co-receptor
    cell-Adhesion

    Thanks.
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2013
    Posts
    16
    Rep Power
    0
    I hope this helps:
    Code:
    /[A-Za-z]\-[A-Za-z]/
    Depending on host language, you might need to drop out wrapping slashes.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2013
    Posts
    16
    Rep Power
    0
    ... actually it is:
    Code:
    /[A-Za-z]+\-[A-Za-z]+/
    ... for words before and after the dash.

    Does anyone know how to edit posts?
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2016
    Posts
    8
    Rep Power
    0
    Originally Posted by ivanvodisek
    ... actually it is:
    Code:
    /[A-Za-z]+\-[A-Za-z]+/
    ... for words before and after the dash.
    I see. So, \- escape the dash (-) sign.

    And how do we apply it to specifics? For instance,

    let w = "T-cell"
    Code:
    // need to make w var clickable and load into a DIV
    // hence, the following w2 var
    <cfset w2 = '<a href="##" onclick="loadRightDoc(&apos;kPoints.cfm?t=&apos;,&apos;#w#&apos;)" style="color:##7A30A0">#w#</a>'>
    
    // body is a var whose value is a long text string
    // the second parameter for the following REreplace function is for RegExp
    // how do we construct it to escape -
    // how about ^-
    <cfset body = REreplace(body,"^-#w#","#w2#","all")>
    Thanks.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2013
    Posts
    16
    Rep Power
    0
    I guess you are going to use HTML + Javascript? For regexp, google out "regexp javascript". Then you have to construct your string with + sign from Javascript. Then you'll have to call DOM object methods from javascript - google out "innerHTML javascript" for dynamic node content.

    I'm afraid you are going to have to learn a bit of programming with Javascript. W3C has a nice reference to Javascript functions, but for learning, I recommend to google out some javascript tutorials.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2016
    Posts
    8
    Rep Power
    0
    No, no, sorry for misunderstanding, I'm using ColdFusion as the server side scripting language.
    The REreplace function is a part of it...
    As mentioned above, the var w could be "T-cell", then, how do I escape the - for the var, as in
    REreplace(body,"^-#w#","#w2#","all")

    Thanks.
    P.S. Sorry, I was using Javascript commenting convention but I'm not using Javascript for this need.
  12. #7
  13. Forgotten Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,013
    Rep Power
    9616
    Originally Posted by ivanvodisek
    Does anyone know how to edit posts?
    Editing isn't allowed on new accounts as an anti-spam measure, but "new" is a pretty low bar so you'll pass that soon. Don't worry about editing - making a follow-up post is perfectly fine.

    Originally Posted by regLearner
    So, \- escape the dash (-) sign.
    Actually that's only required when the hyphen is inside a [] character set. Inside, it represents a range (which is how "A-Z" works) so the backslash to escape it turns it into a regular hyphen. Outside of a character set, hyphens don't have any special meaning so they don't need to be escaped. (But it doesn't hurt.)

    Originally Posted by regLearner
    And how do we apply it to specifics?
    Your problem is a little more complicated: you need it to match a hyphenated word but ignore other HTML markup.

    The good news is that it doesn't look like there are hyphens being used anywhere else in that string, so you may be able to get away with the
    Code:
    [A-Za-z]+\-[A-Za-z]+
    that ivanvodisek suggested.
    Ignore all that.

    However, if you're asking about how to can use terms like "T-cell" in the regular expression then you don't need to do anything special: letters, apostrophes, and hyphens like I said above, can be used in a regex just fine.
    As for what you need to do in ColdFusion, it seems like
    Code:
    REreplace(body,"#w#","#w2#","all")
    would do the job. Note that the ^- was removed: they force the regex to match a hyphen at the beginning of the string and you don't want that.

    Comments on this post

    • Will-O-The-Wisp agrees : Thank you!
    Last edited by requinix; December 23rd, 2016 at 10:20 PM. Reason: not actually the problem
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2016
    Posts
    8
    Rep Power
    0
    Thanks for the note, however,
    <cfset body = REreplace(body,"#w#","#w2#","all")>
    didn't work.

    still found cell from T-cell

    good thing:
    found cell from cells

    Another issue:
    found "oncogene" from oncogenesis
    same is true below:
    found "cell" from cellular
    That is, we need to each item as a word vs. pure string match, how to add white space requirement (except start ^ and end $) ?

IMN logo majestic logo threadwatch logo seochat tools logo