Thread: REGEX help

    #1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    37
    Rep Power
    1

    REGEX help


    Okay, this is the crux of my problem, I have a string that I want to seperate according to a pattern.

    This is my example string "atg........tgaatg.......taaatg......tag"
    Where the dots represent the characters a,t,g, or c.

    What I want to do is match each ocurrence of atg.....tga/taa/ or tag, and slam that thing into an array. The REGEX I come up with, however, matches the entire string, even when I use the /g option. I don't want to use the split option because it destroys some of the data. Heres my code if that helps:

    Code:
    my $DNA7 = "atgtttaaataaatgggccgctagatgaaaaaatag";
    
    my @array = ($DNA7 =~ /(atg)(.+)(taa)|(tag)|(tga)/g);
    EDIt: I think I realize what my problem is. I don't think REGEX can solve my problem.
    Last edited by seedofwinter; December 23rd, 2013 at 08:07 AM. Reason: Never mind
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2007
    Posts
    765
    Rep Power
    929
    /.+/ matches as much as possible, /.+?/ matches as little as possible. I.e. /atg.+tga/ will match the entire string, but /atg.+?tga/ will stop at the first instance of "tga".

    Although you could solve it like that, when faced with this type of problem, it's often easier to describe things in terms of split:

    Code:
    F:\temp>type x.pl
    $x = 'atgtttaaataaatgggccgctagatgaaaaaatag';
    
    @y = split /(?<=tga|taa|tag)(?=atg)/, $x;
    
    print for @y;
    
    F:\temp>perl -l x.pl
    atgtttaaataa
    atgggccgctag
    atgaaaaaatag
    sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    37
    Rep Power
    1
    Originally Posted by OmegaZero
    /.+/ matches as much as possible, /.+?/ matches as little as possible. I.e. /atg.+tga/ will match the entire string, but /atg.+?tga/ will stop at the first instance of "tga".

    Although you could solve it like that, when faced with this type of problem, it's often easier to describe things in terms of split:

    Code:
    F:\temp>type x.pl
    $x = 'atgtttaaataaatgggccgctagatgaaaaaatag';
    
    @y = split /(?<=tga|taa|tag)(?=atg)/, $x;
    
    print for @y;
    
    F:\temp>perl -l x.pl
    atgtttaaataa
    atgggccgctag
    atgaaaaaatag
    Thanks for the help, Omega. It really helped.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    37
    Rep Power
    1
    Originally Posted by OmegaZero
    /.+/ matches as much as possible, /.+?/ matches as little as possible. I.e. /atg.+tga/ will match the entire string, but /atg.+?tga/ will stop at the first instance of "tga".

    Although you could solve it like that, when faced with this type of problem, it's often easier to describe things in terms of split:

    Code:
    F:\temp>type x.pl
    $x = 'atgtttaaataaatgggccgctagatgaaaaaatag';
    
    @y = split /(?<=tga|taa|tag)(?=atg)/, $x;
    
    print for @y;
    
    F:\temp>perl -l x.pl
    atgtttaaataa
    atgggccgctag
    atgaaaaaatag
    Also, on another note, that REGEX in the split looks really complicated. Do you have any sources, books or web, that you can recommend to me to understand it? LOL I am so used to doing split with the most basic of dilemeters that I don't even know where to begin in trying to comprehend that monster. lol
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,932
    Rep Power
    1225
    Here is one of if not the best resource book on regular expressions.

    Mastering Regular Expressions, 3rd Edition By Jeffrey E.F. Friedl
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    830
    Rep Power
    496
    Jeff Friedl's book is certainly in my view the best resource book on regexes, but for a shorter introduction on Perl regexes, you could try this:

    perlre

IMN logo majestic logo threadwatch logo seochat tools logo