#1
  1. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,468
    Rep Power
    594

    Can regexp determine if string is present


    I'm writing a perl script that parses email headers. One type of header is the 'Received' header. When I detect a particular type IP address (private like 192.168.x.x) I need to determine if there is a host or domain name immediately following the 'from' string and extract it. For example:
    Code:
    Received: from mail.universia.net (baterno.univ.corp [192.168.41.103]) by daganzo.mx1.universia.net (Postfix) with SMTP id D191B1F5572; Thu,  7 Aug 2008 01:12:34 +0200 (CEST)
    I need to extract 'mail.universia.net'. The problem is I am not sure that host name will always be there. Somehow I need to know if there is an alphanumeric string consisting of any number of alphanumeric character strings separated by dots and extract it. White space would end the string and a non-alphanumeric means the string is not present. I hope I described that clearly. Thanks.
  2. #2
  3. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3886
    Can you also post an example of the data when that first host name (i.e. the mail.universia.net in that example) is not present?
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,468
    Rep Power
    594
    Thanks for the reply. I don't have a specific example as that condition is uncommon but does occur. Generally it will look quite similar except there will be parens or some other non-alphanumeric delimiting it like square brackets ([). I think it is safe to assume that if the first string after 'from' contains a non-alphanumeric (other then a dot) then there is nothing to extract. I should also note, in case it is not obvious, that the first string after 'from will be delimited by whitespace.
    Last edited by gw1500se; August 28th, 2008 at 07:39 AM.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2007
    Posts
    765
    Rep Power
    929
    This problem might be best solved without a regex. Since you're using perl you have all of CPAN available. A simple search turns up Mail::Field::Received and Email::Received.

    Since your fields are space-delimited, another method would be to split on whitespace, then step through the results and examining the one following a "from".
    sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,468
    Rep Power
    594
    Hmm. I was not aware of those perl modules. Thanks. I'll give it a try.

IMN logo majestic logo threadwatch logo seochat tools logo