Thread: Parsing

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 1999
    Posts
    7
    Rep Power
    0
    I'm trying to collect data from an email account, I tried to figure out a way with PHP to access the mail file and look for the string in it, but I didn't have an idea how to do it.

    I came accross perl, but my expertise isn't that great either here. Some people also told me about grep, but couldn't really explain how.

    Maybe someone can tell me how I would access let's say a file called some.file and parse everything between <x> and </x> and write it to a file.

    Thanks,
    Till
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 1999
    Posts
    119
    Rep Power
    16
    to open a file, just use the open function (perldoc -f open).

    the easiest way to get the contents of that is to use read (perldoc -f read).

    then, once you've done that, it's a matter of using regular expressions. Those are a bit more difficult to understand, but try "man perlre".

    Here's an example of one that you may be able to use, haven't tested it and my skills at REs are not mastered (understatement).

    # contents of file in $c
    if ( $c =~ m|<x>([^</x>]*)</x>|im ) {
    $d = $1;
    }
    # now $d has everything b/w <x> and </x>

    I don't have time to look and see if '<' and '>' are special characters, but I don't think they are.

    I'm sure someone will correct me if this is wrong...

    good luck.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 1999
    Location
    Seattle
    Posts
    133
    Rep Power
    16
    Except you used ([^</x>]*). If the text between the tags contains an 'x', the regular expression will fail since character classes match if any character is in the class, not necessarily those in sequence.

    A better way is to just use ([^<]*) which will stop when the '<' of the end tag is found.

    Don
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 1999
    Posts
    33
    Rep Power
    16
    don, wouldn't that miss out all <'s?

IMN logo majestic logo threadwatch logo seochat tools logo