Thread: Parsing Help

    #1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2000
    Location
    Salem, OR, USA
    Posts
    41
    Rep Power
    14

    I am trying to filter out the data from a
    wrapup data file that contains conversation
    between a support staff and a customers. I then stuff those informations into a MySQL database. I know how to seperate the data if it is only one line, but how do you get data from multiple lines. For example (see below): The wrapup note contains multiple lines. How do I assign all those line to a single string variable? So, I can stuff the database with it. Do I screen for the line "******** Agent Notes **********" and make it a starting point, then screen for the line "*** Customer's Email Message ***" and make it an ending point. Take everything in between these two line and assign it to a variable?

    Some sample code (reg. exp.) please!

    Thanks a lot.

    --DVN

    SAMPLE DATA FILE:

    ********** Binding ************
    Email Address: ThaBomb@weareit.com
    Problem Type: Pricing/Promotion
    Customer Type: Consumer

    ******** Agent Notes **********
    Wrap-up Note:
    this is email wrap up from conversation
    between DVN and customer #12345. Customer
    inquired on prices for image capture
    software for Nikon microscope 12/20/00.

    *** Customer's Email Message ***
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2000
    Location
    Southern California
    Posts
    73
    Rep Power
    14
    Here's one possible solution:

    <BLOCKQUOTE><font size="1" face="Verdana,Arial,Helvetica">code:</font><HR><pre>
    # slurp in the whole file
    undef $/;

    open FILE, "<$file" or die "Cannot open $file: $!n";
    $input = <FILE>;
    close FILE;

    foreach my $section ( split /**?s*(.*?)s***/, $input) {
    chomp($section);
    $section =~ s{^s+|s+$}{}g;
    next unless $section =~ m{Agents+Notes};

    ($notes) = $section =~ m{Wrap-ups+Note.(.*)$};
    $agent_notes{some_identifier} = $notes;
    }
    [/code]

    *Note that I had to use a kludge for the "Wrap-up" regexp because this forum CGI turns part of it into a smiley, even if you request that smilies be disabled :-(


    [This message has been edited by vpopper (edited January 01, 2001).]
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2000
    Location
    Salem, OR, USA
    Posts
    41
    Rep Power
    14

    Thank you very much for your respond. Looking at your code briefly, it looks like it will work. But I don't understand one line:
    ($notes) = $section =~ m/Wrap-ups+Note.(.*)$/;

    ($notes) <=== How does this work?

    This is the first time I seen this usage.

    --DN
  6. #4
  7. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2000
    Location
    Southern California
    Posts
    73
    Rep Power
    14
    <BLOCKQUOTE><font size="1" face="Verdana,Arial,Helvetica">quote:</font><HR>Originally posted by ThaBomb:
    I don't understand one line:
    ($notes) = $section =~ m/Wrap-ups+Note.(.*)$/;

    ($notes) <=== How does this work?
    [/quote]

    $notes will be assigned the value of the matching text in the parens of the regexp, i.e. everything after "Wrap-up Note:". You could think of it as $1.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2000
    Location
    Salem, OR, USA
    Posts
    41
    Rep Power
    14

    Hello vpopper,

    I finally had a chance to try your code snippet today and I couldn't get it to work. Am I doing something wrong?

    I tried split /**?s*(.*?)s***/ but it didn't work. So I changed it to split /^*+s*(.*?)s**+$/ and it still didn't work.

    Here is my code:

    #!/usr/bin/perl

    $/;

    $datadir = "/home/httpd/html/wrap";
    open(INFILE, "$datadir/sssss") &#0124; &#0124; die "Cannot open file: $!n";
    $input = <INFILE>;
    close(INFILE);

    open(OUTFILE,"> $datadir/sample_output.txt") or die "Cannot open file: $!n";
    foreach my $section ( split /^*+s*(.*?)s**+$/, $input)
    {
    chomp($section);
    $section =~ s/^s+|s+$//g;
    next unless $section =~ m/Agents+Notes/;
    ($notes) = $section =~ m/Wrap-ups+Note.(.*)$/;
    $agent_notes{agent_notes} = $notes;

    # Testing printing
    print OUTFILE "=============================n";
    print OUTFILE "Header ...... $section n";
    print OUTFILE "$agent_notes{agent_notes} n";
    print OUTFILE "==============================n";
    }
    close(OUTFILE);

    Here is my data:

    ************* Agent Notes ******************
    Wrap-up Note:
    ok


    ********* Customer Transcript **************
    Transcript:
    Connect Wednesday, November 01, 2000 - 05:11:09 PM
    Connected. Ready to assist customer Ijattsu.

    Greeting message: Wednesday, November 01, 2000 - 05:11:09 PM
    Greetings from blah blah blah
  10. #6
  11. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2000
    Location
    Southern California
    Posts
    73
    Rep Power
    14
    [QUOTE]Originally posted by ThaBomb:
    I finally had a chance to try your code snippet today and I couldn't get it to work. Am I doing something wrong?

    This line:
    $/;

    Should be changed to this:
    undef $/;

    The $/ variable is the input record separator, defaulted to a newline character. If we undef it, we slurp in the whole file rather than one line at a time.

    Since you didn't undef it, you only read one line. This line would cause it to be skipped:

    next unless $section =~ m/Agents+Notes/;

    Also, if you are going to output the section as it is read, you don't need to store it in a hash. You can just output $notes. But you'll also need some identifier for the notes, unless you are just printing them all without association.


IMN logo majestic logo threadwatch logo seochat tools logo