#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    2
    Rep Power
    0

    Parse file, split


    I have a file that is a list of cars with the year/make/model as the description. ex- 2011 Chevy Camaro

    Some of the cars have extended model names like:
    2011 Dodge Ram Crew Cab Short Bed

    What I want to do is have it parse and split to year/make/model and i have this

    Code:
    open (FILE, 'E:\ptest\cars.txt');
     while (<FILE>) {
     chomp;
     ($year, $make, $model) = split(" ");
     print "$year ";
     print "$make ";
     print "$model \n";
     
     }
     close (FILE);
     exit;

    I'm not sure how to adjust for the model name being more than one word.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    /dev/null
    Posts
    163
    Rep Power
    19
    perl Code:
    #! /usr/bin/perl
     
    use warnings;
    use strict;
     
    my ($year, $make, $model) = (0, 0, 0);
     
    open FILE, "< file";
    while (<FILE>) {
        chomp;
        if (/^([0-9]+)\s*([a-zA-Z]+)\s*(.*)$/) {
            $year  = $1;
            $make  = $2;
            $model = $3;
        }
        print "Year: $year, Make: $make, Model: $model\n";
    }
    close FILE;
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    2
    Rep Power
    0
    Could you explain the IF conditional?
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    /dev/null
    Posts
    163
    Rep Power
    19
    Code:
    /^([0-9]+)\s*([a-zA-Z]+)\s*(.*)$/
    This is a regular expression to match completely each line of your file.
    ^([0-9]+) => Match the year (actually any number) at the beginning of the line and hold in a group (referred later as group #1 or $1)
    \s* => Match any following whitespaces
    ([a-zA-Z]+) => Match the make (actually group of lower and upper case alphabets) following the whitespace and hold in the next group (referred later as group #2 or $2)
    \s* => Match any following whitespaces
    (.*)$ => Match the model that is more than a word (actually all the characters following the previous whitespace) until the end of line and hold in the next group (referred later as group #3 or $3)

    Hope that helps.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,930
    Rep Power
    1225
    If you read the documentation for the split function, you'll find that it accepts a limit param which is the maximum number of fields to use when splitting the string.

    In you case, your need to use a limit of 3.

    Code:
    my ($year, $make, $model) = split /\s+/, $_, 3);

IMN logo majestic logo threadwatch logo seochat tools logo