Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    15
    Rep Power
    0

    Optimizing speed in double loop


    Hello Perlers,

    I am having an issue I really hope someone can help me with. It's basically for inventory control of items.
    I am reading in from 2 different files:
    1) items going out
    2) items existing everywhere in inventory

    I have a working version of this, which is fine but it's way too slow. Please forgive me as my perl skills are novice and I haven't written anything in a long time either.

    Here's an example of what is inside items.txt:
    (Item number, count going out, item name, item static location, amount of items per box)

    Code:
    0020339        [12]   ITEM1                 CAA011A   12
    9010000        [4]   ITEM2              CAA011A   4
    9031484        [24]   ITEM3                 CAA011A   24
    9138655        [24]   ITEM4      CAA011A   24
    1222827        [12856]   ITEM5                 CAA021C   12
    Now, I need to match the first number (item number not the CAA location) to all the occurences inside the export.txt and string it all together.

    Here's an example of what is inside export.txt:
    (unimportant, static location, unimportant, unimportant, unimportant, unimportant, item number, amount of item on hand(*in itemspercase "bpc" value), amount of item available(*initemspercase "bpc" value), unimportant, unimportant, unimportant, receive date, unimportant, unimportant, unimportant, unimportant, unimportant, unimportant, unimportant)

    | CP6|CAA011A | |001|PAL| |20339 | 444 | 444 |EA | 71| 0 | 0 |08/01/2012| | | | |10/12/2012| | |
    | CP6|CAA011A | |001|PAL| |9010000 | 268 | 268 |EA | 38| 0 | 0 |08/29/2012| | | | |10/11/2012| | |
    | CP6|CAA011A | |001|PAL| |9031484 | 216 | 216 |EA | 15| 0 | 0 |09/21/2012| | | | |10/12/2012| | |
    | CP6|CAA011A | |001|PAL| |9103893 | 318 | 318 |EA | 1| 0 | 0 |12/05/2011| | | | |10/12/2012| | |
    | CP6|R54121A | |001|PAL| |9103893 | 1,104 | 1,104 |EA | 71| 0 | 0 |08/01/2012| | | | |10/11/2012| | |
    (see the match but different location)

    I need to match the static location from items.txt and go through a huge list of export.txt and find all the lines with matching item numbers so I can find dynamic locations where I can get items from, to refill the static locations if they're low/empty.

    The following code works but it's slow. It took 15 minutes to do 500 compares, I will most likely have between 2,000 and 5,000 compares.

    Again, forgive my code, it has been a long night and I'm definitely rusty:

    Code:
    #!/usr/bin/perl
    
    open(FILE, "<items.txt");
    @itemarr = <FILE>;
    close(FILE);
    
    open(FILE2, "<export.txt");
    @exportarr = <FILE2>;
    close(FILE2);
    
    foreach $line (@itemarr){
    # Convert 4+ spaces into 3 spaces for splitting by space delimiter further down.
    $line =~ s/\s\s\s+/\ \ \ /gi;
    
    $line =~ s/^00//gi;
    
    # Split by 3-spaces delimiter
    ($material, $amount, $desc, $pickloc, $bpc) = split("   ", $line);
    # Remove item commas, to stop problems with numbers over 1,000
    $amount =~ s/,//gi;
    # Remove the brackets I used with quotename[]
    $amount =~ s/\[//gi;
    $amount =~ s/\]//gi;
    
    # Count the amount and multiply it by itemspercase
    $amount = $amount * $bpc;
    foreach $sapd (@exportarr){
    
    # Clear export commas too.
    $sapd =~ s/,//gi;
    
    $sapd =~ s/\s+|//gi;
    $sapd =~ s/|\s+//gi;
    $sapd =~ s/\s\s\s+/ /gi;
    ($none, $type, $storagebin, $bt, $sec, $sut, $s5, $aterial, $stock,
    $astock, $bun, $til, $pastock, $pickq, $grdate, $pb, $rb, $ia, $lm,
    $sp, $sc) = split(/\|/, $sapd);
    
    
    if ($storagebin =~ /^\D\D\D\D/gi){
            next;
    }
    $lol = $stock;
    $stock = $stock / $bpc;
    $stockfo = $lol / $bpc;
    $amountfo = $amount / $bpc;
    $diff = $stockfo - $amountfo;
    if ($aterial == "$material"){
    
            if($storagebin =~ /^\D\d\d/){
            $hike = "\t\t\t\t\t\t\t\t\t\t $storagebin - $grdate - ($stockfo)\r\n";
            }
    
            if($storagebin =~ /^\D\D\D/){
    	if($diff < 0){ $diff = "0 ($diff)] [" . $diff * -1 . " Needed*"; }
            print "\r\n$storagebin - $desc | [QOH=$stock QGO=$amountfo RQ=$diff] \n";
            } else {
          	  print $hike;
            }
    }
    }
    }
    
    
    # Still don't tab, sorry men.
    I know this can be heavily optimized and I'm turning to the pros to give me some advice on this one. It doesn't have to be lightning quick but this one is way too slow. I can most likely work out the other bugs I'm dealing with after I get the speed going.

    Should I first parse the export.txt and remove all the not-needed fields/line before reading it into an array or is that not going to help much?

    Thank you all for any guidance.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,940
    Rep Power
    1225
    You should use a hash to store the data from items.txt using the item number as the key. Then as you loop over export.txt, you should use the split function to extract the fields and then do a simple hash lookup on the hash key.

    You should add these 2 pragmas, which should be in every Perl script you write, after the shebang line (line number 1) and fix the problems that they point out.
    Code:
    use strict;
    use warnings;
    There are many other improvements I can suggest, but start with that and then we'll move on to the other issues.

    Comments on this post

    • ishnid agrees
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    15
    Rep Power
    0
    Originally Posted by FishMonger
    You should use a hash to store the data from items.txt using the item number as the key. Then as you loop over export.txt, you should use the split function to extract the fields and then do a simple hash lookup on the hash key.

    You should add these 2 pragmas, which should be in every Perl script you write, after the shebang line (line number 1) and fix the problems that they point out.
    Code:
    use strict;
    use warnings;
    There are many other improvements I can suggest, but start with that and then we'll move on to the other issues.
    Thanks for the reply. I actually do use warnings when I'm having an issue. I don't use strict because I always **** up the "my" claims with arrays. Basically, a lack of experience with them but I will take it into consideration.

    I did notice I had several warnings so maybe I will go back and address these and see if that helps things speed-wise.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,940
    Rep Power
    1225
    I don't use strict because I always **** up the "my" claims with arrays.
    That's a very odd reason for not using strict. The strict pragma will point out a number of mistakes in your code which you could easily miss without its help and fixing those mistakes pushes you into writing better code.

    The code you're using here for those tasks is extremely questionable and is part of the reason why your script is running slow.

    If you really want to know which sections need to be optimized, you should use the Devel::NYTProf module to profile the script.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    15
    Rep Power
    0
    Originally Posted by FishMonger
    That's a very odd reason for not using strict. The strict pragma will point out a number of mistakes in your code which you could easily miss without its help and fixing those mistakes pushes you into writing better code.

    The code you're using here for those tasks is extremely questionable and is part of the reason why your script is running slow.

    If you really want to know which sections need to be optimized, you should use the module to profile the script.
    Lol I understand. Hence the double apology for my code.

    I don't perl much anymore. You notice I used == for compare, it worked because of numeric but overall, that's stored in my memory from php. I love eq.

    Using warnings I was able to remove them all except Use of Uninitialized value $stock and $storagebin in pattern match and $stock in pattern match.

    I might have to do this with sql and insert it all, might make my life easier. I just had 2 text files and thought I could complete it in perl pretty quick. Which, I did - I just didn't expect it to be this slow.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    835
    Rep Power
    496
    The real important issue Fishmonger pointed out in terms of performance is that you should use a hash rather than in an array to store the data of one of your files (presumably item.txt, using the item number as the key for the hash).

    Right now, if each of your file has, say, 1,000 lines, you are basically doing all the work of cleaning and splitting lines, retrieving the right data, etc., roughly a million times. If you first store the useful data from one of the files in a hash and then read the other file, you'll do the work only 2,000 times, so that will be about 500 times faster and probably take only a few seconds.

    There are other improvements to be made in terms of performance (for example, in your split, you can retrieve only the data that you need, discarding what you don't need), but the use of a hash is by far the single most important improvement to be made to make your script faster.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    15
    Rep Power
    0
    Originally Posted by Laurent_R
    The real important issue Fishmonger pointed out in terms of performance is that you should use a hash rather than in an array to store the data of one of your files (presumably item.txt, using the item number as the key for the hash).

    Right now, if each of your file has, say, 1,000 lines, you are basically doing all the work of cleaning and splitting lines, retrieving the right data, etc., roughly a million times. If you first store the useful data from one of the files in a hash and then read the other file, you'll do the work only 2,000 times, so that will be about 500 times faster and probably take only a few seconds.

    There are other improvements to be made in terms of performance (for example, in your split, you can retrieve only the data that you need, discarding what you don't need), but the use of a hash is by far the single most important improvement to be made to make your script faster.
    It makes sense in my mind, it really does, but that doesn't mean I know how to execute it. I'm not experienced with hashes and the 101s didn't seem to cover what I'm trying to do, or I'm missing it.

    Essentially I need to do the perl equivalent of:
    "Select * from table where itemnumber=$material"

    So, let's go back to hashing.

    I want to read each line from items.txt. I need to set each item number as a hashkey and loop through the export.txt to set the values. A few issues arise because I don't know hashes very well at all. How do I set multiple values to the same key and continue with my looping.

    [in foreach loop
    Item=1112234 ($material)
    desc=item1 ($desc)
    loc=R12121A ($location)
    ] ?

    %scanhash = ( $$material { "Item#" => $material, "desc" => $desc, "loc" => $location }); ?

    Now, once I have them into the hash, how do I only pull out the key info that I need? It should be 1 item number with several locations.

    Thanks
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,940
    Rep Power
    1225
    In items.txt are the item numbers unique or could you have multiple lines with the same item number?

    Please post as attachments a reasonable sample of both files so that we can have a better understanding of their format.
    Last edited by FishMonger; October 14th, 2012 at 12:17 PM.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    15
    Rep Power
    0
    Originally Posted by FishMonger
    In items.txt are the item numbers unique or could you have multiple lines with the same item number?

    Please post as attachments a reasonable sample of both files so that we can have a better understanding of their format.
    items.txt shouldn't have multiple occurences of the same item number but it may under unique circumstances. Originally the database DOES but I use SQL to build the textfile and I used a count(quantity) and group by location to count them all so there should only be one occurence of each item number.

    I can't attach yet so I will go back to the OP style items.txt:
    Code:
    0020339        [12]   AAABatteries                 CAA011A   12
    9010000        [4]   Large Easter Eggs              CAA011A   4
    9031484        [24]   HolidayTrinkets                 CAA011A   24
    9138655        [24]   Trinkets24perpack      CAA011A   24
    1222827        [126]   GIFT PAXX                 CDD214G   12
    Item number, amount going out, description, pick location, amount of items per box

    I have regex to make the delimiter 3 spaces and do the split, any split is reasonable as I control the build of items.txt

    As for export.txt, I don't control it, it's generated by another database program which I can only export a text file. It also has further requirements as follows export.txt:
    Code:
    | CP6|CAA011A   |  |001|PAL| |20339     |      444 |      444 |EA |    71|        0 |        0 |08/01/2012|  |  |  |  |10/12/2012|      |      |
    | CP6|CAA011A   |  |001|PAL| |9010000   |      268 |      268 |EA |    38|        0 |        0 |08/29/2012|  |  |  |  |10/11/2012|      |      |
    | CP6|CAA011A   |  |001|PAL| |9031484   |      216 |      216 |EA |    15|        0 |        0 |09/21/2012|  |  |  |  |10/12/2012|      |      |
    | CP6|CAA011A   |  |001|PAL| |9103893   |      318 |      318 |EA |     1|        0 |        0 |12/05/2011|  |  |  |  |10/12/2012|      |      |
    | CP6|CAA011A   |  |001|PAL| |9138655   |    1,104 |    1,104 |EA |    71|        0 |        0 |08/01/2012|  |  |  |  |10/11/2012|      |      |
    | RAP|P24591A   |R2|A02|PAL| |20339     |      660 |      660 |EA |    28|        0 |        0 |09/14/2012|  |  |  |  |10/11/2012|24    |      |
    See here I list all locations and their counts, I have to manually exclude the CAA (and similar) picklocations because they are where I pick the items from, not where I need to go find the product at. Also, the stock of export is done in items per box instead of boxes going out, hence all the double maths and division.

    The above, should produce CAA011A - 20339 - 67 - 4 - 25 -(negative if it is negative) P24591A (then more locations where the item number matches)

    Sorry it's even harder without attachments. I can probably upload a screenshot of what my output looks like.. standby.
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    15
    Rep Power
    0
    Keep in mind, my export has a lot more matching locations because it's several thousand lines.
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    15
    Rep Power
    0
    I think I figured some of this out, the global regex was doing a lot of unnecessary work... like a lot of other things I got going on. I started over from scratch and I'm noticing a huge improvement to regex the single variable after I get it instead of globally through the entire array.

    500 in 15 minutes went down to 1800 in 7 minutes.

    This is very reasonable for what I needed it to do.
  22. #12
  23. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,940
    Rep Power
    1225
    That still seems pretty slow to me, but if you're satisfied then that's ok with me.

    Unless you have some heavy parsing that you're not showing us, I'd expect it to process 18000 lines in a few seconds.
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    835
    Rep Power
    496
    Same idea, we haven't seen you data, but I would think that 7 minutes is still very very slow for the amount of data you are speaking about. To me, it should take a few seconds, or at most a few dozens seconds.
  26. #14
  27. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    15
    Rep Power
    0
    I don't doubt it could be much faster but 7-10 minutes is fine. Faster would be better, but chances are nobody is going to write that for me. In time I do plan to speed it up but I'm not sure how.

    I wish I could show you both the full data but it's confidential data.

    Export.txt is 2MB of
    Code:
    | RAP|P24591A   |R2|A02|PAL| |20339     |      660 |      660 |EA |    28|        0 |        0 |09/14/2012|  |  |  |  |10/11/2012|24    |      |
    lines

    items.txt is 130kb but it can range from 130kb all the way to 5MB, depending on how much data I pull with sql.

    Here's the new code:
    Code:
    #!/usr/bin/perl
    
    open(FILE,"<items.txt");
    @items = <FILE>;
    close(FILE);
    open(FILE2,"<export.txt");
    @export = <FILE2>;
    close(FILE2);
    open(FILE3,">daily_print.txt");
    print "Opening 'daily_print.txt' for write ... Please Wait\n";
    print "Generating ... standby\n";
    
    foreach $line (@items){
    #$line =~ s/\d\d\d\d\d\d\d\s\s\s\s\s\s\s\s(.*)\D\D/\ \ \ $1\ \ \ /gi;
    $line =~ s/\s\s\s\s+/\ \ \ /gi;
    ($itn, $qty, $desc, $pickloc, $bpc) = split("   ", $line);
    
    $qty =~ s/,//gi;
    $qty =~ s/\[//gi;
    $qty =~ s/\]//gi;
    $bpc =~ s/\s+//gi;
    #$pickloc =~ s/\s\s+/\ \ \ /gi;
    #$desc =~ s/\s\s\s+/\ \ \ /gi;
    
    foreach $sline (@export){
    ($un, $type, $location, $bt, $sec, $sut, $s, $material, $stock, $astock, $bun, $til, $ps, $pq, $grdate, $pb, $rb, $br, $ia, $lm, $sp, $sc) = split(/\|/, $sline);
    	$location =~ s/\s\s+//gi;
    	$stock =~ s/,//gi;
    	$stock =~ s/\s+//gi;
    	$stocker = $stock / $bpc;
    	$diff = $stocker - $qty;
    	if ($itn != $material){	next; }
    	if ($itn < 1) { next; }
    	if ($itn == $material){
    		if ($location =~ /^\D\D\D\D\D/g){ next; }
    		$qty =~ s/\]//g;
    		$qty =~ s/\[//g;
    		if ($location =~ /^\D\D\D\d/g){
    		$math2 = $stocker - $qty;
                    $tag2 = "Remaining";
    		if($math2 < 0) { $tag = " * * * * * * * * * *"; $tag2="Short";}
    		print FILE3 "\n$tag ($itn) $location $desc |[QOH=$stocker GoingOut=$qty $tag2=$math2]$tag\n\n"; $tag = ""; $tag2 = "Remaining";
    		} else {
    		$stockn = $stock / $bpc;
    		print FILE3 "\t\t\t\t\t $location - $grdate - ($stockn) \n";
    		}
    	}
    }
    $i = $i + 1;
    }
    print "$i done\n";
    close(FILE3);
    Forgive that regex too, I was testing manually for a character that found its way in there I couldn't find. I should go adjust that now.

    I do appreciate the suggestions of hashing though, it was me using hashes that made me realize I was regexing way too much to get one value. I edited 4 regex's and the speed increased a lot.
  28. #15
  29. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    835
    Rep Power
    496
    The main point is not whether you use a hash or some other data structure (although a hash has high chances of being faster), the main point is that you should most probably not use two nested foreach loops (or at least reduce considerably the amount of work you do in the inner loop by doing it only once beforehand instead of doing it millions of times).

    So, in other words, you should read one of the input files once at the beginning, do all the work of cleaning the lines, splitting, extracting the data and storing the part of the data you will need to use in an appropriate data structure (probably a hash) in memory, and then read the second file once, and process one line at a time of the second file with the help of the data you saved in the first phase.

    As I said, if each of your file has 1,000 lines, you will read in total 2,000 lines, instead of processing one million records because you process each of the records of the data used in the inner loop 1,000 times.
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo