#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    4
    Rep Power
    0

    Perl sript needed for comparison of two text files !


    Hi Guys

    can anybody help me on this.

    File1.dat

    London|AC132|LAV|456|F
    London|VA653|MAC|500|A
    Newyork|GC496|LKM|0|U

    Location ID entity Amt flag
    London AC132 LAV 456 F
    London VA653 MAC 500 A
    Newyork GC496 LKM 0 U


    file2.dat

    London|AC132|XXX|400|A
    London|VA653|XXX|500|A
    Newyork|GC496|XXX|100|U


    Location ID entity Amt flag
    London AC132 XXX 400 A
    London VA653 XXX 500 A
    Newyork GC496 XXX 100 U

    II column is unique.

    now the output should be as below.

    Location Column Count
    London entity 2
    London Amt 1
    London Flag 1
    Newyork entity 1
    Newyork Amt 1

    Can anbody provide me with the perl script.
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Location
    spaceBAR Central
    Posts
    229
    Rep Power
    42
    Don't understand, If your pipe symbol(|) delimited files have fields(location, id, entity, amt, flag):
    Code:
    ***file1.dat
    location id    entity amt flag
    London   AC132 LAV    456 F
    London   VA653 MAC    500 A
    Newyork  GC496 LKM      0 U
    
    ***file2.dat
    location id    entity amt flag
    London   AC132 XXX    400 A
    London   VA653 XXX    500 A
    Newyork  GC496 XXX    100 U
    What do you mean by 'comparison of two text files' to come up with this output?
    Code:
    location column count
    London   entity     2
    London   Amt        1
    London   Flag       1
    Newyork  entity     1
    Newyork  Amt        1
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    830
    Rep Power
    496
    Same question. I fail to understand how comparing the two input files gives you the presented output file.

    Please explain what you are looking for in these two files that you want to report in the output.

    Also, if you started to code something, please show it to us, it might help putting us on the right track to help you.
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    4
    Rep Power
    0
    Pipe delimited files doesnt contain location, id, entity, amt, flag.
    The values inside the file are meant to be location, id, entity, amt, flag respectively.

    The two files are different...highlighted as underlined!
    file1.dat

    London|AC132|LAV|456|F
    London|VA653|MAC|500|A
    Newyork|GC496|LKM|0|U

    file2.dat

    London|AC132|XXX|400|A
    London|VA653|XXX|500|A
    Newyork|GC496|XXX|100|U

    I need the output on the differences mentioned in III,IV,V column which are nothing but entity, amt ,flag respectively.

    the differences must be grouped together on the basis of location.....means for location=london, entity column has two differences.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    4
    Rep Power
    0
    II column is my primary key.

    Basically i am having two files which will be having different counts. I need to join them on II column n then find the differences.

    Two files contain millions of data......

    Let me know if any other info is needed.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,930
    Rep Power
    1225
    This is not a code writing service. We will assist you in troubleshooting your code, but we will not write the script for you.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    830
    Rep Power
    496
    Alright, it is clear now. Your 2 files have a unique identifier, you want to join them on it and pick up whatver differences there are on the other fields. That sounds relatively easy, subject to my final remark below.

    As Fishmonger pointed out, this forum is not a code writing service, but I am certainly willing to put you on the right track or at least to give you precise ideas on how to try so solve it.

    Originally Posted by akshay02
    Two files contain millions of data.......
    Well, can you be a bit more specific on the exact data volumes? Because there is an easy algorithm if one file will fit entirely into memory, probably less than a dozen lines of code for the actual comparison work. If, however, the files are too big for one of them to fit in the memory, then it might get somewhat more complicated. In my experience, the typical limit (depending on the file, the computer and other things) is that one or two million lines will probably not be a problem, 20 million lines or more will probably start to be a problem, between these two values, it is your call, depending on the line size, your computer hardware and configuration, etc.
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    4
    Rep Power
    0
    Originally Posted by Laurent_R
    Alright, it is clear now. Your 2 files have a unique identifier, you want to join them on it and pick up whatver differences there are on the other fields. That sounds relatively easy, subject to my final remark below.

    As Fishmonger pointed out, this forum is not a code writing service, but I am certainly willing to put you on the right track or at least to give you precise ideas on how to try so solve it.



    Well, can you be a bit more specific on the exact data volumes? Because there is an easy algorithm if one file will fit entirely into memory, probably less than a dozen lines of code for the actual comparison work. If, however, the files are too big for one of them to fit in the memory, then it might get somewhat more complicated. In my experience, the typical limit (depending on the file, the computer and other things) is that one or two million lines will probably not be a problem, 20 million lines or more will probably start to be a problem, between these two values, it is your call, depending on the line size, your computer hardware and configuration, etc.
    Approx 1 million......give me a basic structure or plan to do it......note: the count of two files will not be same.......

IMN logo majestic logo threadwatch logo seochat tools logo