Perl Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPerl Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old November 12th, 2012, 11:25 PM
manigrover manigrover is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2012
Posts: 60 manigrover User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 10 h 42 m 3 sec
Reputation Power: 2
Find common entries in first column and fetch whatever in fornt of it

Hi all

I have 2 files

one is like this

Quote:
PTGS2
IL2RB
IGF1R
CALR
ABCC1
RET
ABCB4
MMP2
ERBB4
TP53
IL7R
PIK3CG
SYK
IL9
CNTFR
SLC6A2
PDGFRA
PRLR


Second is like this
Quote:
CALR Antigen processing and presentation CPSab CALR Antigen processing and presentation CPSab ü tttt n 19p13.13c
KIR2DL5A Antigen processing and presentation CPSab KIR2DL5A ü tttt n 19p13.13
KIR2DS1 Antigen processing and presentation CPSab KIR2DS1 ü tttt n 19q13.4
KIR2DS2 Antigen processing and presentation CPSab KIR2DS2 ü tttt n 19q13.4
KIR2DS3 Antigen processing and presentation CPSab KIR2DS3 ü tttt n 19q13.4
KIR2DS5 Antigen processing and presentation CPSab KIR2DS5 ü tttt n 19q13.4
PSME1 Antigen processing and presentation CPSab PSME1 ü tttt n 14q12a
PSME2 Antigen processing and presentation CPSab PSME2 ü tttt n 14q12a
PTK2 Aspirin Blocks Signaling Pathway Involved in Platelet Activation CPSab PTK2 Aspirin Blocks Signaling Pathway Involved in Platelet Activation CPSab ü tt n 8q24.3c
SYK Aspirin Blocks Signaling Pathway Involved in Platelet Activation CPSab SYK ü tt n 9q22.2b
PIK3C2G CCR3 signaling in Eosinophils CPS CPSab PIK3C2G CCR3 signaling in Eosinophils CPSab ü t n 12p12.3b
PTK2 CCR3 signaling in Eosinophils CPS CPSab PTK2 ü t n 8q24.3c
CHUK CD40L Signaling Pathway CPSab CHUK CD40L Signaling Pathway CPSab ü ü tttt n 10q24.2c
DUSP1 CD40L Signaling Pathway CPSab DUSP1 ü ü ü tttt n 5q35.1e
IKBKAP CD40L Signaling Pathway CPSab IKBKAP ü ü ü ü tttt n 9q31.3a
MAP3K1 CD40L Signaling Pathway CPSab MAP3K1 ü ü ü ü ü tttt n 5q11.2f
TRAF6 CD40L Signaling Pathway CPSab TRAF6 ü ü ü ü ü tttt n 11p12d
CCNE1 CDK Regulation of DNA Replication C CPSab CCNE1 CDK Regulation of DNA Replication CPSab ü ü ü tttt n 19q12c
KITLG CDK Regulation of DNA Replication C CPSab KITLG ü ü ü tttt n 12q21.32a
MCM5 CDK Regulation of DNA Replication C CPSab MCM5 ü ü tttt n 22q12.3c
ORC4L CDK Regulation of DNA Replication C CPSab ORC4L ü ü ü tttt n 2q23.1a
PIK3C2G CXCR4 Signaling Pathway CPS CPSab PIK3C2G CXCR4 Signaling Pathway CPSab


I have to check if there if there is any entry common between first file and first column of second file then I have to fetch whatever is present in front of it from second file

so if CALR is common then output is

Quote:
CALR Antigen processing and presentation CPSab CALR Antigen processing and presentation CPSab ü tttt n 19p13.13c


Please let me know perl scripting regarding to help one of my friend.

Reply With Quote
  #2  
Old November 13th, 2012, 02:56 AM
Laurent_R Laurent_R is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jun 2012
Posts: 502 Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 4 Days 18 h 51 m 13 sec
Reputation Power: 385
Hi,

The way to do it really depends of the size (in terms of number of lines)) of each file. Depending on whether one has many more lines than the other, the algorithm may differ.

In almost all cases, though, I think that the first thing to do with this type of problem is to read the first file line by line, chomp each line and store each line as a key in a hash (the associated values don't really matter, could be 1 for each hash entry.

Then, you read the second file line by line and for each line test it against each hash entry. The way to do it may differ depending on various factors pertaining to the data: relative size of the files, size of each line in the second file, volume of data (i.e. do you want to optimize for code simplicity or for speed and performance), etc. and also the Perl version you are using. You could use:
- Regular expressions to find a match and capture whatever is before the match in the line
- Index and substr function
- Possibly the smart match (if your Perl version allows it)

Another possible approach may be to use the List::Utils (and/or possibly List::More::Utils)) modules to compare the list of words in the first file and the list of words in each line of the second file.

Reply With Quote
  #3  
Old November 13th, 2012, 03:07 AM
manigrover manigrover is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2012
Posts: 60 manigrover User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 10 h 42 m 3 sec
Reputation Power: 2
Hi

Thanks for reply.

Yes, the second file is larger but follow the same pattern as sample presented here.

But first file is small and this much only which I presented

Initially I tried one code in unix which worked for very small sample to certain extent but not for large original data so

here is the code in shell which I tried:

Quote:
awk 'NR==FNR{X[$1]=$0;next}{n=split($1,P," ");sub($1,"",$0);for(i=1;i<=n;i++){if(X[P[i]]){print P[i],$0}}}' file1 FS="\t" file2


Now taking help in from perl!

Reply With Quote
  #4  
Old November 13th, 2012, 08:37 AM
FishMonger FishMonger is offline
Contributing User
Dev Shed Intermediate (1500 - 1999 posts)
 
Join Date: Apr 2009
Posts: 1,645 FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level)FishMonger User rank is General 3rd Grade (Above 100000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 1 Day 21 h 31 m 24 sec
Reputation Power: 1170
What have you tried?

How big are the files?

Is there a possibility of one or both of the files having duplicate entries in the first column?

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPerl Programming > Find common entries in first column and fetch whatever in fornt of it

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap