#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    22
    Rep Power
    0

    Problem taking many input files


    Hi all,

    In the following script, one fasta input file (goodProteins.fasta) and many second input files (listnnnn.txt) are taken as:


    Code:
    def process(wanted_file, result_file):
        fasta_file = "goodProteins.fasta" # First input 
    
        wanted = set()
        with open(wanted_file) as f:
            for line in f:
                line = line.strip()
                if line != "":
                    wanted.add(line)
    
        fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
        with open(result_file, "w") as f:
            for seq in fasta_sequences:
                if seq.id in wanted:
                    SeqIO.write([seq], f, "fasta")
    
    for i in range(100):
        wanted_file = "list" + str(i) + ".txt"
        result_file = "gene" + str(i) + ".txt"
        process(wanted_file, result_file)


    this script works fine if the number of the list file is continuous incremental no. (eg. list1.txt, list2.txt etc) .. but actually my list file numbers are large and do not follow any specific pattern (such as list33488.txt, list2781.txt etc).. plz suggest..
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    /dev/null
    Posts
    163
    Rep Power
    19
    Originally Posted by utpalmtbi
    this script works fine if the number of the list file is continuous incremental no. (eg. list1.txt, list2.txt etc) .. but actually my list file numbers are large and do not follow any specific pattern (such as list33488.txt, list2781.txt etc).. plz suggest..
    You may use the glob module to fill an array with all the "list*" files you need.

    Code:
    import glob
    listFilesArr = glob.glob('/home/dir/list*txt')
    for file in listFilesArr:
        <statements>
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    22
    Rep Power
    0
    I change the code as follows:

    Code:
    from Bio import SeqIO
    import glob
    def process(wanted_file, result_file):
        fasta_file = "goodProteins.fasta" # First input 
    
        wanted = set()
        with open(wanted_file) as f:
            for line in f:
                line = line.strip()
                if line != "":
                    wanted.add(line)
    
        fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
        with open(result_file, "w") as f:
            for seq in fasta_sequences:
                if seq.id in wanted:
                    SeqIO.write([seq], f, "fasta")
    
    listFilesArr = glob.glob('list*txt')
    for file in listFilesArr:
    wanted_file = "list*.txt"
    result_file = "gene*.txt"
    process(wanted_file, result_file)
    and it gives following error

    Code:
    File "script.py", line 25
        wanted_file = "list*.txt"
                  ^
    IndentationError: expected an indented block
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    438
    Rep Power
    67
    Originally Posted by utpalmtbi
    I change the code as follows:

    Code:
    for file in listFilesArr:
    wanted_file = "list*.txt"
    and it gives following error

    Code:
    File "script.py", line 25
        wanted_file = "list*.txt"
                  ^
    IndentationError: expected an indented block
    Isnít the error message self-explanatory? The line before your line 25 (doing quick math: 24) starts a for loop and expects an indented block.
    My armada: openSUSE 13.1 (home desktop, home laptop), Crunchbang Linux 11 (work laptop), Trisquel GNU/Linux 6.0.1 (mini laptop), Ubuntu 14.04 LTS (server), Android 4.2.1 (tablet), Windows 7 Ultimate (testbed)
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    /dev/null
    Posts
    163
    Rep Power
    19
    Along with the indentation error, when you do wanted_file = "list*.txt", you are storing "list*.txt" in the literal sense. Here asterisk * does not act as a wildcard, the way it does on a UNIX shell. You have to do a bit of string processing:

    Code:
    listFilesArr = glob.glob('list*txt')
    for wanted_file in listFilesArr:
        result_file = "gene" + wanted_file[4:-4] + ".txt" # assuming the file name is "list<some_number>.txt"
        process(wanted_file, result_file)

    Comments on this post

    • utpalmtbi agrees
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    22
    Rep Power
    0
    Got it.. Thank u

IMN logo majestic logo threadwatch logo seochat tools logo