Skip to content
Advertisement

Read file contents in python conditionally [closed]

I am trying to read one chromosome sequence from a genome file in python. The format of the genome file is like the following but with more lines of sequence for each chromosome:

Chr1

ATCGTGTGATGGTGCGTAGATGCTGAT

GCTGATGTGTCGAGCGATGCTGAGTCG

Chr2

TGCGTGATGCTGAGCGATGCTGATGCT

TAGCTGACCACACACCTGTTTTGTAGG

Chr3

CAGTCGTAGCGATGCTGATGATGCTGA

GGTTGGTTGGCGGACCACCATTACTAT

I use the following code to read the whole genome sequence. However, I just want the sequence of one chromosome (e.g. whole sequence of Chr2). Rather than reading the whole genome, then searching the pattern for Chr2, is there any other way I could do this?

Thank you

   with open("genome.txt") as f:
       for line in f:
           genome.append(line.rstrip())

Advertisement

Answer

Open the file and read line by line until you find ‘Chr2’.

Consume all non-empty lines until you reach EOF or any line beginning with ‘Chr’

def getgenomes(gfile):
    g = []
    for line in gfile:
        if line.startswith('Chr'):
            break
        if (line := line.strip()):
            g.append(line)
    return g

with open('genome.txt', encoding='utf-8') as gfile:
    genomes = None
    for line in gfile:
        if line.startswith('Chr2'):
            genomes = getgenomes(gfile)
            break
    print(genomes)

output:

['TGCGTGATGCTGAGCGATGCTGATGCT', 'TAGCTGACCACACACCTGTTTTGTAGG']
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement