As the title suggests, I’m struggling to figure out how to make it so that a multiline block of text can fit in a single cell. As for some context to what I’m doing, I’m using Beautiful Soup to extract the mtDNA sequence along with other data on the site and put these values in a csv.
I’ve tried using str.strip('n')
to the text a single line, but that didn’t work and the text ended up flowing to the next row as well. Below is my code for my program.
import requests theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&extrafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=1000000' res = requests.get(theSequenceLink) dna_sequence = res.text.strip() #cleaning up the sequence split = 'genome' mtDNA_sequence = dna_sequence.partition(split)[2] #you can ignore the genbank and haplogroup stuff f.write(genbank_ID + ", " + haplogroup.replace(",", "|") + ", " + mtDNA_sequence + "n")
Any help towards solving this would be much appreciated.
Advertisement
Answer
The problem is the dna sequence has newline characters in it. So, you will have to replace the newline characters.
import requests theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&ext rafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=10 00000' res = requests.get(theSequenceLink) dna_sequence = res.text.strip() #cleaning up the sequence split = 'genome' mtDNA_sequence = dna_sequence.partition(split)[2].strip().replace("n","") f = open("a.csv","w") genbank_ID = "hi" haplogroup = "world" #you can ignore the genbank and haplogroup stuff f.write(genbank_ID + ", " + haplogroup.replace(",", "|") + ", "" + mtDNA_sequence + ""n") f.close()