Skip to content
Advertisement

Adding multi-line text to a single cell in a csv after _scraping_ a site

As the title suggests, I’m struggling to figure out how to make it so that a multiline block of text can fit in a single cell. As for some context to what I’m doing, I’m using Beautiful Soup to extract the mtDNA sequence along with other data on the site and put these values in a csv.

I’ve tried using str.strip('n') to the text a single line, but that didn’t work and the text ended up flowing to the next row as well. Below is my code for my program.

import requests

theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&extrafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=1000000'
res = requests.get(theSequenceLink)
dna_sequence = res.text.strip()

#cleaning up the sequence
split = 'genome'
mtDNA_sequence = dna_sequence.partition(split)[2]

#you can ignore the genbank and haplogroup stuff
f.write(genbank_ID + ", " + haplogroup.replace(",", "|") + ", " + mtDNA_sequence + "n")

Any help towards solving this would be much appreciated.

Advertisement

Answer

The problem is the dna sequence has newline characters in it. So, you will have to replace the newline characters.

import requests
theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&ext
rafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=10
00000'
res = requests.get(theSequenceLink)
dna_sequence = res.text.strip()

#cleaning up the sequence
split = 'genome'
mtDNA_sequence = dna_sequence.partition(split)[2].strip().replace("n","")

f = open("a.csv","w")
genbank_ID = "hi"
haplogroup = "world"

#you can ignore the genbank and haplogroup stuff
f.write(genbank_ID + ", " + haplogroup.replace(",", "|") + ", "" + mtDNA_sequence + ""n")
f.close()
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement