I have a file that looks like this:
gRNA_A gene_a 140626 gene_b 227598 gene_c 115781 gRNA_B gene_a 125003 gene_b 102000 gene_c 200300
I want to read this into a pandas dataframe and re-shape it so that it looks like this:
gene_a gene_b gene_c gRNA_A 140626 227598 115781 gRNA_B 125003 102000 200300
Is this possible? If so, how?
Notes: it will not always be this size, so the solution needs to be size-independent. The input file will be max ~200gRNAs x 20genes. There will be gRNA_somelettercombos, but the gene will not be named gene_lettercombo– the gene will be the name of an actual gene (like GAPDH, ACTB, etc.).
Advertisement
Answer
You need to write a parser for your custom format, relying on the gRNA
string to start a new group and then taking odd elements as key and even as value:
d = {} current_rRNA = None gene = None with open('gRNA.txt') as f: for line in f: # iterate over lines line = line.strip() if not line: # skip blank lines continue if line.startswith('gRNA_'): # start new group current_rRNA = line d[current_rRNA] = {} else: if gene: # even line of a group = data d[current_rRNA][gene] = int(line) gene = None else: # odd line of a group = gene name gene = line df = pd.DataFrame.from_dict(d, orient='index')
output:
gene_a gene_b gene_c gRNA_A 140626 227598 115781 gRNA_B 125003 102000 200300