I have a file that looks like this:
JavaScript
x
15
15
1
gRNA_A
2
gene_a
3
140626
4
gene_b
5
227598
6
gene_c
7
115781
8
gRNA_B
9
gene_a
10
125003
11
gene_b
12
102000
13
gene_c
14
200300
15
I want to read this into a pandas dataframe and re-shape it so that it looks like this:
JavaScript
1
4
1
gene_a gene_b gene_c
2
gRNA_A 140626 227598 115781
3
gRNA_B 125003 102000 200300
4
Is this possible? If so, how?
Notes: it will not always be this size, so the solution needs to be size-independent. The input file will be max ~200gRNAs x 20genes. There will be gRNA_somelettercombos, but the gene will not be named gene_lettercombo– the gene will be the name of an actual gene (like GAPDH, ACTB, etc.).
Advertisement
Answer
You need to write a parser for your custom format, relying on the gRNA
string to start a new group and then taking odd elements as key and even as value:
JavaScript
1
21
21
1
d = {}
2
current_rRNA = None
3
gene = None
4
5
with open('gRNA.txt') as f:
6
for line in f: # iterate over lines
7
line = line.strip()
8
if not line: # skip blank lines
9
continue
10
if line.startswith('gRNA_'): # start new group
11
current_rRNA = line
12
d[current_rRNA] = {}
13
else:
14
if gene: # even line of a group = data
15
d[current_rRNA][gene] = int(line)
16
gene = None
17
else: # odd line of a group = gene name
18
gene = line
19
20
df = pd.DataFrame.from_dict(d, orient='index')
21
output:
JavaScript
1
4
1
gene_a gene_b gene_c
2
gRNA_A 140626 227598 115781
3
gRNA_B 125003 102000 200300
4