Pandas dataframe manipulation/re-sizing of a single-column count file

Question

I have a file that looks like this: I want to read this into a pandas dataframe and re-shape it so that it looks like this: Is this possible? If so, how? Notes: it will not always be this size, so the solution needs to be size-independent. The input file will be max ~200gRNAs x 20genes. There will be gRNA_somelettercombos,

Accepted Answer

You need to write a parser for your custom format, relying on the gRNA string to start a new group and then taking odd elements as key and even as value:d = {}current_rRNA = Nonegene = Nonewith open('gRNA.txt') as f:    for line in f:                    # iterate over lines        line = line.strip()        if not line:                  # skip blank lines            continue        if line.startswith('gRNA_'):  # start new group            current_rRNA = line            d[current_rRNA] = {}        else:            if gene:                  # even line of a group = data                d[current_rRNA][gene] = int(line)                gene = None            else:                     # odd line of a group = gene name                gene = linedf = pd.DataFrame.from_dict(d, orient='index')output:        gene_a  gene_b  gene_cgRNA_A  140626  227598  115781gRNA_B  125003  102000  200300

Advertisement

Answer