I'm working with the following table: input_test input_test2 input_test3 ip_test ip_test2 ip_test3 ENSG00000000003.15 1 1 1 3 3 3 ENSG00000000457.14 2 2 2 1 1 1 ENSG00000000460.17 2 2 2 3 3 3 ENSG00000001036.14 3 3 3 4 4 4 ENSG00000001167.14 3 3 3 5 5 5 My goal is to make a new column called translational efficiency that divides

Divide multiple columns in pandas

I’m working with the following table:

	input_test	input_test2	input_test3	ip_test	ip_test2	ip_test3
ENSG00000000003.15	1	1	1	3	3	3
ENSG00000000457.14	2	2	2	1	1	1
ENSG00000000460.17	2	2	2	3	3	3
ENSG00000001036.14	3	3	3	4	4	4
ENSG00000001167.14	3	3	3	5	5	5

My goal is to make a new column called translational efficiency that divides the averaged ip columns by the averaged input columns, which I’m thinking should look like this:

	input_test	input_test2	input_test3	ip_test	ip_test2	ip_test3	translational_efficiency
ENSG00000000003.15	1	1	1	3	3	3	3
ENSG00000000457.14	2	2	2	1	1	1	0.5
ENSG00000000460.17	2	2	2	3	3	3	1.5
ENSG00000001036.14	3	3	3	4	4	4	1.3
ENSG00000001167.14	3	3	3	5	5	5	1.6

Thus far, I’ve created a script with the following arguments:

python translational_efficiency.py --in_matrix all_reads_matrix.csv --ip_files ip_test ip_test2 ip_test3 --input_files input_test input_test2 input_test3 --save_path /Users/ks/Desktop/

Notice that the --ip_files and --input_files arguments take in multiple columns as a list that reflect the columns listed in the table.

I was hoping to do something like this:

import argparse
import pandas as pd


def trans_eff(in_matrix, ip_files, input_files, save_path):

    # call in original table without translational efficiency column

    df = pd.read_csv(in_matrix, index_col=False)

    # divide ip files columns by input files column into new column called translational_efficiency

    df['translational_efficiency'] = df[ip_files[0]] / df[input_files[0]]

if __name__ == '__main__':

    parser = argparse.ArgumentParser(description='Make a new matrix that includes translational '
                                                 'efficiencies from merge_matrix.py')

    parser.add_argument("--in_matrix",
                        help='matrix made by merge_matrix.py')

    parser.add_argument("--ip_files",
                        help='all ip files in as a list',
                        action='append',
                        nargs='*')

    parser.add_argument("--input_files",
                        help='all input files in as a list',
                        action='append',
                        nargs='*')

    parser.add_argument("--save_path",
                        help='path to save')

    # parse out arguments

    args = parser.parse_args()

    # create csv with translational efficiencies

    trans_eff(args.in_matrix, args.ip_files, args.input_files, args.save_path)

However, this gives me the following error: ValueError: Wrong number of items passed 6, placement implies 1

Answer

you can grab columns by prefix using startswith string method

input_cols = [col for col in df.columns if col.startswith('input_test')]
ip_cols = [col for col in df.columns if col.startswith('ip_test')]

and calculate mean on axis=1 for those columns (for each row) and have a new column translational_efficiency by dividing these mean.

df['translational_efficiency'] =  df[ip_cols].mean(axis=1) / df[input_cols].mean(axis=1)

As you described in updated question and comment, i guess you can use df[ip_files] and df[input_files] directly as they have column names. You can ignore column name extraction part

df['translational_efficiency'] =  df[ip_files].mean(axis=1) / df[input_files].mean(axis=1)

Advertisement

Answer