I would like to calculate RMS, Count, SUM to array inside all columns of pandas dataframe and then fulfill outputs into new three dataframes as shown below
P.S > solution should deal with N numbers of columns, in my case, I have around 300 columns x,y,z,a,b,c ……. etc …… N
ID x y z ….. EF407412 [471, 1084, 1360, 2284] [1408, 1572, 2277] [977, 1003, 1493, 1519, 1650, 1676, 2804] ….. KM043272 [2173] [1293, 2354,] [1200] ….. KM043273 ….. ….. ….. ….. ….. ….. ….. ….. …..
Dataframe_RMS
ID x y z …..
EF407412 1454.5749 1792.4263 1685.6893 …..
KM043272 2173 1899.1004 1200 …..
KM043273 ….. ….. ….. …..
….. ….. ….. ….. …..
Dataframe_Count
ID x y z …..
EF407412 4 3 7 …..
KM043272 1 2 1 …..
KM043273 ….. ….. ….. …..
….. ….. ….. ….. …..
Advertisement
Answer
Updating answer as per the OP’s comment – for any number of Columns Check Below code:
import pandas as pd from ast import literal_eval import numpy as np df = pd.DataFrame({'ID':['EF407412','KM043272'] , 'x': ['[471, 1084, 1360, 2284]','[2173]'] , 'y': ['[1408, 1572, 2277]','[1293, 2354,]'] , 'z': ['[977, 1003, 1493, 1519, 1650, 1676, 2804]','[1200]']} ) col_num = df.shape[1]
COUNT
df[[i+"_count" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1]. format( *(tuple([len(literal_eval(x[col])) for col in df.columns[1:col_num]] ))),axis=1). astype('str').str.split(',', expand=True).values df[['ID']+([ col for col in df.columns if col.endswith('count')])]
OUTPUT:
SUM
df[[i+"_sum" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1]. format( *(tuple([sum(literal_eval(x[col])) for col in df.columns[1:col_num]] ))),axis=1). astype('str').str.split(',', expand=True).values df[['ID']+([ col for col in df.columns if col.endswith('sum')])]
Output:
RMS
df[[i+"_rms" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1]. format( *(tuple([np.sqrt(np.mean(np.square(literal_eval(x[col])))) for col in df.columns[1:col_num]] ))),axis=1). astype('str').str.split(',', expand=True).values df[['ID']+([ col for col in df.columns if col.endswith('rms')])]
Output: