Skip to content
Advertisement

Calculate RMS, Count, SUM to array inside all columns of pandas dataframe

I would like to calculate RMS, Count, SUM to array inside all columns of pandas dataframe and then fulfill outputs into new three dataframes as shown below

enter image description here

P.S > solution should deal with N numbers of columns, in my case, I have around 300 columns x,y,z,a,b,c ……. etc …… N

ID x y z ….. EF407412 [471, 1084, 1360, 2284] [1408, 1572, 2277] [977, 1003, 1493, 1519, 1650, 1676, 2804] ….. KM043272 [2173] [1293, 2354,] [1200] ….. KM043273 ….. ….. ….. ….. ….. ….. ….. ….. …..

Dataframe_RMS
ID x y z ….. EF407412 1454.5749 1792.4263 1685.6893 ….. KM043272 2173 1899.1004 1200 ….. KM043273 ….. ….. ….. ….. ….. ….. ….. ….. …..

Dataframe_Count
ID x y z ….. EF407412 4 3 7 ….. KM043272 1 2 1 ….. KM043273 ….. ….. ….. ….. ….. ….. ….. ….. …..

Advertisement

Answer

Updating answer as per the OP’s comment – for any number of Columns Check Below code:

import pandas as pd 
from ast import literal_eval
import numpy as np 

df = pd.DataFrame({'ID':['EF407412','KM043272']
                   , 'x': ['[471, 1084, 1360, 2284]','[2173]']
                   , 'y': ['[1408, 1572, 2277]','[1293, 2354,]']
                   , 'z': ['[977, 1003, 1493, 1519, 1650, 1676, 2804]','[1200]']} )

col_num = df.shape[1]

COUNT

df[[i+"_count" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1].
                                                    format( *(tuple([len(literal_eval(x[col])) for col in df.columns[1:col_num]] ))),axis=1).
                                                    astype('str').str.split(',', expand=True).values

df[['ID']+([ col for col in df.columns if col.endswith('count')])]

OUTPUT:

enter image description here

SUM

df[[i+"_sum" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1].
                                                    format( *(tuple([sum(literal_eval(x[col])) for col in df.columns[1:col_num]] ))),axis=1).
                                                    astype('str').str.split(',', expand=True).values

df[['ID']+([ col for col in df.columns if col.endswith('sum')])]

Output:

enter image description here

RMS

df[[i+"_rms" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1].
                                                    format( *(tuple([np.sqrt(np.mean(np.square(literal_eval(x[col])))) for col in df.columns[1:col_num]] ))),axis=1).
                                                    astype('str').str.split(',', expand=True).values

df[['ID']+([ col for col in df.columns if col.endswith('rms')])]

Output:

enter image description here

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement