Skip to content
Advertisement

Python/Pandas:How to process a column of data like a dictionary

i have a csv lie this

cel_id|PDCP.RxBytesUl
1001-1234-1|5QI1:0.0001;5QI2:0.0002;5QI3:0.0003;5QI4:0.0004;5QI5:0.0005;5QI6:0.0006;5QI7:0.0007;5QI8:0.0008;5QI9:0.0009
1001-1234-2|5QI1:0.0001;5QI2:0.0003;5QI3:0.0005;5QI4:0.0007;5QI5:0.0009;5QI6:0.0010;5QI7:0.0000;5QI8:0.0000;5QI9:0.0128
1001-1234-4|5QI1:0.0001;5QI2:0.0003;5QI3:0.0005;5QI4:0.0007;5QI5:0.0009;5QI6:0.0010;5QI7:0.0010;5QI8:0.0030;5QI9:0.0020

i would like to sum the values from column “PDCP.RxBytesUl”,

PDCP.RxBytesUl = 5QI1+5QI2+5QI3+5QI4+5QI5+5QI6+5QI7+5QI8+5QI9

finally,the result is like this

 cel_id      PDCP.RxBytesUl
1001-1234-1  0.0045
1001-1234-2  0.0163
1001-1234-4  0.0095

At first I wanted to convert this column into a dict(), but I found the format was not right, i have no idea, please help me, thank you

Advertisement

Answer

You can use Regex based solution:

df = pd.read_csv('input.csv',delimiter='|')

df['sum'] = df['PDCP.RxBytesUl'].str.extractall(':(d+(?:.d+)?)').astype('float').unstack().sum(axis=1)
df.drop('PDCP.RxBytesUl', axis=1, inplace=True)

df:

    cel_id      sum
0   1001-1234-1 0.0045
1   1001-1234-2 0.0163
2   1001-1234-4 0.0095

Better code Suggested by Shubham :)

df['sum'] = df['PDCP.RxBytesUl'].str.extractall(':([^;]+)').astype('float').sum(level=0)
df.drop('PDCP.RxBytesUl', axis=1, inplace=True)
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement