i have a csv lie this
JavaScript
x
5
1
cel_id|PDCP.RxBytesUl
2
1001-1234-1|5QI1:0.0001;5QI2:0.0002;5QI3:0.0003;5QI4:0.0004;5QI5:0.0005;5QI6:0.0006;5QI7:0.0007;5QI8:0.0008;5QI9:0.0009
3
1001-1234-2|5QI1:0.0001;5QI2:0.0003;5QI3:0.0005;5QI4:0.0007;5QI5:0.0009;5QI6:0.0010;5QI7:0.0000;5QI8:0.0000;5QI9:0.0128
4
1001-1234-4|5QI1:0.0001;5QI2:0.0003;5QI3:0.0005;5QI4:0.0007;5QI5:0.0009;5QI6:0.0010;5QI7:0.0010;5QI8:0.0030;5QI9:0.0020
5
i would like to sum the values from column “PDCP.RxBytesUl”,
PDCP.RxBytesUl = 5QI1+5QI2+5QI3+5QI4+5QI5+5QI6+5QI7+5QI8+5QI9
finally,the result is like this
JavaScript
1
5
1
cel_id PDCP.RxBytesUl
2
1001-1234-1 0.0045
3
1001-1234-2 0.0163
4
1001-1234-4 0.0095
5
At first I wanted to convert this column into a dict(), but I found the format was not right, i have no idea, please help me, thank you
Advertisement
Answer
You can use Regex based solution:
JavaScript
1
5
1
df = pd.read_csv('input.csv',delimiter='|')
2
3
df['sum'] = df['PDCP.RxBytesUl'].str.extractall(':(d+(?:.d+)?)').astype('float').unstack().sum(axis=1)
4
df.drop('PDCP.RxBytesUl', axis=1, inplace=True)
5
df:
JavaScript
1
5
1
cel_id sum
2
0 1001-1234-1 0.0045
3
1 1001-1234-2 0.0163
4
2 1001-1234-4 0.0095
5
Better code Suggested by Shubham :)
JavaScript
1
3
1
df['sum'] = df['PDCP.RxBytesUl'].str.extractall(':([^;]+)').astype('float').sum(level=0)
2
df.drop('PDCP.RxBytesUl', axis=1, inplace=True)
3