i have a csv lie this
cel_id|PDCP.RxBytesUl 1001-1234-1|5QI1:0.0001;5QI2:0.0002;5QI3:0.0003;5QI4:0.0004;5QI5:0.0005;5QI6:0.0006;5QI7:0.0007;5QI8:0.0008;5QI9:0.0009 1001-1234-2|5QI1:0.0001;5QI2:0.0003;5QI3:0.0005;5QI4:0.0007;5QI5:0.0009;5QI6:0.0010;5QI7:0.0000;5QI8:0.0000;5QI9:0.0128 1001-1234-4|5QI1:0.0001;5QI2:0.0003;5QI3:0.0005;5QI4:0.0007;5QI5:0.0009;5QI6:0.0010;5QI7:0.0010;5QI8:0.0030;5QI9:0.0020
i would like to sum the values from column “PDCP.RxBytesUl”,
PDCP.RxBytesUl = 5QI1+5QI2+5QI3+5QI4+5QI5+5QI6+5QI7+5QI8+5QI9
finally,the result is like this
cel_id PDCP.RxBytesUl 1001-1234-1 0.0045 1001-1234-2 0.0163 1001-1234-4 0.0095
At first I wanted to convert this column into a dict(), but I found the format was not right, i have no idea, please help me, thank you
Advertisement
Answer
You can use Regex based solution:
df = pd.read_csv('input.csv',delimiter='|') df['sum'] = df['PDCP.RxBytesUl'].str.extractall(':(d+(?:.d+)?)').astype('float').unstack().sum(axis=1) df.drop('PDCP.RxBytesUl', axis=1, inplace=True)
df:
cel_id sum 0 1001-1234-1 0.0045 1 1001-1234-2 0.0163 2 1001-1234-4 0.0095
Better code Suggested by Shubham :)
df['sum'] = df['PDCP.RxBytesUl'].str.extractall(':([^;]+)').astype('float').sum(level=0) df.drop('PDCP.RxBytesUl', axis=1, inplace=True)