Context: I’m trying to sum all values based in a list only if they start with or contain a string
So with a config file like this:
JavaScript
x
5
1
{
2
'exclude_granularity':True,
3
'granularity_suffix_list':['A','B']
4
}
5
And a dataframe like this:
JavaScript
1
2
1
tt = pd.DataFrame({'A_2':[1,2,3],'A_3':[3,4,2],'B_4':[5,2,1],'B_1':[8,2,1],'C_3':[2,4,2})
2
How can I group by if they all start by a given substring present on the granularity_suffix_list?
Desired output:
JavaScript
1
5
1
A B C_3
2
0 4 13 2
3
1 6 4 4
4
2 5 2 2
5
Attempts: I was trying this:
JavaScript
1
5
1
if exclude_granularity == True:
2
def correct_categories(cols):
3
return [cat if col.startswith(cat) else col for col in cols for cat in granularity_suffix_list]
4
df= df.groupby(correct_categories(df.columns),axis=1).sum()
5
But It doesn’t work. Instead, the function returns a list like ['A_2','A','A_3','A',B_4','B'...]
Thank you
Advertisement
Answer
Okay finally managed to solve what I wanted
Posting the solution if anyone finds it relevant
JavaScript
1
17
17
1
tt = pd.DataFrame({'A_2':[1,2,3],'A_3':[3,4,2],'B_4':[5,2,1],'B_1':[8,2,1],'C_3':[2,4,2]})
2
3
granularity_suffix_list = ['A','B']
4
5
def correct_categories(cols_to_aggregate):
6
lst = []
7
for _, column in enumerate(cols_to_aggregate):
8
if not column.startswith(tuple(granularity_suffix_list)):
9
lst.append(column)
10
else:
11
lst.append(granularity_suffix_list[
12
[i for i, w in enumerate(granularity_suffix_list) if column.startswith(w)][0]
13
])
14
return lst
15
16
df = tt.groupby(correct_categories(tt.columns),axis=1).sum()
17