I have a dataframe that looks like that :
JavaScript
x
5
1
data = {'Place A':['New York', 'Oslo', 'Oslo'], 'Place B':['Paris','London','London'], 'Type':['A','B','A'], 'Number' : [34,42,24]}
2
df = pd.DataFrame(data)
3
print(df)
4
5
Place A | Place B | Type | Number |
---|---|---|---|
New York | Paris | A | 34 |
Oslo | London | B | 42 |
Oslo | London | A | 24 |
i need to have the percentage number of each type according to the routes.
I don’t know witch command to use to get a dataframe that looks like this
xxx | Paris | Oslo | London | New York |
---|---|---|---|---|
Paris | NaN | NaN | NaN | type A : 100% |
Oslo | NaN | NaN | type A : 36,3% type B : 63,6% | NaN |
London | NaN | type A : 36,3% type B : 63,6% | NaN | NaN |
New York | type A : 100% | NaN | NaN | NaN |
Advertisement
Answer
IUUC, you could reshape the data, then compute the percent per group, finally reshape again:
JavaScript
1
15
15
1
cities = set([*df['Place A'], *df['Place B']])
2
3
df['total'] = df.groupby(['Place A', 'Place B'])['Number'].transform('sum')
4
df['percent'] = df['Number']/df['total']
5
df['value'] = df['Type']+': '+df['percent'].round(2).astype(str)
6
7
df2 = (df.groupby(['Place A', 'Place B'])['value']
8
.agg(', '.join).unstack(1)
9
.rename_axis(index=None, columns=None)
10
.reindex(index=cities, columns=cities)
11
)
12
13
# make output symmetrical
14
df2.update(df2.T)
15
output:
JavaScript
1
6
1
Oslo New York Paris London
2
Oslo NaN NaN NaN B: 0.64, A: 0.36
3
New York NaN NaN A: 1.0 NaN
4
Paris NaN A: 1.0 NaN NaN
5
London B: 0.64, A: 0.36 NaN NaN NaN
6
If you don’t need the symmetrical output, you can keep it by PlaceA/PlaceB, which will have the same information in condensed form:
JavaScript
1
2
1
df2 = df.groupby(['Place A', 'Place B'])['value'].agg(', '.join).unstack(1)
2
output:
JavaScript
1
5
1
Place B London Paris
2
Place A
3
New York NaN A: 1.0
4
Oslo B: 0.64, A: 0.36 NaN
5