I have a dataframe that looks like that :
data = {'Place A':['New York', 'Oslo', 'Oslo'], 'Place B':['Paris','London','London'], 'Type':['A','B','A'], 'Number' : [34,42,24]} df = pd.DataFrame(data) print(df)
Place A | Place B | Type | Number |
---|---|---|---|
New York | Paris | A | 34 |
Oslo | London | B | 42 |
Oslo | London | A | 24 |
i need to have the percentage number of each type according to the routes.
I don’t know witch command to use to get a dataframe that looks like this
xxx | Paris | Oslo | London | New York |
---|---|---|---|---|
Paris | NaN | NaN | NaN | type A : 100% |
Oslo | NaN | NaN | type A : 36,3% type B : 63,6% | NaN |
London | NaN | type A : 36,3% type B : 63,6% | NaN | NaN |
New York | type A : 100% | NaN | NaN | NaN |
Advertisement
Answer
IUUC, you could reshape the data, then compute the percent per group, finally reshape again:
cities = set([*df['Place A'], *df['Place B']]) df['total'] = df.groupby(['Place A', 'Place B'])['Number'].transform('sum') df['percent'] = df['Number']/df['total'] df['value'] = df['Type']+': '+df['percent'].round(2).astype(str) df2 = (df.groupby(['Place A', 'Place B'])['value'] .agg(', '.join).unstack(1) .rename_axis(index=None, columns=None) .reindex(index=cities, columns=cities) ) # make output symmetrical df2.update(df2.T)
output:
Oslo New York Paris London Oslo NaN NaN NaN B: 0.64, A: 0.36 New York NaN NaN A: 1.0 NaN Paris NaN A: 1.0 NaN NaN London B: 0.64, A: 0.36 NaN NaN NaN
If you don’t need the symmetrical output, you can keep it by PlaceA/PlaceB, which will have the same information in condensed form:
df2 = df.groupby(['Place A', 'Place B'])['value'].agg(', '.join).unstack(1)
output:
Place B London Paris Place A New York NaN A: 1.0 Oslo B: 0.64, A: 0.36 NaN