Skip to content

How can a create a percentage matrix based on a dataframe

I have a dataframe that looks like that :

data = {'Place A':['New York', 'Oslo', 'Oslo'], 'Place B':['Paris','London','London'], 'Type':['A','B','A'], 'Number' : [34,42,24]}
df = pd.DataFrame(data)

Place A Place B Type Number
New York Paris A 34
Oslo London B 42
Oslo London A 24

i need to have the percentage number of each type according to the routes.

I don’t know witch command to use to get a dataframe that looks like this

xxx Paris Oslo London New York
Paris NaN NaN NaN type A : 100%
Oslo NaN NaN type A : 36,3% type B : 63,6% NaN
London NaN type A : 36,3% type B : 63,6% NaN NaN
New York type A : 100% NaN NaN NaN



IUUC, you could reshape the data, then compute the percent per group, finally reshape again:

cities = set([*df['Place A'], *df['Place B']])

df['total'] = df.groupby(['Place A', 'Place B'])['Number'].transform('sum')
df['percent'] = df['Number']/df['total']
df['value'] = df['Type']+': '+df['percent'].round(2).astype(str)

df2 = (df.groupby(['Place A', 'Place B'])['value']
         .agg(', '.join).unstack(1)
         .rename_axis(index=None, columns=None)
         .reindex(index=cities, columns=cities)

# make output symmetrical


                      Oslo New York   Paris            London
Oslo                   NaN      NaN     NaN  B: 0.64, A: 0.36
New York               NaN      NaN  A: 1.0               NaN
Paris                  NaN   A: 1.0     NaN               NaN
London    B: 0.64, A: 0.36      NaN     NaN               NaN

If you don’t need the symmetrical output, you can keep it by PlaceA/PlaceB, which will have the same information in condensed form:

df2 = df.groupby(['Place A', 'Place B'])['value'].agg(', '.join).unstack(1)


Place B             London   Paris
Place A                           
New York               NaN  A: 1.0
Oslo      B: 0.64, A: 0.36     NaN
7 People found this is helpful