Skip to content
Advertisement

Showing the proportions of values across each column in a DataFrame in Python

I have created the following DataFrame:

dataset = pd.DataFrame(np.random.randint(0,3,size=(5, 8)), columns=list('ABCDEFGH'))

Now I wish to show the proportion of each value (0,1,2) across each column. Ideally I’d like to represent this as a stacked bar chart – Column names on the x axis (so 8 bars in total from A to H), with the different colours on the bars representing the proportion of each value (0,1,2).

What’s the easiest/simplest/most concise way to do this?

Edit: I’ve found an easy way to represent the proportions – not as a bar char, but as a DataFrame. See below:

df = pd.concat([dataset['A'].value_counts(normalize=True).mul(100),
               dataset['B'].value_counts(normalize=True).mul(100),
               dataset['C'].value_counts(normalize=True).mul(100),
                dataset['D'].value_counts(normalize=True).mul(100),
               dataset['E'].value_counts(normalize=True).mul(100),
               dataset['F'].value_counts(normalize=True).mul(100),
               dataset['G'].value_counts(normalize=True).mul(100),
               dataset['H'].value_counts(normalize=True).mul(100)],
               axis=1,keys=('proportions A','proportions B',
                           'proportions C', 'proportions D',
                           'proportions E', 'proportions F',
                           'proportions G', 'proportions H'))

However, is there a more concise way to code this? E.g. anyway to make the above code into a loop?

Advertisement

Answer

This seems to be the most efficient way. In terms of shortening it, is it this what you are looking for? It is really your solution, just condensed via comprehensions.

df = pd.concat([dataset[colid].value_counts(normalize=True).mul(100) for colid in list('ABCDEFGH')],
              axis=1,keys=('proportions ' + colid for colid in list('ABCDEFGH')))

print(df)

which results in

   proportions A  proportions B  proportions C  proportions D  proportions E  
0            NaN           20.0            NaN           60.0           20.0   
1           80.0           40.0           40.0           20.0           40.0   
2           20.0           40.0           60.0           20.0           40.0   

   proportions F  proportions G  proportions H  
0           20.0           40.0           80.0  
1           40.0           40.0           20.0  
2           40.0           20.0            NaN  
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement