I have created the following DataFrame:
dataset = pd.DataFrame(np.random.randint(0,3,size=(5, 8)), columns=list('ABCDEFGH'))
Now I wish to show the proportion of each value (0,1,2) across each column. Ideally I’d like to represent this as a stacked bar chart – Column names on the x axis (so 8 bars in total from A to H), with the different colours on the bars representing the proportion of each value (0,1,2).
What’s the easiest/simplest/most concise way to do this?
Edit: I’ve found an easy way to represent the proportions – not as a bar char, but as a DataFrame. See below:
df = pd.concat([dataset['A'].value_counts(normalize=True).mul(100), dataset['B'].value_counts(normalize=True).mul(100), dataset['C'].value_counts(normalize=True).mul(100), dataset['D'].value_counts(normalize=True).mul(100), dataset['E'].value_counts(normalize=True).mul(100), dataset['F'].value_counts(normalize=True).mul(100), dataset['G'].value_counts(normalize=True).mul(100), dataset['H'].value_counts(normalize=True).mul(100)], axis=1,keys=('proportions A','proportions B', 'proportions C', 'proportions D', 'proportions E', 'proportions F', 'proportions G', 'proportions H'))
However, is there a more concise way to code this? E.g. anyway to make the above code into a loop?
Advertisement
Answer
This seems to be the most efficient way. In terms of shortening it, is it this what you are looking for? It is really your solution, just condensed via comprehensions.
df = pd.concat([dataset[colid].value_counts(normalize=True).mul(100) for colid in list('ABCDEFGH')], axis=1,keys=('proportions ' + colid for colid in list('ABCDEFGH'))) print(df)
which results in
proportions A proportions B proportions C proportions D proportions E 0 NaN 20.0 NaN 60.0 20.0 1 80.0 40.0 40.0 20.0 40.0 2 20.0 40.0 60.0 20.0 40.0 proportions F proportions G proportions H 0 20.0 40.0 80.0 1 40.0 40.0 20.0 2 40.0 20.0 NaN