I guess it could be a duplicated question, but I could not find the solution.
I want to make a frequency table in python.
df = pd.DataFrame({ 'sample': ['A', 'A', 'B', 'C', 'B', 'C', 'C'], 'group': ['X', 'X', 'Y', 'Y', 'Z', 'Z', 'Z'], 'category': ['a', 'b', 'a', 'b', 'c', 'a', 'c' ] }) df # sample group category #0 A X a #1 A X b #2 B Y a #3 C Y b #4 B Z c #5 C Z a #6 C Z c
And this is an expected result that is similar to the frequency table.
# sample group a b c #0 A X 1 1 0 #1 B Y 1 0 0 #2 C Y 0 1 0 #3 B Z 0 0 1 #4 C Z 1 0 1
I tried using crosstab
, groupby
, and pivot_table
functions, but all of them failed to get the correct result.
pd.crosstab(df.sample, df.category) #is it available with only two variables?
df.groupby(['sample', 'group']).category.value_counts(normalize=False) #I think that this is similar to my expected result, but I want the form like an adjacency matrix #sample group category #A X a 1 # b 1 #B Y a 1 # Z c 1 #C Y b 1 # Z a 1 # c 1 #Name: category, dtype: int64
pd.pivot_table(df['sample'], df['group'], df['category'], aggfunc=','.join)
How can I make the expected result?
Advertisement
Answer
Because exist function DataFrame.sample
is better use []
like dot notation, for multiple columns use list:
df = pd.crosstab([df['sample'],df['group']], df['category']) print (df) category a b c sample group A X 1 1 0 B Y 1 0 0 Z 0 0 1 C Y 0 1 0 Z 1 0 1
df = pd.crosstab([df.sample, df.group], df.category) print (df) category a b c row_0 group <bound method NDFrame.sample of sample group ... X 1 1 0 Y 1 1 0 Z 1 0 2