I guess it could be a duplicated question, but I could not find the solution.
I want to make a frequency table in python.
df = pd.DataFrame({
'sample': ['A', 'A', 'B', 'C', 'B', 'C', 'C'],
'group': ['X', 'X', 'Y', 'Y', 'Z', 'Z', 'Z'],
'category': ['a', 'b', 'a', 'b', 'c', 'a', 'c' ]
})
df
# sample group category
#0 A X a
#1 A X b
#2 B Y a
#3 C Y b
#4 B Z c
#5 C Z a
#6 C Z c
And this is an expected result that is similar to the frequency table.
# sample group a b c #0 A X 1 1 0 #1 B Y 1 0 0 #2 C Y 0 1 0 #3 B Z 0 0 1 #4 C Z 1 0 1
I tried using crosstab, groupby, and pivot_table functions, but all of them failed to get the correct result.
pd.crosstab(df.sample, df.category) #is it available with only two variables?
df.groupby(['sample', 'group']).category.value_counts(normalize=False) #I think that this is similar to my expected result, but I want the form like an adjacency matrix #sample group category #A X a 1 # b 1 #B Y a 1 # Z c 1 #C Y b 1 # Z a 1 # c 1 #Name: category, dtype: int64
pd.pivot_table(df['sample'], df['group'], df['category'], aggfunc=','.join)
How can I make the expected result?
Advertisement
Answer
Because exist function DataFrame.sample is better use [] like dot notation, for multiple columns use list:
df = pd.crosstab([df['sample'],df['group']], df['category'])
print (df)
category a b c
sample group
A X 1 1 0
B Y 1 0 0
Z 0 0 1
C Y 0 1 0
Z 1 0 1
df = pd.crosstab([df.sample, df.group], df.category)
print (df)
category a b c
row_0 group
<bound method NDFrame.sample of sample group ... X 1 1 0
Y 1 1 0
Z 1 0 2