I used python code:
JavaScript
x
2
1
df.stat.crosstab("age", "y").orderBy("age_y").show()
2
to create a crosstab from a spark dataframe as follows:
However, I cannot find a code to obtain the row percentages. For example, age 18 row percentages should be 5/12 = 41.7%
for ‘no’ and 7/12 = 58.3%
for ‘yes’. The sum of 2 percentages is 100%.
May someone advise me in this case? Many thanks in advance.
Advertisement
Answer
Simply add 2 columns using using withColumn
and your formula to calculate the percentages:
JavaScript
1
22
22
1
from pyspark.sql import functions as F
2
3
df1 = df.stat.crosstab("age", "y").orderBy("age_y")
4
5
result = df1.withColumn(
6
"no_rp",
7
F.round(F.col("no") / (F.col("no") + F.col("yes")) * 100, 2)
8
).withColumn(
9
"yes_rp",
10
F.round(F.col("yes") / (F.col("no") + F.col("yes")) * 100, 2)
11
)
12
13
result.show()
14
15
#+-----+---+---+-----+------+
16
#|age_y| no|yes|no_rp|yes_rp|
17
#+-----+---+---+-----+------+
18
#| 18| 5| 7|41.67| 58.33|
19
#| 19| 24| 11|68.57| 31.43|
20
#| 20| 35| 15| 70.0| 30.0|
21
#+-----+---+---+-----+------+
22