Skip to content
Advertisement

Polars equivalent of pandas expression df.groupby[‘col1′,’col2’][‘col3’].sum().unstack()

pandasdf=pd.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)
pandasdf.groupby(["fruits","cars"])['B'].sum().unstack()

enter image description here

How can i create an equivalent truth table in polars?

Something like the below table into a truth table

df=pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)
df.groupby(["fruits","cars"]).agg(pl.col('B').sum()) #->truthtable

The efficiency of the code is important as the dataset is too large (for using it with apriori algorithm)

The unstack function in polars is different, polars alterative for pd.crosstab would also work.

Advertisement

Answer

It seems like you want tot do a pivot.

df = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
        "optional": [28, 300, None, 2, -30],
    }
)

df.pivot(values="B", index="cars", columns="fruits", aggregate_fn=pl.element().sum())
shape: (2, 3)
┌────────┬────────┬───────┐
│ cars   ┆ banana ┆ apple │
│ ---    ┆ ---    ┆ ---   │
│ str    ┆ i64    ┆ i64   │
╞════════╪════════╪═══════╡
│ beetle ┆ 6      ┆ 5     │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ audi   ┆ 4      ┆ null  │
└────────┴────────┴───────┘

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement