Skip to content
Advertisement

How to ignore NULL fields while concatenating strings from multiple columns in python polars?

I have a dataframe with person names with these fields – last, first and middle names, i’m trying to concatenating these fields to get a full_name column in a dataframe as below.

dfl.with_columns(
    pl.concat_str([pl.col('last_name'),
                   pl.col('first_name_or_initial'),
                   pl.col('middle_name_or_initial')],' ').alias('full_name')).select([
                   pl.col('full_name'),pl.col('last_name'),
                   pl.col('first_name_or_initial'),
                   pl.col('middle_name_or_initial')])

Here is the output:

enter image description here

Why I’m getting null in full_name after concatenating last, first and middle_names ? Here If any of field holds null while concatenating strings the result will be NULL like wise in above example last_name is Lee, first_name is James, middle_names is NULL hence full_name is NULL how to ignore NULL field when concatenating the fields.

Advertisement

Answer

We can apppend fill_null to the Expressions passed to concat_str:

dfl.with_columns(
    pl.concat_str(
        [
            pl.col("last_name").fill_null(""),
            pl.col("first_name_or_initial").fill_null(""),
            pl.col("middle_name_or_initial").fill_null(""),
        ],
        " ",
    ).str.rstrip().alias("full_name")
).select(
    [
        pl.col("full_name"),
        pl.col("last_name"),
        pl.col("first_name_or_initial"),
        pl.col("middle_name_or_initial"),
    ]
)
shape: (5, 4)
┌─────────────────────┬────────────┬───────────────────────┬────────────────────────┐
│ full_name           ┆ last_name  ┆ first_name_or_initial ┆ middle_name_or_initial │
│ ---                 ┆ ---        ┆ ---                   ┆ ---                    │
│ str                 ┆ str        ┆ str                   ┆ str                    │
╞═════════════════════╪════════════╪═══════════════════════╪════════════════════════╡
│ Dionne Thomas Roger ┆ Dionne     ┆ Thomas                ┆ Roger                  │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Lee James           ┆ Lee        ┆ James                 ┆ null                   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Lee James           ┆ Lee        ┆ James                 ┆ null                   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Latella Domenic C.  ┆ Latella    ┆ Domenic               ┆ C.                     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Trumpauska Arvydas  ┆ Trumpauska ┆ Arvydas               ┆ null                   │
└─────────────────────┴────────────┴───────────────────────┴────────────────────────┘

I’ve filled the null values with an empty string "", but you can choose whatever value you need.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement