What would be the most pythonic way of creating the following string concatenation:
We have an initial dataframe with some of the columns being:
- origin
- dest_1_country
- dest_1_city
- dest_2_country
- dest_2_city
- dest_3_country
- dest_3_city
- dest_4_country
- dest_4_city
We want to create an additional column that is the full route for every row in the dataframe and that could be generated by
df[‘full_route’] = df[‘origin].fillna(“”) + df[‘dest_1_country].fillna(“”) + df[‘dest_1_city].fillna(“”) + df[‘dest_2_country].fillna(“”) + df[‘dest_2_city].fillna(“”) + df[‘dest_3_country].fillna(“”) + df[‘dest_3_city].fillna(“”) + df[‘dest_4_country].fillna(“”) + df[‘dest_4_city].fillna(“”)
Obviously this cannot be the most pythonic way of getting the desired result given how cumbersome it is.. what if I had 100 cities in the df?
What would be the best way to achieving this in python?
Note: in the dataframe, there’re other columns that have nothing to do with the route and that shouldn’t be considered in the concatenation.
Thanks a lot!!
Advertisement
Answer
If you have this dataframe:
origin dest_1_country dest_1_city dest_2_country dest_2_city 0 a b c d e 1 f g h i j
Then you can do:
df["full_route"] = df.sum(axis=1) # df.fillna("").sum(axis=1) if you have NaNs print(df)
To concatenate all columns:
origin dest_1_country dest_1_city dest_2_country dest_2_city full_route 0 a b c d e abcde 1 f g h i j fghij
EDIT: If you want to concatenate “origin” and every “*city”/”*country” column, you can do:
df["full_route"] = df["origin"].fillna("") + df.filter( regex=r"country$|city$" ).fillna("").sum(axis=1) print(df)