Pandas string concatenation of row values of a column that have an implicit hierarchy

Tags: , , , ,



I have a dataframe signifying the temperature for three days for different regions in India. It’s given in the following image. original_dataframe

I need to generate another column in the same dataframe that concatenates the string values of the state and the city which is seen in ‘Col5’ as per the below image.

output_dataframe

I’m aware of merging the values based on columns. But can something like be achieved using pandas or udf in python. Request your insights.

Answer

  1. Create a new column with null-values, fill the new column with the state name if the row contains “/” (or any other character you use in your date-format).
  2. Then forward fill the new column with the state names and then fill the new column with city names from Col 1.
  3. Lastly, fix the rows that contain the state name by only including the state name once.

So something like this:

import pandas as pd
import numpy as np

df["Col 5"] = np.nan
df.loc[df['Col 2'].str.contains("/").fillna(False), "Col 5"] = df["Col 1"]
df["Col 5"] = df["Col 5"].ffill() + "|" + df['Col 1']
df.loc[df["Col 5"].str.split("|").str[0] == df["Col 5"].str.split("|").str[1], "Col 5"] = df["Col 5"].str.split("|").str[0]

Is that what you had in mind?



Source: stackoverflow