persons |
---|
John New York |
Janet New York |
Mike Denver |
Michelle Texas |
I want to split into 2 columns: person and city. I tried this:
df = pd.DataFrame({"persons": ["John New York", "Janet New York", "Mike Denver", "Michelle Texas"]}) df[["name", "city"]] = df.persons.str.split("New York", expand=True,)
and it gives me this:
persons name city 0 John New York John 1 Janet New York Janet 2 Mike Denver Mike Denver None 3 Michelle Texas Michelle Texas None
What I want is to split by cities and keep the separator in the city column like this:
persons name city 0 John New York John New York 1 Janet New York Janet New York 2 Mike Denver Mike Denver None 3 Michelle Texas Michelle Texas None
Advertisement
Answer
You can use regex with a capture group:
df[['name', 'city']] = df['persons'].str.split(r'(New York)', expand=True).iloc[:,:2] print(df) persons name city 0 John New York John New York 1 Janet New York Janet New York 2 Mike Denver Mike Denver None 3 Michelle Texas Michelle Texas None
Read more on how it works here.