| persons |
|---|
| John New York |
| Janet New York |
| Mike Denver |
| Michelle Texas |
I want to split into 2 columns: person and city. I tried this:
df = pd.DataFrame({"persons": ["John New York", "Janet New York", "Mike Denver", "Michelle Texas"]})
df[["name", "city"]] = df.persons.str.split("New York", expand=True,)
and it gives me this:
persons name city 0 John New York John 1 Janet New York Janet 2 Mike Denver Mike Denver None 3 Michelle Texas Michelle Texas None
What I want is to split by cities and keep the separator in the city column like this:
persons name city 0 John New York John New York 1 Janet New York Janet New York 2 Mike Denver Mike Denver None 3 Michelle Texas Michelle Texas None
Advertisement
Answer
You can use regex with a capture group:
df[['name', 'city']] = df['persons'].str.split(r'(New York)', expand=True).iloc[:,:2]
print(df)
persons name city
0 John New York John New York
1 Janet New York Janet New York
2 Mike Denver Mike Denver None
3 Michelle Texas Michelle Texas None
Read more on how it works here.