So I have a column that contains baseball team names, and I want to split it into the 2 new columns, that will contain separately city name and team name.
| Team |
|---|
| New York Giants |
| Atlanta Braves |
| Chicago Cubs |
| Chicago White Sox |
I would like to get something like this:
| Team | City | Franchise |
|---|---|---|
| New York Giants | New York | Giants |
| Atlanta Braves | Atlanta | Braves |
| Chicago Cubs | Chicago | Cubs |
| Chicago White Sox | Chicago | White Sox |
What I have tried so far?
- using
splitandrsplit–> it gets the job done, but can’t unify it. - did the count
df['cnt'] = df.asc.apply(lambda x: len(str(x).split(' ')))to get number of strings, so I know what kind of cases I have
There are 3 different cases:
- Standard one (e.g. Atlanta Braves)
- City with 2 strings (e.g. New York Giants)
- Franchise name with 2 strings (e.g. Chicago White Sox )
What I would like to do?
- Split based on conditions (
if cnt=2 then split on 1st occurence). Can’t find syntax for this, how this would go? - Update based on names (e.g.
if ['Col_name'].str.contains("York" or "Angeles") then split on 2nd occurence. Also, can’t find working syntax, example for this?
What would be a good approach to solve this?
Thanks!
Advertisement
Answer
Use:
#part of cities with space
cities = ['York','Angeles']
#test rows
m = df['Team'].str.contains('|'.join(cities))
#first split by first space to 2 new columns
df[['City','Franchise']] = df['Team'].str.split(n=1, expand=True)
#split by second space only filtered rows
s = df.loc[m, 'Team'].str.split(n=2)
#update values
df.update(pd.concat([s.str[:2].str.join(' '), s.str[2]], axis=1, ignore_index=True).set_axis(['City','Franchise'], axis=1))
print (df)
Team City Franchise
0 New York Giants New York Giants
1 Atlanta Braves Atlanta Braves
2 Chicago Cubs Chicago Cubs
3 Chicago White Sox Chicago White Sox