So I have a column that contains baseball team names, and I want to split it into the 2 new columns, that will contain separately city name and team name.
Team |
---|
New York Giants |
Atlanta Braves |
Chicago Cubs |
Chicago White Sox |
I would like to get something like this:
Team | City | Franchise |
---|---|---|
New York Giants | New York | Giants |
Atlanta Braves | Atlanta | Braves |
Chicago Cubs | Chicago | Cubs |
Chicago White Sox | Chicago | White Sox |
What I have tried so far?
- using
split
andrsplit
–> it gets the job done, but can’t unify it. - did the count
df['cnt'] = df.asc.apply(lambda x: len(str(x).split(' ')))
to get number of strings, so I know what kind of cases I have
There are 3 different cases:
- Standard one (e.g. Atlanta Braves)
- City with 2 strings (e.g. New York Giants)
- Franchise name with 2 strings (e.g. Chicago White Sox )
What I would like to do?
- Split based on conditions (
if cnt=2 then split on 1st occurence
). Can’t find syntax for this, how this would go? - Update based on names (e.g.
if ['Col_name'].str.contains("York" or "Angeles") then split on 2nd occurence
. Also, can’t find working syntax, example for this?
What would be a good approach to solve this?
Thanks!
Advertisement
Answer
Use:
#part of cities with space cities = ['York','Angeles'] #test rows m = df['Team'].str.contains('|'.join(cities)) #first split by first space to 2 new columns df[['City','Franchise']] = df['Team'].str.split(n=1, expand=True) #split by second space only filtered rows s = df.loc[m, 'Team'].str.split(n=2) #update values df.update(pd.concat([s.str[:2].str.join(' '), s.str[2]], axis=1, ignore_index=True).set_axis(['City','Franchise'], axis=1)) print (df) Team City Franchise 0 New York Giants New York Giants 1 Atlanta Braves Atlanta Braves 2 Chicago Cubs Chicago Cubs 3 Chicago White Sox Chicago White Sox