Skip to content
Advertisement

Pandas: Split and/or update columns, based on inconsistent data?

So I have a column that contains baseball team names, and I want to split it into the 2 new columns, that will contain separately city name and team name.

Team
New York Giants
Atlanta Braves
Chicago Cubs
Chicago White Sox

I would like to get something like this:

Team City Franchise
New York Giants New York Giants
Atlanta Braves Atlanta Braves
Chicago Cubs Chicago Cubs
Chicago White Sox Chicago White Sox

What I have tried so far?

  • using split and rsplit –> it gets the job done, but can’t unify it.
  • did the count df['cnt'] = df.asc.apply(lambda x: len(str(x).split(' '))) to get number of strings, so I know what kind of cases I have

There are 3 different cases:

  1. Standard one (e.g. Atlanta Braves)
  2. City with 2 strings (e.g. New York Giants)
  3. Franchise name with 2 strings (e.g. Chicago White Sox )

What I would like to do?

  • Split based on conditions (if cnt=2 then split on 1st occurence). Can’t find syntax for this, how this would go?
  • Update based on names (e.g. if ['Col_name'].str.contains("York" or "Angeles") then split on 2nd occurence . Also, can’t find working syntax, example for this?

What would be a good approach to solve this?

Thanks!

Advertisement

Answer

Use:

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement