for instance the column i want to split is duration here, it has data points like – 110 or 2 seasons, i want to make a differerent column for seasons and in place of seasons in my current column it should say null as this would make the type of column int from string screenshot of my data
i tried the split function but that’s for splliting in between data points, unlike splitting different other data points
Advertisement
Answer
I have tried to replicate a portion of your dataframe in order to provide the below solution – note that it will also change the np.NaN values to ‘Null’ as requested.
Creating the sample dataframe off of your screenshot:
movies_dic = {'release_year': [2021,2020,2021,2021,2021,1940,2018,2008,2021], 'duration':[np.NaN, 94, 108, 97, 104, 60, '4 Seasons', 90, '1 Season']} stack_df = pd.DataFrame(movies_dic) stack_df
The issue is likely that the ‘duration’ column is of object dtypes – namely it contains both string and integer values in it. I have made 2 small functions that will make use of the data types and allocate them to their respective column. The first is taking all the ‘string’ rows and placing them in the ‘series_duration’ column:
def series(x): if type(x) == str: return x else: return 'Null'
Then the movies function keeps the integer values (i.e. those without the word ‘Season’ in them) as is:
def movies(x): if type(x) == int: return x else: return 'Null' stack_df['series_duration'] = stack_df['duration'].apply(lambda x: series(x)) stack_df['duration'] = stack_df['duration'].apply(lambda x: movies(x)) stack_df release_year duration series_duration 0 2021 Null Null 1 2020 94 Null 2 2021 108 Null 3 2021 97 Null 4 2021 104 Null 5 1940 60 Null 6 2018 Null 4 Seasons 7 2008 90 Null 8 2021 Null 1 Season