how do i split a column into two in python on the basis of data in it

Question

for instance the column i want to split is duration here, it has data points like &#8211; 110 or 2 seasons, i want to make a differerent column for seasons and in place of seasons in my current column it should say null as this would make the type of column int from string screenshot of my data i tried

Accepted Answer

I have tried to replicate a portion of your dataframe in order to provide the below solution &#8211; note that it will also change the np.NaN values to &#8216;Null&#8217; as requested.Creating the sample dataframe off of your screenshot:movies_dic = {'release_year': [2021,2020,2021,2021,2021,1940,2018,2008,2021], 'duration':[np.NaN, 94, 108, 97, 104, 60, '4 Seasons', 90, '1 Season']}stack_df = pd.DataFrame(movies_dic)stack_dfThe issue is likely that the &#8216;duration&#8217; column is of object dtypes &#8211; namely it contains both string and integer values in it. I have made 2 small functions that will make use of the data types and allocate them to their respective column. The first is taking all the &#8216;string&#8217; rows and placing them in the &#8216;series_duration&#8217; column:def series(x):    if type(x) == str:        return x    else:        return 'Null'Then the movies function keeps the integer values (i.e.  those without the word &#8216;Season&#8217; in them) as is:def movies(x):    if type(x) == int:        return x    else:        return 'Null'stack_df['series_duration'] = stack_df['duration'].apply(lambda x: series(x))stack_df['duration'] = stack_df['duration'].apply(lambda x: movies(x))stack_dfrelease_year    duration    series_duration0   2021    Null           Null1   2020    94             Null2   2021    108            Null3   2021    97             Null4   2021    104            Null5   1940    60             Null6   2018    Null           4 Seasons7   2008    90             Null8   2021    Null           1 Season

Advertisement

Answer