I have a dataframe with a list of years in the first column. A second column shows the number of years listed in each row.
Years Count_of_Years 0 [] 2 1 [] 2 2 ['2021'] 6 3 ['2022'] 6 4 [] 2
Which made me think that the contents of each cell is a pure string. And it seems that way when I checked the type:
type(df['Years'][0]) str
When I convert the column to list using to_list()
, it shows:
df['Years'].to_list()
'[]', '[]', "['2021']", "['2021']", '[]', '[]',
How do I convert it so that the Count_of_Years
shows correct values?
Advertisement
Answer
If the values in Years
column are already strings then I would suggest to use the str.count
method with a regex pattern to find the number of matching occurrences:
df['new_count'] = df['Years'].str.count(r'd{4}')
Years Count_of_Years new_count 0 [] 2 0 1 [] 2 0 2 ['2021'] 6 1 3 ['2022'] 6 1 4 [] 2 0