I have a dataframe with a list of years in the first column. A second column shows the number of years listed in each row.
JavaScript
x
8
1
Years Count_of_Years
2
0 [] 2
3
1 [] 2
4
2 ['2021'] 6
5
3 ['2022'] 6
6
4 [] 2
7
8
Which made me think that the contents of each cell is a pure string. And it seems that way when I checked the type:
JavaScript
1
4
1
type(df['Years'][0])
2
3
str
4
When I convert the column to list using to_list()
, it shows:
JavaScript
1
2
1
df['Years'].to_list()
2
JavaScript
1
7
1
'[]',
2
'[]',
3
"['2021']",
4
"['2021']",
5
'[]',
6
'[]',
7
How do I convert it so that the Count_of_Years
shows correct values?
Advertisement
Answer
If the values in Years
column are already strings then I would suggest to use the str.count
method with a regex pattern to find the number of matching occurrences:
JavaScript
1
2
1
df['new_count'] = df['Years'].str.count(r'd{4}')
2
JavaScript
1
7
1
Years Count_of_Years new_count
2
0 [] 2 0
3
1 [] 2 0
4
2 ['2021'] 6 1
5
3 ['2022'] 6 1
6
4 [] 2 0
7