I am iterating through a pandas dataframe (originally a csv file) and checking for specific keywords in each row of a certain column. If it appears at least once, I add 1 to a score. There are like 7 keywords, and if the score is >=6, I would like to assign an item of another column (but in this row) with a string (here it is “Software and application developer”) and safe the score. Unfortunately, the score is everywhere the same what I find hard to believe. This is my code so far:
JavaScript
x
20
20
1
for row in data.iterrows():
2
devScore=0
3
if row[1].str.contains("developer").any() | row[1].str.contains("developpeur").any():
4
devScore=devScore+1
5
if row[1].str.contains("symfony").any():
6
devScore=devScore+1
7
if row[1].str.contains("javascript").any():
8
devScore=devScore+1
9
if row[1].str.contains("java").any() | row[1].str.contains("jee").any():
10
devScore=devScore+1
11
if row[1].str.contains("php").any():
12
devScore=devScore+1
13
if row[1].str.contains("html").any() | row[1].str.contains("html5").any():
14
devScore=devScore+1
15
if row[1].str.contains("application").any() | row[1].str.contains("applications").any():
16
devScore=devScore+1
17
if devScore>=6:
18
data["occupation"]="Software and application developer"
19
data["score"]=devScore
20
Advertisement
Answer
You assign a constant onto the whole column here:
JavaScript
1
3
1
data["occupation"]="Software and application developer"
2
data["score"]=devScore
3
They are supposed to be:
JavaScript
1
8
1
for idx, row in data.iterrows():
2
# blah blah
3
#
4
.
5
.
6
data.loc[idx, "occupation"]="Software and application developer"
7
data.loc[idx, "score"]=devScore
8