I am iterating through a pandas dataframe (originally a csv file) and checking for specific keywords in each row of a certain column. If it appears at least once, I add 1 to a score. There are like 7 keywords, and if the score is >=6, I would like to assign an item of another column (but in this row) with a string (here it is “Software and application developer”) and safe the score. Unfortunately, the score is everywhere the same what I find hard to believe. This is my code so far:
for row in data.iterrows(): devScore=0 if row[1].str.contains("developer").any() | row[1].str.contains("developpeur").any(): devScore=devScore+1 if row[1].str.contains("symfony").any(): devScore=devScore+1 if row[1].str.contains("javascript").any(): devScore=devScore+1 if row[1].str.contains("java").any() | row[1].str.contains("jee").any(): devScore=devScore+1 if row[1].str.contains("php").any(): devScore=devScore+1 if row[1].str.contains("html").any() | row[1].str.contains("html5").any(): devScore=devScore+1 if row[1].str.contains("application").any() | row[1].str.contains("applications").any(): devScore=devScore+1 if devScore>=6: data["occupation"]="Software and application developer" data["score"]=devScore
Advertisement
Answer
You assign a constant onto the whole column here:
data["occupation"]="Software and application developer" data["score"]=devScore
They are supposed to be:
for idx, row in data.iterrows(): # blah blah # . . data.loc[idx, "occupation"]="Software and application developer" data.loc[idx, "score"]=devScore