I have a dataframe called RawDatabase
which I am am snapping values to a validation list which is called ValidationLists
. I take a specific column from the RawDatabase and compare the elements to the validation list. The entry will be snapped to the entry in the validation list it most closely resembles.
The code looks like this:
JavaScript
x
18
18
1
def GetStandardisedField(rawDatabase,validationLists,field):
2
print('Standardising ', field,' ...')
3
4
my_list = validationLists[field]
5
6
l1=[]
7
8
for x in rawDatabase[field]:
9
10
choice = process.extractOne(x, my_list)[0]
11
l1.append(choice)
12
13
rawDatabase['choice']=l1
14
rawDatabase[field] = rawDatabase['choice']
15
del rawDatabase['choice']
16
17
return rawDatabase
18
In an example the rawDatabase[field] looks like:
JavaScript
1
6
1
0 yes
2
1 YES123
3
2 nO023
4
3 n
5
4 NaN
6
and the validationList looks like:
JavaScript
1
3
1
YES
2
NO
3
I am trying to snap all the values so that the new rawDatabase[field] looks like:
JavaScript
1
6
1
0 YES
2
1 YES
3
2 NO
4
3 NO
5
4
6
I however seem to have a problem when I try to snap an NaN
value to the validationList
(even when I include NaN
in the validationList
as a test).
What is the best way to handle NaN values (so the NaN value in the snapped dataset is blank)?
Advertisement
Answer
JavaScript
1
19
19
1
from fuzzywuzzy import process
2
l=['YES',"NO"]
3
a=[]
4
for x in df.Col1:
5
try:
6
a.append([process.extract(x, l, limit=1)][0][0][0])
7
except:
8
a.append(np.nan)
9
10
df['target']=a
11
df
12
Out[1261]:
13
Col1 target
14
0 yes YES
15
1 YES123 YES
16
2 nO023 NO
17
3 n NO
18
4 NaN NaN
19