I have a dataframe called RawDatabase
which I am am snapping values to a validation list which is called ValidationLists
. I take a specific column from the RawDatabase and compare the elements to the validation list. The entry will be snapped to the entry in the validation list it most closely resembles.
The code looks like this:
def GetStandardisedField(rawDatabase,validationLists,field): print('Standardising ', field,' ...') my_list = validationLists[field] l1=[] for x in rawDatabase[field]: choice = process.extractOne(x, my_list)[0] l1.append(choice) rawDatabase['choice']=l1 rawDatabase[field] = rawDatabase['choice'] del rawDatabase['choice'] return rawDatabase
In an example the rawDatabase[field] looks like:
0 yes 1 YES123 2 nO023 3 n 4 NaN
and the validationList looks like:
YES NO
I am trying to snap all the values so that the new rawDatabase[field] looks like:
0 YES 1 YES 2 NO 3 NO 4
I however seem to have a problem when I try to snap an NaN
value to the validationList
(even when I include NaN
in the validationList
as a test).
What is the best way to handle NaN values (so the NaN value in the snapped dataset is blank)?
Advertisement
Answer
from fuzzywuzzy import process l=['YES',"NO"] a=[] for x in df.Col1: try: a.append([process.extract(x, l, limit=1)][0][0][0]) except: a.append(np.nan) df['target']=a df Out[1261]: Col1 target 0 yes YES 1 YES123 YES 2 nO023 NO 3 n NO 4 NaN NaN