Skip to content
Advertisement

How to drop dictionaries with NaN values from list

This seems like a fairly simple thing but I haven’t been able to find an answer for it here (yet).

I have a list of dictionaries, and some of the dictionaries in the list have NaN values. I just need to drop any dictionary from the list if it has a NaN value in it.

I’ve tried it a few different ways myself. Here’s one attempt with filter and a lambda function, which got a TypeError (“must be real number, not dict_values,” which makes sense):

from math import isnan

def remove_dictionaries_missing_data(list_of_dictionaries):
    return list(filter(lambda dictionary: not math.isnan(dictionary.values()), 
                                          list_of_dictionaries))

I also tried it with a couple nested loops and some code I really wasn’t sure about and got the same error:

from math import isnan

def remove_dictionaries_missing_data(list_of_dictionaries):
    cleaned_list = []
    for dictionary in list_of_dictionaries:
        if not math.isnan(dictionary[value] for value in dictionary.values()):
            cleaned_list.append(dictionary)
    return cleaned_list

… and finally with just a list comprehension (same error):

from math import isnan
def remove_movies_missing_data(movies):
    return [movie for movie in movies if not math.isnan(movie.values())]

EDIT:

Here’s a sample of the list I’m working with:

[{'year': 2013,
  'imdb': 'tt2005374',
  'title': 'The Frozen Ground',
  'test': 'nowomen-disagree',
  'clean_test': 'nowomen',
  'binary': 'FAIL',
  'budget': 19200000,
  'domgross': nan,
  'intgross': nan,
  'code': '2013FAIL',
  'budget_2013$': 19200000,
  'domgross_2013$': nan,
  'intgross_2013$': nan,
  'period code': 1.0,
  'decade code': 1.0},
 {'year': 2011,
  'imdb': 'tt1422136',
  'title': 'A Lonely Place to Die',
  'test': 'ok',
  'clean_test': 'ok',
  'binary': 'PASS',
  'budget': 4000000,
  'domgross': nan,
  'intgross': 442550.0,
  'code': '2011PASS',
  'budget_2013$': 4142763,
  'domgross_2013$': nan,
  'intgross_2013$': 458345.0,
  'period code': 1.0,
  'decade code': 1.0},
... ]

Advertisement

Answer

dictionary.values() is a generator for all the values in the dictionary. You need to call math.isnan() on the individual values. You can use any() to do this:

def remove_dictionarries_missing_data(list_of_dictionaries):
    return [d for d in list_of_dictionaries 
             if not any(isinstance(val, float) and math.isnan(val) for val in d.values())]
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement