I am trying to filter out records whose field_A is null or empty string in the data frame like below:
JavaScript
x
3
1
my_df[my_df.editions is not None]
2
my_df.shape
3
This gives me error:
JavaScript
1
52
52
1
---------------------------------------------------------------------------
2
KeyError Traceback (most recent call last)
3
<ipython-input-40-e1969e0af259> in <module>()
4
1 my_df['editions'] = my['editions'].astype(str)
5
----> 2 my_df = my_df[my_df.editions is not None]
6
3 my_df.shape
7
8
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
9
1995 return self._getitem_multilevel(key)
10
1996 else:
11
-> 1997 return self._getitem_column(key)
12
1998
13
1999 def _getitem_column(self, key):
14
15
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
16
2002 # get column
17
2003 if self.columns.is_unique:
18
-> 2004 return self._get_item_cache(key)
19
2005
20
2006 # duplicate columns & possible reduce dimensionality
21
22
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
23
1348 res = cache.get(item)
24
1349 if res is None:
25
-> 1350 values = self._data.get(item)
26
1351 res = self._box_item_values(item, values)
27
1352 cache[item] = res
28
29
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
30
3288
31
3289 if not isnull(item):
32
-> 3290 loc = self.items.get_loc(item)
33
3291 else:
34
3292 indexer = np.arange(len(self.items))[isnull(self.items)]
35
36
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
37
1945 return self._engine.get_loc(key)
38
1946 except KeyError:
39
-> 1947 return self._engine.get_loc(self._maybe_cast_indexer(key))
40
1948
41
1949 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
42
43
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()
44
45
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()
46
47
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()
48
49
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()
50
51
KeyError: True
52
or
JavaScript
1
3
1
my_df[my_df.editions != None]
2
my_df.shape
3
This one gave no error but didn’t filter out any None values.
I also tried:
JavaScript
1
2
1
my_df = my_df[my_df.editions.notnull()]
2
This one doesn’t give error but doesn’t filter out any None values either.
Could anyone please advise how to solve this problem? Thanks!
Advertisement
Answer
Can you create a new dataframe from the filtering?
Dataframe before:
JavaScript
1
10
10
1
a b
2
1 9
3
2 10
4
3 11
5
4 12
6
5 13
7
6 14
8
7 15
9
8 null
10
Example:
JavaScript
1
7
1
import pandas
2
3
my_df = pandas.DataFrame({"a":[1,2,3,4,5,6,7,8],"b":[9,10,11,12,13,14,15,"null"]})
4
5
my_df2= my_df[(my_df['b']!="null")]
6
print(my_df2)
7
dataframe after:
JavaScript
1
9
1
a b
2
1 9
3
2 10
4
3 11
5
4 12
6
5 13
7
6 14
8
7 15
9
What it is doing is looking for “null” and excluding it. You could do the same thing with empty strings.