I have a transposed Dataframe tr:
7128 | 8719 | 14051 | 14636 | |
---|---|---|---|---|
JDUTC_0 | 2451957.36 | 2452149.36 | 2457243.98 | 2452531.89 |
JDUTC_1 | 2451957.37 | 2452149.36 | 2457243.99 | 2452531.90 |
JDUTC_2 | 2451957.37 | 2452149.36 | 2457244.00 | 2452531.91 |
JDUTC_3 | NaN | 2452149.36 | NaN | NaN |
JDUTC_4 | NaN | 2452149.36 | NaN | NaN |
JDUTC_5 | NaN | 2452149.36 | NaN | NaN |
JDUTC_6 | 1.23 | 2452149.37 | NaN | NaN |
JDUTC_7 | NaN | NaN | NaN | NaN |
JDUTC_8 | NaN | NaN | NaN | NaN |
JDUTC_9 | NaN | NaN | NaN | NaN |
And I create dict ‘a’ with this block of code:
a = {} b=[] for _, contents in tr.items(): b.clear() for ind, val in enumerate(contents): if np.isnan(val): b.append(ind) continue else: pass print(_) print(b) a[_] = b print(a)
Which gives me this output:
7128 [3, 4, 5, 7, 8, 9] {7128: [3, 4, 5, 7, 8, 9]} 8719 [7, 8, 9] {7128: [7, 8, 9], 8719: [7, 8, 9]} 14051 [3, 4, 5, 6, 7, 8, 9] {7128: [3, 4, 5, 6, 7, 8, 9], 8719: [3, 4, 5, 6, 7, 8, 9], 14051: [3, 4, 5, 6, 7, 8, 9]} 14636 [3, 4, 5, 6, 7, 8, 9] {7128: [3, 4, 5, 6, 7, 8, 9], 8719: [3, 4, 5, 6, 7, 8, 9], 14051: [3, 4, 5, 6, 7, 8, 9], 14636: [3, 4, 5, 6, 7, 8, 9]}
What I expect dict ‘a’ to look like is this:
{7128: [3, 4, 5, 7, 8, 9] 8719: [7, 8, 9] 14051: [3, 4, 5, 6, 7, 8, 9] 14636: [3, 4, 5, 6, 7, 8, 9]}
What I am doing wrong? Why is a[_] = b
overwriting all the previous keys when print(_)
is verifying that _ is always the next column label?
Advertisement
Answer
With the correct name convention, I would change your code after:
import numpy as np import pandas as pd import sys if sys.version_info[0] < 3: from StringIO import StringIO else: from io import StringIO s = StringIO("""idx 7128 8719 14051 14636 JDUTC_0 2451957.36 2452149.36 2457243.98 2452531.89 JDUTC_1 2451957.37 2452149.36 2457243.99 2452531.90 JDUTC_2 2451957.37 2452149.36 2457244.00 2452531.91 JDUTC_3 NaN 2452149.36 NaN NaN JDUTC_4 NaN 2452149.36 NaN NaN JDUTC_5 NaN 2452149.36 NaN NaN JDUTC_6 1.23 2452149.37 NaN NaN JDUTC_7 NaN NaN NaN NaN JDUTC_8 NaN NaN NaN NaN JDUTC_9 NaN NaN NaN NaN""") tr = pd.read_csv(s, sep="t", index_col=0)
(people should give minimal working code – but often forget to give e.g. the code to build the data frame etc. and the imports)
to:
a = {} b = [] for name, values in tr.items(): b.clear() # this is problematic as you know for ind, val in enumerate(values): if np.isnan(val): b.append(ind) continue else: pass a[name] = b
continue
and pass
are not necessary – they just say “go on” with the loop.
In Python, you are not forced to give the else
branch:
for name, values in tr.items(): b.clear() # This is still problematic at this state. for ind, val in enumerate(values): if np.isnan(val): b.append(ind) a[name] = b
Such collection of data using for-loops are better done with list-comprehensions:
a = {} for name, values in tr.items(): b = [ind for ind, val in enumerate(values) if np.isnan(val)] a[name] = b # now the result is already correct!
And finally, you can even build list-comprehensions for dictionaries – making this entire code a one-liner – but a readable one – when one is familiar with list comprehensions:
a = {name: [i for i, x in enumerate(vals) if np.isnan(x)] for name, vals in tr.items()}
You can see the result:
a # which returns: {'7128': [3, 4, 5, 7, 8, 9], '8719': [7, 8, 9], '14051': [3, 4, 5, 6, 7, 8, 9], '14636': [3, 4, 5, 6, 7, 8, 9]}
List-comprehensions are going into the direction of Functional Programming (FP).
Which exactly deals with the problem of not to apply mutation (like the b.append()
or b.clear()
methods – because – as you have seen: your case is a demonstration of how easily a bug is generated when using mutation. – and would contribute to the discussion – why FP – while it at the first sight looks brain-unfriendly – is
actually the more brain-friendly way to program.
List comprehensions are the Pythonic form of “map” – and if you use a “if” inside list comprehensions – this is the Pythonic equivalent to “filter” which FP people know like a second brain for breathing.