I’m trying to figure out if there is a good way to manage units in my pandas data. For example, I have a DataFrame
that looks like this:
length (m) width (m) thickness (cm) 0 1.2 3.4 5.6 1 7.8 9.0 1.2 2 3.4 5.6 7.8
Currently, the measurement units are encoded in column names. Downsides include:
- column selection is awkward —
df['width (m)']
vs.df['width']
- things will likely break if the units of my source data change
If I wanted to strip the units out of the column names, is there somewhere else that the information could be stored?
Advertisement
Answer
There isn’t any great way to do this right now, see github issue here for some discussion.
As a quick hack, could do something like this, maintaining a separate dict with the units.
In [3]: units = {} In [5]: newcols = [] ...: for col in df: ...: name, unit = col.split(' ') ...: units[name] = unit ...: newcols.append(name) In [6]: df.columns = newcols In [7]: df Out[7]: length width thickness 0 1.2 3.4 5.6 1 7.8 9.0 1.2 2 3.4 5.6 7.8 In [8]: units['length'] Out[8]: '(m)'