Skip to content
Advertisement

Use a multidimensional index on a MultiIndex pandas dataframe?

I have a multiindex pandas dataframe that looks like this (called p_z):

JavaScript

I want to be able to select certain rows based on another dataframe (or numpy array) which is multidimensional. It would look like this as a pandas dataframe (called tofpid):

JavaScript

I also have it as an awkward array, where it’s a (26692, ) array (each of the entries has a non-standard number of subentries). This is a selection df/array that tells the p_z df which rows to keep. So in entry 0 of p_z, it should keep subentries 0, 2, 4, 5, 7, etc.

I can’t find a way to get this done in pandas. I’m new to pandas, and even newer to multiindex; but I feel there ought to be a way to do this. If it’s able to be broadcast even better as I’ll be doing this over ~1500 dataframes of similar size. If it helps, these dataframes are from a *.root file imported using uproot (if there’s another way to do this without pandas, I’ll take it; but I would love to use pandas to keep things organised).

Edit: Here’s a reproducible example (courtesy of Jim Pavinski’s answer; thanks!).

JavaScript

Both of these dataframes are produced natively in uproot, but this will reproduce the same dataframes that uproot would (using the awkward library).

Advertisement

Answer

IIUC:

Input data:

JavaScript

Create a new multiindex from the columns (entry, tofpid) of your second dataframe:

JavaScript

Output result:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement