Skip to content
Advertisement

Defining Parent For a Dataset with Several Conditions in Pandas

I have a CSV file with more than 10,000,000 rows of data with below structures: I have an ID as my uniqueID per group:

Data Format

JavaScript

For defining parent relationship below conditions exist:

  1. Each group MUST has 1 Head.
  2. It is OPTIONAL to have ONLY 1 Senior in each group.
  3. Each group MUST have AT LEAST one Junior.

EXPECTED RESULT

JavaScript

Below code works when I have one Junior, I want to know if there is any way to define parent for more than one juniors:

JavaScript

Advertisement

Answer

You could pivot the Type and Name columns then forword fill within ID group. Then take the right-hand two non-NaN entries to get the Parent and Name.

Pivot and forward-fill:

JavaScript

A function to pull the last two non-NaN entries:

JavaScript

Apply it and drop the non-applicable rows:

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement